Add v4 C7/C6 fast classify and small-segment v4 scaffolding
This commit is contained in:
52
docs/analysis/PF_STATUS_V4_202502.md
Normal file
52
docs/analysis/PF_STATUS_V4_202502.md
Normal file
@ -0,0 +1,52 @@
|
||||
# PF/OS ベースライン (PF2, small-object v4 状態)
|
||||
|
||||
- コマンド (Release, v4: C7+C6 を v4 に強制、v3 OFF):
|
||||
```
|
||||
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
|
||||
HAKMEM_BENCH_MIN_SIZE=16 \
|
||||
HAKMEM_BENCH_MAX_SIZE=1024 \
|
||||
HAKMEM_SS_OS_STATS=1 \
|
||||
HAKMEM_SMALL_HEAP_V4_ENABLED=1 \
|
||||
HAKMEM_SMALL_HEAP_V4_CLASSES=0xC0 \
|
||||
HAKMEM_SMALL_HEAP_V3_ENABLED=0 \
|
||||
perf stat -e cycles,instructions,task-clock,page-faults \
|
||||
./bench_random_mixed_hakmem 1000000 400 1
|
||||
```
|
||||
- 結果 (環境: リリースビルド, ws=400, iters=1M):
|
||||
- Throughput: **31,779,973 ops/s** (time=0.031s)
|
||||
- perf stat: cycles=205,322,023 / instructions=385,092,104 / task-clock=51.40ms / page-faults=6,702
|
||||
- `[SS_OS_STATS]` : alloc=2 free=4 madvise=2 madvise_enomem=0 madvise_disabled=0 mmap_total=2
|
||||
- 所感:
|
||||
- v4 (C7+C6) 強制時の pf/OS 基準値。v3 基準 (~40M) より遅めだが、pf 数値と OS stats を PF2 の起点として固定。
|
||||
- 今後 SmallSegmentBox_v4 を繋ぐ A/B では、page-faults/SS_OS_STATS をこの値からどこまで下げられるかを指標にする。
|
||||
|
||||
## DEBUG perf (cycles:u, -O0/-g, v4=C7+C6)
|
||||
|
||||
- ビルド:
|
||||
```
|
||||
make clean
|
||||
CFLAGS='-O0 -g' USE_LTO=0 OPT_LEVEL=0 NATIVE=0 make bench_random_mixed_hakmem -j4
|
||||
```
|
||||
- コマンド:
|
||||
```
|
||||
HAKMEM_PROFILE=DEBUG_TINY_FRONT_PERF \
|
||||
HAKMEM_BENCH_MIN_SIZE=16 \
|
||||
HAKMEM_BENCH_MAX_SIZE=1024 \
|
||||
HAKMEM_SMALL_HEAP_V4_ENABLED=1 \
|
||||
HAKMEM_SMALL_HEAP_V4_CLASSES=0xC0 \
|
||||
HAKMEM_SMALL_HEAP_V3_ENABLED=0 \
|
||||
perf record -F 5000 --call-graph dwarf -e cycles:u \
|
||||
-o perf.data.pf_v4 ./bench_random_mixed_hakmem 1000000 400 1
|
||||
```
|
||||
- Throughput: **15,173,790 ops/s** (DEBUG, ws=400, iters=1M, v4=C7+C6)
|
||||
- self% 上位 (perf report --stdio):
|
||||
- free 14.37%(small_heap_free_fast_v4 内 3.39%)
|
||||
- tiny_alloc_gate_fast 13.33%
|
||||
- main 12.93%
|
||||
- malloc 7.09%
|
||||
- ss_map_lookup 4.97% / hak_super_registry_init + memset 合算 ~4.5%
|
||||
- small_heap_alloc_fast_v4 2.23%
|
||||
- hak_tiny_size_to_class 2.21% / tiny_route_get 2.34% / front_gate_unified_enabled 2.36% / tiny_route_is_heap_kind 2.09%
|
||||
- xorshift32 2.08%
|
||||
- メモ:
|
||||
- v4 強制下でも gate/classify/ss_map_lookup が依然目立つ。Segment/OS 側が整えば pf と合わせて自明に下がるかを PF3 で確認。
|
||||
Reference in New Issue
Block a user