Files
hakmem/docs/analysis/PF_STATUS_V4_202502.md

53 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PF/OS ベースライン (PF2, small-object v4 状態)
- コマンド (Release, v4: C7+C6 を v4 に強制、v3 OFF):
```
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
HAKMEM_BENCH_MIN_SIZE=16 \
HAKMEM_BENCH_MAX_SIZE=1024 \
HAKMEM_SS_OS_STATS=1 \
HAKMEM_SMALL_HEAP_V4_ENABLED=1 \
HAKMEM_SMALL_HEAP_V4_CLASSES=0xC0 \
HAKMEM_SMALL_HEAP_V3_ENABLED=0 \
perf stat -e cycles,instructions,task-clock,page-faults \
./bench_random_mixed_hakmem 1000000 400 1
```
- 結果 (環境: リリースビルド, ws=400, iters=1M):
- Throughput: **31,779,973 ops/s** (time=0.031s)
- perf stat: cycles=205,322,023 / instructions=385,092,104 / task-clock=51.40ms / page-faults=6,702
- `[SS_OS_STATS]` : alloc=2 free=4 madvise=2 madvise_enomem=0 madvise_disabled=0 mmap_total=2
- 所感:
- v4 (C7+C6) 強制時の pf/OS 基準値。v3 基準 (~40M) より遅めだが、pf 数値と OS stats を PF2 の起点として固定。
- 今後 SmallSegmentBox_v4 を繋ぐ A/B では、page-faults/SS_OS_STATS をこの値からどこまで下げられるかを指標にする。
## DEBUG perf (cycles:u, -O0/-g, v4=C7+C6)
- ビルド:
```
make clean
CFLAGS='-O0 -g' USE_LTO=0 OPT_LEVEL=0 NATIVE=0 make bench_random_mixed_hakmem -j4
```
- コマンド:
```
HAKMEM_PROFILE=DEBUG_TINY_FRONT_PERF \
HAKMEM_BENCH_MIN_SIZE=16 \
HAKMEM_BENCH_MAX_SIZE=1024 \
HAKMEM_SMALL_HEAP_V4_ENABLED=1 \
HAKMEM_SMALL_HEAP_V4_CLASSES=0xC0 \
HAKMEM_SMALL_HEAP_V3_ENABLED=0 \
perf record -F 5000 --call-graph dwarf -e cycles:u \
-o perf.data.pf_v4 ./bench_random_mixed_hakmem 1000000 400 1
```
- Throughput: **15,173,790 ops/s** (DEBUG, ws=400, iters=1M, v4=C7+C6)
- self% 上位 (perf report --stdio):
- free 14.37%small_heap_free_fast_v4 内 3.39%
- tiny_alloc_gate_fast 13.33%
- main 12.93%
- malloc 7.09%
- ss_map_lookup 4.97% / hak_super_registry_init + memset 合算 ~4.5%
- small_heap_alloc_fast_v4 2.23%
- hak_tiny_size_to_class 2.21% / tiny_route_get 2.34% / front_gate_unified_enabled 2.36% / tiny_route_is_heap_kind 2.09%
- xorshift32 2.08%
- メモ:
- v4 強制下でも gate/classify/ss_map_lookup が依然目立つ。Segment/OS 側が整えば pf と合わせて自明に下がるかを PF3 で確認。