Optimize C6 heavy and C7 ultra performance analysis with refined design refinements

- Update environment profile presets and visibility analysis - Enhance small object and tiny segment v4 box implementations - Refine C7 ultra and C6 heavy allocation strategies - Add comprehensive performance metrics and design documentation 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 22:57:26 +09:00
parent 9460785bd6
commit 2a13478dc7
25 changed files with 718 additions and 41 deletions
--- a/docs/analysis/PF_STATUS_V4_202502.md
+++ b/docs/analysis/PF_STATUS_V4_202502.md
@ -1,3 +1,23 @@
+# PF/OS ベースライン
+
+# BASELINE-LOCK (Mixed 16–1024B v3 vs v4, Release)
+- コマンド共通 (ws=400, iters=1M):
+  ```
+  HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE
+  HAKMEM_BENCH_MIN_SIZE=16
+  HAKMEM_BENCH_MAX_SIZE=1024
+  ```
+- v3 本命構成（C7-only v3, v4/segment すべて OFF, fast classify v3 ON）:
+  - `HAKMEM_SMALL_HEAP_V3_ENABLED=1 HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 HAKMEM_SMALL_HEAP_V4_ENABLED=0 HAKMEM_SMALL_HEAP_V4_CLASSES=0 HAKMEM_TINY_PTR_FAST_CLASSIFY_V4_ENABLED=0 HAKMEM_SMALL_SEGMENT_V4_ENABLED=0`
+  - Throughput: **33.7–33.9M ops/s**（2 run, segv/assert なし）
+- v4 強制（C7+C6 v4 + fast classify v4, v3 OFF, segment OFF）:
+  - `HAKMEM_SMALL_HEAP_V3_ENABLED=0 HAKMEM_SMALL_HEAP_V3_CLASSES=0 HAKMEM_SMALL_HEAP_V4_ENABLED=1 HAKMEM_SMALL_HEAP_V4_CLASSES=0xC0 HAKMEM_TINY_PTR_FAST_CLASSIFY_V4_ENABLED=1`
+  - Throughput: **32.0–32.5M ops/s**
+- C7-only v4（C6 v1, v3 OFF, fast classify v4 ON）:
+  - `HAKMEM_SMALL_HEAP_V4_CLASSES=0x80 HAKMEM_SMALL_HEAP_V3_ENABLED=0`
+  - Throughput: **≈33.0M ops/s**
+- 判断: 現行 Mixed の本命は v3 構成（上記）。v4 系は研究箱として opt-in 扱いを維持。
+
 # PF/OS ベースライン (PF2, small-object v4 状態)

 - コマンド (Release, v4: C7+C6 を v4 に強制、v3 OFF):
@ -20,6 +40,29 @@
  - v4 (C7+C6) 強制時の pf/OS 基準値。v3 基準 (~40M) より遅めだが、pf 数値と OS stats を PF2 の起点として固定。
  - 今後 SmallSegmentBox_v4 を繋ぐ A/B では、page-faults/SS_OS_STATS をこの値からどこまで下げられるかを指標にする。

+## PF3: smallsegment_v4 ゲート A/B（C7+C6 v4 強制）
+
+- コマンド (Release, v4: C7+C6, v3 OFF):
+  ```
+  HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
+  HAKMEM_BENCH_MIN_SIZE=16 \
+  HAKMEM_BENCH_MAX_SIZE=1024 \
+  HAKMEM_SMALL_HEAP_V4_ENABLED=1 \
+  HAKMEM_SMALL_HEAP_V4_CLASSES=0xC0 \
+  HAKMEM_SMALL_HEAP_V3_ENABLED=0 \
+  HAKMEM_SMALL_HEAP_V3_CLASSES=0 \
+    perf stat -e cycles,instructions,task-clock,page-faults \
+      HAKMEM_SMALL_SEGMENT_V4_ENABLED=0 ./bench_random_mixed_hakmem 1000000 400 1
+    perf stat -e cycles,instructions,task-clock,page-faults \
+      HAKMEM_SMALL_SEGMENT_V4_ENABLED=1 ./bench_random_mixed_hakmem 1000000 400 1
+  ```
+- 結果 (ws=400, iters=1M):
+  - OFF: Throughput **28,890,266 ops/s**, page-faults=6,744, task-clock=54.84ms
+  - ON : Throughput **28,849,781 ops/s**, page-faults=6,746, task-clock=61.49ms
+- 所感:
+  - smallsegment_v4 ゲートを通しても pf/ops はほぼ変化なし（現状は Tiny v1 lease 経由の薄い実装）。
+  - 「Segment 経由の入り口」はできたので、PF4 以降で専用 mmap/segment 分割を実装して再 A/B する。
+
 ## DEBUG perf (cycles:u, -O0/-g, v4=C7+C6)

 - ビルド: