# Phase 18: Hot Text Isolation v1 — Next Instructions ## Status - Phase 17 confirms **Case B**: allocator logic delta is negligible; gap is **layout/I-cache**. - Next: reduce instruction footprint + improve I-cache locality via **Hot Text Isolation**. Refs: - Phase 17 results: `docs/analysis/PHASE17_FORCE_LIBC_GAP_VALIDATION_1_AB_TEST_RESULTS.md` - Phase 18 design: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_DESIGN.md` --- ## 0. Goal / Success Criteria Primary (v1 は “低リスク・効果小さめ” 想定): - Mixed (16–1024B) throughput **+2%** 以上で GO(layout work の現実ライン) Secondary (must move in the right direction): - I-cache misses reduced(目安: **-10%** 以上) - Total instructions reduced(目安: **-5%** 以上) If throughput is NEUTRAL but counters improve significantly, keep as research box and iterate once. --- ## 1. Patch Plan (small, reversible) ### Patch 1: Hot/Cold attribute SSOT (L0 Box) Add: - `core/box/hot_text_attrs_box.h` Defines: - `HAK_HOT_FN`, `HAK_COLD_FN` (no-op when `HAKMEM_HOT_TEXT_ISOLATION=0`) Usage: - annotate only a short, high-impact list first: - wrappers: `malloc/free/calloc/realloc` - FastLane entry helpers (if non-inline) - cold helpers: `malloc_cold/free_cold`, wrapper diagnostics Rollback: build knob off. ### Patch 2: Wrapper TU split (L1 Box boundary) Move wrapper definitions out of `core/hakmem.c`: - new: `core/hak_wrappers_box.c` - `#include "box/hak_wrappers.inc.h"` - remove wrapper include from `core/hakmem.c` Rationale: - Prevents wrapper text from being interleaved with unrelated code in one TU. - Sets up link-order clustering. Rollback: restore include in `core/hakmem.c` and drop new TU. ### Patch 3 (optional): bench-only section GC Makefile knob: - `HOT_TEXT_GC_SECTIONS=0/1`(research-only) When `=1`, add for bench builds: - `-ffunction-sections -fdata-sections` - `-Wl,--gc-sections` Notes: - Keep it bench-only first (do not touch shared lib build until proven stable). - **Phase 18 v1 outcome**: This exact flag set caused an I-cache regression in this repo/toolchain. - Ref: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md` - Therefore, **Patch 3 is NO-GO for now** unless combined with explicit hot symbol ordering. --- ## 2. A/B Procedure (required) ### 2.1 Baseline build (OFF) ```sh make clean make -j bench_random_mixed_hakmem bench_random_mixed_system ls -lh bench_random_mixed_hakmem bench_random_mixed_system scripts/run_mixed_10_cleanenv.sh ``` Perf stat (1 run, 200M iters): ```sh perf stat -e cycles,instructions,branches,branch-misses,cache-misses,dTLB-load-misses,minor-faults -- \ env -i PATH="$PATH" HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \ ./bench_random_mixed_hakmem 200000000 400 1 ``` ### 2.2 Optimized build (ON) ```sh make clean make -j HOT_TEXT_ISOLATION=1 bench_random_mixed_hakmem bench_random_mixed_system ls -lh bench_random_mixed_hakmem bench_random_mixed_system scripts/run_mixed_10_cleanenv.sh ``` Perf stat (same command): ```sh perf stat -e cycles,instructions,branches,branch-misses,cache-misses,dTLB-load-misses,minor-faults -- \ env -i PATH="$PATH" HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \ ./bench_random_mixed_hakmem 200000000 400 1 ``` ### 2.3 System ceiling check (optional) ```sh ./bench_random_mixed_system 200000000 400 1 2>&1 | rg "Throughput" || true ``` --- ## 3. GO/NO-GO Decision - **GO**: Mixed 10-run mean **+2%** 以上 and no health regressions - **NEUTRAL**: within ±2% → keep as research box, iterate once (more cold isolation or better clustering) - **NO-GO**: **-2%** or worse → rollback and freeze Health profiles: ```sh scripts/verify_health_profiles.sh ``` --- ## 4. Reporting (required artifacts) Create: - `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md` - throughput A/B (10-run) - binary sizes - perf stat table (cycles/instructions/I-cache) - conclusion (GO/NEUTRAL/NO-GO) Update: - `CURRENT_TASK.md` (Phase 18 status + next) --- ## 5. Notes / guardrails - This phase intentionally compares **different binaries** (layout is the target), but keep the environment clean (`env -i`, fixed profile, same machine). - Avoid “delete code” experiments; only isolate/cold/cluster. - Keep “cold” truly cold: no allocations, no logging, no TLS-heavy helpers. --- ## 6. If v1 is NEUTRAL: Phase 18 v2(BENCH_MINIMAL)へ即進む Phase 17 の “instructions 2x” を直接削るには、layout だけでなく **hot path に混ざっている ENV/stats/debug の固定費を compile-out** する必要がある可能性が高い。 次の一手(bench 専用 binary / rollback 可能): - `HAKMEM_BENCH_MINIMAL=1`(Makefile knob)で: - FastLane / wrapper の “常用ON 経路” を固定し、ENV gate を compile-time 定数化 - hot counters を完全 compile-out - 観測は `perf stat` のみ(常時ログ禁止) 期待: +10–20%(もし本当に instruction footprint が支配ならここで大きく動く)