166 lines
4.8 KiB
Markdown
166 lines
4.8 KiB
Markdown
|
|
# Phase 18: Hot Text Isolation v1 — Next Instructions
|
|||
|
|
|
|||
|
|
## Status
|
|||
|
|
|
|||
|
|
- Phase 17 confirms **Case B**: allocator logic delta is negligible; gap is **layout/I-cache**.
|
|||
|
|
- Next: reduce instruction footprint + improve I-cache locality via **Hot Text Isolation**.
|
|||
|
|
|
|||
|
|
Refs:
|
|||
|
|
- Phase 17 results: `docs/analysis/PHASE17_FORCE_LIBC_GAP_VALIDATION_1_AB_TEST_RESULTS.md`
|
|||
|
|
- Phase 18 design: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_DESIGN.md`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 0. Goal / Success Criteria
|
|||
|
|
|
|||
|
|
Primary (v1 は “低リスク・効果小さめ” 想定):
|
|||
|
|
- Mixed (16–1024B) throughput **+2%** 以上で GO(layout work の現実ライン)
|
|||
|
|
|
|||
|
|
Secondary (must move in the right direction):
|
|||
|
|
- I-cache misses reduced(目安: **-10%** 以上)
|
|||
|
|
- Total instructions reduced(目安: **-5%** 以上)
|
|||
|
|
|
|||
|
|
If throughput is NEUTRAL but counters improve significantly, keep as research box and iterate once.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Patch Plan (small, reversible)
|
|||
|
|
|
|||
|
|
### Patch 1: Hot/Cold attribute SSOT (L0 Box)
|
|||
|
|
|
|||
|
|
Add:
|
|||
|
|
- `core/box/hot_text_attrs_box.h`
|
|||
|
|
|
|||
|
|
Defines:
|
|||
|
|
- `HAK_HOT_FN`, `HAK_COLD_FN` (no-op when `HAKMEM_HOT_TEXT_ISOLATION=0`)
|
|||
|
|
|
|||
|
|
Usage:
|
|||
|
|
- annotate only a short, high-impact list first:
|
|||
|
|
- wrappers: `malloc/free/calloc/realloc`
|
|||
|
|
- FastLane entry helpers (if non-inline)
|
|||
|
|
- cold helpers: `malloc_cold/free_cold`, wrapper diagnostics
|
|||
|
|
|
|||
|
|
Rollback: build knob off.
|
|||
|
|
|
|||
|
|
### Patch 2: Wrapper TU split (L1 Box boundary)
|
|||
|
|
|
|||
|
|
Move wrapper definitions out of `core/hakmem.c`:
|
|||
|
|
- new: `core/hak_wrappers_box.c`
|
|||
|
|
- `#include "box/hak_wrappers.inc.h"`
|
|||
|
|
- remove wrapper include from `core/hakmem.c`
|
|||
|
|
|
|||
|
|
Rationale:
|
|||
|
|
- Prevents wrapper text from being interleaved with unrelated code in one TU.
|
|||
|
|
- Sets up link-order clustering.
|
|||
|
|
|
|||
|
|
Rollback: restore include in `core/hakmem.c` and drop new TU.
|
|||
|
|
|
|||
|
|
### Patch 3 (optional): bench-only section GC
|
|||
|
|
|
|||
|
|
Makefile knob:
|
|||
|
|
- `HOT_TEXT_ISOLATION=0/1`
|
|||
|
|
|
|||
|
|
When `=1`, add for bench builds:
|
|||
|
|
- `-DHAKMEM_HOT_TEXT_ISOLATION=1`
|
|||
|
|
- `-ffunction-sections -fdata-sections`
|
|||
|
|
- `LDFLAGS += -Wl,--gc-sections`
|
|||
|
|
|
|||
|
|
Notes:
|
|||
|
|
- Keep it bench-only first (do not touch shared lib build until proven stable).
|
|||
|
|
- If toolchain rejects `--gc-sections` or results are unstable → skip this patch.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. A/B Procedure (required)
|
|||
|
|
|
|||
|
|
### 2.1 Baseline build (OFF)
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
make clean
|
|||
|
|
make -j bench_random_mixed_hakmem bench_random_mixed_system
|
|||
|
|
ls -lh bench_random_mixed_hakmem bench_random_mixed_system
|
|||
|
|
scripts/run_mixed_10_cleanenv.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Perf stat (1 run, 200M iters):
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
perf stat -e cycles,instructions,branches,branch-misses,cache-misses,dTLB-load-misses,minor-faults -- \
|
|||
|
|
env -i PATH="$PATH" HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
|
|||
|
|
./bench_random_mixed_hakmem 200000000 400 1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.2 Optimized build (ON)
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
make clean
|
|||
|
|
make -j HOT_TEXT_ISOLATION=1 bench_random_mixed_hakmem bench_random_mixed_system
|
|||
|
|
ls -lh bench_random_mixed_hakmem bench_random_mixed_system
|
|||
|
|
scripts/run_mixed_10_cleanenv.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Perf stat (same command):
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
perf stat -e cycles,instructions,branches,branch-misses,cache-misses,dTLB-load-misses,minor-faults -- \
|
|||
|
|
env -i PATH="$PATH" HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
|
|||
|
|
./bench_random_mixed_hakmem 200000000 400 1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.3 System ceiling check (optional)
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
./bench_random_mixed_system 200000000 400 1 2>&1 | rg "Throughput" || true
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. GO/NO-GO Decision
|
|||
|
|
|
|||
|
|
- **GO**: Mixed 10-run mean **+2%** 以上 and no health regressions
|
|||
|
|
- **NEUTRAL**: within ±2% → keep as research box, iterate once (more cold isolation or better clustering)
|
|||
|
|
- **NO-GO**: **-2%** or worse → rollback and freeze
|
|||
|
|
|
|||
|
|
Health profiles:
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
scripts/verify_health_profiles.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Reporting (required artifacts)
|
|||
|
|
|
|||
|
|
Create:
|
|||
|
|
- `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md`
|
|||
|
|
- throughput A/B (10-run)
|
|||
|
|
- binary sizes
|
|||
|
|
- perf stat table (cycles/instructions/I-cache)
|
|||
|
|
- conclusion (GO/NEUTRAL/NO-GO)
|
|||
|
|
|
|||
|
|
Update:
|
|||
|
|
- `CURRENT_TASK.md` (Phase 18 status + next)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Notes / guardrails
|
|||
|
|
|
|||
|
|
- This phase intentionally compares **different binaries** (layout is the target), but keep the environment clean (`env -i`, fixed profile, same machine).
|
|||
|
|
- Avoid “delete code” experiments; only isolate/cold/cluster.
|
|||
|
|
- Keep “cold” truly cold: no allocations, no logging, no TLS-heavy helpers.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. If v1 is NEUTRAL: Phase 18 v2(BENCH_MINIMAL)へ即進む
|
|||
|
|
|
|||
|
|
Phase 17 の “instructions 2x” を直接削るには、layout だけでなく **hot path に混ざっている ENV/stats/debug の固定費を compile-out** する必要がある可能性が高い。
|
|||
|
|
|
|||
|
|
次の一手(bench 専用 binary / rollback 可能):
|
|||
|
|
|
|||
|
|
- `HAKMEM_BENCH_MINIMAL=1`(Makefile knob)で:
|
|||
|
|
- FastLane / wrapper の “常用ON 経路” を固定し、ENV gate を compile-time 定数化
|
|||
|
|
- hot counters を完全 compile-out
|
|||
|
|
- 観測は `perf stat` のみ(常時ログ禁止)
|
|||
|
|
|
|||
|
|
期待: +10–20%(もし本当に instruction footprint が支配ならここで大きく動く)
|