Phase 18 v1: Hot Text Isolation — NO-GO (I-cache regression)
## Summary Phase 18 v1 attempted layout optimization using section splitting + GC: - `-ffunction-sections -fdata-sections -Wl,--gc-sections` Result: **Catastrophic I-cache regression** - Throughput: -0.87% (48.94M → 48.52M ops/s) - I-cache misses: +91.06% (131K → 250K) - Variance: +80% (σ=0.45M → σ=0.81M) Root cause: Section-based splitting without explicit hot symbol ordering fragments code locality, destroying natural compiler/LTO layout. ## Build Knob Safety Makefile updated to separate concerns: - `HOT_TEXT_ISOLATION=1` → attributes only (safe, but no perf gain) - `HOT_TEXT_GC_SECTIONS=1` → section splitting (currently NO-GO) Both kept as research boxes (default OFF). ## Verdict Freeze Phase 18 v1: - Do NOT use section-based linking without strong ordering strategy - Keep hot/cold attributes as placeholder (currently unused) - Proceed to Phase 18 v2: BENCH_MINIMAL compile-out Expected impact v2: +10-20% via instruction count reduction - GO threshold: +5% minimum, +8% preferred - Only continue if instructions clearly drop ## Files New: - docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md Modified: - Makefile (build knob safety isolation) - CURRENT_TASK.md (Phase 18 v1 verdict) - docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md ## Lessons 1. Layout optimization is extremely fragile without ordering guarantees 2. I-cache is first-order performance factor (IPC=2.30 is memory-bound) 3. Compiler defaults may be better than manual section splitting 4. Next frontier: instruction count reduction (stats/ENV removal) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -0,0 +1,55 @@
|
||||
# Phase 18 v1: Hot Text Isolation / Layout Control — A/B Test Results
|
||||
|
||||
**Date**: 2025-12-15
|
||||
**Verdict**: ❌ **NO-GO**(I-cache regression)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Phase 18 v1 attempted a low-risk layout improvement using section splitting + GC:
|
||||
- `-ffunction-sections -fdata-sections -Wl,--gc-sections`
|
||||
|
||||
Result was worse than baseline:
|
||||
- throughput slightly regressed
|
||||
- **I-cache misses nearly doubled**
|
||||
- variance increased significantly
|
||||
|
||||
Conclusion: section-based linking **without strong ordering/clustering** is harmful for this codebase/toolchain at present.
|
||||
|
||||
---
|
||||
|
||||
## A/B Results (Mixed)
|
||||
|
||||
| Metric | Baseline | Optimized | Delta |
|
||||
|--------|----------|-----------|-------|
|
||||
| Throughput | 48.94M ops/s | 48.52M ops/s | **-0.87%** |
|
||||
| I-cache misses | 131K | 250K | **+91.06%** |
|
||||
| Instructions | 41.29B | 41.32B | +0.09% |
|
||||
| Variance (σ) | 0.45M | 0.81M | **+80%** |
|
||||
|
||||
---
|
||||
|
||||
## Root Cause (most likely)
|
||||
|
||||
The section-splitting flags increase fragmentation and can destroy the natural locality produced by:
|
||||
- compiler + LTO layout decisions
|
||||
- existing hot/cold separation by function boundaries
|
||||
|
||||
Without an explicit “hot symbol ordering” mechanism (link-order file / linker script / PGO call-graph sort), the remaining `.text.*` fragments may become poorly clustered, increasing front-end stalls.
|
||||
|
||||
This matches observations:
|
||||
- instructions did not decrease
|
||||
- I-cache misses increased dramatically
|
||||
- throughput regressed modestly (front-end disruption + variability)
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
Freeze Phase 18 v1:
|
||||
- Treat `--gc-sections` path as **NO-GO** in this environment unless combined with explicit link ordering.
|
||||
|
||||
Next:
|
||||
- Phase 18 v2: **BENCH_MINIMAL compile-out** (reduce instruction footprint directly).
|
||||
|
||||
@ -58,16 +58,17 @@ Rollback: restore include in `core/hakmem.c` and drop new TU.
|
||||
### Patch 3 (optional): bench-only section GC
|
||||
|
||||
Makefile knob:
|
||||
- `HOT_TEXT_ISOLATION=0/1`
|
||||
- `HOT_TEXT_GC_SECTIONS=0/1`(research-only)
|
||||
|
||||
When `=1`, add for bench builds:
|
||||
- `-DHAKMEM_HOT_TEXT_ISOLATION=1`
|
||||
- `-ffunction-sections -fdata-sections`
|
||||
- `LDFLAGS += -Wl,--gc-sections`
|
||||
- `-Wl,--gc-sections`
|
||||
|
||||
Notes:
|
||||
- Keep it bench-only first (do not touch shared lib build until proven stable).
|
||||
- If toolchain rejects `--gc-sections` or results are unstable → skip this patch.
|
||||
- **Phase 18 v1 outcome**: This exact flag set caused an I-cache regression in this repo/toolchain.
|
||||
- Ref: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md`
|
||||
- Therefore, **Patch 3 is NO-GO for now** unless combined with explicit hot symbol ordering.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user