Phase 18 v1: Hot Text Isolation — NO-GO (I-cache regression)

## Summary

Phase 18 v1 attempted layout optimization using section splitting + GC:
- `-ffunction-sections -fdata-sections -Wl,--gc-sections`

Result: **Catastrophic I-cache regression**
- Throughput: -0.87% (48.94M → 48.52M ops/s)
- I-cache misses: +91.06% (131K → 250K)
- Variance: +80% (σ=0.45M → σ=0.81M)

Root cause: Section-based splitting without explicit hot symbol ordering
fragments code locality, destroying natural compiler/LTO layout.

## Build Knob Safety

Makefile updated to separate concerns:
- `HOT_TEXT_ISOLATION=1` → attributes only (safe, but no perf gain)
- `HOT_TEXT_GC_SECTIONS=1` → section splitting (currently NO-GO)

Both kept as research boxes (default OFF).

## Verdict

Freeze Phase 18 v1:
- Do NOT use section-based linking without strong ordering strategy
- Keep hot/cold attributes as placeholder (currently unused)
- Proceed to Phase 18 v2: BENCH_MINIMAL compile-out

Expected impact v2: +10-20% via instruction count reduction
- GO threshold: +5% minimum, +8% preferred
- Only continue if instructions clearly drop

## Files

New:
- docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md

Modified:
- Makefile (build knob safety isolation)
- CURRENT_TASK.md (Phase 18 v1 verdict)
- docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md

## Lessons

1. Layout optimization is extremely fragile without ordering guarantees
2. I-cache is first-order performance factor (IPC=2.30 is memory-bound)
3. Compiler defaults may be better than manual section splitting
4. Next frontier: instruction count reduction (stats/ENV removal)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-15 05:53:58 +09:00
parent f8e7cf05b4
commit b1912d6587
4 changed files with 108 additions and 4 deletions

View File

@ -1,5 +1,26 @@
# 本線タスク(現在) # 本線タスク(現在)
## 更新メモ2025-12-15 Phase 18 HOT-TEXT-ISOLATION-1
### Phase 18 HOT-TEXT-ISOLATION-1: Hot Text Isolation v1 — ❌ NO-GO / FROZEN
結果: Mixed 10-run mean **-0.87%** 回帰、I-cache misses **+91.06%** 劣化。`-ffunction-sections -Wl,--gc-sections` による細粒度セクション化が I-cache locality を破壊。hot/cold 属性は実装済みだが未適用のため、デメリットのみが発生。
- A/B 結果: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md`
- 指示書: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md`
- 設計: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_DESIGN.md`
- 対処: `HOT_TEXT_ISOLATION=0` (default) で rollback
主要原因:
- Section-based linking が自然な compiler locality を破壊
- `--gc-sections` のリンク順序変更で I-cache が断片化
- Hot/cold 属性が実際には適用されていない(実装の不完全性)
重要な知見:
- Phase 17 の結論を再確認: bottleneck は **instruction count****memory latency**
- Code layout 最適化では 2.30 IPC の壁を越えられない
- 次の一手: instruction count を直接削る Phase 18 v2 (BENCH_MINIMAL) へ
## 更新メモ2025-12-14 Phase 6 FRONT-FASTLANE-1 ## 更新メモ2025-12-14 Phase 6 FRONT-FASTLANE-1
### Phase 6 FRONT-FASTLANE-1: Front FastLaneLayer Collapse— ✅ GO / 本線昇格 ### Phase 6 FRONT-FASTLANE-1: Front FastLaneLayer Collapse— ✅ GO / 本線昇格
@ -475,6 +496,7 @@ Phase 6-10 で達成した累積改善:
**設計**: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_DESIGN.md` **設計**: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_DESIGN.md`
**指示書**: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md` **指示書**: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md`
**結果v1**: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md`(❌ NO-GO / I-cache miss 悪化)
実装ゲート(戻せる): 実装ゲート(戻せる):
- Makefile knob: `HOT_TEXT_ISOLATION=0/1` - Makefile knob: `HOT_TEXT_ISOLATION=0/1`

View File

@ -114,6 +114,32 @@ else ifeq ($(BUILD_FLAVOR),debug)
CFLAGS_SHARED += -DHAKMEM_BUILD_DEBUG=1 CFLAGS_SHARED += -DHAKMEM_BUILD_DEBUG=1
endif endif
# ------------------------------------------------------------
# Phase 18: Hot Text Isolation (I-cache locality optimization)
# ------------------------------------------------------------
# Enable (safe): make HOT_TEXT_ISOLATION=1 bench_random_mixed_hakmem
# Default: OFF (research box, requires A/B validation)
# What it does:
# - Adds -DHAKMEM_HOT_TEXT_ISOLATION=1 (hot/cold attribute macros only)
#
# NOTE (Phase 18 v1 NO-GO):
# - The section-splitting + --gc-sections experiment caused a large I-cache regression.
# - Keep it behind a separate opt-in knob (HOT_TEXT_GC_SECTIONS=1) if needed for research.
HOT_TEXT_ISOLATION ?= 0
ifeq ($(HOT_TEXT_ISOLATION),1)
CFLAGS += -DHAKMEM_HOT_TEXT_ISOLATION=1
CFLAGS_SHARED += -DHAKMEM_HOT_TEXT_ISOLATION=1
endif
# Research-only (currently NO-GO): function/data sections + --gc-sections.
# Enable explicitly only when combined with an ordering strategy.
HOT_TEXT_GC_SECTIONS ?= 0
ifeq ($(HOT_TEXT_GC_SECTIONS),1)
CFLAGS += -ffunction-sections -fdata-sections
CFLAGS_SHARED += -ffunction-sections -fdata-sections
LDFLAGS += -Wl,--gc-sections
endif
# Default: enable Box Theory refactor for Tiny (Phase 6-1.7) # Default: enable Box Theory refactor for Tiny (Phase 6-1.7)
# This is the best performing option currently (4.19M ops/s) # This is the best performing option currently (4.19M ops/s)
# NOTE: Disabled while testing ULTRA_SIMPLE with SFC integration # NOTE: Disabled while testing ULTRA_SIMPLE with SFC integration

View File

@ -0,0 +1,55 @@
# Phase 18 v1: Hot Text Isolation / Layout Control — A/B Test Results
**Date**: 2025-12-15
**Verdict**: ❌ **NO-GO**I-cache regression
---
## Executive Summary
Phase 18 v1 attempted a low-risk layout improvement using section splitting + GC:
- `-ffunction-sections -fdata-sections -Wl,--gc-sections`
Result was worse than baseline:
- throughput slightly regressed
- **I-cache misses nearly doubled**
- variance increased significantly
Conclusion: section-based linking **without strong ordering/clustering** is harmful for this codebase/toolchain at present.
---
## A/B Results (Mixed)
| Metric | Baseline | Optimized | Delta |
|--------|----------|-----------|-------|
| Throughput | 48.94M ops/s | 48.52M ops/s | **-0.87%** |
| I-cache misses | 131K | 250K | **+91.06%** |
| Instructions | 41.29B | 41.32B | +0.09% |
| Variance (σ) | 0.45M | 0.81M | **+80%** |
---
## Root Cause (most likely)
The section-splitting flags increase fragmentation and can destroy the natural locality produced by:
- compiler + LTO layout decisions
- existing hot/cold separation by function boundaries
Without an explicit “hot symbol ordering” mechanism (link-order file / linker script / PGO call-graph sort), the remaining `.text.*` fragments may become poorly clustered, increasing front-end stalls.
This matches observations:
- instructions did not decrease
- I-cache misses increased dramatically
- throughput regressed modestly (front-end disruption + variability)
---
## Decision
Freeze Phase 18 v1:
- Treat `--gc-sections` path as **NO-GO** in this environment unless combined with explicit link ordering.
Next:
- Phase 18 v2: **BENCH_MINIMAL compile-out** (reduce instruction footprint directly).

View File

@ -58,16 +58,17 @@ Rollback: restore include in `core/hakmem.c` and drop new TU.
### Patch 3 (optional): bench-only section GC ### Patch 3 (optional): bench-only section GC
Makefile knob: Makefile knob:
- `HOT_TEXT_ISOLATION=0/1` - `HOT_TEXT_GC_SECTIONS=0/1`research-only
When `=1`, add for bench builds: When `=1`, add for bench builds:
- `-DHAKMEM_HOT_TEXT_ISOLATION=1`
- `-ffunction-sections -fdata-sections` - `-ffunction-sections -fdata-sections`
- `LDFLAGS += -Wl,--gc-sections` - `-Wl,--gc-sections`
Notes: Notes:
- Keep it bench-only first (do not touch shared lib build until proven stable). - Keep it bench-only first (do not touch shared lib build until proven stable).
- If toolchain rejects `--gc-sections` or results are unstable → skip this patch. - **Phase 18 v1 outcome**: This exact flag set caused an I-cache regression in this repo/toolchain.
- Ref: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md`
- Therefore, **Patch 3 is NO-GO for now** unless combined with explicit hot symbol ordering.
--- ---