From b1912d6587e336a80fbef4a7c7a835519d32143c Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Mon, 15 Dec 2025 05:53:58 +0900 Subject: [PATCH] =?UTF-8?q?Phase=2018=20v1:=20Hot=20Text=20Isolation=20?= =?UTF-8?q?=E2=80=94=20NO-GO=20(I-cache=20regression)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary Phase 18 v1 attempted layout optimization using section splitting + GC: - `-ffunction-sections -fdata-sections -Wl,--gc-sections` Result: **Catastrophic I-cache regression** - Throughput: -0.87% (48.94M → 48.52M ops/s) - I-cache misses: +91.06% (131K → 250K) - Variance: +80% (σ=0.45M → σ=0.81M) Root cause: Section-based splitting without explicit hot symbol ordering fragments code locality, destroying natural compiler/LTO layout. ## Build Knob Safety Makefile updated to separate concerns: - `HOT_TEXT_ISOLATION=1` → attributes only (safe, but no perf gain) - `HOT_TEXT_GC_SECTIONS=1` → section splitting (currently NO-GO) Both kept as research boxes (default OFF). ## Verdict Freeze Phase 18 v1: - Do NOT use section-based linking without strong ordering strategy - Keep hot/cold attributes as placeholder (currently unused) - Proceed to Phase 18 v2: BENCH_MINIMAL compile-out Expected impact v2: +10-20% via instruction count reduction - GO threshold: +5% minimum, +8% preferred - Only continue if instructions clearly drop ## Files New: - docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md Modified: - Makefile (build knob safety isolation) - CURRENT_TASK.md (Phase 18 v1 verdict) - docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md ## Lessons 1. Layout optimization is extremely fragile without ordering guarantees 2. I-cache is first-order performance factor (IPC=2.30 is memory-bound) 3. Compiler defaults may be better than manual section splitting 4. Next frontier: instruction count reduction (stats/ENV removal) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 --- CURRENT_TASK.md | 22 ++++++++ Makefile | 26 +++++++++ ...18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md | 55 +++++++++++++++++++ ..._HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md | 9 +-- 4 files changed, 108 insertions(+), 4 deletions(-) create mode 100644 docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index fa9f10ac..7cae4f45 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -1,5 +1,26 @@ # 本線タスク(現在) +## 更新メモ(2025-12-15 Phase 18 HOT-TEXT-ISOLATION-1) + +### Phase 18 HOT-TEXT-ISOLATION-1: Hot Text Isolation v1 — ❌ NO-GO / FROZEN + +結果: Mixed 10-run mean **-0.87%** 回帰、I-cache misses **+91.06%** 劣化。`-ffunction-sections -Wl,--gc-sections` による細粒度セクション化が I-cache locality を破壊。hot/cold 属性は実装済みだが未適用のため、デメリットのみが発生。 + +- A/B 結果: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md` +- 指示書: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md` +- 設計: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_DESIGN.md` +- 対処: `HOT_TEXT_ISOLATION=0` (default) で rollback + +主要原因: +- Section-based linking が自然な compiler locality を破壊 +- `--gc-sections` のリンク順序変更で I-cache が断片化 +- Hot/cold 属性が実際には適用されていない(実装の不完全性) + +重要な知見: +- Phase 17 の結論を再確認: bottleneck は **instruction count** と **memory latency** +- Code layout 最適化では 2.30 IPC の壁を越えられない +- 次の一手: instruction count を直接削る Phase 18 v2 (BENCH_MINIMAL) へ + ## 更新メモ(2025-12-14 Phase 6 FRONT-FASTLANE-1) ### Phase 6 FRONT-FASTLANE-1: Front FastLane(Layer Collapse)— ✅ GO / 本線昇格 @@ -475,6 +496,7 @@ Phase 6-10 で達成した累積改善: **設計**: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_DESIGN.md` **指示書**: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md` +**結果(v1)**: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md`(❌ NO-GO / I-cache miss 悪化) 実装ゲート(戻せる): - Makefile knob: `HOT_TEXT_ISOLATION=0/1` diff --git a/Makefile b/Makefile index fc68d352..91001d4a 100644 --- a/Makefile +++ b/Makefile @@ -114,6 +114,32 @@ else ifeq ($(BUILD_FLAVOR),debug) CFLAGS_SHARED += -DHAKMEM_BUILD_DEBUG=1 endif +# ------------------------------------------------------------ +# Phase 18: Hot Text Isolation (I-cache locality optimization) +# ------------------------------------------------------------ +# Enable (safe): make HOT_TEXT_ISOLATION=1 bench_random_mixed_hakmem +# Default: OFF (research box, requires A/B validation) +# What it does: +# - Adds -DHAKMEM_HOT_TEXT_ISOLATION=1 (hot/cold attribute macros only) +# +# NOTE (Phase 18 v1 NO-GO): +# - The section-splitting + --gc-sections experiment caused a large I-cache regression. +# - Keep it behind a separate opt-in knob (HOT_TEXT_GC_SECTIONS=1) if needed for research. +HOT_TEXT_ISOLATION ?= 0 +ifeq ($(HOT_TEXT_ISOLATION),1) + CFLAGS += -DHAKMEM_HOT_TEXT_ISOLATION=1 + CFLAGS_SHARED += -DHAKMEM_HOT_TEXT_ISOLATION=1 +endif + +# Research-only (currently NO-GO): function/data sections + --gc-sections. +# Enable explicitly only when combined with an ordering strategy. +HOT_TEXT_GC_SECTIONS ?= 0 +ifeq ($(HOT_TEXT_GC_SECTIONS),1) + CFLAGS += -ffunction-sections -fdata-sections + CFLAGS_SHARED += -ffunction-sections -fdata-sections + LDFLAGS += -Wl,--gc-sections +endif + # Default: enable Box Theory refactor for Tiny (Phase 6-1.7) # This is the best performing option currently (4.19M ops/s) # NOTE: Disabled while testing ULTRA_SIMPLE with SFC integration diff --git a/docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md b/docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md new file mode 100644 index 00000000..5bebd455 --- /dev/null +++ b/docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md @@ -0,0 +1,55 @@ +# Phase 18 v1: Hot Text Isolation / Layout Control — A/B Test Results + +**Date**: 2025-12-15 +**Verdict**: ❌ **NO-GO**(I-cache regression) + +--- + +## Executive Summary + +Phase 18 v1 attempted a low-risk layout improvement using section splitting + GC: +- `-ffunction-sections -fdata-sections -Wl,--gc-sections` + +Result was worse than baseline: +- throughput slightly regressed +- **I-cache misses nearly doubled** +- variance increased significantly + +Conclusion: section-based linking **without strong ordering/clustering** is harmful for this codebase/toolchain at present. + +--- + +## A/B Results (Mixed) + +| Metric | Baseline | Optimized | Delta | +|--------|----------|-----------|-------| +| Throughput | 48.94M ops/s | 48.52M ops/s | **-0.87%** | +| I-cache misses | 131K | 250K | **+91.06%** | +| Instructions | 41.29B | 41.32B | +0.09% | +| Variance (σ) | 0.45M | 0.81M | **+80%** | + +--- + +## Root Cause (most likely) + +The section-splitting flags increase fragmentation and can destroy the natural locality produced by: +- compiler + LTO layout decisions +- existing hot/cold separation by function boundaries + +Without an explicit “hot symbol ordering” mechanism (link-order file / linker script / PGO call-graph sort), the remaining `.text.*` fragments may become poorly clustered, increasing front-end stalls. + +This matches observations: +- instructions did not decrease +- I-cache misses increased dramatically +- throughput regressed modestly (front-end disruption + variability) + +--- + +## Decision + +Freeze Phase 18 v1: +- Treat `--gc-sections` path as **NO-GO** in this environment unless combined with explicit link ordering. + +Next: +- Phase 18 v2: **BENCH_MINIMAL compile-out** (reduce instruction footprint directly). + diff --git a/docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md index c2ac5be3..c1a9c82b 100644 --- a/docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md +++ b/docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_NEXT_INSTRUCTIONS.md @@ -58,16 +58,17 @@ Rollback: restore include in `core/hakmem.c` and drop new TU. ### Patch 3 (optional): bench-only section GC Makefile knob: -- `HOT_TEXT_ISOLATION=0/1` +- `HOT_TEXT_GC_SECTIONS=0/1`(research-only) When `=1`, add for bench builds: -- `-DHAKMEM_HOT_TEXT_ISOLATION=1` - `-ffunction-sections -fdata-sections` -- `LDFLAGS += -Wl,--gc-sections` +- `-Wl,--gc-sections` Notes: - Keep it bench-only first (do not touch shared lib build until proven stable). -- If toolchain rejects `--gc-sections` or results are unstable → skip this patch. +- **Phase 18 v1 outcome**: This exact flag set caused an I-cache regression in this repo/toolchain. + - Ref: `docs/analysis/PHASE18_HOT_TEXT_ISOLATION_1_AB_TEST_RESULTS.md` + - Therefore, **Patch 3 is NO-GO for now** unless combined with explicit hot symbol ordering. ---