Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
# Phase 3: Cache Locality - D1/D2 Validation Complete
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
## 現在地(Status)
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
### BASELINE_PHASE3 (10-run, Mixed, ws=400, 20M iters)
|
|
|
|
|
|
- Mean: 46.04M ops/s, Median: 46.04M ops/s, StdDev: 0.14M ops/s
|
|
|
|
|
|
- Baseline established: 2025-12-13
|
|
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
### C3: Static Routing ✅ ADOPT
|
|
|
|
|
|
- `HAKMEM_TINY_STATIC_ROUTE=1` を `MIXED_TINYV3_C7_SAFE` のデフォルトへ昇格(policy_snapshot bypass)
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
- Gain: +2.20% proven
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
- 設計メモ: `docs/analysis/PHASE3_C3_STATIC_ROUTING_1_DESIGN.md`
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
### C1: TLS Prefetch 🔬 NEUTRAL / FREEZE
|
|
|
|
|
|
- `HAKMEM_TINY_PREFETCH=1` は Mixed で平均が伸びず(±1%域)→ default OFF 維持
|
|
|
|
|
|
- 設計メモ: `docs/analysis/PHASE3_C1_TLS_PREFETCH_1_DESIGN.md`
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
### C2: Metadata Cache 🔬 NEUTRAL / FREEZE
|
|
|
|
|
|
- `HAKMEM_TINY_METADATA_CACHE=1` は Mixed で平均が伸びず(±1%域)→ default OFF 維持
|
|
|
|
|
|
- 設計メモ: `docs/analysis/PHASE3_C2_METADATA_CACHE_1_DESIGN.md`
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
### C4: MIXED MID_V3 Routing Fix ✅ ADOPT(大きい勝ち)
|
|
|
|
|
|
- `MIXED_TINYV3_C7_SAFE` のデフォルトを **MID_V3 OFF** に変更(C6 を LEGACY 側へ戻す)
|
|
|
|
|
|
- A/B(Mixed, ws=400, 20M iters, 10-run)で **+13%** を確認
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
### D1: Free Path Route Cache ✅ ADOPT (20-run validated, PROMOTED TO DEFAULT)
|
|
|
|
|
|
- `HAKMEM_FREE_STATIC_ROUTE=1` を `MIXED_TINYV3_C7_SAFE` のデフォルトへ昇格
|
|
|
|
|
|
- 20-run validation results:
|
|
|
|
|
|
- Baseline (ROUTE=0): Mean 46.30M ops/s, Median 46.30M ops/s, StdDev 0.10M
|
|
|
|
|
|
- Optimized (ROUTE=1): Mean 47.32M ops/s, Median 47.39M ops/s, StdDev 0.11M
|
|
|
|
|
|
- Gain: Mean +2.19%, Median +2.37% (both criteria met)
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
- 設計メモ: `docs/analysis/PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md`
|
2025-12-13 18:46:11 +09:00
|
|
|
|
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
### D2: Wrapper Env Cache ❌ NO-GO / FROZEN
|
|
|
|
|
|
- `HAKMEM_WRAP_ENV_CACHE=1` showed -1.44% regression
|
|
|
|
|
|
- Root cause: TLS overhead > benefit in Mixed workload
|
|
|
|
|
|
- Status: Research box frozen (default OFF, do not pursue)
|
|
|
|
|
|
- 設計メモ: `docs/analysis/PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md`
|
|
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
## 次の指示(ガツン)
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
### Step 0: Baseline 固定(Mixed)
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
1. まず “現行デフォルト” で 10-run を取る(比較の基準線)
|
|
|
|
|
|
```bash
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 20000000 400 1
|
2025-12-13 17:32:34 +09:00
|
|
|
|
```
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
2. A/B の比較は **同じ iter / ws / threads** を厳守。
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
### Step 1: MID_V3 の扱いを “本線 SSOT” にする
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
目的: Mixed 本線は **MID_V3 を常時 OFF**、C6-heavy のみ ON を維持する。
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
- `core/bench_profile.h`(プリセット)
|
|
|
|
|
|
- Mixed: `HAKMEM_MID_V3_ENABLED=0`, `HAKMEM_MID_V3_CLASSES=0x0`
|
|
|
|
|
|
- C6-heavy: 既存通り `HAKMEM_MID_V3_ENABLED=1`, `HAKMEM_MID_V3_CLASSES=0x40`
|
|
|
|
|
|
- `docs/analysis/ENV_PROFILE_PRESETS.md`(人間向け SSOT)
|
|
|
|
|
|
- Mixed 本線: MID_V3 OFF を明記
|
|
|
|
|
|
- C6-heavy: MID_V3 ON 推奨を明記
|
2025-12-13 19:01:57 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
GO/NO-GO:
|
|
|
|
|
|
- Mixed (10-run mean): **+1.0% 以上で GO**(既に +13% を観測)
|
|
|
|
|
|
- C6-heavy: 参考(Mixed を最優先)
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
### Step 2: 次のボトルネックを “数字で” 決める
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
MID_V3 を切った後に、改めて perf を取り直して “次の芯” を決める。
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
```bash
|
|
|
|
|
|
perf record -F 99 --call-graph dwarf -- \
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 20000000 400 1
|
|
|
|
|
|
perf report --stdio
|
|
|
|
|
|
```
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
判定ルール:
|
|
|
|
|
|
- self% が **5% 未満の箱は NO-GO(後回し)**
|
|
|
|
|
|
- 5% 以上の関数/箱だけを次のフェーズ候補にする
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
2025-12-13 23:47:19 +09:00
|
|
|
|
### Step 3: Phase 3 D2 は NO-GO(凍結)
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
|
|
|
|
|
|
`HAKMEM_WRAP_ENV_CACHE=1` は **-1.44% 回帰**のため、研究箱として freeze(default OFF)。
|
|
|
|
|
|
次は D3(alloc 側)に進むか、Phase 3 を総括して次フェーズへ移る。
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
2025-12-13 23:47:19 +09:00
|
|
|
|
### Step 4: Phase 3 D3(Alloc Gate Specialization)は “perf で 5%超えたら” 着手
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
狙い: Mixed 本線の固定構成に合わせ、alloc gate の分岐を削って 1–2% 詰める。
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
2025-12-13 23:47:19 +09:00
|
|
|
|
- 実装指示書: `docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md`
|
2025-12-14 00:05:11 +09:00
|
|
|
|
- 設計メモ(最新版): `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
|
|
|
|
|
- ENV: `HAKMEM_ALLOC_GATE_SHAPE=0/1`(default 0)
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
- 注意: “safe enable 判定” を必ず入れて、ENV 組み合わせで壊れないようにする
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
## 次候補(perf で 5% 超なら着手)
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
1. `tiny_alloc_gate_fast` / `malloc_tiny_fast_for_class`(alloc 側の形最適化)
|
|
|
|
|
|
2. `free_tiny_fast_*`(free 側第2ホットの追加短絡)
|
|
|
|
|
|
3. wrapper(`malloc`/`free`)の hot 入口の更なる短縮(B4 の次)
|