Files

Moe Charm (CI) e9b97e9d8e Phase 74-1/74-2: UnifiedCache LOCALIZE optimization (P1 frozen, NEUTRAL -0.87%)

Phase 74-1 (ENV-gated LOCALIZE):
- Result: +0.50% (NEUTRAL)
- Runtime branch overhead caused instructions/branches to increase
- Diagnosed: Branch tax dominates intended optimization

Phase 74-2 (compile-time LOCALIZE):
- Result: -0.87% (NEUTRAL, P1 frozen)
- Removed runtime branch → instructions -0.6%, branches -2.3% ✓
- But cache-misses +86% (register pressure/spill) → net loss
- Conclusion: LOCALIZE本体 works, but fragile to cache effects

Key finding:
- Dependency chain reduction (LOCALIZE) has low ROI due to cache-miss sensitivity
- P1 (LOCALIZE) frozen at default OFF
- Next: Phase 74-3 (P0: FASTAPI) - move branches outside hot loop

Files:
- core/hakmem_build_flags.h: HAKMEM_TINY_UC_LOCALIZE_COMPILED flag
- core/box/tiny_unified_cache_hitpath_env_box.h: ENV gate (frozen)
- core/front/tiny_unified_cache.h: compile-time #if blocks
- docs/analysis/PHASE74_*: Design, instructions, results
- CURRENT_TASK.md: P1 frozen, P0 next instructions

Also includes:
- Phase 69 refill tuning results (archived docs)
- PERFORMANCE_TARGETS_SCORECARD.md: Phase 69 baseline update
- PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md: Route banner docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-18 07:47:44 +09:00

6.6 KiB

Raw Blame History

Phase 69-1: Refill Tuning Parameter Sweeps - Results

Date: 2025-12-17 Baseline: Phase 68 PGO (bench_random_mixed_hakmem_minimal_pgo) Benchmark: scripts/run_mixed_10_cleanenv.sh (RUNS=10) Goal: Find +3-6% optimization for M2 milestone (55% of mimalloc)

Executive Summary

Winner Identified: Warm Pool Size=16 achieves +3.26% (Strong GO) with ENV-only change.

No code changes required - Deploy via HAKMEM_WARM_POOL_SIZE=16 environment variable
Exceeds M2 threshold (+3.0% Strong GO criterion)
Single strongest improvement among all tested parameters
Combined optimizations are non-additive - Warm Pool Size=16 alone outperforms combinations

⚠️ Important correction (2025-12 audit): The previously reported “Refill Batch Size sweep” based on TINY_REFILL_BATCH_SIZE was not measuring a real knob. That macro currently has zero call sites (it is defined but not referenced in the active Tiny front path), so any observed deltas were layout/drift noise, not an algorithmic effect.

Full Sweep Results

Baseline (Phase 68 PGO)

Metric	Value
Mean	60.65M ops/s
Median	60.68M ops/s
CV	1.68%
% of mimalloc	50.93%

Runs: 10 Binary: bench_random_mixed_hakmem_minimal_pgo (PGO optimized)

1. Warm Pool Size Sweep (ENV-only, no recompile)

Parameter: HAKMEM_WARM_POOL_SIZE (default: 12 SuperSlabs/class)

Size	Mean (M ops/s)	Median (M ops/s)	CV	vs Baseline	Decision
16	62.63	63.38	2.43%	+3.26%	Strong GO ✓✓✓
24	62.37	62.35	1.99%	+2.84%	GO ✓

Winner: Size=16 (+3.26%)

Analysis:

Size=16 exceeds +3.0% Strong GO threshold
Size=24 shows diminishing returns (+2.84% vs +3.26%)
Optimal sweet spot at Size=16 balances cache hit rate vs memory overhead

Command Used:

# Size=16
HAKMEM_WARM_POOL_SIZE=16 RUNS=10 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh

# Size=24
HAKMEM_WARM_POOL_SIZE=24 RUNS=10 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh

2. Unified Cache C5-C7 Sweep (ENV-only, no recompile)

Parameter: HAKMEM_TINY_UNIFIED_C5, HAKMEM_TINY_UNIFIED_C6, HAKMEM_TINY_UNIFIED_C7 (default: 128 slots)

Cache Size	Mean (M ops/s)	Median (M ops/s)	CV	vs Baseline	Decision
256	61.92	61.70	1.49%	+2.09%	GO ✓
512	61.80	62.00	1.21%	+1.89%	GO ✓

Winner: Cache=256 (+2.09%)

Analysis:

Cache=256 shows +2.09% improvement (GO threshold)
Cache=512 shows diminishing returns (+1.89% vs +2.09%)
Larger caches provide marginal gains while increasing memory overhead
Lower CV (1.49%) indicates stable performance

Command Used:

# Cache=256
HAKMEM_TINY_UNIFIED_C5=256 HAKMEM_TINY_UNIFIED_C6=256 HAKMEM_TINY_UNIFIED_C7=256 RUNS=10 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh

# Cache=512
HAKMEM_TINY_UNIFIED_C5=512 HAKMEM_TINY_UNIFIED_C6=512 HAKMEM_TINY_UNIFIED_C7=512 RUNS=10 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh

3. Combined Optimization Check

Configuration: Warm Pool Size=16 + Unified Cache C5-C7=256

Mean (M ops/s)	Median (M ops/s)	CV	vs Baseline	Decision
62.35	62.32	1.91%	+2.81%	GO (non-additive)

Analysis:

Combined result (+2.81%) is LESS than Warm Pool Size=16 alone (+3.26%)
Non-additive behavior indicates parameters are not orthogonal
Likely explanation: Warm pool optimization reduces unified cache miss rate, making cache capacity increase redundant
Recommendation: Use Warm Pool Size=16 alone for maximum benefit

Command Used:

HAKMEM_WARM_POOL_SIZE=16 HAKMEM_TINY_UNIFIED_C5=256 HAKMEM_TINY_UNIFIED_C6=256 HAKMEM_TINY_UNIFIED_C7=256 RUNS=10 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh

4. Refill Batch Size Sweep (invalid — macro not wired)

The TINY_REFILL_BATCH_SIZE macro is currently define-only:

rg -n "TINY_REFILL_BATCH_SIZE" core
# -> core/hakmem_tiny_config.h only

So we do not treat it as a tuning parameter until it is actually connected to refill logic.

If we want to tune refill frequency, use the real knobs:

HAKMEM_TINY_REFILL_COUNT_HOT
HAKMEM_TINY_REFILL_COUNT_MID
HAKMEM_TINY_REFILL_COUNT / HAKMEM_TINY_REFILL_COUNT_C{0..7}

Recommendations

Phase 69-2 (Baseline Promotion)

Primary Recommendation: Deploy Warm Pool Size=16 (ENV-only)

Rationale:

Strongest single improvement (+3.26%, Strong GO)
No code changes required - Zero risk of layout tax
Immediate deployment via environment variable
Exceeds M2 threshold (+3.0% Strong GO criterion)

Deployment:

# Add to PGO training environment and benchmark scripts
export HAKMEM_WARM_POOL_SIZE=16

Secondary Options (for Phase 69-3+)

Option A: Warm Pool Size=16 + Refill Batch=32

Combined potential: Unknown (requires testing, may be non-additive like unified cache)
Complexity: Requires PGO rebuild for Batch=32
Risk: Layout tax from code change

Option B: Warm Pool Size=16 alone (recommended)

Gain: +3.26% guaranteed
Complexity: ENV-only, zero code changes
Risk: None (reversible via ENV)

Raw Data Files

All 10-run logs saved to:

/tmp/phase69_baseline.log - Phase 68 PGO baseline
/tmp/phase69_warm16.log - Warm Pool Size=16
/tmp/phase69_warm24.log - Warm Pool Size=24
/tmp/phase69_cache256.log - Unified Cache C5-C7=256
/tmp/phase69_cache512.log - Unified Cache C5-C7=512
/tmp/phase69_combined.log - Combined (Warm=16 + Cache=256)
/tmp/phase69_batch32.log - Refill Batch=32

Next Steps

Awaiting User Instructions for Phase 69-2:

Confirm Warm Pool Size=16 as baseline promotion candidate
Decide whether to:
- Update ENV defaults in hakmem_tiny_config.h (preferred for SSOT)
- Document as recommended ENV setting in README/docs
- Add to PGO training scripts
Re-run make pgo-fast-full with HAKMEM_WARM_POOL_SIZE=16 in training environment
Update PERFORMANCE_TARGETS_SCORECARD.md with new baseline (projected: 62.63M ops/s, ~52.6% of mimalloc)

Phase 69-1 Status: ✅ COMPLETE Winner: Warm Pool Size=16 (+3.26%, Strong GO, ENV-only)

6.6 KiB Raw Blame History

Phase 69-1: Refill Tuning Parameter Sweeps - Results

Executive Summary

Full Sweep Results

Baseline (Phase 68 PGO)

1. Warm Pool Size Sweep (ENV-only, no recompile)

2. Unified Cache C5-C7 Sweep (ENV-only, no recompile)

3. Combined Optimization Check

4. Refill Batch Size Sweep (invalid — macro not wired)

Recommendations

Phase 69-2 (Baseline Promotion)

Secondary Options (for Phase 69-3+)

Raw Data Files

Next Steps

6.6 KiB

Raw Blame History