# Phase 6.11.4: Implementation Guide **Quick Reference**: Step-by-step implementation for hak_alloc optimization --- ## 🎯 Goal **Reduce `hak_alloc` overhead**: 126,479 cycles (39.6%) → <70,000 cycles (<22%) **Target improvement**: **-45% reduction in 2-3 hours** --- ## 📋 Implementation Checklist ### ✅ Phase 6.11.4 (P0-1): Atomic Operation Elimination (30 minutes) **Expected gain**: -30,000 cycles (-24%) #### Step 1: Modify hakmem.c **File**: `apps/experiments/hakmem-poc/hakmem.c:362-369` ```diff void* hak_alloc_at(size_t size, hak_callsite_t site) { HKM_TIME_START(t0); if (!g_initialized) hak_init(); - // Phase 6.8: Feature-gated evolution tick (every 1024 allocs) - if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_EVOLUTION)) { + // Phase 6.11.4 (P0-1): Compile-time guard for atomic operation + #if HAKMEM_FEATURE_EVOLUTION static _Atomic uint64_t tick_counter = 0; if ((atomic_fetch_add(&tick_counter, 1) & 0x3FF) == 0) { - struct timespec now; - clock_gettime(CLOCK_MONOTONIC, &now); - uint64_t now_ns = now.tv_sec * 1000000000ULL + now.tv_nsec; - hak_evo_tick(now_ns); + hak_evo_tick(get_time_ns()); } - } + #endif ``` **Key changes**: 1. Replace runtime check `if (HAK_ENABLED_LEARNING(...))` with compile-time `#if HAKMEM_FEATURE_EVOLUTION` 2. Use `get_time_ns()` helper instead of inline `clock_gettime` (minor cleanup) #### Step 2: Add helper function (optional cleanup) **File**: `apps/experiments/hakmem-poc/hakmem_evo.c` ```c // Public helper (expose in hakmem_evo.h) uint64_t get_time_ns(void) { struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); return (uint64_t)ts.tv_sec * 1000000000ULL + (uint64_t)ts.tv_nsec; } ``` **File**: `apps/experiments/hakmem-poc/hakmem_evo.h` ```c // Add to public API uint64_t get_time_ns(void); // Helper for external callers ``` #### Step 3: Test with Evolution disabled ```bash # Baseline (with atomic) cd apps/experiments/hakmem-poc HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem HAKMEM_TIMING=1 ./bench_allocators_hakmem # Modify hakmem_config.h temporarily # Change: #define HAKMEM_FEATURE_EVOLUTION 0 # Rebuild and benchmark HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem HAKMEM_TIMING=1 ./bench_allocators_hakmem ``` **Expected output**: ``` Before: hak_alloc: 126,479 cycles (39.6%) After: hak_alloc: 96,000 cycles (30.0%) ← -24% reduction ✅ ``` --- ### ✅ Phase 6.11.4 (P0-2): Cached Strategy (1-2 hours) **Expected gain**: -26,000 cycles (-27% additional) #### Step 1: Add global cache variables **File**: `apps/experiments/hakmem-poc/hakmem.c:52-60` ```diff static int g_initialized = 0; // Statistics static uint64_t g_malloc_count = 0; // Used for optimization stats display -// Phase 6.11: ELO Sampling Rate reduction (1/100 sampling) -static uint64_t g_elo_call_count = 0; // Total calls to ELO path -static int g_cached_strategy_id = -1; // Cached strategy ID (updated every 100 calls) +// Phase 6.11.4 (P0-2): Async ELO strategy cache +static _Atomic int g_cached_strategy_id = 2; // Default: 2MB threshold (strategy_id=4) +static _Atomic uint64_t g_elo_generation = 0; // Invalidation counter ``` #### Step 2: Update hak_alloc logic **File**: `apps/experiments/hakmem-poc/hakmem.c:377-417` ```diff if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) { // ELO enabled: use strategy selection int strategy_id; if (hak_evo_is_frozen()) { // FROZEN: Use confirmed best strategy (zero overhead) strategy_id = hak_evo_get_confirmed_strategy(); threshold = hak_elo_get_threshold(strategy_id); } else if (hak_evo_is_canary()) { // CANARY: 5% trial with candidate, 95% with confirmed if (hak_evo_should_use_candidate()) { strategy_id = hak_evo_get_candidate_strategy(); } else { strategy_id = hak_evo_get_confirmed_strategy(); } threshold = hak_elo_get_threshold(strategy_id); } else { - // LEARN: ELO operation with 1/100 sampling (Phase 6.11 optimization) - g_elo_call_count++; - - // Update strategy every 100 calls (99% overhead reduction) - if (g_elo_call_count % 100 == 0 || g_cached_strategy_id == -1) { - // Sample: Select strategy using epsilon-greedy (10% exploration, 90% exploitation) - strategy_id = hak_elo_select_strategy(); - g_cached_strategy_id = strategy_id; - - // Record allocation for ELO learning (simplified: no timing yet) - hak_elo_record_alloc(strategy_id, size, 0); - } else { - // Use cached strategy (fast path, no ELO overhead) - strategy_id = g_cached_strategy_id; - } + // Phase 6.11.4 (P0-2): LEARN mode uses cached strategy (updated async) + strategy_id = atomic_load(&g_cached_strategy_id); threshold = hak_elo_get_threshold(strategy_id); } } else { // ELO disabled: use default threshold (2MB - mimalloc's large threshold) threshold = 2097152; // 2MB } ``` #### Step 3: Add async recompute in evo_tick **File**: `apps/experiments/hakmem-poc/hakmem_evo.c` **Add new function**: ```c // Phase 6.11.4 (P0-2): Async ELO strategy recomputation void hak_elo_async_recompute(void) { if (!hak_elo_is_initialized()) return; // Re-select best strategy (epsilon-greedy) int new_strategy = hak_elo_select_strategy(); // Update cached strategy extern _Atomic int g_cached_strategy_id; // From hakmem.c extern _Atomic uint64_t g_elo_generation; atomic_store(&g_cached_strategy_id, new_strategy); atomic_fetch_add(&g_elo_generation, 1); // Invalidate fprintf(stderr, "[ELO] Async strategy update: %d → %d (gen=%lu)\n", atomic_load(&g_cached_strategy_id), new_strategy, atomic_load(&g_elo_generation)); } ``` **Call from hak_evo_tick**: ```diff void hak_evo_tick(uint64_t now_ns) { // ... existing logic ... // Close window if conditions met if (should_close) { // ... existing window closure logic ... + // Phase 6.11.4 (P0-2): Recompute ELO strategy (every window) + if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) { + hak_elo_async_recompute(); + } // Reset window g_window_ops_count = 0; g_window_start_ns = now_ns; } } ``` **Expose in header**: **File**: `apps/experiments/hakmem-poc/hakmem_evo.h` ```c // Phase 6.11.4 (P0-2): Async ELO update void hak_elo_async_recompute(void); int hak_elo_is_initialized(void); // Helper ``` **File**: `apps/experiments/hakmem-poc/hakmem_elo.c` ```c int hak_elo_is_initialized(void) { return g_initialized; } ``` #### Step 4: Test with Evolution enabled ```bash # Restore HAKMEM_FEATURE_EVOLUTION=1 in hakmem_config.h cd apps/experiments/hakmem-poc HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem HAKMEM_TIMING=1 ./bench_allocators_hakmem ``` **Expected output**: ``` Before (P0-1): hak_alloc: 96,000 cycles (30.0%) After (P0-2): hak_alloc: 70,000 cycles (21.9%) ← -27% additional reduction ✅ Total: 126,479 → 70,000 cycles (-45% total) 🎉 ``` --- ## 🔧 Troubleshooting ### Issue 1: Undefined reference to `g_cached_strategy_id` **Cause**: External variable not declared in header **Fix**: Add to `hakmem_evo.h` or make variables accessible via getter: ```c // Option 1: Getter function (safer) int hak_elo_get_cached_strategy(void); // Option 2: Extern declaration (faster) extern _Atomic int g_cached_strategy_id; ``` ### Issue 2: ELO strategy not updating **Cause**: `hak_elo_async_recompute()` not called **Debug**: ```bash # Add debug prints fprintf(stderr, "[DEBUG] hak_evo_tick called, should_close=%d\n", should_close); ``` ### Issue 3: Race condition on g_elo_generation **Not a problem**: Read-only in hot-path, atomic increment in cold-path --- ## 📊 Validation ### Benchmark all scenarios ```bash cd apps/experiments/hakmem-poc ./bench_allocators_hakmem ``` **Expected improvements**: | Scenario | Before (ns) | After (ns) | Reduction | |----------|-------------|------------|-----------| | json (64KB) | 298 | **~220** | **-26%** | | mir (256KB) | 1,698 | **~1,250** | **-26%** | | vm (2MB) | 15,021 | **~11,000** | **-27%** | ### Profiling validation ```bash HAKMEM_TIMING=1 ./bench_allocators_hakmem ``` **Expected cycle distribution**: ``` Before: hak_alloc: 126,479 cycles (39.6%) ← Bottleneck syscall_munmap: 131,666 cycles (41.3%) After: hak_alloc: 70,000 cycles (27.5%) ← Reduced! ✅ syscall_munmap: 131,666 cycles (51.7%) ← Now #1 bottleneck ``` **Success criterion**: `hak_alloc` < 75,000 cycles (40% reduction) --- ## 🎯 Next Steps After P0-2 ### Option A: Stop here (RECOMMENDED) **Rationale**: - 45% reduction achieved (126,479 → 70,000 cycles) - 2-3 hours total investment - Excellent ROI **Decision**: Move to **Phase 6.13 (L2.5 Pool mir scenario optimization)** ### Option B: Continue to P2 (Hash Optimization) **Expected gain**: Additional 10,000 cycles (-14%) **Time investment**: 2-3 hours **Priority**: Medium **Implementation**: See `PHASE_6.11.4_THREADING_COST_ANALYSIS.md` Section 3 --- ## 📝 Documentation Updates After completion, update: 1. **CURRENT_TASK.md**: ```markdown ## ✅ Phase 6.11.4 完了!(YYYY-MM-DD) **実装完了**: hak_alloc 最適化 (-45% reduction) **P0-1**: Atomic operation elimination (-24%) **P0-2**: Cached strategy (-27%) **結果**: 126,479 → 70,000 cycles (-45%) ``` 2. **PHASE_6.11.4_COMPLETION_REPORT.md**: - Copy template from `PHASE_6.11.3_COMPLETION_REPORT.md` - Fill in actual benchmark results - Add profiling comparison --- ## 🚀 Quick Start Commands ```bash # 1. Implement P0-1 (30 min) vim apps/experiments/hakmem-poc/hakmem.c # Edit line 362-369 make bench_allocators_hakmem HAKMEM_TIMING=1 ./bench_allocators_hakmem # 2. Implement P0-2 (1-2 hrs) vim apps/experiments/hakmem-poc/hakmem.c # Edit line 52-60, 377-417 vim apps/experiments/hakmem-poc/hakmem_evo.c # Add hak_elo_async_recompute make bench_allocators_hakmem HAKMEM_TIMING=1 ./bench_allocators_hakmem # 3. Validate ./bench_allocators_hakmem | tee results_p0.txt python3 quick_analyze.py results_p0.txt ``` **Total time**: 2-3 hours for **-45% reduction** 🎉