hakmem/docs/design/PHASE_6.11.4_IMPLEMENTATION_GUIDE.md

# Phase 6.11.4: Implementation Guide

**Quick Reference**: Step-by-step implementation for hak_alloc optimization

---

## 🎯 Goal

**Reduce `hak_alloc` overhead**: 126,479 cycles (39.6%) → <70,000 cycles (<22%)

**Target improvement**: **-45% reduction in 2-3 hours**

---

## 📋 Implementation Checklist

### ✅ Phase 6.11.4 (P0-1): Atomic Operation Elimination (30 minutes)

**Expected gain**: -30,000 cycles (-24%)

#### Step 1: Modify hakmem.c

**File**: `apps/experiments/hakmem-poc/hakmem.c:362-369`

```diff
void* hak_alloc_at(size_t size, hak_callsite_t site) {
    HKM_TIME_START(t0);

    if (!g_initialized) hak_init();

-   // Phase 6.8: Feature-gated evolution tick (every 1024 allocs)
-   if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_EVOLUTION)) {
+   // Phase 6.11.4 (P0-1): Compile-time guard for atomic operation
+   #if HAKMEM_FEATURE_EVOLUTION
        static _Atomic uint64_t tick_counter = 0;
        if ((atomic_fetch_add(&tick_counter, 1) & 0x3FF) == 0) {
-           struct timespec now;
-           clock_gettime(CLOCK_MONOTONIC, &now);
-           uint64_t now_ns = now.tv_sec * 1000000000ULL + now.tv_nsec;
-           hak_evo_tick(now_ns);
+           hak_evo_tick(get_time_ns());
        }
-   }
+   #endif
```

**Key changes**:
1. Replace runtime check `if (HAK_ENABLED_LEARNING(...))` with compile-time `#if HAKMEM_FEATURE_EVOLUTION`
2. Use `get_time_ns()` helper instead of inline `clock_gettime` (minor cleanup)

#### Step 2: Add helper function (optional cleanup)

**File**: `apps/experiments/hakmem-poc/hakmem_evo.c`

```c
// Public helper (expose in hakmem_evo.h)
uint64_t get_time_ns(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return (uint64_t)ts.tv_sec * 1000000000ULL + (uint64_t)ts.tv_nsec;
}
```

**File**: `apps/experiments/hakmem-poc/hakmem_evo.h`

```c
// Add to public API
uint64_t get_time_ns(void);  // Helper for external callers
```

#### Step 3: Test with Evolution disabled

```bash
# Baseline (with atomic)
cd apps/experiments/hakmem-poc
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem

# Modify hakmem_config.h temporarily
# Change: #define HAKMEM_FEATURE_EVOLUTION 0

# Rebuild and benchmark
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem
```

**Expected output**:
```
Before:
  hak_alloc: 126,479 cycles (39.6%)

After:
  hak_alloc: 96,000 cycles (30.0%) ← -24% reduction ✅
```

---

### ✅ Phase 6.11.4 (P0-2): Cached Strategy (1-2 hours)

**Expected gain**: -26,000 cycles (-27% additional)

#### Step 1: Add global cache variables

**File**: `apps/experiments/hakmem-poc/hakmem.c:52-60`

```diff
static int g_initialized = 0;

// Statistics
static uint64_t g_malloc_count = 0;  // Used for optimization stats display

-// Phase 6.11: ELO Sampling Rate reduction (1/100 sampling)
-static uint64_t g_elo_call_count = 0;      // Total calls to ELO path
-static int g_cached_strategy_id = -1;       // Cached strategy ID (updated every 100 calls)
+// Phase 6.11.4 (P0-2): Async ELO strategy cache
+static _Atomic int g_cached_strategy_id = 2;      // Default: 2MB threshold (strategy_id=4)
+static _Atomic uint64_t g_elo_generation = 0;     // Invalidation counter
```

#### Step 2: Update hak_alloc logic

**File**: `apps/experiments/hakmem-poc/hakmem.c:377-417`

```diff
    if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
        // ELO enabled: use strategy selection
        int strategy_id;

        if (hak_evo_is_frozen()) {
            // FROZEN: Use confirmed best strategy (zero overhead)
            strategy_id = hak_evo_get_confirmed_strategy();
            threshold = hak_elo_get_threshold(strategy_id);
        } else if (hak_evo_is_canary()) {
            // CANARY: 5% trial with candidate, 95% with confirmed
            if (hak_evo_should_use_candidate()) {
                strategy_id = hak_evo_get_candidate_strategy();
            } else {
                strategy_id = hak_evo_get_confirmed_strategy();
            }
            threshold = hak_elo_get_threshold(strategy_id);
        } else {
-           // LEARN: ELO operation with 1/100 sampling (Phase 6.11 optimization)
-           g_elo_call_count++;
-
-           // Update strategy every 100 calls (99% overhead reduction)
-           if (g_elo_call_count % 100 == 0 || g_cached_strategy_id == -1) {
-               // Sample: Select strategy using epsilon-greedy (10% exploration, 90% exploitation)
-               strategy_id = hak_elo_select_strategy();
-               g_cached_strategy_id = strategy_id;
-
-               // Record allocation for ELO learning (simplified: no timing yet)
-               hak_elo_record_alloc(strategy_id, size, 0);
-           } else {
-               // Use cached strategy (fast path, no ELO overhead)
-               strategy_id = g_cached_strategy_id;
-           }
+           // Phase 6.11.4 (P0-2): LEARN mode uses cached strategy (updated async)
+           strategy_id = atomic_load(&g_cached_strategy_id);
            threshold = hak_elo_get_threshold(strategy_id);
        }
    } else {
        // ELO disabled: use default threshold (2MB - mimalloc's large threshold)
        threshold = 2097152;  // 2MB
    }
```

#### Step 3: Add async recompute in evo_tick

**File**: `apps/experiments/hakmem-poc/hakmem_evo.c`

**Add new function**:

```c
// Phase 6.11.4 (P0-2): Async ELO strategy recomputation
void hak_elo_async_recompute(void) {
    if (!hak_elo_is_initialized()) return;

    // Re-select best strategy (epsilon-greedy)
    int new_strategy = hak_elo_select_strategy();

    // Update cached strategy
    extern _Atomic int g_cached_strategy_id;  // From hakmem.c
    extern _Atomic uint64_t g_elo_generation;

    atomic_store(&g_cached_strategy_id, new_strategy);
    atomic_fetch_add(&g_elo_generation, 1);  // Invalidate

    fprintf(stderr, "[ELO] Async strategy update: %d → %d (gen=%lu)\n",
            atomic_load(&g_cached_strategy_id), new_strategy,
            atomic_load(&g_elo_generation));
}
```

**Call from hak_evo_tick**:

```diff
void hak_evo_tick(uint64_t now_ns) {
    // ... existing logic ...

    // Close window if conditions met
    if (should_close) {
        // ... existing window closure logic ...

+       // Phase 6.11.4 (P0-2): Recompute ELO strategy (every window)
+       if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
+           hak_elo_async_recompute();
+       }

        // Reset window
        g_window_ops_count = 0;
        g_window_start_ns = now_ns;
    }
}
```

**Expose in header**:

**File**: `apps/experiments/hakmem-poc/hakmem_evo.h`

```c
// Phase 6.11.4 (P0-2): Async ELO update
void hak_elo_async_recompute(void);
int hak_elo_is_initialized(void);  // Helper
```

**File**: `apps/experiments/hakmem-poc/hakmem_elo.c`

```c
int hak_elo_is_initialized(void) {
    return g_initialized;
}
```

#### Step 4: Test with Evolution enabled

```bash
# Restore HAKMEM_FEATURE_EVOLUTION=1 in hakmem_config.h
cd apps/experiments/hakmem-poc
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem
```

**Expected output**:
```
Before (P0-1):
  hak_alloc: 96,000 cycles (30.0%)

After (P0-2):
  hak_alloc: 70,000 cycles (21.9%) ← -27% additional reduction ✅

Total:
  126,479 → 70,000 cycles (-45% total) 🎉
```

---

## 🔧 Troubleshooting

### Issue 1: Undefined reference to `g_cached_strategy_id`

**Cause**: External variable not declared in header

**Fix**: Add to `hakmem_evo.h` or make variables accessible via getter:

```c
// Option 1: Getter function (safer)
int hak_elo_get_cached_strategy(void);

// Option 2: Extern declaration (faster)
extern _Atomic int g_cached_strategy_id;
```

### Issue 2: ELO strategy not updating

**Cause**: `hak_elo_async_recompute()` not called

**Debug**:
```bash
# Add debug prints
fprintf(stderr, "[DEBUG] hak_evo_tick called, should_close=%d\n", should_close);
```

### Issue 3: Race condition on g_elo_generation

**Not a problem**: Read-only in hot-path, atomic increment in cold-path

---

## 📊 Validation

### Benchmark all scenarios

```bash
cd apps/experiments/hakmem-poc
./bench_allocators_hakmem
```

**Expected improvements**:

| Scenario | Before (ns) | After (ns) | Reduction |
|----------|-------------|------------|-----------|
| json (64KB) | 298 | **~220** | **-26%** |
| mir (256KB) | 1,698 | **~1,250** | **-26%** |
| vm (2MB) | 15,021 | **~11,000** | **-27%** |

### Profiling validation

```bash
HAKMEM_TIMING=1 ./bench_allocators_hakmem
```

**Expected cycle distribution**:

```
Before:
  hak_alloc:       126,479 cycles (39.6%)  ← Bottleneck
  syscall_munmap:  131,666 cycles (41.3%)

After:
  hak_alloc:        70,000 cycles (27.5%)  ← Reduced! ✅
  syscall_munmap:  131,666 cycles (51.7%)  ← Now #1 bottleneck
```

**Success criterion**: `hak_alloc` < 75,000 cycles (40% reduction)

---

## 🎯 Next Steps After P0-2

### Option A: Stop here (RECOMMENDED)

**Rationale**:
- 45% reduction achieved (126,479 → 70,000 cycles)
- 2-3 hours total investment
- Excellent ROI

**Decision**: Move to **Phase 6.13 (L2.5 Pool mir scenario optimization)**

### Option B: Continue to P2 (Hash Optimization)

**Expected gain**: Additional 10,000 cycles (-14%)
**Time investment**: 2-3 hours
**Priority**: Medium

**Implementation**: See `PHASE_6.11.4_THREADING_COST_ANALYSIS.md` Section 3

---

## 📝 Documentation Updates

After completion, update:

1. **CURRENT_TASK.md**:
   ```markdown
   ## ✅ Phase 6.11.4 完了！（YYYY-MM-DD）

   **実装完了**: hak_alloc 最適化 (-45% reduction)

   **P0-1**: Atomic operation elimination (-24%)
   **P0-2**: Cached strategy (-27%)

   **結果**: 126,479 → 70,000 cycles (-45%)
   ```

2. **PHASE_6.11.4_COMPLETION_REPORT.md**:
   - Copy template from `PHASE_6.11.3_COMPLETION_REPORT.md`
   - Fill in actual benchmark results
   - Add profiling comparison

---

## 🚀 Quick Start Commands

```bash
# 1. Implement P0-1 (30 min)
vim apps/experiments/hakmem-poc/hakmem.c      # Edit line 362-369
make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem

# 2. Implement P0-2 (1-2 hrs)
vim apps/experiments/hakmem-poc/hakmem.c      # Edit line 52-60, 377-417
vim apps/experiments/hakmem-poc/hakmem_evo.c  # Add hak_elo_async_recompute
make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem

# 3. Validate
./bench_allocators_hakmem | tee results_p0.txt
python3 quick_analyze.py results_p0.txt
```

**Total time**: 2-3 hours for **-45% reduction** 🎉
-												Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

											
										
										
											2025-11-05 12:31:14 +09:00
+								# Phase 6.11.4: Implementation Guide
 								**Quick Reference**: Step-by-step implementation for hak_alloc optimization
 								---
 								## 🎯 Goal
 								**Reduce `hak_alloc` overhead**: 126,479 cycles (39.6%) → <70,000 cycles (<22%)
 								**Target improvement**: **-45% reduction in 2-3 hours**
 								---
 								## 📋 Implementation Checklist
 								### ✅ Phase 6.11.4 (P0-1): Atomic Operation Elimination (30 minutes)
 								**Expected gain**: -30,000 cycles (-24%)
 								#### Step 1: Modify hakmem.c
 								**File**: `apps/experiments/hakmem-poc/hakmem.c:362-369`
 								```diff
 								void* hak_alloc_at(size_t size, hak_callsite_t site) {
 								    HKM_TIME_START(t0);
 								    if (!g_initialized) hak_init();
 								-   // Phase 6.8: Feature-gated evolution tick (every 1024 allocs)
 								-   if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_EVOLUTION)) {
 								+   // Phase 6.11.4 (P0-1): Compile-time guard for atomic operation
 								+   #if HAKMEM_FEATURE_EVOLUTION
 								        static _Atomic uint64_t tick_counter = 0;
 								        if ((atomic_fetch_add(&tick_counter, 1) & 0x3FF) == 0) {
 								-           struct timespec now;
 								-           clock_gettime(CLOCK_MONOTONIC, &now);
 								-           uint64_t now_ns = now.tv_sec * 1000000000ULL + now.tv_nsec;
 								-           hak_evo_tick(now_ns);
 								+           hak_evo_tick(get_time_ns());
 								        }
 								-   }
 								+   #endif
 								```
 								**Key changes**:
 . Replace runtime check `if (HAK_ENABLED_LEARNING(...))` with compile-time `#if HAKMEM_FEATURE_EVOLUTION`
 . Use `get_time_ns()` helper instead of inline `clock_gettime` (minor cleanup)
 								#### Step 2: Add helper function (optional cleanup)
 								**File**: `apps/experiments/hakmem-poc/hakmem_evo.c`
 								```c
 								// Public helper (expose in hakmem_evo.h)
 								uint64_t get_time_ns(void) {
 								    struct timespec ts;
 								    clock_gettime(CLOCK_MONOTONIC, &ts);
 								    return (uint64_t)ts.tv_sec * 1000000000ULL + (uint64_t)ts.tv_nsec;
 								}
 								```
 								**File**: `apps/experiments/hakmem-poc/hakmem_evo.h`
 								```c
 								// Add to public API
 								uint64_t get_time_ns(void);  // Helper for external callers
 								```
 								#### Step 3: Test with Evolution disabled
 								```bash
 								# Baseline (with atomic)
 								cd apps/experiments/hakmem-poc
 								HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
 								HAKMEM_TIMING=1 ./bench_allocators_hakmem
 								# Modify hakmem_config.h temporarily
 								# Change: #define HAKMEM_FEATURE_EVOLUTION 0
 								# Rebuild and benchmark
 								HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
 								HAKMEM_TIMING=1 ./bench_allocators_hakmem
 								```
 								**Expected output**:
 								```
 								Before:
 								  hak_alloc: 126,479 cycles (39.6%)
 								After:
 								  hak_alloc: 96,000 cycles (30.0%) ← -24% reduction ✅
 								```
 								---
 								### ✅ Phase 6.11.4 (P0-2): Cached Strategy (1-2 hours)
 								**Expected gain**: -26,000 cycles (-27% additional)
 								#### Step 1: Add global cache variables
 								**File**: `apps/experiments/hakmem-poc/hakmem.c:52-60`
 								```diff
 								static int g_initialized = 0;
 								// Statistics
 								static uint64_t g_malloc_count = 0;  // Used for optimization stats display
 								-// Phase 6.11: ELO Sampling Rate reduction (1/100 sampling)
 								-static uint64_t g_elo_call_count = 0;      // Total calls to ELO path
 								-static int g_cached_strategy_id = -1;       // Cached strategy ID (updated every 100 calls)
 								+// Phase 6.11.4 (P0-2): Async ELO strategy cache
 								+static _Atomic int g_cached_strategy_id = 2;      // Default: 2MB threshold (strategy_id=4)
 								+static _Atomic uint64_t g_elo_generation = 0;     // Invalidation counter
 								```
 								#### Step 2: Update hak_alloc logic
 								**File**: `apps/experiments/hakmem-poc/hakmem.c:377-417`
 								```diff
 								    if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
 								        // ELO enabled: use strategy selection
 								        int strategy_id;
 								        if (hak_evo_is_frozen()) {
 								            // FROZEN: Use confirmed best strategy (zero overhead)
 								            strategy_id = hak_evo_get_confirmed_strategy();
 								            threshold = hak_elo_get_threshold(strategy_id);
 								        } else if (hak_evo_is_canary()) {
 								            // CANARY: 5% trial with candidate, 95% with confirmed
 								            if (hak_evo_should_use_candidate()) {
 								                strategy_id = hak_evo_get_candidate_strategy();
 								            } else {
 								                strategy_id = hak_evo_get_confirmed_strategy();
 								            }
 								            threshold = hak_elo_get_threshold(strategy_id);
 								        } else {
 								-           // LEARN: ELO operation with 1/100 sampling (Phase 6.11 optimization)
 								-           g_elo_call_count++;
 								-
 								-           // Update strategy every 100 calls (99% overhead reduction)
 								-           if (g_elo_call_count % 100 == 0 || g_cached_strategy_id == -1) {
 								-               // Sample: Select strategy using epsilon-greedy (10% exploration, 90% exploitation)
 								-               strategy_id = hak_elo_select_strategy();
 								-               g_cached_strategy_id = strategy_id;
 								-
 								-               // Record allocation for ELO learning (simplified: no timing yet)
 								-               hak_elo_record_alloc(strategy_id, size, 0);
 								-           } else {
 								-               // Use cached strategy (fast path, no ELO overhead)
 								-               strategy_id = g_cached_strategy_id;
 								-           }
 								+           // Phase 6.11.4 (P0-2): LEARN mode uses cached strategy (updated async)
 								+           strategy_id = atomic_load(&g_cached_strategy_id);
 								            threshold = hak_elo_get_threshold(strategy_id);
 								        }
 								    } else {
 								        // ELO disabled: use default threshold (2MB - mimalloc's large threshold)
 								        threshold = 2097152;  // 2MB
 								    }
 								```
 								#### Step 3: Add async recompute in evo_tick
 								**File**: `apps/experiments/hakmem-poc/hakmem_evo.c`
 								**Add new function**:
 								```c
 								// Phase 6.11.4 (P0-2): Async ELO strategy recomputation
 								void hak_elo_async_recompute(void) {
 								    if (!hak_elo_is_initialized()) return;
 								    // Re-select best strategy (epsilon-greedy)
 								    int new_strategy = hak_elo_select_strategy();
 								    // Update cached strategy
 								    extern _Atomic int g_cached_strategy_id;  // From hakmem.c
 								    extern _Atomic uint64_t g_elo_generation;
 								    atomic_store(&g_cached_strategy_id, new_strategy);
 								    atomic_fetch_add(&g_elo_generation, 1);  // Invalidate
 								    fprintf(stderr, "[ELO] Async strategy update: %d → %d (gen=%lu)\n",
 								            atomic_load(&g_cached_strategy_id), new_strategy,
 								            atomic_load(&g_elo_generation));
 								}
 								```
 								**Call from hak_evo_tick**:
 								```diff
 								void hak_evo_tick(uint64_t now_ns) {
 								    // ... existing logic ...
 								    // Close window if conditions met
 								    if (should_close) {
 								        // ... existing window closure logic ...
 								+       // Phase 6.11.4 (P0-2): Recompute ELO strategy (every window)
 								+       if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
 								+           hak_elo_async_recompute();
 								+       }
 								        // Reset window
 								        g_window_ops_count = 0;
 								        g_window_start_ns = now_ns;
 								    }
 								}
 								```
 								**Expose in header**:
 								**File**: `apps/experiments/hakmem-poc/hakmem_evo.h`
 								```c
 								// Phase 6.11.4 (P0-2): Async ELO update
 								void hak_elo_async_recompute(void);
 								int hak_elo_is_initialized(void);  // Helper
 								```
 								**File**: `apps/experiments/hakmem-poc/hakmem_elo.c`
 								```c
 								int hak_elo_is_initialized(void) {
 								    return g_initialized;
 								}
 								```
 								#### Step 4: Test with Evolution enabled
 								```bash
 								# Restore HAKMEM_FEATURE_EVOLUTION=1 in hakmem_config.h
 								cd apps/experiments/hakmem-poc
 								HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
 								HAKMEM_TIMING=1 ./bench_allocators_hakmem
 								```
 								**Expected output**:
 								```
 								Before (P0-1):
 								  hak_alloc: 96,000 cycles (30.0%)
 								After (P0-2):
 								  hak_alloc: 70,000 cycles (21.9%) ← -27% additional reduction ✅
 								Total:
 ,479 → 70,000 cycles (-45% total) 🎉
 								```
 								---
 								## 🔧 Troubleshooting
 								### Issue 1: Undefined reference to `g_cached_strategy_id`
 								**Cause**: External variable not declared in header
 								**Fix**: Add to `hakmem_evo.h` or make variables accessible via getter:
 								```c
 								// Option 1: Getter function (safer)
 								int hak_elo_get_cached_strategy(void);
 								// Option 2: Extern declaration (faster)
 								extern _Atomic int g_cached_strategy_id;
 								```
 								### Issue 2: ELO strategy not updating
 								**Cause**: `hak_elo_async_recompute()` not called
 								**Debug**:
 								```bash
 								# Add debug prints
 								fprintf(stderr, "[DEBUG] hak_evo_tick called, should_close=%d\n", should_close);
 								```
 								### Issue 3: Race condition on g_elo_generation
 								**Not a problem**: Read-only in hot-path, atomic increment in cold-path
 								---
 								## 📊 Validation
 								### Benchmark all scenarios
 								```bash
 								cd apps/experiments/hakmem-poc
 								./bench_allocators_hakmem
 								```
 								**Expected improvements**:
 								| Scenario | Before (ns) | After (ns) | Reduction |
 								|----------|-------------|------------|-----------|
 								| json (64KB) | 298 | **~220** | **-26%** |
 								| mir (256KB) | 1,698 | **~1,250** | **-26%** |
 								| vm (2MB) | 15,021 | **~11,000** | **-27%** |
 								### Profiling validation
 								```bash
 								HAKMEM_TIMING=1 ./bench_allocators_hakmem
 								```
 								**Expected cycle distribution**:
 								```
 								Before:
 								  hak_alloc:       126,479 cycles (39.6%)  ← Bottleneck
 								  syscall_munmap:  131,666 cycles (41.3%)
 								After:
 								  hak_alloc:        70,000 cycles (27.5%)  ← Reduced! ✅
 								  syscall_munmap:  131,666 cycles (51.7%)  ← Now #1 bottleneck
 								```
 								**Success criterion**: `hak_alloc` < 75,000 cycles (40% reduction)
 								---
 								## 🎯 Next Steps After P0-2
 								### Option A: Stop here (RECOMMENDED)
 								**Rationale**:
 								- 45% reduction achieved (126,479 → 70,000 cycles)
 								- 2-3 hours total investment
 								- Excellent ROI
 								**Decision**: Move to **Phase 6.13 (L2.5 Pool mir scenario optimization)**
 								### Option B: Continue to P2 (Hash Optimization)
 								**Expected gain**: Additional 10,000 cycles (-14%)
 								**Time investment**: 2-3 hours
 								**Priority**: Medium
 								**Implementation**: See `PHASE_6.11.4_THREADING_COST_ANALYSIS.md` Section 3
 								---
 								## 📝 Documentation Updates
 								After completion, update:
 . **CURRENT_TASK.md**:
 								   ```markdown
 								   ## ✅ Phase 6.11.4 完了！（YYYY-MM-DD）
 								   **実装完了**: hak_alloc 最適化 (-45% reduction)
 								   **P0-1**: Atomic operation elimination (-24%)
 								   **P0-2**: Cached strategy (-27%)
 								   **結果**: 126,479 → 70,000 cycles (-45%)
 								   ```
 . **PHASE_6.11.4_COMPLETION_REPORT.md**:
 								   - Copy template from `PHASE_6.11.3_COMPLETION_REPORT.md`
 								   - Fill in actual benchmark results
 								   - Add profiling comparison
 								---
 								## 🚀 Quick Start Commands
 								```bash
 								# 1. Implement P0-1 (30 min)
 								vim apps/experiments/hakmem-poc/hakmem.c      # Edit line 362-369
 								make bench_allocators_hakmem
 								HAKMEM_TIMING=1 ./bench_allocators_hakmem
 								# 2. Implement P0-2 (1-2 hrs)
 								vim apps/experiments/hakmem-poc/hakmem.c      # Edit line 52-60, 377-417
 								vim apps/experiments/hakmem-poc/hakmem_evo.c  # Add hak_elo_async_recompute
 								make bench_allocators_hakmem
 								HAKMEM_TIMING=1 ./bench_allocators_hakmem
 								# 3. Validate
 								./bench_allocators_hakmem | tee results_p0.txt
 								python3 quick_analyze.py results_p0.txt
 								```
 								**Total time**: 2-3 hours for **-45% reduction** 🎉