Files
hakmem/docs/design/PHASE_6.11.4_IMPLEMENTATION_GUIDE.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

392 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 6.11.4: Implementation Guide
**Quick Reference**: Step-by-step implementation for hak_alloc optimization
---
## 🎯 Goal
**Reduce `hak_alloc` overhead**: 126,479 cycles (39.6%) → <70,000 cycles (<22%)
**Target improvement**: **-45% reduction in 2-3 hours**
---
## 📋 Implementation Checklist
### ✅ Phase 6.11.4 (P0-1): Atomic Operation Elimination (30 minutes)
**Expected gain**: -30,000 cycles (-24%)
#### Step 1: Modify hakmem.c
**File**: `apps/experiments/hakmem-poc/hakmem.c:362-369`
```diff
void* hak_alloc_at(size_t size, hak_callsite_t site) {
HKM_TIME_START(t0);
if (!g_initialized) hak_init();
- // Phase 6.8: Feature-gated evolution tick (every 1024 allocs)
- if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_EVOLUTION)) {
+ // Phase 6.11.4 (P0-1): Compile-time guard for atomic operation
+ #if HAKMEM_FEATURE_EVOLUTION
static _Atomic uint64_t tick_counter = 0;
if ((atomic_fetch_add(&tick_counter, 1) & 0x3FF) == 0) {
- struct timespec now;
- clock_gettime(CLOCK_MONOTONIC, &now);
- uint64_t now_ns = now.tv_sec * 1000000000ULL + now.tv_nsec;
- hak_evo_tick(now_ns);
+ hak_evo_tick(get_time_ns());
}
- }
+ #endif
```
**Key changes**:
1. Replace runtime check `if (HAK_ENABLED_LEARNING(...))` with compile-time `#if HAKMEM_FEATURE_EVOLUTION`
2. Use `get_time_ns()` helper instead of inline `clock_gettime` (minor cleanup)
#### Step 2: Add helper function (optional cleanup)
**File**: `apps/experiments/hakmem-poc/hakmem_evo.c`
```c
// Public helper (expose in hakmem_evo.h)
uint64_t get_time_ns(void) {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return (uint64_t)ts.tv_sec * 1000000000ULL + (uint64_t)ts.tv_nsec;
}
```
**File**: `apps/experiments/hakmem-poc/hakmem_evo.h`
```c
// Add to public API
uint64_t get_time_ns(void); // Helper for external callers
```
#### Step 3: Test with Evolution disabled
```bash
# Baseline (with atomic)
cd apps/experiments/hakmem-poc
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem
# Modify hakmem_config.h temporarily
# Change: #define HAKMEM_FEATURE_EVOLUTION 0
# Rebuild and benchmark
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem
```
**Expected output**:
```
Before:
hak_alloc: 126,479 cycles (39.6%)
After:
hak_alloc: 96,000 cycles (30.0%) ← -24% reduction ✅
```
---
### ✅ Phase 6.11.4 (P0-2): Cached Strategy (1-2 hours)
**Expected gain**: -26,000 cycles (-27% additional)
#### Step 1: Add global cache variables
**File**: `apps/experiments/hakmem-poc/hakmem.c:52-60`
```diff
static int g_initialized = 0;
// Statistics
static uint64_t g_malloc_count = 0; // Used for optimization stats display
-// Phase 6.11: ELO Sampling Rate reduction (1/100 sampling)
-static uint64_t g_elo_call_count = 0; // Total calls to ELO path
-static int g_cached_strategy_id = -1; // Cached strategy ID (updated every 100 calls)
+// Phase 6.11.4 (P0-2): Async ELO strategy cache
+static _Atomic int g_cached_strategy_id = 2; // Default: 2MB threshold (strategy_id=4)
+static _Atomic uint64_t g_elo_generation = 0; // Invalidation counter
```
#### Step 2: Update hak_alloc logic
**File**: `apps/experiments/hakmem-poc/hakmem.c:377-417`
```diff
if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
// ELO enabled: use strategy selection
int strategy_id;
if (hak_evo_is_frozen()) {
// FROZEN: Use confirmed best strategy (zero overhead)
strategy_id = hak_evo_get_confirmed_strategy();
threshold = hak_elo_get_threshold(strategy_id);
} else if (hak_evo_is_canary()) {
// CANARY: 5% trial with candidate, 95% with confirmed
if (hak_evo_should_use_candidate()) {
strategy_id = hak_evo_get_candidate_strategy();
} else {
strategy_id = hak_evo_get_confirmed_strategy();
}
threshold = hak_elo_get_threshold(strategy_id);
} else {
- // LEARN: ELO operation with 1/100 sampling (Phase 6.11 optimization)
- g_elo_call_count++;
-
- // Update strategy every 100 calls (99% overhead reduction)
- if (g_elo_call_count % 100 == 0 || g_cached_strategy_id == -1) {
- // Sample: Select strategy using epsilon-greedy (10% exploration, 90% exploitation)
- strategy_id = hak_elo_select_strategy();
- g_cached_strategy_id = strategy_id;
-
- // Record allocation for ELO learning (simplified: no timing yet)
- hak_elo_record_alloc(strategy_id, size, 0);
- } else {
- // Use cached strategy (fast path, no ELO overhead)
- strategy_id = g_cached_strategy_id;
- }
+ // Phase 6.11.4 (P0-2): LEARN mode uses cached strategy (updated async)
+ strategy_id = atomic_load(&g_cached_strategy_id);
threshold = hak_elo_get_threshold(strategy_id);
}
} else {
// ELO disabled: use default threshold (2MB - mimalloc's large threshold)
threshold = 2097152; // 2MB
}
```
#### Step 3: Add async recompute in evo_tick
**File**: `apps/experiments/hakmem-poc/hakmem_evo.c`
**Add new function**:
```c
// Phase 6.11.4 (P0-2): Async ELO strategy recomputation
void hak_elo_async_recompute(void) {
if (!hak_elo_is_initialized()) return;
// Re-select best strategy (epsilon-greedy)
int new_strategy = hak_elo_select_strategy();
// Update cached strategy
extern _Atomic int g_cached_strategy_id; // From hakmem.c
extern _Atomic uint64_t g_elo_generation;
atomic_store(&g_cached_strategy_id, new_strategy);
atomic_fetch_add(&g_elo_generation, 1); // Invalidate
fprintf(stderr, "[ELO] Async strategy update: %d → %d (gen=%lu)\n",
atomic_load(&g_cached_strategy_id), new_strategy,
atomic_load(&g_elo_generation));
}
```
**Call from hak_evo_tick**:
```diff
void hak_evo_tick(uint64_t now_ns) {
// ... existing logic ...
// Close window if conditions met
if (should_close) {
// ... existing window closure logic ...
+ // Phase 6.11.4 (P0-2): Recompute ELO strategy (every window)
+ if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
+ hak_elo_async_recompute();
+ }
// Reset window
g_window_ops_count = 0;
g_window_start_ns = now_ns;
}
}
```
**Expose in header**:
**File**: `apps/experiments/hakmem-poc/hakmem_evo.h`
```c
// Phase 6.11.4 (P0-2): Async ELO update
void hak_elo_async_recompute(void);
int hak_elo_is_initialized(void); // Helper
```
**File**: `apps/experiments/hakmem-poc/hakmem_elo.c`
```c
int hak_elo_is_initialized(void) {
return g_initialized;
}
```
#### Step 4: Test with Evolution enabled
```bash
# Restore HAKMEM_FEATURE_EVOLUTION=1 in hakmem_config.h
cd apps/experiments/hakmem-poc
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem
```
**Expected output**:
```
Before (P0-1):
hak_alloc: 96,000 cycles (30.0%)
After (P0-2):
hak_alloc: 70,000 cycles (21.9%) ← -27% additional reduction ✅
Total:
126,479 → 70,000 cycles (-45% total) 🎉
```
---
## 🔧 Troubleshooting
### Issue 1: Undefined reference to `g_cached_strategy_id`
**Cause**: External variable not declared in header
**Fix**: Add to `hakmem_evo.h` or make variables accessible via getter:
```c
// Option 1: Getter function (safer)
int hak_elo_get_cached_strategy(void);
// Option 2: Extern declaration (faster)
extern _Atomic int g_cached_strategy_id;
```
### Issue 2: ELO strategy not updating
**Cause**: `hak_elo_async_recompute()` not called
**Debug**:
```bash
# Add debug prints
fprintf(stderr, "[DEBUG] hak_evo_tick called, should_close=%d\n", should_close);
```
### Issue 3: Race condition on g_elo_generation
**Not a problem**: Read-only in hot-path, atomic increment in cold-path
---
## 📊 Validation
### Benchmark all scenarios
```bash
cd apps/experiments/hakmem-poc
./bench_allocators_hakmem
```
**Expected improvements**:
| Scenario | Before (ns) | After (ns) | Reduction |
|----------|-------------|------------|-----------|
| json (64KB) | 298 | **~220** | **-26%** |
| mir (256KB) | 1,698 | **~1,250** | **-26%** |
| vm (2MB) | 15,021 | **~11,000** | **-27%** |
### Profiling validation
```bash
HAKMEM_TIMING=1 ./bench_allocators_hakmem
```
**Expected cycle distribution**:
```
Before:
hak_alloc: 126,479 cycles (39.6%) ← Bottleneck
syscall_munmap: 131,666 cycles (41.3%)
After:
hak_alloc: 70,000 cycles (27.5%) ← Reduced! ✅
syscall_munmap: 131,666 cycles (51.7%) ← Now #1 bottleneck
```
**Success criterion**: `hak_alloc` < 75,000 cycles (40% reduction)
---
## 🎯 Next Steps After P0-2
### Option A: Stop here (RECOMMENDED)
**Rationale**:
- 45% reduction achieved (126,479 70,000 cycles)
- 2-3 hours total investment
- Excellent ROI
**Decision**: Move to **Phase 6.13 (L2.5 Pool mir scenario optimization)**
### Option B: Continue to P2 (Hash Optimization)
**Expected gain**: Additional 10,000 cycles (-14%)
**Time investment**: 2-3 hours
**Priority**: Medium
**Implementation**: See `PHASE_6.11.4_THREADING_COST_ANALYSIS.md` Section 3
---
## 📝 Documentation Updates
After completion, update:
1. **CURRENT_TASK.md**:
```markdown
## ✅ Phase 6.11.4 完了YYYY-MM-DD
**実装完了**: hak_alloc 最適化 (-45% reduction)
**P0-1**: Atomic operation elimination (-24%)
**P0-2**: Cached strategy (-27%)
**結果**: 126,479 → 70,000 cycles (-45%)
```
2. **PHASE_6.11.4_COMPLETION_REPORT.md**:
- Copy template from `PHASE_6.11.3_COMPLETION_REPORT.md`
- Fill in actual benchmark results
- Add profiling comparison
---
## 🚀 Quick Start Commands
```bash
# 1. Implement P0-1 (30 min)
vim apps/experiments/hakmem-poc/hakmem.c # Edit line 362-369
make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem
# 2. Implement P0-2 (1-2 hrs)
vim apps/experiments/hakmem-poc/hakmem.c # Edit line 52-60, 377-417
vim apps/experiments/hakmem-poc/hakmem_evo.c # Add hak_elo_async_recompute
make bench_allocators_hakmem
HAKMEM_TIMING=1 ./bench_allocators_hakmem
# 3. Validate
./bench_allocators_hakmem | tee results_p0.txt
python3 quick_analyze.py results_p0.txt
```
**Total time**: 2-3 hours for **-45% reduction** 🎉