392 lines
10 KiB
Markdown
392 lines
10 KiB
Markdown
|
|
# Phase 6.11.4: Implementation Guide
|
|||
|
|
|
|||
|
|
**Quick Reference**: Step-by-step implementation for hak_alloc optimization
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Goal
|
|||
|
|
|
|||
|
|
**Reduce `hak_alloc` overhead**: 126,479 cycles (39.6%) → <70,000 cycles (<22%)
|
|||
|
|
|
|||
|
|
**Target improvement**: **-45% reduction in 2-3 hours**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 Implementation Checklist
|
|||
|
|
|
|||
|
|
### ✅ Phase 6.11.4 (P0-1): Atomic Operation Elimination (30 minutes)
|
|||
|
|
|
|||
|
|
**Expected gain**: -30,000 cycles (-24%)
|
|||
|
|
|
|||
|
|
#### Step 1: Modify hakmem.c
|
|||
|
|
|
|||
|
|
**File**: `apps/experiments/hakmem-poc/hakmem.c:362-369`
|
|||
|
|
|
|||
|
|
```diff
|
|||
|
|
void* hak_alloc_at(size_t size, hak_callsite_t site) {
|
|||
|
|
HKM_TIME_START(t0);
|
|||
|
|
|
|||
|
|
if (!g_initialized) hak_init();
|
|||
|
|
|
|||
|
|
- // Phase 6.8: Feature-gated evolution tick (every 1024 allocs)
|
|||
|
|
- if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_EVOLUTION)) {
|
|||
|
|
+ // Phase 6.11.4 (P0-1): Compile-time guard for atomic operation
|
|||
|
|
+ #if HAKMEM_FEATURE_EVOLUTION
|
|||
|
|
static _Atomic uint64_t tick_counter = 0;
|
|||
|
|
if ((atomic_fetch_add(&tick_counter, 1) & 0x3FF) == 0) {
|
|||
|
|
- struct timespec now;
|
|||
|
|
- clock_gettime(CLOCK_MONOTONIC, &now);
|
|||
|
|
- uint64_t now_ns = now.tv_sec * 1000000000ULL + now.tv_nsec;
|
|||
|
|
- hak_evo_tick(now_ns);
|
|||
|
|
+ hak_evo_tick(get_time_ns());
|
|||
|
|
}
|
|||
|
|
- }
|
|||
|
|
+ #endif
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Key changes**:
|
|||
|
|
1. Replace runtime check `if (HAK_ENABLED_LEARNING(...))` with compile-time `#if HAKMEM_FEATURE_EVOLUTION`
|
|||
|
|
2. Use `get_time_ns()` helper instead of inline `clock_gettime` (minor cleanup)
|
|||
|
|
|
|||
|
|
#### Step 2: Add helper function (optional cleanup)
|
|||
|
|
|
|||
|
|
**File**: `apps/experiments/hakmem-poc/hakmem_evo.c`
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Public helper (expose in hakmem_evo.h)
|
|||
|
|
uint64_t get_time_ns(void) {
|
|||
|
|
struct timespec ts;
|
|||
|
|
clock_gettime(CLOCK_MONOTONIC, &ts);
|
|||
|
|
return (uint64_t)ts.tv_sec * 1000000000ULL + (uint64_t)ts.tv_nsec;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**File**: `apps/experiments/hakmem-poc/hakmem_evo.h`
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Add to public API
|
|||
|
|
uint64_t get_time_ns(void); // Helper for external callers
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 3: Test with Evolution disabled
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Baseline (with atomic)
|
|||
|
|
cd apps/experiments/hakmem-poc
|
|||
|
|
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
|
|||
|
|
HAKMEM_TIMING=1 ./bench_allocators_hakmem
|
|||
|
|
|
|||
|
|
# Modify hakmem_config.h temporarily
|
|||
|
|
# Change: #define HAKMEM_FEATURE_EVOLUTION 0
|
|||
|
|
|
|||
|
|
# Rebuild and benchmark
|
|||
|
|
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
|
|||
|
|
HAKMEM_TIMING=1 ./bench_allocators_hakmem
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected output**:
|
|||
|
|
```
|
|||
|
|
Before:
|
|||
|
|
hak_alloc: 126,479 cycles (39.6%)
|
|||
|
|
|
|||
|
|
After:
|
|||
|
|
hak_alloc: 96,000 cycles (30.0%) ← -24% reduction ✅
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### ✅ Phase 6.11.4 (P0-2): Cached Strategy (1-2 hours)
|
|||
|
|
|
|||
|
|
**Expected gain**: -26,000 cycles (-27% additional)
|
|||
|
|
|
|||
|
|
#### Step 1: Add global cache variables
|
|||
|
|
|
|||
|
|
**File**: `apps/experiments/hakmem-poc/hakmem.c:52-60`
|
|||
|
|
|
|||
|
|
```diff
|
|||
|
|
static int g_initialized = 0;
|
|||
|
|
|
|||
|
|
// Statistics
|
|||
|
|
static uint64_t g_malloc_count = 0; // Used for optimization stats display
|
|||
|
|
|
|||
|
|
-// Phase 6.11: ELO Sampling Rate reduction (1/100 sampling)
|
|||
|
|
-static uint64_t g_elo_call_count = 0; // Total calls to ELO path
|
|||
|
|
-static int g_cached_strategy_id = -1; // Cached strategy ID (updated every 100 calls)
|
|||
|
|
+// Phase 6.11.4 (P0-2): Async ELO strategy cache
|
|||
|
|
+static _Atomic int g_cached_strategy_id = 2; // Default: 2MB threshold (strategy_id=4)
|
|||
|
|
+static _Atomic uint64_t g_elo_generation = 0; // Invalidation counter
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 2: Update hak_alloc logic
|
|||
|
|
|
|||
|
|
**File**: `apps/experiments/hakmem-poc/hakmem.c:377-417`
|
|||
|
|
|
|||
|
|
```diff
|
|||
|
|
if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
|
|||
|
|
// ELO enabled: use strategy selection
|
|||
|
|
int strategy_id;
|
|||
|
|
|
|||
|
|
if (hak_evo_is_frozen()) {
|
|||
|
|
// FROZEN: Use confirmed best strategy (zero overhead)
|
|||
|
|
strategy_id = hak_evo_get_confirmed_strategy();
|
|||
|
|
threshold = hak_elo_get_threshold(strategy_id);
|
|||
|
|
} else if (hak_evo_is_canary()) {
|
|||
|
|
// CANARY: 5% trial with candidate, 95% with confirmed
|
|||
|
|
if (hak_evo_should_use_candidate()) {
|
|||
|
|
strategy_id = hak_evo_get_candidate_strategy();
|
|||
|
|
} else {
|
|||
|
|
strategy_id = hak_evo_get_confirmed_strategy();
|
|||
|
|
}
|
|||
|
|
threshold = hak_elo_get_threshold(strategy_id);
|
|||
|
|
} else {
|
|||
|
|
- // LEARN: ELO operation with 1/100 sampling (Phase 6.11 optimization)
|
|||
|
|
- g_elo_call_count++;
|
|||
|
|
-
|
|||
|
|
- // Update strategy every 100 calls (99% overhead reduction)
|
|||
|
|
- if (g_elo_call_count % 100 == 0 || g_cached_strategy_id == -1) {
|
|||
|
|
- // Sample: Select strategy using epsilon-greedy (10% exploration, 90% exploitation)
|
|||
|
|
- strategy_id = hak_elo_select_strategy();
|
|||
|
|
- g_cached_strategy_id = strategy_id;
|
|||
|
|
-
|
|||
|
|
- // Record allocation for ELO learning (simplified: no timing yet)
|
|||
|
|
- hak_elo_record_alloc(strategy_id, size, 0);
|
|||
|
|
- } else {
|
|||
|
|
- // Use cached strategy (fast path, no ELO overhead)
|
|||
|
|
- strategy_id = g_cached_strategy_id;
|
|||
|
|
- }
|
|||
|
|
+ // Phase 6.11.4 (P0-2): LEARN mode uses cached strategy (updated async)
|
|||
|
|
+ strategy_id = atomic_load(&g_cached_strategy_id);
|
|||
|
|
threshold = hak_elo_get_threshold(strategy_id);
|
|||
|
|
}
|
|||
|
|
} else {
|
|||
|
|
// ELO disabled: use default threshold (2MB - mimalloc's large threshold)
|
|||
|
|
threshold = 2097152; // 2MB
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 3: Add async recompute in evo_tick
|
|||
|
|
|
|||
|
|
**File**: `apps/experiments/hakmem-poc/hakmem_evo.c`
|
|||
|
|
|
|||
|
|
**Add new function**:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Phase 6.11.4 (P0-2): Async ELO strategy recomputation
|
|||
|
|
void hak_elo_async_recompute(void) {
|
|||
|
|
if (!hak_elo_is_initialized()) return;
|
|||
|
|
|
|||
|
|
// Re-select best strategy (epsilon-greedy)
|
|||
|
|
int new_strategy = hak_elo_select_strategy();
|
|||
|
|
|
|||
|
|
// Update cached strategy
|
|||
|
|
extern _Atomic int g_cached_strategy_id; // From hakmem.c
|
|||
|
|
extern _Atomic uint64_t g_elo_generation;
|
|||
|
|
|
|||
|
|
atomic_store(&g_cached_strategy_id, new_strategy);
|
|||
|
|
atomic_fetch_add(&g_elo_generation, 1); // Invalidate
|
|||
|
|
|
|||
|
|
fprintf(stderr, "[ELO] Async strategy update: %d → %d (gen=%lu)\n",
|
|||
|
|
atomic_load(&g_cached_strategy_id), new_strategy,
|
|||
|
|
atomic_load(&g_elo_generation));
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Call from hak_evo_tick**:
|
|||
|
|
|
|||
|
|
```diff
|
|||
|
|
void hak_evo_tick(uint64_t now_ns) {
|
|||
|
|
// ... existing logic ...
|
|||
|
|
|
|||
|
|
// Close window if conditions met
|
|||
|
|
if (should_close) {
|
|||
|
|
// ... existing window closure logic ...
|
|||
|
|
|
|||
|
|
+ // Phase 6.11.4 (P0-2): Recompute ELO strategy (every window)
|
|||
|
|
+ if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
|
|||
|
|
+ hak_elo_async_recompute();
|
|||
|
|
+ }
|
|||
|
|
|
|||
|
|
// Reset window
|
|||
|
|
g_window_ops_count = 0;
|
|||
|
|
g_window_start_ns = now_ns;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expose in header**:
|
|||
|
|
|
|||
|
|
**File**: `apps/experiments/hakmem-poc/hakmem_evo.h`
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Phase 6.11.4 (P0-2): Async ELO update
|
|||
|
|
void hak_elo_async_recompute(void);
|
|||
|
|
int hak_elo_is_initialized(void); // Helper
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**File**: `apps/experiments/hakmem-poc/hakmem_elo.c`
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
int hak_elo_is_initialized(void) {
|
|||
|
|
return g_initialized;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 4: Test with Evolution enabled
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Restore HAKMEM_FEATURE_EVOLUTION=1 in hakmem_config.h
|
|||
|
|
cd apps/experiments/hakmem-poc
|
|||
|
|
HAKMEM_DEBUG_TIMING=1 make bench_allocators_hakmem
|
|||
|
|
HAKMEM_TIMING=1 ./bench_allocators_hakmem
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected output**:
|
|||
|
|
```
|
|||
|
|
Before (P0-1):
|
|||
|
|
hak_alloc: 96,000 cycles (30.0%)
|
|||
|
|
|
|||
|
|
After (P0-2):
|
|||
|
|
hak_alloc: 70,000 cycles (21.9%) ← -27% additional reduction ✅
|
|||
|
|
|
|||
|
|
Total:
|
|||
|
|
126,479 → 70,000 cycles (-45% total) 🎉
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 Troubleshooting
|
|||
|
|
|
|||
|
|
### Issue 1: Undefined reference to `g_cached_strategy_id`
|
|||
|
|
|
|||
|
|
**Cause**: External variable not declared in header
|
|||
|
|
|
|||
|
|
**Fix**: Add to `hakmem_evo.h` or make variables accessible via getter:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Option 1: Getter function (safer)
|
|||
|
|
int hak_elo_get_cached_strategy(void);
|
|||
|
|
|
|||
|
|
// Option 2: Extern declaration (faster)
|
|||
|
|
extern _Atomic int g_cached_strategy_id;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Issue 2: ELO strategy not updating
|
|||
|
|
|
|||
|
|
**Cause**: `hak_elo_async_recompute()` not called
|
|||
|
|
|
|||
|
|
**Debug**:
|
|||
|
|
```bash
|
|||
|
|
# Add debug prints
|
|||
|
|
fprintf(stderr, "[DEBUG] hak_evo_tick called, should_close=%d\n", should_close);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Issue 3: Race condition on g_elo_generation
|
|||
|
|
|
|||
|
|
**Not a problem**: Read-only in hot-path, atomic increment in cold-path
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 Validation
|
|||
|
|
|
|||
|
|
### Benchmark all scenarios
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
cd apps/experiments/hakmem-poc
|
|||
|
|
./bench_allocators_hakmem
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected improvements**:
|
|||
|
|
|
|||
|
|
| Scenario | Before (ns) | After (ns) | Reduction |
|
|||
|
|
|----------|-------------|------------|-----------|
|
|||
|
|
| json (64KB) | 298 | **~220** | **-26%** |
|
|||
|
|
| mir (256KB) | 1,698 | **~1,250** | **-26%** |
|
|||
|
|
| vm (2MB) | 15,021 | **~11,000** | **-27%** |
|
|||
|
|
|
|||
|
|
### Profiling validation
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
HAKMEM_TIMING=1 ./bench_allocators_hakmem
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected cycle distribution**:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Before:
|
|||
|
|
hak_alloc: 126,479 cycles (39.6%) ← Bottleneck
|
|||
|
|
syscall_munmap: 131,666 cycles (41.3%)
|
|||
|
|
|
|||
|
|
After:
|
|||
|
|
hak_alloc: 70,000 cycles (27.5%) ← Reduced! ✅
|
|||
|
|
syscall_munmap: 131,666 cycles (51.7%) ← Now #1 bottleneck
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Success criterion**: `hak_alloc` < 75,000 cycles (40% reduction)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Next Steps After P0-2
|
|||
|
|
|
|||
|
|
### Option A: Stop here (RECOMMENDED)
|
|||
|
|
|
|||
|
|
**Rationale**:
|
|||
|
|
- 45% reduction achieved (126,479 → 70,000 cycles)
|
|||
|
|
- 2-3 hours total investment
|
|||
|
|
- Excellent ROI
|
|||
|
|
|
|||
|
|
**Decision**: Move to **Phase 6.13 (L2.5 Pool mir scenario optimization)**
|
|||
|
|
|
|||
|
|
### Option B: Continue to P2 (Hash Optimization)
|
|||
|
|
|
|||
|
|
**Expected gain**: Additional 10,000 cycles (-14%)
|
|||
|
|
**Time investment**: 2-3 hours
|
|||
|
|
**Priority**: Medium
|
|||
|
|
|
|||
|
|
**Implementation**: See `PHASE_6.11.4_THREADING_COST_ANALYSIS.md` Section 3
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📝 Documentation Updates
|
|||
|
|
|
|||
|
|
After completion, update:
|
|||
|
|
|
|||
|
|
1. **CURRENT_TASK.md**:
|
|||
|
|
```markdown
|
|||
|
|
## ✅ Phase 6.11.4 完了!(YYYY-MM-DD)
|
|||
|
|
|
|||
|
|
**実装完了**: hak_alloc 最適化 (-45% reduction)
|
|||
|
|
|
|||
|
|
**P0-1**: Atomic operation elimination (-24%)
|
|||
|
|
**P0-2**: Cached strategy (-27%)
|
|||
|
|
|
|||
|
|
**結果**: 126,479 → 70,000 cycles (-45%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **PHASE_6.11.4_COMPLETION_REPORT.md**:
|
|||
|
|
- Copy template from `PHASE_6.11.3_COMPLETION_REPORT.md`
|
|||
|
|
- Fill in actual benchmark results
|
|||
|
|
- Add profiling comparison
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Quick Start Commands
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. Implement P0-1 (30 min)
|
|||
|
|
vim apps/experiments/hakmem-poc/hakmem.c # Edit line 362-369
|
|||
|
|
make bench_allocators_hakmem
|
|||
|
|
HAKMEM_TIMING=1 ./bench_allocators_hakmem
|
|||
|
|
|
|||
|
|
# 2. Implement P0-2 (1-2 hrs)
|
|||
|
|
vim apps/experiments/hakmem-poc/hakmem.c # Edit line 52-60, 377-417
|
|||
|
|
vim apps/experiments/hakmem-poc/hakmem_evo.c # Add hak_elo_async_recompute
|
|||
|
|
make bench_allocators_hakmem
|
|||
|
|
HAKMEM_TIMING=1 ./bench_allocators_hakmem
|
|||
|
|
|
|||
|
|
# 3. Validate
|
|||
|
|
./bench_allocators_hakmem | tee results_p0.txt
|
|||
|
|
python3 quick_analyze.py results_p0.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Total time**: 2-3 hours for **-45% reduction** 🎉
|