Files
hakmem/docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_3_REVISED_INSTRUCTIONS.md

407 lines
11 KiB
Markdown
Raw Normal View History

# Phase 19-3: ENV Snapshot Consolidation — Revised Implementation Instructions
## 0. Design Corrections (from initial analysis)
**Key mistakes in initial design**:
1. **Wrong**: Create new TLS ctx
**Right**: Use existing `HakmemEnvSnapshot*` from `core/box/hakmem_env_snapshot_box.h`
2. **Wrong**: Option A with static `__thread g_init` (doesn't respect `hakmem_env_snapshot_refresh_from_env()`)
**Right**: Pass `const HakmemEnvSnapshot* env` down the call stack (refresh works automatically)
3. **Wrong**: Modify `core/wrapper.c`
**Right**: Modify `core/box/hak_wrappers.inc.h` (actual integration point)
4. **Wrong**: Keep `__builtin_expect(hakmem_env_snapshot_enabled(), 0)`
**Right**: Remove UNLIKELY hint (Phase 19-1 trap: snapshot is now ON by default, hint is backwards)
---
## 1. Strategy: Simplest Path (3 micro-phases)
### Phase 19-3a: Remove UNLIKELY hint from ENV checks
**Problem**: `__builtin_expect(hakmem_env_snapshot_enabled(), 0)` appears in:
- `core/front/malloc_tiny_fast.h:236` (C7 ULTRA alloc)
- `core/front/malloc_tiny_fast.h:403` (Front V3 free hotcold)
- `core/front/malloc_tiny_fast.h:624` (C7 ULTRA free)
- `core/front/malloc_tiny_fast.h:830` (C7 ULTRA free larson)
- `core/front/malloc_tiny_fast.h:910` (Front V3 free larson)
**Issue**: Snapshot is now ON in presets → UNLIKELY hint is backwards (same trap as Phase 19-1 NO-GO)
**Fix**: Replace `__builtin_expect(hakmem_env_snapshot_enabled(), 0)` with `hakmem_env_snapshot_enabled()`
**Expected**: +0-2% from correct branch prediction
**A/B Test**:
```sh
# Before (with UNLIKELY hint)
scripts/run_mixed_10_cleanenv.sh
# After (without hint)
scripts/run_mixed_10_cleanenv.sh
```
**GO Criteria**: No regression (±1%)
---
### Phase 19-3b: Pass snapshot down from wrapper entry
**Current state**: Each callee calls `hakmem_env_snapshot_enabled()` independently
- 5 calls × 2 TLS reads each = **10 TLS reads/op**
**Proposed**:
```c
// core/box/hak_wrappers.inc.h (malloc wrapper)
void* malloc(size_t size) {
// Entry-point: read ENV snapshot once
const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
if (fastlane_direct_enabled()) {
// NOTE: Keep FastLane safety rule: do not use fast paths before init.
if (g_initialized) {
void* ptr = malloc_tiny_fast_with_env(size, env); // NEW: pass env
if (ptr != NULL) return ptr;
// IMPORTANT: malloc miss must fall through to existing wrapper path
// (do NOT call malloc_cold() directly; it expects lock_depth to be incremented).
}
}
void* ptr = front_fastlane_try_malloc(size);
if (__builtin_expect(ptr != NULL, 1)) return ptr;
// Not handled → continue to existing wrapper path below (wrap_shape / lock_depth / init waits / malloc_cold(...)).
// (Do not duplicate the full wrapper here; only the env pass-down is new.)
/* existing wrapper path */
}
// core/box/hak_wrappers.inc.h (free wrapper)
void free(void* ptr) {
if (__builtin_expect(!ptr, 0)) return;
// Entry-point: read ENV snapshot once
const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
if (fastlane_direct_enabled()) {
// NOTE: Keep FastLane safety rule: do not use fast paths before init.
if (g_initialized) {
if (free_tiny_fast_with_env(ptr, env)) return; // NEW: pass env
free_cold(ptr);
return;
}
}
if (front_fastlane_try_free(ptr)) return;
free_cold(ptr, wrapper_env_cfg_fast());
}
```
**Propagate env down**:
```c
// core/front/malloc_tiny_fast.h (example)
static inline void* malloc_tiny_fast_with_env(size_t size, const HakmemEnvSnapshot* env) {
// ... existing logic ...
// OLD: if (__builtin_expect(hakmem_env_snapshot_enabled(), 0)) {
// NEW: if (env) {
if (env) {
// Use snapshot from env (NO additional TLS read)
if (tiny_c7_ultra_enabled_cached(env)) { // NEW: use cached check
// C7 ULTRA path
}
}
// ... rest of logic ...
}
```
**Helper functions** (optional, if needed):
```c
// core/box/hakmem_env_snapshot_box.h
static inline bool tiny_c7_ultra_enabled_cached(const HakmemEnvSnapshot* env) {
if (!env) return 0;
// Read from snapshot (already in cache, no TLS read)
return env->tiny_c7_ultra_enabled;
}
```
**Expected**:
- TLS reads: 10/op → 2/op (just wrapper entry check)
- Instructions: -8.0 to -10.0/op
- Throughput: **+3-5%**
**ENV Gate** (optional, if conservative rollout needed):
```c
// core/box/env_snapshot_consolidation_env_box.h (NEW, if needed)
extern _Atomic int g_env_snapshot_consolidation_enabled;
static inline bool env_snapshot_consolidation_enabled(void) {
return atomic_load_explicit(&g_env_snapshot_consolidation_enabled, memory_order_relaxed);
}
```
**Alternative** (no new ENV gate):
- Just always pass env down when HAKMEM_FASTLANE_DIRECT=1
- Rely on existing FASTLANE_DIRECT gate for rollback
---
### Phase 19-3c: Propagate to all callees (if 19-3b is GO)
**Targets**:
- `tiny_legacy_fallback_box.h:28` (ENV snapshot check)
- `tiny_metadata_cache_hot_box.h:64` (metadata cache check)
- Other helper functions that currently call `hakmem_env_snapshot_enabled()`
**Expected cumulative**: **+5-8%**
---
## 2. Box Theory Compliance
**Boundary**:
- L0 (ENV gate): Optional `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0/1` OR reuse `HAKMEM_FASTLANE_DIRECT=1`
- L1 (Hot inline): `const HakmemEnvSnapshot* env` parameter (NULL-safe)
- L2 (Fallback): If `env == NULL`, use old path (call `hakmem_env_snapshot_enabled()`)
**Rollback**:
- Runtime: `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0` (or `HAKMEM_FASTLANE_DIRECT=0`)
- Compile-time: `#if` guards on env parameter propagation
- NULL-safe: `if (!env) { /* old path */ }`
**Observability**:
- Perf stat: TLS read count reduction
- `hakmem_env_snapshot_enabled()` samples should drop from 7% → <1%
**Refresh compatibility**:
- ✅ Works: Wrapper reads fresh snapshot on each operation
-`bench_profile` putenv() + refresh works (no static cache)
- ✅ No version tracking needed
---
## 3. Implementation Steps
### Step 1: Phase 19-3a (UNLIKELY hint removal)
**Files to modify**:
```sh
# Find all instances
grep -n "__builtin_expect(hakmem_env_snapshot_enabled(), 0)" core/front/malloc_tiny_fast.h
# Replace with
# hakmem_env_snapshot_enabled()
```
**Lines to change**:
- `malloc_tiny_fast.h:236`
- `malloc_tiny_fast.h:403`
- `malloc_tiny_fast.h:624`
- `malloc_tiny_fast.h:830`
- `malloc_tiny_fast.h:910`
**A/B Test**:
```sh
# Baseline (before)
scripts/run_mixed_10_cleanenv.sh
# Optimized (after)
scripts/run_mixed_10_cleanenv.sh
```
**GO Criteria**: No regression (±1%)
---
### Step 2: Phase 19-3b (Pass env from wrapper)
**2.1 Update wrappers** (`core/box/hak_wrappers.inc.h`)
Find malloc wrapper (around line 674):
```c
void* malloc(size_t size) {
// ADD: Read snapshot once at entry
const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
// ... rest of existing logic ...
// Replace malloc_tiny_fast(size) with malloc_tiny_fast_with_env(size, env)
}
```
Find free wrapper (around line 188):
```c
void free(void* ptr) {
// ADD: Read snapshot once at entry
const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
// ... rest of existing logic ...
// Replace free_tiny_fast(ptr) with free_tiny_fast_with_env(ptr, env)
}
```
**2.2 Add _with_env variants** (`core/front/malloc_tiny_fast.h`)
Option A: Rename existing functions (breaking change, requires full propagation)
Option B: Add new `_with_env` variants, keep old functions as wrappers (safer)
**Recommended: Option B** (incremental migration)
```c
// malloc_tiny_fast.h (add new variant)
static inline void* malloc_tiny_fast_with_env(size_t size, const HakmemEnvSnapshot* env) {
// Replace hakmem_env_snapshot_enabled() checks with (env && ...)
// ... existing logic ...
}
// Keep old function as wrapper (for gradual migration)
static inline void* malloc_tiny_fast(size_t size) {
const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
return malloc_tiny_fast_with_env(size, env);
}
```
**2.3 Update ENV checks inside _with_env**
Replace:
```c
if (__builtin_expect(hakmem_env_snapshot_enabled(), 0)) {
// C7 ULTRA check
}
```
With:
```c
if (env) {
// Use env snapshot (NO TLS read)
if (env->tiny_c7_ultra_enabled) {
// C7 ULTRA path
}
}
```
**2.4 Build & Verify**
```sh
make clean && make -j bench_random_mixed_hakmem
# Should compile without errors
```
**2.5 A/B Test**
```sh
# Baseline (19-3a only)
scripts/run_mixed_10_cleanenv.sh
# Optimized (19-3b with env passing)
scripts/run_mixed_10_cleanenv.sh
```
**GO Criteria**: +3% minimum
---
### Step 3: Phase 19-3c (Propagate to helpers)
**If 19-3b is GO**, propagate env to:
- `tiny_legacy_fallback_box.h`
- `tiny_metadata_cache_hot_box.h`
- Other helper functions
**Expected cumulative**: +5-8%
---
## 4. Safety Checklist
- [ ] NULL-safe: All `if (env)` checks handle NULL correctly
- [ ] Refresh works: Wrapper reads fresh snapshot each call
- [ ] Rollback: Can disable via ENV or compile flag
- [ ] No static cache: No version tracking needed
- [ ] Existing code paths preserved: Old functions still work
---
## 5. Expected Performance (Cumulative)
| Phase | TLS reads/op | Instructions/op | Throughput |
|-------|--------------|-----------------|------------|
| Baseline (19-1b) | 10 | 169.45 | 52.06M ops/s |
| 19-3a (hint fix) | 10 | ~169 | +0-2% |
| 19-3b (env pass) | 2 | ~159-161 | +3-5% |
| 19-3c (helpers) | 2 | ~157-159 | +5-8% |
**Target after Phase 19-3c**:
- Throughput: **54.7-56.2M ops/s** (vs 52.06M baseline)
- Instructions/op: **157-159** (vs 169.45 baseline)
- Gap to libc (135.92): **+15-17%** (vs +24.6% before 19-3)
---
## 6. Perf Validation
**Before Phase 19-3**:
```sh
perf stat -e cycles,instructions -- ./bench_random_mixed_hakmem 200000000 400 1
# Instructions: ~169.45/op
```
**After Phase 19-3c**:
```sh
perf stat -e cycles,instructions -- ./bench_random_mixed_hakmem 200000000 400 1
# Instructions: ~157-159/op (target: -10.0 reduction)
```
**Perf record validation**:
```sh
perf record -g -- ./bench_random_mixed_hakmem 50000000 400 1
perf report --stdio --no-children | grep hakmem_env_snapshot_enabled
# Should show <1% samples (down from 7%)
```
---
## 7. Risk Assessment
**Phase 19-3a**: **LOW**
- Simple search-replace
- No algorithmic changes
- Worst case: ±0% (no-op)
**Phase 19-3b**: **MEDIUM**
- API signature changes (add env parameter)
- Mechanical changes (low semantic risk)
- Rollback: Keep old functions as wrappers
**Phase 19-3c**: **MEDIUM**
- Wider propagation
- More functions to update
- Same rollback strategy
---
## 8. Timeline
- **Phase 19-3a**: 1-2 hours (search-replace + A/B)
- **Phase 19-3b**: 4-6 hours (wrapper + _with_env variants + A/B)
- **Phase 19-3c**: 3-4 hours (helper propagation + A/B)
**Total**: 8-12 hours (1-2 days part-time)
---
## 9. Next Steps After This Phase
If Phase 19-3 achieves +5-8%:
- Current: 52.06M ops/s
- After 19-3: ~54.7-56.2M ops/s
- Gap to libc (79.72M): ~+42-46%
**Remaining candidates** (from Phase 19-1 Design):
- Candidate C: Stats removal (+3-5%, already done in BENCH_MINIMAL)
- Candidate D: Header inline (+2-3%)
- Candidate E: Route fast path (+2-3%)
**Or**: Re-profile with `perf record` to find next hot path (self% ≥ 5%)