# Phase 85: Free Path Commit-Once (LEGACY-only) Implementation Plan ## 1. Objective & Scope **Goal**: Eliminate per-operation policy/route/mono ceremony overhead in `free_tiny_fast()` for LEGACY route by applying Phase 78-1 "commit-once" pattern. **Target**: +2.0% improvement (GO threshold) **Scope**: - LEGACY route only (classes C4-C7, size 129-256 bytes) - Does NOT apply to ULTRA/MID/V7 routes - Must coexist with existing Phase 9 (MONO DUALHOT) and Phase 10 (MONO LEGACY DIRECT) optimizations - Fail-fast if HAKMEM_TINY_LARSON_FIX enabled (owner_tid validation incompatible with commit-once) **Strategy**: Cache Route + Handler mapping at init-time (bench_profile refresh boundary), skip 12-20 branches per free() in hot path. --- ## 2. Architecture & Design ### 2.1 Core Pattern (Phase 78-1 Adaptation) Following Phase 78-1 successful pattern: ``` ┌─────────────────────────────────────────────────────┐ │ Init-time (bench_profile refresh boundary) │ │ ───────────────────────────────────────────────── │ │ free_path_commit_once_refresh_from_env() │ │ ├─ Read ENV: HAKMEM_FREE_PATH_COMMIT_ONCE=0/1 │ │ ├─ Fail-fast: if LARSON_FIX enabled → disable │ │ ├─ For C4-C7 (LEGACY classes): │ │ │ └─ Compute: route_kind, handler function │ │ │ └─ Store: g_free_path_commit_once_fixed[4] │ │ └─ Set: g_free_path_commit_once_enabled = true │ └─────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ Hot path (every free) │ │ ───────────────────────────────────────────────── │ │ free_tiny_fast() │ │ if (g_free_path_commit_once_enabled_fast()) { │ │ // NEW: Direct dispatch, skip all ceremony │ │ auto& cached = g_free_path_commit_once_fixed[ │ │ class_idx - TINY_C4]; │ │ return cached.handler(ptr, class_idx, heap); │ │ } │ │ // Fallback: existing Phase 9/10/policy/route │ │ ... │ └─────────────────────────────────────────────────────┘ ``` ### 2.2 Cached State Structure ```c typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap); struct FreePatchCommitOnceEntry { TinyRouteKind route_kind; // LEGACY, ULTRA, MID, V7 (validation only) FreeTinyHandler handler; // Direct function pointer uint8_t valid; // Safety flag }; // Global state (4 entries for C4-C7) extern FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4]; extern bool g_free_path_commit_once_enabled; ``` ### 2.3 What Gets Cached For each LEGACY class (C4-C7): - **route_kind**: Expected to be `TINY_ROUTE_LEGACY` - **handler**: Function pointer to `tiny_legacy_fallback_free_base_with_env` or appropriate handler - **valid**: Safety flag (1 if cache entry is valid) ### 2.4 Eliminated Overhead **Before** (15-26 branches per free): 1. Phase 9 MONO DUALHOT check (3-5 branches) 2. Phase 10 MONO LEGACY DIRECT check (4-6 branches) 3. Policy snapshot call `small_policy_v7_snapshot()` (5-10 branches, potential getenv) 4. Route computation `tiny_route_for_class()` (3-5 branches) 5. Switch on route_kind (1-2 branches) **After** (commit-once enabled, LEGACY classes): 1. Master gate check `g_free_path_commit_once_enabled_fast()` (1 branch, predicted taken) 2. Class index range check (1 branch, predicted taken) 3. Cached entry lookup (0 branches, direct memory load) 4. Direct handler dispatch (1 indirect call) **Branch reduction**: 12-20 branches per LEGACY free → **Estimated +2-3% improvement** --- ## 3. Files to Create/Modify ### 3.1 New Files (Box Pattern) #### `core/box/free_path_commit_once_fixed_box.h` ```c #ifndef HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H #define HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H #include #include #include "core/hakmem_tiny_defs.h" typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap); struct FreePatchCommitOnceEntry { TinyRouteKind route_kind; FreeTinyHandler handler; uint8_t valid; }; // Global cache (4 entries for C4-C7) extern struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4]; extern bool g_free_path_commit_once_enabled; // Fast-path API (inlined, no fallback needed) static inline bool free_path_commit_once_enabled_fast(void) { return __builtin_expect(g_free_path_commit_once_enabled, 0); } // Refresh (called once at bench_profile boundary) void free_path_commit_once_refresh_from_env(void); #endif ``` #### `core/box/free_path_commit_once_fixed_box.c` ```c #include "free_path_commit_once_fixed_box.h" #include "core/box/tiny_env_box.h" #include "core/box/tiny_larson_fix_env_box.h" #include "core/hakmem_tiny.h" #include #include struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4]; bool g_free_path_commit_once_enabled = false; void free_path_commit_once_refresh_from_env(void) { // Read master ENV gate const char* env_val = getenv("HAKMEM_FREE_PATH_COMMIT_ONCE"); bool requested = (env_val && atoi(env_val) == 1); if (!requested) { g_free_path_commit_once_enabled = false; return; } // Fail-fast: LARSON_FIX incompatible with commit-once if (tiny_larson_fix_enabled()) { fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n"); g_free_path_commit_once_enabled = false; return; } // Pre-compute route + handler for C4-C7 (LEGACY) for (unsigned i = 0; i < 4; i++) { unsigned class_idx = TINY_C4 + i; // Route determination (expect LEGACY for C4-C7) TinyRouteKind route = tiny_route_for_class(class_idx); // Handler selection (simplified, matches free_tiny_fast logic) FreeTinyHandler handler = NULL; if (route == TINY_ROUTE_LEGACY) { handler = tiny_legacy_fallback_free_base_with_env; } else { // Unexpected route, fail-fast fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: C%u route=%d not LEGACY, disabling\n", class_idx, (int)route); g_free_path_commit_once_enabled = false; return; } g_free_path_commit_once_fixed[i].route_kind = route; g_free_path_commit_once_fixed[i].handler = handler; g_free_path_commit_once_fixed[i].valid = 1; } g_free_path_commit_once_enabled = true; } ``` ### 3.2 Modified Files #### `core/front/malloc_tiny_fast.h` (free_tiny_fast function) **Insertion point**: Line ~950, before Phase 9/10 checks ```c static void free_tiny_fast(void* ptr, unsigned class_idx, TinyHeap* heap, ...) { // NEW: Phase 85 commit-once fast path (LEGACY classes only) #if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED if (free_path_commit_once_enabled_fast()) { if (class_idx >= TINY_C4 && class_idx <= TINY_C7) { const unsigned cache_idx = class_idx - TINY_C4; const struct FreePatchCommitOnceEntry* entry = &g_free_path_commit_once_fixed[cache_idx]; if (__builtin_expect(entry->valid, 1)) { entry->handler(ptr, class_idx, heap); return; } } } #endif // Existing Phase 9/10/policy/route ceremony (fallback) ... } ``` #### `core/bench_profile.h` (refresh function integration) Add to `refresh_all_env_caches()`: ```c void refresh_all_env_caches(void) { // ... existing refreshes ... #if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED free_path_commit_once_refresh_from_env(); #endif } ``` #### `Makefile` (box flag) Add new box flag: ```makefile BOX_FREE_PATH_COMMIT_ONCE_FIXED ?= 1 CFLAGS += -DHAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED=$(BOX_FREE_PATH_COMMIT_ONCE_FIXED) ``` --- ## 4. Implementation Stages ### Stage 1: Box Infrastructure (1-2 hours) 1. Create `free_path_commit_once_fixed_box.h` with struct definition, global declarations, fast-path API 2. Create `free_path_commit_once_fixed_box.c` with refresh implementation 3. Add Makefile box flag 4. Integrate refresh call into `core/bench_profile.h` 5. **Validation**: Compile, verify no build errors ### Stage 2: Hot Path Integration (1 hour) 1. Modify `core/front/malloc_tiny_fast.h` to add Phase 85 fast path at line ~950 2. Add class range check (C4-C7) and cache lookup 3. Add handler dispatch with validity check 4. **Validation**: Compile, verify no build errors, run basic functionality test ### Stage 3: Fail-Fast Safety (30 min) 1. Test LARSON_FIX=1 scenario, verify commit-once disabled 2. Test invalid route scenario (C4-C7 with non-LEGACY route) 3. **Validation**: Both scenarios should log fail-fast message and fall back to standard path ### Stage 4: A/B Testing (2-3 hours) 1. Build single binary with box flag enabled 2. Baseline test: `HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh` 3. Treatment test: `HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh` 4. Compare mean/median/CV, calculate delta 5. **GO criteria**: +2.0% or better --- ## 5. Test Plan ### 5.1 SSOT Baseline (10-run) ```bash # Control (commit-once disabled) HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_control.txt # Treatment (commit-once enabled) HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_treatment.txt ``` **Expected baseline**: 55.53M ops/s (from recent allocator matrix) **GO threshold**: 55.53M × 1.02 = **56.64M ops/s** (treatment mean) ### 5.2 Safety Tests ```bash # Test 1: LARSON_FIX incompatibility HAKMEM_TINY_LARSON_FIX=1 HAKMEM_FREE_PATH_COMMIT_ONCE=1 ./bench_random_mixed_hakmem 1000000 400 1 # Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible" # Test 2: Invalid route scenario (manually inject via debugging) # Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: C4 route=X not LEGACY" ``` ### 5.3 Performance Profile Optional (if time permits): ```bash # Perf stat comparison HAKMEM_FREE_PATH_COMMIT_ONCE=0 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1 HAKMEM_FREE_PATH_COMMIT_ONCE=1 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1 ``` **Expected**: 8-12% reduction in branches, <1% change in branch misses --- ## 6. Rollback Strategy ### Immediate Rollback (No Recompile) ```bash export HAKMEM_FREE_PATH_COMMIT_ONCE=0 ``` ### Box Removal (Recompile) ```bash make clean BOX_FREE_PATH_COMMIT_ONCE_FIXED=0 make bench_random_mixed_hakmem ``` ### File Reversions - Remove: `core/box/free_path_commit_once_fixed_box.{h,c}` - Revert: `core/front/malloc_tiny_fast.h` (remove Phase 85 block) - Revert: `core/bench_profile.h` (remove refresh call) - Revert: `Makefile` (remove box flag) --- ## 7. Expected Results ### 7.1 Performance Target | Metric | Control | Treatment | Delta | Status | |--------|---------|-----------|-------|--------| | Mean (M ops/s) | 55.53 | 56.64+ | +2.0%+ | GO threshold | | CV (%) | 1.5-2.0 | 1.5-2.0 | stable | required | | Branch reduction | baseline | -8-12% | ~10% | expected | ### 7.2 GO/NO-GO Decision **GO if**: - Treatment mean ≥ 56.64M ops/s (+2.0%) - CV remains stable (<3%) - No regressions in other scenarios (json/mir/vm) - Fail-fast tests pass **NO-GO if**: - Treatment mean < 56.64M ops/s - CV increases significantly (>3%) - Regressions observed - Fail-fast mechanisms fail ### 7.3 Risk Assessment **Low Risk**: - Scope limited to LEGACY route (C4-C7, 129-256 bytes) - ENV gate allows instant rollback - Fail-fast for LARSON_FIX ensures safety - Phase 9/10 MONO optimizations unaffected (fall through on cache miss) **Potential Issues**: - Layout tax: New code path may cause I-cache/register pressure (mitigated by early placement at line ~950) - Indirect call overhead: Cached function pointer may have misprediction cost (likely negligible vs branch reduction) - Route dynamics: If route changes at runtime (unlikely), commit-once becomes stale (requires bench_profile refresh) --- ## 8. Success Criteria Summary 1. ✅ Build completes without errors 2. ✅ Fail-fast tests pass (LARSON_FIX=1, invalid route) 3. ✅ SSOT 10-run treatment ≥ 56.64M ops/s (+2.0%) 4. ✅ CV remains stable (<3%) 5. ✅ No regressions in other scenarios **If all criteria met**: Merge to master, update CURRENT_TASK.md, record in PERFORMANCE_TARGETS_SCORECARD.md **If NO-GO**: Keep as research box, document findings, archive plan. --- ## 9. References - Phase 78-1 pattern: `core/box/tiny_inline_slots_fixed_mode_box.{h,c}` - Free path implementation: `core/front/malloc_tiny_fast.h:919-1221` - LARSON_FIX constraint: `core/box/tiny_larson_fix_env_box.h` - Route snapshot: `core/hakmem_tiny.c:64-65` (g_tiny_route_class, g_tiny_route_snapshot_done) - SSOT validation: `scripts/run_mixed_10_cleanenv.sh`