## Summary Implemented Phase 86 "mask-only commit" optimization for free path: - Bitset mask (0x7f for C0-C6) to identify LEGACY classes - Direct call to tiny_legacy_fallback_free_base_with_env() - No indirect function pointers (avoids Phase 85's -0.86% regression) - Fail-fast on LARSON_FIX=1 (cross-thread validation incompatibility) ## Results (10-run SSOT) **NO-GO**: +0.25% improvement (threshold: +1.0%) - Control: 51,750,467 ops/s (CV: 2.26%) - Treatment: 51,881,055 ops/s (CV: 2.32%) - Delta: +0.25% (mean), -0.15% (median) ## Root Cause Competing optimizations plateau: 1. Phase 9/10 MONO LEGACY (+1.89%) already capture most free path benefit 2. Remaining margin insufficient to overcome: - Two branch checks (mask_enabled + has_class) - I-cache layout tax in hot path - Direct function call overhead ## Phase 85 vs Phase 86 | Metric | Phase 85 | Phase 86 | |--------|----------|----------| | Approach | Indirect calls + table | Bitset mask + direct call | | Result | -0.86% | +0.25% | | Verdict | NO-GO (regression) | NO-GO (insufficient) | Phase 86 correctly avoided indirect call penalties but revealed architectural limit: can't escape Phase 9/10 overlay without restructuring. ## Recommendation Free path optimization layer has reached practical ceiling: - Phase 9/10 +1.89% + Phase 6/19/FASTLANE +16-27% ≈ 18-29% total - Further attempts on ceremony elimination face same constraints - Recommend focus on different optimization layers (malloc, etc.) ## Files Changed ### New - core/box/free_path_legacy_mask_box.h (API + globals) - core/box/free_path_legacy_mask_box.c (refresh logic) ### Modified - core/bench_profile.h (added refresh call) - core/front/malloc_tiny_fast.h (added Phase 86 fast path check) - Makefile (added object files) - CURRENT_TASK.md (documented result) All changes conditional on HAKMEM_FREE_PATH_LEGACY_MASK=1 (default OFF). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
14 KiB
Phase 85: Free Path Commit-Once (LEGACY-only) Implementation Plan
1. Objective & Scope
Goal: Eliminate per-operation policy/route/mono ceremony overhead in free_tiny_fast() for LEGACY route by applying Phase 78-1 "commit-once" pattern.
Target: +2.0% improvement (GO threshold)
Scope:
- LEGACY route only (classes C4-C7, size 129-256 bytes)
- Does NOT apply to ULTRA/MID/V7 routes
- Must coexist with existing Phase 9 (MONO DUALHOT) and Phase 10 (MONO LEGACY DIRECT) optimizations
- Fail-fast if HAKMEM_TINY_LARSON_FIX enabled (owner_tid validation incompatible with commit-once)
Strategy: Cache Route + Handler mapping at init-time (bench_profile refresh boundary), skip 12-20 branches per free() in hot path.
2. Architecture & Design
2.1 Core Pattern (Phase 78-1 Adaptation)
Following Phase 78-1 successful pattern:
┌─────────────────────────────────────────────────────┐
│ Init-time (bench_profile refresh boundary) │
│ ───────────────────────────────────────────────── │
│ free_path_commit_once_refresh_from_env() │
│ ├─ Read ENV: HAKMEM_FREE_PATH_COMMIT_ONCE=0/1 │
│ ├─ Fail-fast: if LARSON_FIX enabled → disable │
│ ├─ For C4-C7 (LEGACY classes): │
│ │ └─ Compute: route_kind, handler function │
│ │ └─ Store: g_free_path_commit_once_fixed[4] │
│ └─ Set: g_free_path_commit_once_enabled = true │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Hot path (every free) │
│ ───────────────────────────────────────────────── │
│ free_tiny_fast() │
│ if (g_free_path_commit_once_enabled_fast()) { │
│ // NEW: Direct dispatch, skip all ceremony │
│ auto& cached = g_free_path_commit_once_fixed[ │
│ class_idx - TINY_C4]; │
│ return cached.handler(ptr, class_idx, heap); │
│ } │
│ // Fallback: existing Phase 9/10/policy/route │
│ ... │
└─────────────────────────────────────────────────────┘
2.2 Cached State Structure
typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap);
struct FreePatchCommitOnceEntry {
TinyRouteKind route_kind; // LEGACY, ULTRA, MID, V7 (validation only)
FreeTinyHandler handler; // Direct function pointer
uint8_t valid; // Safety flag
};
// Global state (4 entries for C4-C7)
extern FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
extern bool g_free_path_commit_once_enabled;
2.3 What Gets Cached
For each LEGACY class (C4-C7):
- route_kind: Expected to be
TINY_ROUTE_LEGACY - handler: Function pointer to
tiny_legacy_fallback_free_base_with_envor appropriate handler - valid: Safety flag (1 if cache entry is valid)
2.4 Eliminated Overhead
Before (15-26 branches per free):
- Phase 9 MONO DUALHOT check (3-5 branches)
- Phase 10 MONO LEGACY DIRECT check (4-6 branches)
- Policy snapshot call
small_policy_v7_snapshot()(5-10 branches, potential getenv) - Route computation
tiny_route_for_class()(3-5 branches) - Switch on route_kind (1-2 branches)
After (commit-once enabled, LEGACY classes):
- Master gate check
g_free_path_commit_once_enabled_fast()(1 branch, predicted taken) - Class index range check (1 branch, predicted taken)
- Cached entry lookup (0 branches, direct memory load)
- Direct handler dispatch (1 indirect call)
Branch reduction: 12-20 branches per LEGACY free → Estimated +2-3% improvement
3. Files to Create/Modify
3.1 New Files (Box Pattern)
core/box/free_path_commit_once_fixed_box.h
#ifndef HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
#define HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
#include <stdbool.h>
#include <stdint.h>
#include "core/hakmem_tiny_defs.h"
typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap);
struct FreePatchCommitOnceEntry {
TinyRouteKind route_kind;
FreeTinyHandler handler;
uint8_t valid;
};
// Global cache (4 entries for C4-C7)
extern struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
extern bool g_free_path_commit_once_enabled;
// Fast-path API (inlined, no fallback needed)
static inline bool free_path_commit_once_enabled_fast(void) {
return __builtin_expect(g_free_path_commit_once_enabled, 0);
}
// Refresh (called once at bench_profile boundary)
void free_path_commit_once_refresh_from_env(void);
#endif
core/box/free_path_commit_once_fixed_box.c
#include "free_path_commit_once_fixed_box.h"
#include "core/box/tiny_env_box.h"
#include "core/box/tiny_larson_fix_env_box.h"
#include "core/hakmem_tiny.h"
#include <stdlib.h>
#include <string.h>
struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
bool g_free_path_commit_once_enabled = false;
void free_path_commit_once_refresh_from_env(void) {
// Read master ENV gate
const char* env_val = getenv("HAKMEM_FREE_PATH_COMMIT_ONCE");
bool requested = (env_val && atoi(env_val) == 1);
if (!requested) {
g_free_path_commit_once_enabled = false;
return;
}
// Fail-fast: LARSON_FIX incompatible with commit-once
if (tiny_larson_fix_enabled()) {
fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n");
g_free_path_commit_once_enabled = false;
return;
}
// Pre-compute route + handler for C4-C7 (LEGACY)
for (unsigned i = 0; i < 4; i++) {
unsigned class_idx = TINY_C4 + i;
// Route determination (expect LEGACY for C4-C7)
TinyRouteKind route = tiny_route_for_class(class_idx);
// Handler selection (simplified, matches free_tiny_fast logic)
FreeTinyHandler handler = NULL;
if (route == TINY_ROUTE_LEGACY) {
handler = tiny_legacy_fallback_free_base_with_env;
} else {
// Unexpected route, fail-fast
fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: C%u route=%d not LEGACY, disabling\n",
class_idx, (int)route);
g_free_path_commit_once_enabled = false;
return;
}
g_free_path_commit_once_fixed[i].route_kind = route;
g_free_path_commit_once_fixed[i].handler = handler;
g_free_path_commit_once_fixed[i].valid = 1;
}
g_free_path_commit_once_enabled = true;
}
3.2 Modified Files
core/front/malloc_tiny_fast.h (free_tiny_fast function)
Insertion point: Line ~950, before Phase 9/10 checks
static void free_tiny_fast(void* ptr, unsigned class_idx, TinyHeap* heap, ...) {
// NEW: Phase 85 commit-once fast path (LEGACY classes only)
#if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED
if (free_path_commit_once_enabled_fast()) {
if (class_idx >= TINY_C4 && class_idx <= TINY_C7) {
const unsigned cache_idx = class_idx - TINY_C4;
const struct FreePatchCommitOnceEntry* entry =
&g_free_path_commit_once_fixed[cache_idx];
if (__builtin_expect(entry->valid, 1)) {
entry->handler(ptr, class_idx, heap);
return;
}
}
}
#endif
// Existing Phase 9/10/policy/route ceremony (fallback)
...
}
core/bench_profile.h (refresh function integration)
Add to refresh_all_env_caches():
void refresh_all_env_caches(void) {
// ... existing refreshes ...
#if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED
free_path_commit_once_refresh_from_env();
#endif
}
Makefile (box flag)
Add new box flag:
BOX_FREE_PATH_COMMIT_ONCE_FIXED ?= 1
CFLAGS += -DHAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED=$(BOX_FREE_PATH_COMMIT_ONCE_FIXED)
4. Implementation Stages
Stage 1: Box Infrastructure (1-2 hours)
- Create
free_path_commit_once_fixed_box.hwith struct definition, global declarations, fast-path API - Create
free_path_commit_once_fixed_box.cwith refresh implementation - Add Makefile box flag
- Integrate refresh call into
core/bench_profile.h - Validation: Compile, verify no build errors
Stage 2: Hot Path Integration (1 hour)
- Modify
core/front/malloc_tiny_fast.hto add Phase 85 fast path at line ~950 - Add class range check (C4-C7) and cache lookup
- Add handler dispatch with validity check
- Validation: Compile, verify no build errors, run basic functionality test
Stage 3: Fail-Fast Safety (30 min)
- Test LARSON_FIX=1 scenario, verify commit-once disabled
- Test invalid route scenario (C4-C7 with non-LEGACY route)
- Validation: Both scenarios should log fail-fast message and fall back to standard path
Stage 4: A/B Testing (2-3 hours)
- Build single binary with box flag enabled
- Baseline test:
HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh - Treatment test:
HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh - Compare mean/median/CV, calculate delta
- GO criteria: +2.0% or better
5. Test Plan
5.1 SSOT Baseline (10-run)
# Control (commit-once disabled)
HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_control.txt
# Treatment (commit-once enabled)
HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_treatment.txt
Expected baseline: 55.53M ops/s (from recent allocator matrix)
GO threshold: 55.53M × 1.02 = 56.64M ops/s (treatment mean)
5.2 Safety Tests
# Test 1: LARSON_FIX incompatibility
HAKMEM_TINY_LARSON_FIX=1 HAKMEM_FREE_PATH_COMMIT_ONCE=1 ./bench_random_mixed_hakmem 1000000 400 1
# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible"
# Test 2: Invalid route scenario (manually inject via debugging)
# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: C4 route=X not LEGACY"
5.3 Performance Profile
Optional (if time permits):
# Perf stat comparison
HAKMEM_FREE_PATH_COMMIT_ONCE=0 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1
HAKMEM_FREE_PATH_COMMIT_ONCE=1 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1
Expected: 8-12% reduction in branches, <1% change in branch misses
6. Rollback Strategy
Immediate Rollback (No Recompile)
export HAKMEM_FREE_PATH_COMMIT_ONCE=0
Box Removal (Recompile)
make clean
BOX_FREE_PATH_COMMIT_ONCE_FIXED=0 make bench_random_mixed_hakmem
File Reversions
- Remove:
core/box/free_path_commit_once_fixed_box.{h,c} - Revert:
core/front/malloc_tiny_fast.h(remove Phase 85 block) - Revert:
core/bench_profile.h(remove refresh call) - Revert:
Makefile(remove box flag)
7. Expected Results
7.1 Performance Target
| Metric | Control | Treatment | Delta | Status |
|---|---|---|---|---|
| Mean (M ops/s) | 55.53 | 56.64+ | +2.0%+ | GO threshold |
| CV (%) | 1.5-2.0 | 1.5-2.0 | stable | required |
| Branch reduction | baseline | -8-12% | ~10% | expected |
7.2 GO/NO-GO Decision
GO if:
- Treatment mean ≥ 56.64M ops/s (+2.0%)
- CV remains stable (<3%)
- No regressions in other scenarios (json/mir/vm)
- Fail-fast tests pass
NO-GO if:
- Treatment mean < 56.64M ops/s
- CV increases significantly (>3%)
- Regressions observed
- Fail-fast mechanisms fail
7.3 Risk Assessment
Low Risk:
- Scope limited to LEGACY route (C4-C7, 129-256 bytes)
- ENV gate allows instant rollback
- Fail-fast for LARSON_FIX ensures safety
- Phase 9/10 MONO optimizations unaffected (fall through on cache miss)
Potential Issues:
- Layout tax: New code path may cause I-cache/register pressure (mitigated by early placement at line ~950)
- Indirect call overhead: Cached function pointer may have misprediction cost (likely negligible vs branch reduction)
- Route dynamics: If route changes at runtime (unlikely), commit-once becomes stale (requires bench_profile refresh)
8. Success Criteria Summary
- ✅ Build completes without errors
- ✅ Fail-fast tests pass (LARSON_FIX=1, invalid route)
- ✅ SSOT 10-run treatment ≥ 56.64M ops/s (+2.0%)
- ✅ CV remains stable (<3%)
- ✅ No regressions in other scenarios
If all criteria met: Merge to master, update CURRENT_TASK.md, record in PERFORMANCE_TARGETS_SCORECARD.md
If NO-GO: Keep as research box, document findings, archive plan.
9. References
- Phase 78-1 pattern:
core/box/tiny_inline_slots_fixed_mode_box.{h,c} - Free path implementation:
core/front/malloc_tiny_fast.h:919-1221 - LARSON_FIX constraint:
core/box/tiny_larson_fix_env_box.h - Route snapshot:
core/hakmem_tiny.c:64-65(g_tiny_route_class, g_tiny_route_snapshot_done) - SSOT validation:
scripts/run_mixed_10_cleanenv.sh