Files
hakmem/docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_PLAN.md
Moe Charm (CI) e4c5f05355 Phase 86: Free Path Legacy Mask (NO-GO, +0.25%)
## Summary

Implemented Phase 86 "mask-only commit" optimization for free path:
- Bitset mask (0x7f for C0-C6) to identify LEGACY classes
- Direct call to tiny_legacy_fallback_free_base_with_env()
- No indirect function pointers (avoids Phase 85's -0.86% regression)
- Fail-fast on LARSON_FIX=1 (cross-thread validation incompatibility)

## Results (10-run SSOT)

**NO-GO**: +0.25% improvement (threshold: +1.0%)
- Control:    51,750,467 ops/s (CV: 2.26%)
- Treatment:  51,881,055 ops/s (CV: 2.32%)
- Delta:      +0.25% (mean), -0.15% (median)

## Root Cause

Competing optimizations plateau:
1. Phase 9/10 MONO LEGACY (+1.89%) already capture most free path benefit
2. Remaining margin insufficient to overcome:
   - Two branch checks (mask_enabled + has_class)
   - I-cache layout tax in hot path
   - Direct function call overhead

## Phase 85 vs Phase 86

| Metric | Phase 85 | Phase 86 |
|--------|----------|----------|
| Approach | Indirect calls + table | Bitset mask + direct call |
| Result | -0.86% | +0.25% |
| Verdict | NO-GO (regression) | NO-GO (insufficient) |

Phase 86 correctly avoided indirect call penalties but revealed architectural
limit: can't escape Phase 9/10 overlay without restructuring.

## Recommendation

Free path optimization layer has reached practical ceiling:
- Phase 9/10 +1.89% + Phase 6/19/FASTLANE +16-27% ≈ 18-29% total
- Further attempts on ceremony elimination face same constraints
- Recommend focus on different optimization layers (malloc, etc.)

## Files Changed

### New
- core/box/free_path_legacy_mask_box.h (API + globals)
- core/box/free_path_legacy_mask_box.c (refresh logic)

### Modified
- core/bench_profile.h (added refresh call)
- core/front/malloc_tiny_fast.h (added Phase 86 fast path check)
- Makefile (added object files)
- CURRENT_TASK.md (documented result)

All changes conditional on HAKMEM_FREE_PATH_LEGACY_MASK=1 (default OFF).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-18 22:05:34 +09:00

14 KiB
Raw Blame History

Phase 85: Free Path Commit-Once (LEGACY-only) Implementation Plan

1. Objective & Scope

Goal: Eliminate per-operation policy/route/mono ceremony overhead in free_tiny_fast() for LEGACY route by applying Phase 78-1 "commit-once" pattern.

Target: +2.0% improvement (GO threshold)

Scope:

  • LEGACY route only (classes C4-C7, size 129-256 bytes)
  • Does NOT apply to ULTRA/MID/V7 routes
  • Must coexist with existing Phase 9 (MONO DUALHOT) and Phase 10 (MONO LEGACY DIRECT) optimizations
  • Fail-fast if HAKMEM_TINY_LARSON_FIX enabled (owner_tid validation incompatible with commit-once)

Strategy: Cache Route + Handler mapping at init-time (bench_profile refresh boundary), skip 12-20 branches per free() in hot path.


2. Architecture & Design

2.1 Core Pattern (Phase 78-1 Adaptation)

Following Phase 78-1 successful pattern:

┌─────────────────────────────────────────────────────┐
│ Init-time (bench_profile refresh boundary)         │
│ ─────────────────────────────────────────────────   │
│ free_path_commit_once_refresh_from_env()            │
│   ├─ Read ENV: HAKMEM_FREE_PATH_COMMIT_ONCE=0/1    │
│   ├─ Fail-fast: if LARSON_FIX enabled → disable    │
│   ├─ For C4-C7 (LEGACY classes):                   │
│   │    └─ Compute: route_kind, handler function    │
│   │    └─ Store: g_free_path_commit_once_fixed[4]  │
│   └─ Set: g_free_path_commit_once_enabled = true   │
└─────────────────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│ Hot path (every free)                               │
│ ─────────────────────────────────────────────────   │
│ free_tiny_fast()                                    │
│   if (g_free_path_commit_once_enabled_fast()) {    │
│     // NEW: Direct dispatch, skip all ceremony     │
│     auto& cached = g_free_path_commit_once_fixed[  │
│                      class_idx - TINY_C4];          │
│     return cached.handler(ptr, class_idx, heap);   │
│   }                                                 │
│   // Fallback: existing Phase 9/10/policy/route    │
│   ...                                               │
└─────────────────────────────────────────────────────┘

2.2 Cached State Structure

typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap);

struct FreePatchCommitOnceEntry {
    TinyRouteKind route_kind;  // LEGACY, ULTRA, MID, V7 (validation only)
    FreeTinyHandler handler;   // Direct function pointer
    uint8_t valid;             // Safety flag
};

// Global state (4 entries for C4-C7)
extern FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
extern bool g_free_path_commit_once_enabled;

2.3 What Gets Cached

For each LEGACY class (C4-C7):

  • route_kind: Expected to be TINY_ROUTE_LEGACY
  • handler: Function pointer to tiny_legacy_fallback_free_base_with_env or appropriate handler
  • valid: Safety flag (1 if cache entry is valid)

2.4 Eliminated Overhead

Before (15-26 branches per free):

  1. Phase 9 MONO DUALHOT check (3-5 branches)
  2. Phase 10 MONO LEGACY DIRECT check (4-6 branches)
  3. Policy snapshot call small_policy_v7_snapshot() (5-10 branches, potential getenv)
  4. Route computation tiny_route_for_class() (3-5 branches)
  5. Switch on route_kind (1-2 branches)

After (commit-once enabled, LEGACY classes):

  1. Master gate check g_free_path_commit_once_enabled_fast() (1 branch, predicted taken)
  2. Class index range check (1 branch, predicted taken)
  3. Cached entry lookup (0 branches, direct memory load)
  4. Direct handler dispatch (1 indirect call)

Branch reduction: 12-20 branches per LEGACY free → Estimated +2-3% improvement


3. Files to Create/Modify

3.1 New Files (Box Pattern)

core/box/free_path_commit_once_fixed_box.h

#ifndef HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
#define HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H

#include <stdbool.h>
#include <stdint.h>
#include "core/hakmem_tiny_defs.h"

typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap);

struct FreePatchCommitOnceEntry {
    TinyRouteKind route_kind;
    FreeTinyHandler handler;
    uint8_t valid;
};

// Global cache (4 entries for C4-C7)
extern struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
extern bool g_free_path_commit_once_enabled;

// Fast-path API (inlined, no fallback needed)
static inline bool free_path_commit_once_enabled_fast(void) {
    return __builtin_expect(g_free_path_commit_once_enabled, 0);
}

// Refresh (called once at bench_profile boundary)
void free_path_commit_once_refresh_from_env(void);

#endif

core/box/free_path_commit_once_fixed_box.c

#include "free_path_commit_once_fixed_box.h"
#include "core/box/tiny_env_box.h"
#include "core/box/tiny_larson_fix_env_box.h"
#include "core/hakmem_tiny.h"
#include <stdlib.h>
#include <string.h>

struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
bool g_free_path_commit_once_enabled = false;

void free_path_commit_once_refresh_from_env(void) {
    // Read master ENV gate
    const char* env_val = getenv("HAKMEM_FREE_PATH_COMMIT_ONCE");
    bool requested = (env_val && atoi(env_val) == 1);

    if (!requested) {
        g_free_path_commit_once_enabled = false;
        return;
    }

    // Fail-fast: LARSON_FIX incompatible with commit-once
    if (tiny_larson_fix_enabled()) {
        fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n");
        g_free_path_commit_once_enabled = false;
        return;
    }

    // Pre-compute route + handler for C4-C7 (LEGACY)
    for (unsigned i = 0; i < 4; i++) {
        unsigned class_idx = TINY_C4 + i;

        // Route determination (expect LEGACY for C4-C7)
        TinyRouteKind route = tiny_route_for_class(class_idx);

        // Handler selection (simplified, matches free_tiny_fast logic)
        FreeTinyHandler handler = NULL;

        if (route == TINY_ROUTE_LEGACY) {
            handler = tiny_legacy_fallback_free_base_with_env;
        } else {
            // Unexpected route, fail-fast
            fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: C%u route=%d not LEGACY, disabling\n",
                    class_idx, (int)route);
            g_free_path_commit_once_enabled = false;
            return;
        }

        g_free_path_commit_once_fixed[i].route_kind = route;
        g_free_path_commit_once_fixed[i].handler = handler;
        g_free_path_commit_once_fixed[i].valid = 1;
    }

    g_free_path_commit_once_enabled = true;
}

3.2 Modified Files

core/front/malloc_tiny_fast.h (free_tiny_fast function)

Insertion point: Line ~950, before Phase 9/10 checks

static void free_tiny_fast(void* ptr, unsigned class_idx, TinyHeap* heap, ...) {
    // NEW: Phase 85 commit-once fast path (LEGACY classes only)
    #if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED
    if (free_path_commit_once_enabled_fast()) {
        if (class_idx >= TINY_C4 && class_idx <= TINY_C7) {
            const unsigned cache_idx = class_idx - TINY_C4;
            const struct FreePatchCommitOnceEntry* entry =
                &g_free_path_commit_once_fixed[cache_idx];

            if (__builtin_expect(entry->valid, 1)) {
                entry->handler(ptr, class_idx, heap);
                return;
            }
        }
    }
    #endif

    // Existing Phase 9/10/policy/route ceremony (fallback)
    ...
}

core/bench_profile.h (refresh function integration)

Add to refresh_all_env_caches():

void refresh_all_env_caches(void) {
    // ... existing refreshes ...

    #if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED
    free_path_commit_once_refresh_from_env();
    #endif
}

Makefile (box flag)

Add new box flag:

BOX_FREE_PATH_COMMIT_ONCE_FIXED ?= 1
CFLAGS += -DHAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED=$(BOX_FREE_PATH_COMMIT_ONCE_FIXED)

4. Implementation Stages

Stage 1: Box Infrastructure (1-2 hours)

  1. Create free_path_commit_once_fixed_box.h with struct definition, global declarations, fast-path API
  2. Create free_path_commit_once_fixed_box.c with refresh implementation
  3. Add Makefile box flag
  4. Integrate refresh call into core/bench_profile.h
  5. Validation: Compile, verify no build errors

Stage 2: Hot Path Integration (1 hour)

  1. Modify core/front/malloc_tiny_fast.h to add Phase 85 fast path at line ~950
  2. Add class range check (C4-C7) and cache lookup
  3. Add handler dispatch with validity check
  4. Validation: Compile, verify no build errors, run basic functionality test

Stage 3: Fail-Fast Safety (30 min)

  1. Test LARSON_FIX=1 scenario, verify commit-once disabled
  2. Test invalid route scenario (C4-C7 with non-LEGACY route)
  3. Validation: Both scenarios should log fail-fast message and fall back to standard path

Stage 4: A/B Testing (2-3 hours)

  1. Build single binary with box flag enabled
  2. Baseline test: HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh
  3. Treatment test: HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh
  4. Compare mean/median/CV, calculate delta
  5. GO criteria: +2.0% or better

5. Test Plan

5.1 SSOT Baseline (10-run)

# Control (commit-once disabled)
HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_control.txt

# Treatment (commit-once enabled)
HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_treatment.txt

Expected baseline: 55.53M ops/s (from recent allocator matrix)

GO threshold: 55.53M × 1.02 = 56.64M ops/s (treatment mean)

5.2 Safety Tests

# Test 1: LARSON_FIX incompatibility
HAKMEM_TINY_LARSON_FIX=1 HAKMEM_FREE_PATH_COMMIT_ONCE=1 ./bench_random_mixed_hakmem 1000000 400 1
# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible"

# Test 2: Invalid route scenario (manually inject via debugging)
# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: C4 route=X not LEGACY"

5.3 Performance Profile

Optional (if time permits):

# Perf stat comparison
HAKMEM_FREE_PATH_COMMIT_ONCE=0 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1
HAKMEM_FREE_PATH_COMMIT_ONCE=1 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1

Expected: 8-12% reduction in branches, <1% change in branch misses


6. Rollback Strategy

Immediate Rollback (No Recompile)

export HAKMEM_FREE_PATH_COMMIT_ONCE=0

Box Removal (Recompile)

make clean
BOX_FREE_PATH_COMMIT_ONCE_FIXED=0 make bench_random_mixed_hakmem

File Reversions

  • Remove: core/box/free_path_commit_once_fixed_box.{h,c}
  • Revert: core/front/malloc_tiny_fast.h (remove Phase 85 block)
  • Revert: core/bench_profile.h (remove refresh call)
  • Revert: Makefile (remove box flag)

7. Expected Results

7.1 Performance Target

Metric Control Treatment Delta Status
Mean (M ops/s) 55.53 56.64+ +2.0%+ GO threshold
CV (%) 1.5-2.0 1.5-2.0 stable required
Branch reduction baseline -8-12% ~10% expected

7.2 GO/NO-GO Decision

GO if:

  • Treatment mean ≥ 56.64M ops/s (+2.0%)
  • CV remains stable (<3%)
  • No regressions in other scenarios (json/mir/vm)
  • Fail-fast tests pass

NO-GO if:

  • Treatment mean < 56.64M ops/s
  • CV increases significantly (>3%)
  • Regressions observed
  • Fail-fast mechanisms fail

7.3 Risk Assessment

Low Risk:

  • Scope limited to LEGACY route (C4-C7, 129-256 bytes)
  • ENV gate allows instant rollback
  • Fail-fast for LARSON_FIX ensures safety
  • Phase 9/10 MONO optimizations unaffected (fall through on cache miss)

Potential Issues:

  • Layout tax: New code path may cause I-cache/register pressure (mitigated by early placement at line ~950)
  • Indirect call overhead: Cached function pointer may have misprediction cost (likely negligible vs branch reduction)
  • Route dynamics: If route changes at runtime (unlikely), commit-once becomes stale (requires bench_profile refresh)

8. Success Criteria Summary

  1. Build completes without errors
  2. Fail-fast tests pass (LARSON_FIX=1, invalid route)
  3. SSOT 10-run treatment ≥ 56.64M ops/s (+2.0%)
  4. CV remains stable (<3%)
  5. No regressions in other scenarios

If all criteria met: Merge to master, update CURRENT_TASK.md, record in PERFORMANCE_TARGETS_SCORECARD.md

If NO-GO: Keep as research box, document findings, archive plan.


9. References

  • Phase 78-1 pattern: core/box/tiny_inline_slots_fixed_mode_box.{h,c}
  • Free path implementation: core/front/malloc_tiny_fast.h:919-1221
  • LARSON_FIX constraint: core/box/tiny_larson_fix_env_box.h
  • Route snapshot: core/hakmem_tiny.c:64-65 (g_tiny_route_class, g_tiny_route_snapshot_done)
  • SSOT validation: scripts/run_mixed_10_cleanenv.sh