395 lines
14 KiB
Markdown
395 lines
14 KiB
Markdown
|
|
# Phase 85: Free Path Commit-Once (LEGACY-only) Implementation Plan
|
|||
|
|
|
|||
|
|
## 1. Objective & Scope
|
|||
|
|
|
|||
|
|
**Goal**: Eliminate per-operation policy/route/mono ceremony overhead in `free_tiny_fast()` for LEGACY route by applying Phase 78-1 "commit-once" pattern.
|
|||
|
|
|
|||
|
|
**Target**: +2.0% improvement (GO threshold)
|
|||
|
|
|
|||
|
|
**Scope**:
|
|||
|
|
- LEGACY route only (classes C4-C7, size 129-256 bytes)
|
|||
|
|
- Does NOT apply to ULTRA/MID/V7 routes
|
|||
|
|
- Must coexist with existing Phase 9 (MONO DUALHOT) and Phase 10 (MONO LEGACY DIRECT) optimizations
|
|||
|
|
- Fail-fast if HAKMEM_TINY_LARSON_FIX enabled (owner_tid validation incompatible with commit-once)
|
|||
|
|
|
|||
|
|
**Strategy**: Cache Route + Handler mapping at init-time (bench_profile refresh boundary), skip 12-20 branches per free() in hot path.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Architecture & Design
|
|||
|
|
|
|||
|
|
### 2.1 Core Pattern (Phase 78-1 Adaptation)
|
|||
|
|
|
|||
|
|
Following Phase 78-1 successful pattern:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────┐
|
|||
|
|
│ Init-time (bench_profile refresh boundary) │
|
|||
|
|
│ ───────────────────────────────────────────────── │
|
|||
|
|
│ free_path_commit_once_refresh_from_env() │
|
|||
|
|
│ ├─ Read ENV: HAKMEM_FREE_PATH_COMMIT_ONCE=0/1 │
|
|||
|
|
│ ├─ Fail-fast: if LARSON_FIX enabled → disable │
|
|||
|
|
│ ├─ For C4-C7 (LEGACY classes): │
|
|||
|
|
│ │ └─ Compute: route_kind, handler function │
|
|||
|
|
│ │ └─ Store: g_free_path_commit_once_fixed[4] │
|
|||
|
|
│ └─ Set: g_free_path_commit_once_enabled = true │
|
|||
|
|
└─────────────────────────────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
┌─────────────────────────────────────────────────────┐
|
|||
|
|
│ Hot path (every free) │
|
|||
|
|
│ ───────────────────────────────────────────────── │
|
|||
|
|
│ free_tiny_fast() │
|
|||
|
|
│ if (g_free_path_commit_once_enabled_fast()) { │
|
|||
|
|
│ // NEW: Direct dispatch, skip all ceremony │
|
|||
|
|
│ auto& cached = g_free_path_commit_once_fixed[ │
|
|||
|
|
│ class_idx - TINY_C4]; │
|
|||
|
|
│ return cached.handler(ptr, class_idx, heap); │
|
|||
|
|
│ } │
|
|||
|
|
│ // Fallback: existing Phase 9/10/policy/route │
|
|||
|
|
│ ... │
|
|||
|
|
└─────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.2 Cached State Structure
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap);
|
|||
|
|
|
|||
|
|
struct FreePatchCommitOnceEntry {
|
|||
|
|
TinyRouteKind route_kind; // LEGACY, ULTRA, MID, V7 (validation only)
|
|||
|
|
FreeTinyHandler handler; // Direct function pointer
|
|||
|
|
uint8_t valid; // Safety flag
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
// Global state (4 entries for C4-C7)
|
|||
|
|
extern FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
|
|||
|
|
extern bool g_free_path_commit_once_enabled;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.3 What Gets Cached
|
|||
|
|
|
|||
|
|
For each LEGACY class (C4-C7):
|
|||
|
|
- **route_kind**: Expected to be `TINY_ROUTE_LEGACY`
|
|||
|
|
- **handler**: Function pointer to `tiny_legacy_fallback_free_base_with_env` or appropriate handler
|
|||
|
|
- **valid**: Safety flag (1 if cache entry is valid)
|
|||
|
|
|
|||
|
|
### 2.4 Eliminated Overhead
|
|||
|
|
|
|||
|
|
**Before** (15-26 branches per free):
|
|||
|
|
1. Phase 9 MONO DUALHOT check (3-5 branches)
|
|||
|
|
2. Phase 10 MONO LEGACY DIRECT check (4-6 branches)
|
|||
|
|
3. Policy snapshot call `small_policy_v7_snapshot()` (5-10 branches, potential getenv)
|
|||
|
|
4. Route computation `tiny_route_for_class()` (3-5 branches)
|
|||
|
|
5. Switch on route_kind (1-2 branches)
|
|||
|
|
|
|||
|
|
**After** (commit-once enabled, LEGACY classes):
|
|||
|
|
1. Master gate check `g_free_path_commit_once_enabled_fast()` (1 branch, predicted taken)
|
|||
|
|
2. Class index range check (1 branch, predicted taken)
|
|||
|
|
3. Cached entry lookup (0 branches, direct memory load)
|
|||
|
|
4. Direct handler dispatch (1 indirect call)
|
|||
|
|
|
|||
|
|
**Branch reduction**: 12-20 branches per LEGACY free → **Estimated +2-3% improvement**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Files to Create/Modify
|
|||
|
|
|
|||
|
|
### 3.1 New Files (Box Pattern)
|
|||
|
|
|
|||
|
|
#### `core/box/free_path_commit_once_fixed_box.h`
|
|||
|
|
```c
|
|||
|
|
#ifndef HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
|
|||
|
|
#define HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
|
|||
|
|
|
|||
|
|
#include <stdbool.h>
|
|||
|
|
#include <stdint.h>
|
|||
|
|
#include "core/hakmem_tiny_defs.h"
|
|||
|
|
|
|||
|
|
typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap);
|
|||
|
|
|
|||
|
|
struct FreePatchCommitOnceEntry {
|
|||
|
|
TinyRouteKind route_kind;
|
|||
|
|
FreeTinyHandler handler;
|
|||
|
|
uint8_t valid;
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
// Global cache (4 entries for C4-C7)
|
|||
|
|
extern struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
|
|||
|
|
extern bool g_free_path_commit_once_enabled;
|
|||
|
|
|
|||
|
|
// Fast-path API (inlined, no fallback needed)
|
|||
|
|
static inline bool free_path_commit_once_enabled_fast(void) {
|
|||
|
|
return __builtin_expect(g_free_path_commit_once_enabled, 0);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Refresh (called once at bench_profile boundary)
|
|||
|
|
void free_path_commit_once_refresh_from_env(void);
|
|||
|
|
|
|||
|
|
#endif
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### `core/box/free_path_commit_once_fixed_box.c`
|
|||
|
|
```c
|
|||
|
|
#include "free_path_commit_once_fixed_box.h"
|
|||
|
|
#include "core/box/tiny_env_box.h"
|
|||
|
|
#include "core/box/tiny_larson_fix_env_box.h"
|
|||
|
|
#include "core/hakmem_tiny.h"
|
|||
|
|
#include <stdlib.h>
|
|||
|
|
#include <string.h>
|
|||
|
|
|
|||
|
|
struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
|
|||
|
|
bool g_free_path_commit_once_enabled = false;
|
|||
|
|
|
|||
|
|
void free_path_commit_once_refresh_from_env(void) {
|
|||
|
|
// Read master ENV gate
|
|||
|
|
const char* env_val = getenv("HAKMEM_FREE_PATH_COMMIT_ONCE");
|
|||
|
|
bool requested = (env_val && atoi(env_val) == 1);
|
|||
|
|
|
|||
|
|
if (!requested) {
|
|||
|
|
g_free_path_commit_once_enabled = false;
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Fail-fast: LARSON_FIX incompatible with commit-once
|
|||
|
|
if (tiny_larson_fix_enabled()) {
|
|||
|
|
fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n");
|
|||
|
|
g_free_path_commit_once_enabled = false;
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Pre-compute route + handler for C4-C7 (LEGACY)
|
|||
|
|
for (unsigned i = 0; i < 4; i++) {
|
|||
|
|
unsigned class_idx = TINY_C4 + i;
|
|||
|
|
|
|||
|
|
// Route determination (expect LEGACY for C4-C7)
|
|||
|
|
TinyRouteKind route = tiny_route_for_class(class_idx);
|
|||
|
|
|
|||
|
|
// Handler selection (simplified, matches free_tiny_fast logic)
|
|||
|
|
FreeTinyHandler handler = NULL;
|
|||
|
|
|
|||
|
|
if (route == TINY_ROUTE_LEGACY) {
|
|||
|
|
handler = tiny_legacy_fallback_free_base_with_env;
|
|||
|
|
} else {
|
|||
|
|
// Unexpected route, fail-fast
|
|||
|
|
fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: C%u route=%d not LEGACY, disabling\n",
|
|||
|
|
class_idx, (int)route);
|
|||
|
|
g_free_path_commit_once_enabled = false;
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
g_free_path_commit_once_fixed[i].route_kind = route;
|
|||
|
|
g_free_path_commit_once_fixed[i].handler = handler;
|
|||
|
|
g_free_path_commit_once_fixed[i].valid = 1;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
g_free_path_commit_once_enabled = true;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.2 Modified Files
|
|||
|
|
|
|||
|
|
#### `core/front/malloc_tiny_fast.h` (free_tiny_fast function)
|
|||
|
|
|
|||
|
|
**Insertion point**: Line ~950, before Phase 9/10 checks
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
static void free_tiny_fast(void* ptr, unsigned class_idx, TinyHeap* heap, ...) {
|
|||
|
|
// NEW: Phase 85 commit-once fast path (LEGACY classes only)
|
|||
|
|
#if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED
|
|||
|
|
if (free_path_commit_once_enabled_fast()) {
|
|||
|
|
if (class_idx >= TINY_C4 && class_idx <= TINY_C7) {
|
|||
|
|
const unsigned cache_idx = class_idx - TINY_C4;
|
|||
|
|
const struct FreePatchCommitOnceEntry* entry =
|
|||
|
|
&g_free_path_commit_once_fixed[cache_idx];
|
|||
|
|
|
|||
|
|
if (__builtin_expect(entry->valid, 1)) {
|
|||
|
|
entry->handler(ptr, class_idx, heap);
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
#endif
|
|||
|
|
|
|||
|
|
// Existing Phase 9/10/policy/route ceremony (fallback)
|
|||
|
|
...
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### `core/bench_profile.h` (refresh function integration)
|
|||
|
|
|
|||
|
|
Add to `refresh_all_env_caches()`:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
void refresh_all_env_caches(void) {
|
|||
|
|
// ... existing refreshes ...
|
|||
|
|
|
|||
|
|
#if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED
|
|||
|
|
free_path_commit_once_refresh_from_env();
|
|||
|
|
#endif
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### `Makefile` (box flag)
|
|||
|
|
|
|||
|
|
Add new box flag:
|
|||
|
|
|
|||
|
|
```makefile
|
|||
|
|
BOX_FREE_PATH_COMMIT_ONCE_FIXED ?= 1
|
|||
|
|
CFLAGS += -DHAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED=$(BOX_FREE_PATH_COMMIT_ONCE_FIXED)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Implementation Stages
|
|||
|
|
|
|||
|
|
### Stage 1: Box Infrastructure (1-2 hours)
|
|||
|
|
1. Create `free_path_commit_once_fixed_box.h` with struct definition, global declarations, fast-path API
|
|||
|
|
2. Create `free_path_commit_once_fixed_box.c` with refresh implementation
|
|||
|
|
3. Add Makefile box flag
|
|||
|
|
4. Integrate refresh call into `core/bench_profile.h`
|
|||
|
|
5. **Validation**: Compile, verify no build errors
|
|||
|
|
|
|||
|
|
### Stage 2: Hot Path Integration (1 hour)
|
|||
|
|
1. Modify `core/front/malloc_tiny_fast.h` to add Phase 85 fast path at line ~950
|
|||
|
|
2. Add class range check (C4-C7) and cache lookup
|
|||
|
|
3. Add handler dispatch with validity check
|
|||
|
|
4. **Validation**: Compile, verify no build errors, run basic functionality test
|
|||
|
|
|
|||
|
|
### Stage 3: Fail-Fast Safety (30 min)
|
|||
|
|
1. Test LARSON_FIX=1 scenario, verify commit-once disabled
|
|||
|
|
2. Test invalid route scenario (C4-C7 with non-LEGACY route)
|
|||
|
|
3. **Validation**: Both scenarios should log fail-fast message and fall back to standard path
|
|||
|
|
|
|||
|
|
### Stage 4: A/B Testing (2-3 hours)
|
|||
|
|
1. Build single binary with box flag enabled
|
|||
|
|
2. Baseline test: `HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh`
|
|||
|
|
3. Treatment test: `HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh`
|
|||
|
|
4. Compare mean/median/CV, calculate delta
|
|||
|
|
5. **GO criteria**: +2.0% or better
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Test Plan
|
|||
|
|
|
|||
|
|
### 5.1 SSOT Baseline (10-run)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Control (commit-once disabled)
|
|||
|
|
HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_control.txt
|
|||
|
|
|
|||
|
|
# Treatment (commit-once enabled)
|
|||
|
|
HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_treatment.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected baseline**: 55.53M ops/s (from recent allocator matrix)
|
|||
|
|
|
|||
|
|
**GO threshold**: 55.53M × 1.02 = **56.64M ops/s** (treatment mean)
|
|||
|
|
|
|||
|
|
### 5.2 Safety Tests
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Test 1: LARSON_FIX incompatibility
|
|||
|
|
HAKMEM_TINY_LARSON_FIX=1 HAKMEM_FREE_PATH_COMMIT_ONCE=1 ./bench_random_mixed_hakmem 1000000 400 1
|
|||
|
|
# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible"
|
|||
|
|
|
|||
|
|
# Test 2: Invalid route scenario (manually inject via debugging)
|
|||
|
|
# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: C4 route=X not LEGACY"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5.3 Performance Profile
|
|||
|
|
|
|||
|
|
Optional (if time permits):
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Perf stat comparison
|
|||
|
|
HAKMEM_FREE_PATH_COMMIT_ONCE=0 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1
|
|||
|
|
HAKMEM_FREE_PATH_COMMIT_ONCE=1 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected**: 8-12% reduction in branches, <1% change in branch misses
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Rollback Strategy
|
|||
|
|
|
|||
|
|
### Immediate Rollback (No Recompile)
|
|||
|
|
```bash
|
|||
|
|
export HAKMEM_FREE_PATH_COMMIT_ONCE=0
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Box Removal (Recompile)
|
|||
|
|
```bash
|
|||
|
|
make clean
|
|||
|
|
BOX_FREE_PATH_COMMIT_ONCE_FIXED=0 make bench_random_mixed_hakmem
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### File Reversions
|
|||
|
|
- Remove: `core/box/free_path_commit_once_fixed_box.{h,c}`
|
|||
|
|
- Revert: `core/front/malloc_tiny_fast.h` (remove Phase 85 block)
|
|||
|
|
- Revert: `core/bench_profile.h` (remove refresh call)
|
|||
|
|
- Revert: `Makefile` (remove box flag)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Expected Results
|
|||
|
|
|
|||
|
|
### 7.1 Performance Target
|
|||
|
|
|
|||
|
|
| Metric | Control | Treatment | Delta | Status |
|
|||
|
|
|--------|---------|-----------|-------|--------|
|
|||
|
|
| Mean (M ops/s) | 55.53 | 56.64+ | +2.0%+ | GO threshold |
|
|||
|
|
| CV (%) | 1.5-2.0 | 1.5-2.0 | stable | required |
|
|||
|
|
| Branch reduction | baseline | -8-12% | ~10% | expected |
|
|||
|
|
|
|||
|
|
### 7.2 GO/NO-GO Decision
|
|||
|
|
|
|||
|
|
**GO if**:
|
|||
|
|
- Treatment mean ≥ 56.64M ops/s (+2.0%)
|
|||
|
|
- CV remains stable (<3%)
|
|||
|
|
- No regressions in other scenarios (json/mir/vm)
|
|||
|
|
- Fail-fast tests pass
|
|||
|
|
|
|||
|
|
**NO-GO if**:
|
|||
|
|
- Treatment mean < 56.64M ops/s
|
|||
|
|
- CV increases significantly (>3%)
|
|||
|
|
- Regressions observed
|
|||
|
|
- Fail-fast mechanisms fail
|
|||
|
|
|
|||
|
|
### 7.3 Risk Assessment
|
|||
|
|
|
|||
|
|
**Low Risk**:
|
|||
|
|
- Scope limited to LEGACY route (C4-C7, 129-256 bytes)
|
|||
|
|
- ENV gate allows instant rollback
|
|||
|
|
- Fail-fast for LARSON_FIX ensures safety
|
|||
|
|
- Phase 9/10 MONO optimizations unaffected (fall through on cache miss)
|
|||
|
|
|
|||
|
|
**Potential Issues**:
|
|||
|
|
- Layout tax: New code path may cause I-cache/register pressure (mitigated by early placement at line ~950)
|
|||
|
|
- Indirect call overhead: Cached function pointer may have misprediction cost (likely negligible vs branch reduction)
|
|||
|
|
- Route dynamics: If route changes at runtime (unlikely), commit-once becomes stale (requires bench_profile refresh)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Success Criteria Summary
|
|||
|
|
|
|||
|
|
1. ✅ Build completes without errors
|
|||
|
|
2. ✅ Fail-fast tests pass (LARSON_FIX=1, invalid route)
|
|||
|
|
3. ✅ SSOT 10-run treatment ≥ 56.64M ops/s (+2.0%)
|
|||
|
|
4. ✅ CV remains stable (<3%)
|
|||
|
|
5. ✅ No regressions in other scenarios
|
|||
|
|
|
|||
|
|
**If all criteria met**: Merge to master, update CURRENT_TASK.md, record in PERFORMANCE_TARGETS_SCORECARD.md
|
|||
|
|
|
|||
|
|
**If NO-GO**: Keep as research box, document findings, archive plan.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. References
|
|||
|
|
|
|||
|
|
- Phase 78-1 pattern: `core/box/tiny_inline_slots_fixed_mode_box.{h,c}`
|
|||
|
|
- Free path implementation: `core/front/malloc_tiny_fast.h:919-1221`
|
|||
|
|
- LARSON_FIX constraint: `core/box/tiny_larson_fix_env_box.h`
|
|||
|
|
- Route snapshot: `core/hakmem_tiny.c:64-65` (g_tiny_route_class, g_tiny_route_snapshot_done)
|
|||
|
|
- SSOT validation: `scripts/run_mixed_10_cleanenv.sh`
|