268 lines
9.1 KiB
Markdown
268 lines
9.1 KiB
Markdown
|
|
# Phase 60: Alloc Pass-Down SSOT - Implementation Guide
|
||
|
|
|
||
|
|
**Date**: 2025-12-17
|
||
|
|
**Status**: **Implemented, NO-GO** (kept as research box)
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
Phase 60 implements a Single Source of Truth (SSOT) pattern for the allocation path, computing ENV snapshot, route kind, C7 ULTRA, and DUALHOT flags once at the entry point and passing them down to the allocation logic.
|
||
|
|
|
||
|
|
**Goal**: Reduce redundant computations (ENV snapshot, route determination, etc.) by computing them once at the entry point.
|
||
|
|
|
||
|
|
**Result**: NO-GO (-0.46% regression). The implementation is kept as a research box with default OFF (`HAKMEM_ALLOC_PASSDOWN_SSOT=0`).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Files Modified
|
||
|
|
|
||
|
|
### 1. New ENV Box
|
||
|
|
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/core/box/alloc_passdown_ssot_env_box.h`
|
||
|
|
|
||
|
|
**Purpose**: Provides the ENV gate for enabling/disabling the SSOT path.
|
||
|
|
|
||
|
|
**Key Functions**:
|
||
|
|
```c
|
||
|
|
// ENV gate (compile-time constant in HAKMEM_BENCH_MINIMAL)
|
||
|
|
static inline int alloc_passdown_ssot_enabled(void);
|
||
|
|
```
|
||
|
|
|
||
|
|
**ENV Variable**: `HAKMEM_ALLOC_PASSDOWN_SSOT` (default: 0, OFF)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### 2. Core Implementation
|
||
|
|
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/core/front/malloc_tiny_fast.h`
|
||
|
|
|
||
|
|
**Key Changes**:
|
||
|
|
|
||
|
|
#### a. Context Structure (Lines 92-97)
|
||
|
|
```c
|
||
|
|
// Alloc context: computed once at entry, passed down
|
||
|
|
typedef struct {
|
||
|
|
const HakmemEnvSnapshot* env; // ENV snapshot (NULL if snapshot disabled)
|
||
|
|
SmallRouteKind route_kind; // Route kind (LEGACY/ULTRA/MID/V7)
|
||
|
|
bool c7_ultra_on; // C7 ULTRA enabled
|
||
|
|
bool alloc_dualhot_on; // Alloc DUALHOT enabled (C0-C3 direct path)
|
||
|
|
} alloc_passdown_context_t;
|
||
|
|
```
|
||
|
|
|
||
|
|
#### b. Context Computation (Lines 200-220)
|
||
|
|
```c
|
||
|
|
// Phase 60: Compute context once at entry point
|
||
|
|
__attribute__((always_inline))
|
||
|
|
static inline alloc_passdown_context_t alloc_passdown_context_compute(int class_idx) {
|
||
|
|
alloc_passdown_context_t ctx;
|
||
|
|
|
||
|
|
// 1. ENV snapshot (once)
|
||
|
|
ctx.env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
|
||
|
|
|
||
|
|
// 2. C7 ULTRA enabled (once)
|
||
|
|
ctx.c7_ultra_on = ctx.env ? ctx.env->tiny_c7_ultra_enabled : tiny_c7_ultra_enabled_env();
|
||
|
|
|
||
|
|
// 3. Alloc DUALHOT enabled (once)
|
||
|
|
ctx.alloc_dualhot_on = alloc_dualhot_enabled();
|
||
|
|
|
||
|
|
// 4. Route kind (once)
|
||
|
|
if (tiny_static_route_ready_fast()) {
|
||
|
|
ctx.route_kind = tiny_static_route_get_kind_fast(class_idx);
|
||
|
|
} else {
|
||
|
|
ctx.route_kind = tiny_policy_hot_get_route_with_env((uint32_t)class_idx, ctx.env);
|
||
|
|
}
|
||
|
|
|
||
|
|
return ctx;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### c. SSOT Allocation Path (Lines 286-392)
|
||
|
|
```c
|
||
|
|
// Phase 60: SSOT mode allocation (uses pre-computed context)
|
||
|
|
__attribute__((always_inline))
|
||
|
|
static inline void* malloc_tiny_fast_for_class_ssot(size_t size, int class_idx,
|
||
|
|
const alloc_passdown_context_t* ctx) {
|
||
|
|
// Stats
|
||
|
|
tiny_front_alloc_stat_inc(class_idx);
|
||
|
|
ALLOC_GATE_STAT_INC_CLASS(class_idx);
|
||
|
|
|
||
|
|
// C7 ULTRA early-exit (uses ctx->c7_ultra_on)
|
||
|
|
if (class_idx == 7 && ctx->c7_ultra_on) {
|
||
|
|
void* ultra_p = tiny_c7_ultra_alloc(size);
|
||
|
|
if (TINY_HOT_LIKELY(ultra_p != NULL)) {
|
||
|
|
return ultra_p;
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
// C0-C3 DUALHOT direct path (uses ctx->alloc_dualhot_on)
|
||
|
|
if ((unsigned)class_idx <= 3u) {
|
||
|
|
if (ctx->alloc_dualhot_on) {
|
||
|
|
void* ptr = tiny_hot_alloc_fast(class_idx);
|
||
|
|
if (TINY_HOT_LIKELY(ptr != NULL)) {
|
||
|
|
return ptr;
|
||
|
|
}
|
||
|
|
return tiny_cold_refill_and_alloc(class_idx);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
// Routing dispatch (uses ctx->route_kind)
|
||
|
|
const tiny_env_cfg_t* env_cfg = tiny_env_cfg();
|
||
|
|
if (TINY_HOT_LIKELY(env_cfg->alloc_route_shape)) {
|
||
|
|
if (TINY_HOT_LIKELY(ctx->route_kind == SMALL_ROUTE_LEGACY)) {
|
||
|
|
void* ptr = tiny_hot_alloc_fast(class_idx);
|
||
|
|
if (TINY_HOT_LIKELY(ptr != NULL)) {
|
||
|
|
return ptr;
|
||
|
|
}
|
||
|
|
return tiny_cold_refill_and_alloc(class_idx);
|
||
|
|
}
|
||
|
|
return tiny_alloc_route_cold(ctx->route_kind, class_idx, size);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Original dispatch (backward compatible)
|
||
|
|
switch (ctx->route_kind) {
|
||
|
|
case SMALL_ROUTE_ULTRA:
|
||
|
|
// ... ULTRA path
|
||
|
|
break;
|
||
|
|
case SMALL_ROUTE_MID_V35:
|
||
|
|
// ... MID v3.5 path
|
||
|
|
break;
|
||
|
|
case SMALL_ROUTE_V7:
|
||
|
|
// ... V7 path
|
||
|
|
break;
|
||
|
|
case SMALL_ROUTE_LEGACY:
|
||
|
|
default:
|
||
|
|
break;
|
||
|
|
}
|
||
|
|
|
||
|
|
// LEGACY fallback
|
||
|
|
void* ptr = tiny_hot_alloc_fast(class_idx);
|
||
|
|
if (TINY_HOT_LIKELY(ptr != NULL)) {
|
||
|
|
return ptr;
|
||
|
|
}
|
||
|
|
return tiny_cold_refill_and_alloc(class_idx);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### d. Entry Point Dispatch (Lines 396-402)
|
||
|
|
```c
|
||
|
|
// Phase 60: Entry point dispatch
|
||
|
|
__attribute__((always_inline))
|
||
|
|
static inline void* malloc_tiny_fast_for_class(size_t size, int class_idx) {
|
||
|
|
// Phase 60: SSOT mode (ENV gated)
|
||
|
|
if (alloc_passdown_ssot_enabled()) {
|
||
|
|
alloc_passdown_context_t ctx = alloc_passdown_context_compute(class_idx);
|
||
|
|
return malloc_tiny_fast_for_class_ssot(size, class_idx, &ctx);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Original path (backward compatible, default)
|
||
|
|
// ... existing implementation ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Design Patterns
|
||
|
|
|
||
|
|
### 1. SSOT (Single Source of Truth)
|
||
|
|
|
||
|
|
**Principle**: Compute expensive values once at the entry point, then pass them down.
|
||
|
|
|
||
|
|
**Benefits** (intended):
|
||
|
|
- Avoid redundant ENV snapshot calls
|
||
|
|
- Avoid redundant route kind computations
|
||
|
|
- Reduce branch mispredictions
|
||
|
|
|
||
|
|
**Actual Result**: The original path already has early exits that avoid expensive computations. The SSOT approach forces upfront computation, negating the benefit of early exits.
|
||
|
|
|
||
|
|
### 2. Pass-Down Pattern
|
||
|
|
|
||
|
|
**Principle**: Pass context via struct pointer to downstream functions.
|
||
|
|
|
||
|
|
**Benefits** (intended):
|
||
|
|
- Clear API boundary
|
||
|
|
- Avoid global state
|
||
|
|
|
||
|
|
**Actual Result**: Struct pass-down introduces ABI overhead (register pressure, stack spills), especially when combined with the upfront computation overhead.
|
||
|
|
|
||
|
|
### 3. Always Inline
|
||
|
|
|
||
|
|
**Principle**: Use `__attribute__((always_inline))` to ensure the context computation is inlined.
|
||
|
|
|
||
|
|
**Benefits**:
|
||
|
|
- Reduce function call overhead
|
||
|
|
- Allow compiler to optimize across boundaries
|
||
|
|
|
||
|
|
**Actual Result**: Inlining works as expected, but the upfront computation overhead remains.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Rollback Procedure
|
||
|
|
|
||
|
|
### Option 1: ENV Variable (Runtime)
|
||
|
|
|
||
|
|
Set `HAKMEM_ALLOC_PASSDOWN_SSOT=0` (default).
|
||
|
|
|
||
|
|
### Option 2: Compile-Time (Build-Time)
|
||
|
|
|
||
|
|
Build without `-DHAKMEM_ALLOC_PASSDOWN_SSOT=1`:
|
||
|
|
```bash
|
||
|
|
make bench_random_mixed_hakmem_minimal
|
||
|
|
```
|
||
|
|
|
||
|
|
### Option 3: Code Removal (Permanent)
|
||
|
|
|
||
|
|
If the research box is no longer needed, remove:
|
||
|
|
1. `/mnt/workdisk/public_share/hakmem/core/box/alloc_passdown_ssot_env_box.h`
|
||
|
|
2. The SSOT dispatch code in `malloc_tiny_fast_for_class()` (lines 397-401)
|
||
|
|
3. The `alloc_passdown_context_t` struct and related functions (lines 92-220)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Lessons Learned
|
||
|
|
|
||
|
|
### 1. Early Exits Are Powerful
|
||
|
|
|
||
|
|
The original allocation path has early exits (C7 ULTRA, DUALHOT) that avoid expensive computations in the common case. Forcing upfront computation negates these benefits.
|
||
|
|
|
||
|
|
### 2. Branch Cost
|
||
|
|
|
||
|
|
Even a single branch check (`if (alloc_passdown_ssot_enabled())`) can introduce measurable overhead in a hot path.
|
||
|
|
|
||
|
|
### 3. Pass-Down Overhead
|
||
|
|
|
||
|
|
Passing a struct by pointer introduces ABI overhead (register pressure, stack spills), especially when the struct contains multiple fields.
|
||
|
|
|
||
|
|
### 4. SSOT Is Not Always Better
|
||
|
|
|
||
|
|
The SSOT pattern works well when there are **many redundant computations** across multiple code paths (e.g., Free-side Phase 19-6C). It **fails** when the original path already has **efficient early exits**.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Future Work
|
||
|
|
|
||
|
|
### Alternative Approaches
|
||
|
|
|
||
|
|
1. **Inline Critical Functions**: Ensure `tiny_c7_ultra_alloc`, `tiny_region_id_write_header`, and `unified_cache_push` are always inlined.
|
||
|
|
2. **Branch Reduction**: Remove branches from the hot path (e.g., combine `if (class_idx == 7 && c7_ultra_on)` into a single check).
|
||
|
|
3. **Profile-Guided Optimization (PGO)**: Use PGO to optimize branch prediction.
|
||
|
|
4. **Direct Dispatch**: For common class indices (C0-C3, C7), use direct dispatch instead of switch statements.
|
||
|
|
|
||
|
|
### Related Phases
|
||
|
|
|
||
|
|
- **Phase 19-6C** (Free-side SSOT): Successful (+1.5%) due to many redundant computations.
|
||
|
|
- **Phase 43** (Branch vs Store): Branch cost is higher than store cost in hot paths.
|
||
|
|
- **Phase 40/41** (ASM Analysis): Focus on functions that are actually executed at runtime.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Box Theory Compliance
|
||
|
|
|
||
|
|
| Principle | Compliant? | Notes |
|
||
|
|
|--------------------------|------------|-----------------------------------------------------------------------|
|
||
|
|
| Single Conversion Point | Yes | Entry point computes context once |
|
||
|
|
| Clear Boundaries | Yes | `alloc_passdown_context_t` defines the boundary |
|
||
|
|
| Reversible | Yes | ENV gate allows rollback |
|
||
|
|
| No Side Effects | Yes | Context is immutable after computation |
|
||
|
|
| Performance | **No** | **-0.46% regression** (NO-GO) |
|
||
|
|
|
||
|
|
**Overall**: Box Theory compliant, but **performance non-compliant** (NO-GO).
|