hakmem/PHASE19_FRONTEND_METRICS_FINDINGS.md

# Phase 19: Frontend Layer Metrics Analysis

## Phase 19-1: Box FrontMetrics Implementation ✅

**Status**: COMPLETE (2025-11-16)

**Implementation**:
- Created `core/box/front_metrics_box.h` - Per-class hit/miss counters
- Created `core/box/front_metrics_box.c` - CSV reporting with percentage analysis
- Added instrumentation to all frontend layers in `tiny_alloc_fast.inc.h`
- ENV controls: `HAKMEM_TINY_FRONT_METRICS=1`, `HAKMEM_TINY_FRONT_DUMP=1`

**Build fix**: Added missing `hakmem_smallmid_superslab.o` to Makefile

---

## Phase 19-2: Benchmark Results and Analysis ✅

**Benchmark**: `bench_random_mixed_hakmem 500000 4096 42`
**Workload**: Random allocations 16-1040 bytes, 500K iterations

### Layer Hit Rates (Classes C2/C3)

```
Class     UH_hit    HV2_hit     C5_hit     FC_hit    SFC_hit    SLL_hit        Total
------|----------|----------|----------|----------|----------|----------|-------------
C2           455       3,450          0          0          0          0         3,905
C3            13       7,585          0          0          0          0         7,598

Percentages:
C2:  UltraHot=11.7%, HeapV2=88.3%
C3:  UltraHot=0.2%,  HeapV2=99.8%
```

### Key Findings

1. **HeapV2 Dominates (>80% hit rate)**
   - C2: 88.3% hit rate (3,450 / 3,905 allocations)
   - C3: 99.8% hit rate (7,585 / 7,598 allocations)
   - **Recommendation**: ✅ Keep and optimize (hot path)

2. **UltraHot Marginal (<12% hit rate)**
   - C2: 11.7% hit rate (455 / 3,905 allocations)
   - C3: 0.2% hit rate (13 / 7,598 allocations)
   - **Recommendation**: ⚠️ Consider pruning (low value, adds branch overhead)

3. **FastCache DISABLED**
   - Gated by `g_fastcache_enable=0` (default)
   - 0% hit rate across all classes
   - **Status**: Not in use (OFF by default)

4. **SFC DISABLED**
   - Gated by `g_sfc_enabled=0` (default)
   - 0% hit rate across all classes
   - **Status**: Not in use (OFF by default)

5. **Class5 Dedicated Path DISABLED**
   - `g_front_class5_hit[]=0` for all classes
   - **Status**: Not in use (OFF by default or C5 not hit in this workload)

6. **TLS SLL Not Reached**
   - 0% hit rate because earlier layers (UltraHot + HeapV2) catch 100%
   - **Status**: Enabled but bypassed (earlier layers are effective)

### Layer Execution Order

```
FastCache (C0-C3) [DISABLED]
    ↓
SFC (all classes) [DISABLED]
    ↓
UltraHot (C2-C5) [ENABLED] → 0.2-11.7% hit rate
    ↓
HeapV2 (C0-C3) [ENABLED] → 88-99% hit rate ✅
    ↓
Class5 (C5 only) [DISABLED or N/A]
    ↓
TLS SLL (all classes) [ENABLED but not reached]
    ↓
SuperSlab (fallback)
```

---

## Analysis Recommendations (from Box FrontMetrics)

1. **Layers with >80% hit rate**: ✅ Keep and optimize (hot path)
   - **HeapV2**: 88-99% hit rate → Primary workhorse for C2/C3

2. **Layers with <5% hit rate**: ⚠️ Consider pruning (dead weight)
   - **FastCache**: 0% (disabled)
   - **SFC**: 0% (disabled)
   - **Class5**: 0% (disabled or N/A)
   - **TLS SLL**: 0% (not reached)

3. **Multiple layers 5-20%**: ⚠️ Potential redundancy, test pruning
   - **UltraHot**: 0.2-11.7% → Adds branch overhead for minimal benefit

---

## Phase 19-3: Next Steps (Box FrontPrune)

**Goal**: Add ENV switches to selectively disable layers for A/B testing

**Proposed ENV Controls**:
```bash
HAKMEM_TINY_FRONT_DISABLE_ULTRAHOT=1   # Disable UltraHot magazine
HAKMEM_TINY_FRONT_DISABLE_HEAPV2=1     # Disable HeapV2 magazine
HAKMEM_TINY_FRONT_DISABLE_CLASS5=1     # Disable Class5 dedicated path
HAKMEM_TINY_FRONT_ENABLE_FC=1          # Enable FastCache (currently OFF)
HAKMEM_TINY_FRONT_ENABLE_SFC=1         # Enable SFC (currently OFF)
```

**A/B Test Scenarios**:
1. **Baseline**: Current state (UltraHot + HeapV2)
2. **Test 1**: HeapV2 only (disable UltraHot) → Expected: Minimal perf loss (<12%)
3. **Test 2**: UltraHot only (disable HeapV2) → Expected: Major perf loss (88-99%)
4. **Test 3**: Enable FC + SFC, disable UltraHot/HeapV2 → Test classic TLS cache layers
5. **Test 4**: HeapV2 + FC + SFC (disable UltraHot) → Test hybrid approach

**Expected Outcome**: Identify minimal effective layer set (maximize hit rate, minimize overhead)

---

## Performance Impact

**Benchmark Throughput**: 10.8M ops/s (500K iterations)

**Layer Overhead Estimate**:
- Each layer check: ~2-4 instructions (branch + state access)
- Current active layers: UltraHot (2-4 inst) + HeapV2 (2-4 inst) = 4-8 inst overhead
- If UltraHot removed: -2-4 inst = potential +5-10% perf improvement

**Risk Assessment**:
- Removing HeapV2: HIGH RISK (88-99% hit rate loss)
- Removing UltraHot: LOW RISK (0.2-11.7% hit rate loss, likely <5% perf impact)

---

## Files Modified (Phase 19-1)

1. `core/box/front_metrics_box.h` - NEW (metrics API + inline helpers)
2. `core/box/front_metrics_box.c` - NEW (CSV reporting)
3. `core/tiny_alloc_fast.inc.h` - Added metrics collection calls
4. `Makefile` - Added `front_metrics_box.o` + `hakmem_smallmid_superslab.o`

**Build Command**:
```bash
make clean && make HAKMEM_DEBUG_COUNTERS=1 bench_random_mixed_hakmem
```

**Test Command**:
```bash
HAKMEM_TINY_FRONT_METRICS=1 HAKMEM_TINY_FRONT_DUMP=1 \
./bench_random_mixed_hakmem 500000 4096 42
```

---

## Conclusion

**Phase 19-2 successfully identified**:
- HeapV2 as the dominant effective layer (>80% hit rate)
- UltraHot as a low-value layer (<12% hit rate)
- FC/SFC as currently unused (disabled by default)

**Next Phase**: Implement Box FrontPrune ENV switches for A/B testing layer removal.