# Phase 61: C7 ULTRA Header-Light Implementation **Date**: 2025-12-17 **Objective**: Skip header write in C7 ULTRA alloc hit path to reduce instruction count and I-cache pressure. --- ## Background - `tiny_c7_ultra_alloc()` calls `tiny_region_id_write_header()` on alloc hit - Phase 42 profiling: header write is 4.56% hotspot (2.32% in Phase 61 profiling) - `HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=1` enables header-light mode: - Header written once during refill (carve phase) - Alloc hit returns `base+1` directly (no header write) - Reduces instruction count by ~5-7 instructions per alloc --- ## Runtime Profiling (Phase 61 Step 0) **Command**: ```bash make bench_random_mixed_hakmem_minimal perf record -F 99 -g -- ./bench_random_mixed_hakmem_minimal 200000000 400 1 perf report --no-children | head -60 ``` **Results**: - `free`: 30.92% (top 1) - `malloc`: 24.77% (top 2) - `tiny_region_id_write_header`: 2.32% (top 6, within `free` backtrace) - `tiny_c7_ultra_alloc`: 1.90% (top 7) **Observation**: - Header write is visible hotspot (2.32%) - C7 ULTRA alloc is in top 10 (1.90%) - Combined overhead: ~4.22% of total cycles --- ## Implementation Status **Implementation already exists** (discovered during Step 1 analysis): ### File: `/mnt/workdisk/public_share/hakmem/core/tiny_c7_ultra.c` **Location**: Line 36-72 (`tiny_c7_ultra_alloc()`) **Pattern**: ```c void* tiny_c7_ultra_alloc(size_t size) { (void)size; // C7 dedicated, size unused tiny_c7_ultra_tls_t* tls = &g_tiny_c7_ultra_tls; const bool header_light = tiny_front_v3_c7_ultra_header_light_enabled(); // Hot path: TLS cache hit (single branch) uint16_t n = tls->count; if (__builtin_expect(n > 0, 1)) { void* base = tls->freelist[n - 1]; tls->count = n - 1; // Convert BASE -> USER pointer if (header_light) { return (uint8_t*)base + 1; // Header already written } return tiny_region_id_write_header(base, 7); } // Cold path: Refill TLS cache from segment // ... } ``` **Refill phase** (Line 127-133): ```c // Carve blocks into TLS cache (fill from end to preserve order) uint16_t n = 0; for (uint32_t i = 0; i < capacity && n < TINY_C7_ULTRA_CAP; i++) { uint8_t* blk = base + ((size_t)i * block_sz); if (header_light) { tiny_region_id_write_header(blk, 7); // Write header once } tls->freelist[n++] = blk; } ``` **ENV Control**: - File: `/mnt/workdisk/public_share/hakmem/core/box/tiny_front_v3_env_box.h` - Function: `tiny_c7_ultra_header_light_enabled_env()` (line 145-152) - ENV Variable: `HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT` - Default: OFF (research box, line 149) - Snapshot: Cached in `TinyFrontV3Snapshot.c7_ultra_header_light` (line 17) **Safety**: - Invariant: C7 blocks from pool/refill always have valid headers - Alloc hit: Returns `base+1` directly (assumes header present) - Refill: Writes headers once during carve phase (if header_light enabled) --- ## Rollback Procedure If Phase 61 shows NO-GO (-1.0% or worse): 1. **Runtime Rollback** (immediate, no rebuild): ```bash export HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=0 ``` 2. **Code Rollback** (if needed): - No changes made (implementation pre-existed) - ENV gate defaults to OFF (safe) 3. **Verification**: - Confirm ENV=0 in cleanenv script - Re-run baseline to confirm identical performance --- ## Next Steps - Phase 61 Step 2: A/B test (HEADER_LIGHT=0 vs 1) - Phase 61 Step 3: Results documentation - Target: +1.0% or better for GO decision