Files
hakmem/docs/analysis/PHASE61_C7_ULTRA_HEADER_LIGHT_IMPLEMENTATION.md
Moe Charm (CI) ef8e2ab9b5 Phase 59b & 61: Speed-first Rebase + C7 ULTRA Header-Light Optimization
Phase 59b: Speed-first Mode Baseline Rebase
- Rebase on MIXED_TINYV3_C7_SAFE profile (Speed-first, no prewarm suppression)
- hakmem: 58.478 M ops/s (CV 2.52%)
- mimalloc: 120.979 M ops/s (CV 0.90%)
- Ratio: 48.34% of mimalloc (down from 49.13% Balanced mode in Phase 59)
- Reason for difference: Profile selection (Speed-first vs Balanced) and mimalloc environment variance
- Status: COMPLETE (measurement-only, zero code changes)

Phase 61: C7 ULTRA Header-Light Optimization Attempt
- Objective: Skip header write on C7 ULTRA alloc hit (write only on refill)
- Implementation: ENV gate HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT (default OFF)
- Result: +0.31% (NEUTRAL, below +1.0% GO threshold)
  - Baseline: 59.543 M ops/s (CV 1.53%)
  - Treatment: 59.729 M ops/s (CV 2.66%)
- Root cause analysis:
  - tiny_region_id_write_header only 2.32% of time (lower than Phase 42 estimate 4.56%)
  - Header-light mode adds branch to hot path, negating write savings
  - Mixed workload dilutes C7-specific optimization effectiveness
  - Variance increased due to branch prediction variability
- Decision: Kept as research box with ENV gate (default OFF)
- Lesson: Workload-specific optimizations need careful verification with full workloads

Updated Documentation:
- PHASE59B_SPEED_FIRST_REBASE_RESULTS.md: Full measurement results and analysis
- PHASE61_C7_ULTRA_HEADER_LIGHT_RESULTS.md: A/B test results and root cause analysis
- PHASE61_C7_ULTRA_HEADER_LIGHT_IMPLEMENTATION.md: Implementation details and design
- CURRENT_TASK.md: Updated status and next phase planning (Phase 62)
- PERFORMANCE_TARGETS_SCORECARD.md: Updated baseline and M1 milestone status

M1 (50%) Milestone Status:
- Current: 48.34% (Speed-first profile)
- Gap: -1.66% (within measurement noise)
- Profile recommendation: Speed-first as canonical default for throughput focus

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 16:25:26 +09:00

3.5 KiB

Phase 61: C7 ULTRA Header-Light Implementation

Date: 2025-12-17 Objective: Skip header write in C7 ULTRA alloc hit path to reduce instruction count and I-cache pressure.


Background

  • tiny_c7_ultra_alloc() calls tiny_region_id_write_header() on alloc hit
  • Phase 42 profiling: header write is 4.56% hotspot (2.32% in Phase 61 profiling)
  • HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=1 enables header-light mode:
    • Header written once during refill (carve phase)
    • Alloc hit returns base+1 directly (no header write)
    • Reduces instruction count by ~5-7 instructions per alloc

Runtime Profiling (Phase 61 Step 0)

Command:

make bench_random_mixed_hakmem_minimal
perf record -F 99 -g -- ./bench_random_mixed_hakmem_minimal 200000000 400 1
perf report --no-children | head -60

Results:

  • free: 30.92% (top 1)
  • malloc: 24.77% (top 2)
  • tiny_region_id_write_header: 2.32% (top 6, within free backtrace)
  • tiny_c7_ultra_alloc: 1.90% (top 7)

Observation:

  • Header write is visible hotspot (2.32%)
  • C7 ULTRA alloc is in top 10 (1.90%)
  • Combined overhead: ~4.22% of total cycles

Implementation Status

Implementation already exists (discovered during Step 1 analysis):

File: /mnt/workdisk/public_share/hakmem/core/tiny_c7_ultra.c

Location: Line 36-72 (tiny_c7_ultra_alloc())

Pattern:

void* tiny_c7_ultra_alloc(size_t size) {
    (void)size;  // C7 dedicated, size unused
    tiny_c7_ultra_tls_t* tls = &g_tiny_c7_ultra_tls;
    const bool header_light = tiny_front_v3_c7_ultra_header_light_enabled();

    // Hot path: TLS cache hit (single branch)
    uint16_t n = tls->count;
    if (__builtin_expect(n > 0, 1)) {
        void* base = tls->freelist[n - 1];
        tls->count = n - 1;

        // Convert BASE -> USER pointer
        if (header_light) {
            return (uint8_t*)base + 1;  // Header already written
        }
        return tiny_region_id_write_header(base, 7);
    }

    // Cold path: Refill TLS cache from segment
    // ...
}

Refill phase (Line 127-133):

// Carve blocks into TLS cache (fill from end to preserve order)
uint16_t n = 0;
for (uint32_t i = 0; i < capacity && n < TINY_C7_ULTRA_CAP; i++) {
    uint8_t* blk = base + ((size_t)i * block_sz);
    if (header_light) {
        tiny_region_id_write_header(blk, 7);  // Write header once
    }
    tls->freelist[n++] = blk;
}

ENV Control:

  • File: /mnt/workdisk/public_share/hakmem/core/box/tiny_front_v3_env_box.h
  • Function: tiny_c7_ultra_header_light_enabled_env() (line 145-152)
  • ENV Variable: HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT
  • Default: OFF (research box, line 149)
  • Snapshot: Cached in TinyFrontV3Snapshot.c7_ultra_header_light (line 17)

Safety:

  • Invariant: C7 blocks from pool/refill always have valid headers
  • Alloc hit: Returns base+1 directly (assumes header present)
  • Refill: Writes headers once during carve phase (if header_light enabled)

Rollback Procedure

If Phase 61 shows NO-GO (-1.0% or worse):

  1. Runtime Rollback (immediate, no rebuild):

    export HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=0
    
  2. Code Rollback (if needed):

    • No changes made (implementation pre-existed)
    • ENV gate defaults to OFF (safe)
  3. Verification:

    • Confirm ENV=0 in cleanenv script
    • Re-run baseline to confirm identical performance

Next Steps

  • Phase 61 Step 2: A/B test (HEADER_LIGHT=0 vs 1)
  • Phase 61 Step 3: Results documentation
  • Target: +1.0% or better for GO decision