Phase 59b: Speed-first Mode Baseline Rebase - Rebase on MIXED_TINYV3_C7_SAFE profile (Speed-first, no prewarm suppression) - hakmem: 58.478 M ops/s (CV 2.52%) - mimalloc: 120.979 M ops/s (CV 0.90%) - Ratio: 48.34% of mimalloc (down from 49.13% Balanced mode in Phase 59) - Reason for difference: Profile selection (Speed-first vs Balanced) and mimalloc environment variance - Status: COMPLETE (measurement-only, zero code changes) Phase 61: C7 ULTRA Header-Light Optimization Attempt - Objective: Skip header write on C7 ULTRA alloc hit (write only on refill) - Implementation: ENV gate HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT (default OFF) - Result: +0.31% (NEUTRAL, below +1.0% GO threshold) - Baseline: 59.543 M ops/s (CV 1.53%) - Treatment: 59.729 M ops/s (CV 2.66%) - Root cause analysis: - tiny_region_id_write_header only 2.32% of time (lower than Phase 42 estimate 4.56%) - Header-light mode adds branch to hot path, negating write savings - Mixed workload dilutes C7-specific optimization effectiveness - Variance increased due to branch prediction variability - Decision: Kept as research box with ENV gate (default OFF) - Lesson: Workload-specific optimizations need careful verification with full workloads Updated Documentation: - PHASE59B_SPEED_FIRST_REBASE_RESULTS.md: Full measurement results and analysis - PHASE61_C7_ULTRA_HEADER_LIGHT_RESULTS.md: A/B test results and root cause analysis - PHASE61_C7_ULTRA_HEADER_LIGHT_IMPLEMENTATION.md: Implementation details and design - CURRENT_TASK.md: Updated status and next phase planning (Phase 62) - PERFORMANCE_TARGETS_SCORECARD.md: Updated baseline and M1 milestone status M1 (50%) Milestone Status: - Current: 48.34% (Speed-first profile) - Gap: -1.66% (within measurement noise) - Profile recommendation: Speed-first as canonical default for throughput focus 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
3.5 KiB
3.5 KiB
Phase 61: C7 ULTRA Header-Light Implementation
Date: 2025-12-17 Objective: Skip header write in C7 ULTRA alloc hit path to reduce instruction count and I-cache pressure.
Background
tiny_c7_ultra_alloc()callstiny_region_id_write_header()on alloc hit- Phase 42 profiling: header write is 4.56% hotspot (2.32% in Phase 61 profiling)
HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=1enables header-light mode:- Header written once during refill (carve phase)
- Alloc hit returns
base+1directly (no header write) - Reduces instruction count by ~5-7 instructions per alloc
Runtime Profiling (Phase 61 Step 0)
Command:
make bench_random_mixed_hakmem_minimal
perf record -F 99 -g -- ./bench_random_mixed_hakmem_minimal 200000000 400 1
perf report --no-children | head -60
Results:
free: 30.92% (top 1)malloc: 24.77% (top 2)tiny_region_id_write_header: 2.32% (top 6, withinfreebacktrace)tiny_c7_ultra_alloc: 1.90% (top 7)
Observation:
- Header write is visible hotspot (2.32%)
- C7 ULTRA alloc is in top 10 (1.90%)
- Combined overhead: ~4.22% of total cycles
Implementation Status
Implementation already exists (discovered during Step 1 analysis):
File: /mnt/workdisk/public_share/hakmem/core/tiny_c7_ultra.c
Location: Line 36-72 (tiny_c7_ultra_alloc())
Pattern:
void* tiny_c7_ultra_alloc(size_t size) {
(void)size; // C7 dedicated, size unused
tiny_c7_ultra_tls_t* tls = &g_tiny_c7_ultra_tls;
const bool header_light = tiny_front_v3_c7_ultra_header_light_enabled();
// Hot path: TLS cache hit (single branch)
uint16_t n = tls->count;
if (__builtin_expect(n > 0, 1)) {
void* base = tls->freelist[n - 1];
tls->count = n - 1;
// Convert BASE -> USER pointer
if (header_light) {
return (uint8_t*)base + 1; // Header already written
}
return tiny_region_id_write_header(base, 7);
}
// Cold path: Refill TLS cache from segment
// ...
}
Refill phase (Line 127-133):
// Carve blocks into TLS cache (fill from end to preserve order)
uint16_t n = 0;
for (uint32_t i = 0; i < capacity && n < TINY_C7_ULTRA_CAP; i++) {
uint8_t* blk = base + ((size_t)i * block_sz);
if (header_light) {
tiny_region_id_write_header(blk, 7); // Write header once
}
tls->freelist[n++] = blk;
}
ENV Control:
- File:
/mnt/workdisk/public_share/hakmem/core/box/tiny_front_v3_env_box.h - Function:
tiny_c7_ultra_header_light_enabled_env()(line 145-152) - ENV Variable:
HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT - Default: OFF (research box, line 149)
- Snapshot: Cached in
TinyFrontV3Snapshot.c7_ultra_header_light(line 17)
Safety:
- Invariant: C7 blocks from pool/refill always have valid headers
- Alloc hit: Returns
base+1directly (assumes header present) - Refill: Writes headers once during carve phase (if header_light enabled)
Rollback Procedure
If Phase 61 shows NO-GO (-1.0% or worse):
-
Runtime Rollback (immediate, no rebuild):
export HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=0 -
Code Rollback (if needed):
- No changes made (implementation pre-existed)
- ENV gate defaults to OFF (safe)
-
Verification:
- Confirm ENV=0 in cleanenv script
- Re-run baseline to confirm identical performance
Next Steps
- Phase 61 Step 2: A/B test (HEADER_LIGHT=0 vs 1)
- Phase 61 Step 3: Results documentation
- Target: +1.0% or better for GO decision