Files
hakmem/docs/analysis/PHASE61_C7_ULTRA_HEADER_LIGHT_IMPLEMENTATION.md

125 lines
3.5 KiB
Markdown
Raw Normal View History

Phase 59b & 61: Speed-first Rebase + C7 ULTRA Header-Light Optimization Phase 59b: Speed-first Mode Baseline Rebase - Rebase on MIXED_TINYV3_C7_SAFE profile (Speed-first, no prewarm suppression) - hakmem: 58.478 M ops/s (CV 2.52%) - mimalloc: 120.979 M ops/s (CV 0.90%) - Ratio: 48.34% of mimalloc (down from 49.13% Balanced mode in Phase 59) - Reason for difference: Profile selection (Speed-first vs Balanced) and mimalloc environment variance - Status: COMPLETE (measurement-only, zero code changes) Phase 61: C7 ULTRA Header-Light Optimization Attempt - Objective: Skip header write on C7 ULTRA alloc hit (write only on refill) - Implementation: ENV gate HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT (default OFF) - Result: +0.31% (NEUTRAL, below +1.0% GO threshold) - Baseline: 59.543 M ops/s (CV 1.53%) - Treatment: 59.729 M ops/s (CV 2.66%) - Root cause analysis: - tiny_region_id_write_header only 2.32% of time (lower than Phase 42 estimate 4.56%) - Header-light mode adds branch to hot path, negating write savings - Mixed workload dilutes C7-specific optimization effectiveness - Variance increased due to branch prediction variability - Decision: Kept as research box with ENV gate (default OFF) - Lesson: Workload-specific optimizations need careful verification with full workloads Updated Documentation: - PHASE59B_SPEED_FIRST_REBASE_RESULTS.md: Full measurement results and analysis - PHASE61_C7_ULTRA_HEADER_LIGHT_RESULTS.md: A/B test results and root cause analysis - PHASE61_C7_ULTRA_HEADER_LIGHT_IMPLEMENTATION.md: Implementation details and design - CURRENT_TASK.md: Updated status and next phase planning (Phase 62) - PERFORMANCE_TARGETS_SCORECARD.md: Updated baseline and M1 milestone status M1 (50%) Milestone Status: - Current: 48.34% (Speed-first profile) - Gap: -1.66% (within measurement noise) - Profile recommendation: Speed-first as canonical default for throughput focus 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 16:25:26 +09:00
# Phase 61: C7 ULTRA Header-Light Implementation
**Date**: 2025-12-17
**Objective**: Skip header write in C7 ULTRA alloc hit path to reduce instruction count and I-cache pressure.
---
## Background
- `tiny_c7_ultra_alloc()` calls `tiny_region_id_write_header()` on alloc hit
- Phase 42 profiling: header write is 4.56% hotspot (2.32% in Phase 61 profiling)
- `HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=1` enables header-light mode:
- Header written once during refill (carve phase)
- Alloc hit returns `base+1` directly (no header write)
- Reduces instruction count by ~5-7 instructions per alloc
---
## Runtime Profiling (Phase 61 Step 0)
**Command**:
```bash
make bench_random_mixed_hakmem_minimal
perf record -F 99 -g -- ./bench_random_mixed_hakmem_minimal 200000000 400 1
perf report --no-children | head -60
```
**Results**:
- `free`: 30.92% (top 1)
- `malloc`: 24.77% (top 2)
- `tiny_region_id_write_header`: 2.32% (top 6, within `free` backtrace)
- `tiny_c7_ultra_alloc`: 1.90% (top 7)
**Observation**:
- Header write is visible hotspot (2.32%)
- C7 ULTRA alloc is in top 10 (1.90%)
- Combined overhead: ~4.22% of total cycles
---
## Implementation Status
**Implementation already exists** (discovered during Step 1 analysis):
### File: `/mnt/workdisk/public_share/hakmem/core/tiny_c7_ultra.c`
**Location**: Line 36-72 (`tiny_c7_ultra_alloc()`)
**Pattern**:
```c
void* tiny_c7_ultra_alloc(size_t size) {
(void)size; // C7 dedicated, size unused
tiny_c7_ultra_tls_t* tls = &g_tiny_c7_ultra_tls;
const bool header_light = tiny_front_v3_c7_ultra_header_light_enabled();
// Hot path: TLS cache hit (single branch)
uint16_t n = tls->count;
if (__builtin_expect(n > 0, 1)) {
void* base = tls->freelist[n - 1];
tls->count = n - 1;
// Convert BASE -> USER pointer
if (header_light) {
return (uint8_t*)base + 1; // Header already written
}
return tiny_region_id_write_header(base, 7);
}
// Cold path: Refill TLS cache from segment
// ...
}
```
**Refill phase** (Line 127-133):
```c
// Carve blocks into TLS cache (fill from end to preserve order)
uint16_t n = 0;
for (uint32_t i = 0; i < capacity && n < TINY_C7_ULTRA_CAP; i++) {
uint8_t* blk = base + ((size_t)i * block_sz);
if (header_light) {
tiny_region_id_write_header(blk, 7); // Write header once
}
tls->freelist[n++] = blk;
}
```
**ENV Control**:
- File: `/mnt/workdisk/public_share/hakmem/core/box/tiny_front_v3_env_box.h`
- Function: `tiny_c7_ultra_header_light_enabled_env()` (line 145-152)
- ENV Variable: `HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT`
- Default: OFF (research box, line 149)
- Snapshot: Cached in `TinyFrontV3Snapshot.c7_ultra_header_light` (line 17)
**Safety**:
- Invariant: C7 blocks from pool/refill always have valid headers
- Alloc hit: Returns `base+1` directly (assumes header present)
- Refill: Writes headers once during carve phase (if header_light enabled)
---
## Rollback Procedure
If Phase 61 shows NO-GO (-1.0% or worse):
1. **Runtime Rollback** (immediate, no rebuild):
```bash
export HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=0
```
2. **Code Rollback** (if needed):
- No changes made (implementation pre-existed)
- ENV gate defaults to OFF (safe)
3. **Verification**:
- Confirm ENV=0 in cleanenv script
- Re-run baseline to confirm identical performance
---
## Next Steps
- Phase 61 Step 2: A/B test (HEADER_LIGHT=0 vs 1)
- Phase 61 Step 3: Results documentation
- Target: +1.0% or better for GO decision