125 lines
3.5 KiB
Markdown
125 lines
3.5 KiB
Markdown
|
|
# Phase 61: C7 ULTRA Header-Light Implementation
|
||
|
|
|
||
|
|
**Date**: 2025-12-17
|
||
|
|
**Objective**: Skip header write in C7 ULTRA alloc hit path to reduce instruction count and I-cache pressure.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Background
|
||
|
|
|
||
|
|
- `tiny_c7_ultra_alloc()` calls `tiny_region_id_write_header()` on alloc hit
|
||
|
|
- Phase 42 profiling: header write is 4.56% hotspot (2.32% in Phase 61 profiling)
|
||
|
|
- `HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=1` enables header-light mode:
|
||
|
|
- Header written once during refill (carve phase)
|
||
|
|
- Alloc hit returns `base+1` directly (no header write)
|
||
|
|
- Reduces instruction count by ~5-7 instructions per alloc
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Runtime Profiling (Phase 61 Step 0)
|
||
|
|
|
||
|
|
**Command**:
|
||
|
|
```bash
|
||
|
|
make bench_random_mixed_hakmem_minimal
|
||
|
|
perf record -F 99 -g -- ./bench_random_mixed_hakmem_minimal 200000000 400 1
|
||
|
|
perf report --no-children | head -60
|
||
|
|
```
|
||
|
|
|
||
|
|
**Results**:
|
||
|
|
- `free`: 30.92% (top 1)
|
||
|
|
- `malloc`: 24.77% (top 2)
|
||
|
|
- `tiny_region_id_write_header`: 2.32% (top 6, within `free` backtrace)
|
||
|
|
- `tiny_c7_ultra_alloc`: 1.90% (top 7)
|
||
|
|
|
||
|
|
**Observation**:
|
||
|
|
- Header write is visible hotspot (2.32%)
|
||
|
|
- C7 ULTRA alloc is in top 10 (1.90%)
|
||
|
|
- Combined overhead: ~4.22% of total cycles
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation Status
|
||
|
|
|
||
|
|
**Implementation already exists** (discovered during Step 1 analysis):
|
||
|
|
|
||
|
|
### File: `/mnt/workdisk/public_share/hakmem/core/tiny_c7_ultra.c`
|
||
|
|
|
||
|
|
**Location**: Line 36-72 (`tiny_c7_ultra_alloc()`)
|
||
|
|
|
||
|
|
**Pattern**:
|
||
|
|
```c
|
||
|
|
void* tiny_c7_ultra_alloc(size_t size) {
|
||
|
|
(void)size; // C7 dedicated, size unused
|
||
|
|
tiny_c7_ultra_tls_t* tls = &g_tiny_c7_ultra_tls;
|
||
|
|
const bool header_light = tiny_front_v3_c7_ultra_header_light_enabled();
|
||
|
|
|
||
|
|
// Hot path: TLS cache hit (single branch)
|
||
|
|
uint16_t n = tls->count;
|
||
|
|
if (__builtin_expect(n > 0, 1)) {
|
||
|
|
void* base = tls->freelist[n - 1];
|
||
|
|
tls->count = n - 1;
|
||
|
|
|
||
|
|
// Convert BASE -> USER pointer
|
||
|
|
if (header_light) {
|
||
|
|
return (uint8_t*)base + 1; // Header already written
|
||
|
|
}
|
||
|
|
return tiny_region_id_write_header(base, 7);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Cold path: Refill TLS cache from segment
|
||
|
|
// ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Refill phase** (Line 127-133):
|
||
|
|
```c
|
||
|
|
// Carve blocks into TLS cache (fill from end to preserve order)
|
||
|
|
uint16_t n = 0;
|
||
|
|
for (uint32_t i = 0; i < capacity && n < TINY_C7_ULTRA_CAP; i++) {
|
||
|
|
uint8_t* blk = base + ((size_t)i * block_sz);
|
||
|
|
if (header_light) {
|
||
|
|
tiny_region_id_write_header(blk, 7); // Write header once
|
||
|
|
}
|
||
|
|
tls->freelist[n++] = blk;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**ENV Control**:
|
||
|
|
- File: `/mnt/workdisk/public_share/hakmem/core/box/tiny_front_v3_env_box.h`
|
||
|
|
- Function: `tiny_c7_ultra_header_light_enabled_env()` (line 145-152)
|
||
|
|
- ENV Variable: `HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT`
|
||
|
|
- Default: OFF (research box, line 149)
|
||
|
|
- Snapshot: Cached in `TinyFrontV3Snapshot.c7_ultra_header_light` (line 17)
|
||
|
|
|
||
|
|
**Safety**:
|
||
|
|
- Invariant: C7 blocks from pool/refill always have valid headers
|
||
|
|
- Alloc hit: Returns `base+1` directly (assumes header present)
|
||
|
|
- Refill: Writes headers once during carve phase (if header_light enabled)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Rollback Procedure
|
||
|
|
|
||
|
|
If Phase 61 shows NO-GO (-1.0% or worse):
|
||
|
|
|
||
|
|
1. **Runtime Rollback** (immediate, no rebuild):
|
||
|
|
```bash
|
||
|
|
export HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT=0
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Code Rollback** (if needed):
|
||
|
|
- No changes made (implementation pre-existed)
|
||
|
|
- ENV gate defaults to OFF (safe)
|
||
|
|
|
||
|
|
3. **Verification**:
|
||
|
|
- Confirm ENV=0 in cleanenv script
|
||
|
|
- Re-run baseline to confirm identical performance
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
- Phase 61 Step 2: A/B test (HEADER_LIGHT=0 vs 1)
|
||
|
|
- Phase 61 Step 3: Results documentation
|
||
|
|
- Target: +1.0% or better for GO decision
|