Phase 5 E5-2: Header Write-Once (NEUTRAL, FROZEN)
Target: tiny_region_id_write_header (3.35% self%) - Hypothesis: Headers redundant for reused blocks - Strategy: Write headers ONCE at refill boundary, skip in hot alloc Implementation: - ENV gate: HAKMEM_TINY_HEADER_WRITE_ONCE=0/1 (default 0) - core/box/tiny_header_write_once_env_box.h: ENV gate - core/box/tiny_header_write_once_stats_box.h: Stats counters - core/box/tiny_header_box.h: Added tiny_header_finalize_alloc() - core/front/tiny_unified_cache.c: Prefill at 3 refill sites - core/box/tiny_front_hot_box.h: Use finalize function A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (WRITE_ONCE=0): 44.22M ops/s (mean), 44.53M ops/s (median) - Optimized (WRITE_ONCE=1): 44.42M ops/s (mean), 44.36M ops/s (median) - Improvement: +0.45% mean, -0.38% median Decision: NEUTRAL (within ±1.0% threshold) - Action: FREEZE as research box (default OFF, do not promote) Root Cause Analysis: - Header writes are NOT redundant - existing code writes only when needed - Branch overhead (~4 cycles) cancels savings (~3-5 cycles) - perf self% ≠ optimization ROI (3.35% target → +0.45% gain) Key Lessons: 1. Verify assumptions before optimizing (inspect code paths) 2. Hot spot self% measures time IN function, not savings from REMOVING it 3. Branch overhead matters (even "simple" checks add cycles) Positive Outcome: - StdDev reduced 50% (0.96M → 0.48M) - more stable performance Health Check: PASS (all profiles) Next Candidates: - free_tiny_fast_cold: 7.14% self% - unified_cache_push: 3.39% self% - hakmem_env_snapshot_enabled: 2.97% self% Deliverables: - docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_DESIGN.md - docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md - CURRENT_TASK.md (E5-2 complete, FROZEN) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -28,6 +28,8 @@
|
||||
#define WARM_POOL_DBG_DEFINE
|
||||
#include "../box/warm_pool_dbg_box.h" // Box: Warm Pool C7 debug counters
|
||||
#undef WARM_POOL_DBG_DEFINE
|
||||
#include "../box/tiny_header_write_once_env_box.h" // Phase 5 E5-2: Header write-once optimization
|
||||
#include "../box/tiny_header_box.h" // Phase 5 E5-2: Header class preservation logic
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <stdatomic.h>
|
||||
@ -507,6 +509,45 @@ static inline int unified_refill_validate_base(int class_idx,
|
||||
// Warm Pool Enhanced: Direct carve from warm SuperSlab (bypass superslab_refill)
|
||||
// ============================================================================
|
||||
|
||||
// ============================================================================
|
||||
// Phase 5 E5-2: Header Prefill at Refill Boundary
|
||||
// ============================================================================
|
||||
// Prefill headers for C1-C6 blocks stored in unified cache.
|
||||
// Called after blocks are placed in cache->slots[] during refill.
|
||||
//
|
||||
// Strategy:
|
||||
// - C1-C6: Write headers ONCE at refill (preserved in freelist)
|
||||
// - C0, C7: Skip (headers will be overwritten by next pointer anyway)
|
||||
//
|
||||
// This eliminates redundant header writes in hot allocation path.
|
||||
static inline void unified_cache_prefill_headers(int class_idx, TinyUnifiedCache* cache, int start_tail, int count) {
|
||||
#if HAKMEM_TINY_HEADER_CLASSIDX
|
||||
// Only prefill if write-once optimization is enabled
|
||||
if (!tiny_header_write_once_enabled()) return;
|
||||
|
||||
// Only prefill for C1-C6 (classes that preserve headers)
|
||||
if (!tiny_class_preserves_header(class_idx)) return;
|
||||
|
||||
// Prefill header byte (constant for this class)
|
||||
const uint8_t header_byte = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK);
|
||||
|
||||
// Prefill headers in cache slots (circular buffer)
|
||||
int tail_idx = start_tail;
|
||||
for (int i = 0; i < count; i++) {
|
||||
void* base = cache->slots[tail_idx];
|
||||
if (base) { // Safety: skip NULL slots
|
||||
*(uint8_t*)base = header_byte;
|
||||
}
|
||||
tail_idx = (tail_idx + 1) & cache->mask;
|
||||
}
|
||||
#else
|
||||
(void)class_idx;
|
||||
(void)cache;
|
||||
(void)start_tail;
|
||||
(void)count;
|
||||
#endif
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Batch refill from SuperSlab (called on cache miss)
|
||||
// ============================================================================
|
||||
@ -582,11 +623,15 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
if (page_produced > 0) {
|
||||
// Store blocks into cache and return first
|
||||
void* first = out[0];
|
||||
int start_tail = cache->tail; // E5-2: Save tail position for header prefill
|
||||
for (int i = 1; i < page_produced; i++) {
|
||||
cache->slots[cache->tail] = out[i];
|
||||
cache->tail = (cache->tail + 1) & cache->mask;
|
||||
}
|
||||
|
||||
// E5-2: Prefill headers for C1-C6 (write-once optimization)
|
||||
unified_cache_prefill_headers(class_idx, cache, start_tail, page_produced - 1);
|
||||
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
g_unified_cache_miss[class_idx]++;
|
||||
#endif
|
||||
@ -750,11 +795,15 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
|
||||
// Store blocks into cache and return first
|
||||
void* first = out[0];
|
||||
int start_tail = cache->tail; // E5-2: Save tail position for header prefill
|
||||
for (int i = 1; i < produced; i++) {
|
||||
cache->slots[cache->tail] = out[i];
|
||||
cache->tail = (cache->tail + 1) & cache->mask;
|
||||
}
|
||||
|
||||
// E5-2: Prefill headers for C1-C6 (write-once optimization)
|
||||
unified_cache_prefill_headers(class_idx, cache, start_tail, produced - 1);
|
||||
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
g_unified_cache_miss[class_idx]++;
|
||||
#endif
|
||||
@ -891,11 +940,15 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
|
||||
// Step 5: Store blocks into unified cache (skip first, return it)
|
||||
void* first = out[0];
|
||||
int start_tail = cache->tail; // E5-2: Save tail position for header prefill
|
||||
for (int i = 1; i < produced; i++) {
|
||||
cache->slots[cache->tail] = out[i];
|
||||
cache->tail = (cache->tail + 1) & cache->mask;
|
||||
}
|
||||
|
||||
// E5-2: Prefill headers for C1-C6 (write-once optimization)
|
||||
unified_cache_prefill_headers(class_idx, cache, start_tail, produced - 1);
|
||||
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
if (class_idx == 7) {
|
||||
warm_pool_dbg_c7_uc_miss_shared();
|
||||
|
||||
Reference in New Issue
Block a user