Phase 62A: C7 ULTRA Alloc Dependency Chain Trim - NEUTRAL (-0.71%)
Implemented C7 ULTRA allocation hotpath optimization attempt as per Phase 62A instructions. Objective: Reduce dependency chain in tiny_c7_ultra_alloc() by: 1. Eliminating per-call tiny_front_v3_c7_ultra_header_light_enabled() checks 2. Using TLS headers_initialized flag set during refill 3. Reducing branch count and register pressure Implementation: - New ENV box: core/box/c7_ultra_alloc_depchain_opt_box.h - HAKMEM_C7_ULTRA_ALLOC_DEPCHAIN_OPT=0/1 gate (default OFF) - Modified tiny_c7_ultra_alloc() with optimized path - Preserved original path for compatibility Results (Mixed benchmark, 10-run): - Baseline (OPT=0): 59.300 M ops/s (CV 1.98%) - Treatment (OPT=1): 58.879 M ops/s (CV 1.83%) - Delta: -0.71% (NEUTRAL, within ±1.0% threshold but negative) - Status: NEUTRAL → Research box (default OFF) Root Cause Analysis: 1. LTO optimization already inlines header_light function (call cost = 0) 2. TLS access (memory load + offset) not cheaper than function call 3. Layout tax from code addition (I-cache disruption pattern from Phases 43/46A/47) 4. 5.18% stack % is not optimizable hotspot (already well-optimized) Key Lessons: - LTO-optimized function calls can be cheaper than TLS field access - Micro-optimizations on already-optimized paths show diminishing/negative returns - 48.34% gap to mimalloc is likely algorithmic, not micro-architectural - Layout tax remains consistent pattern across attempted micro-optimizations Decision: - NEUTRAL verdict → kept as research box with ENV gate (default OFF) - Not adopted as production default - Next phases: Option B (production readiness pivot) likely higher ROI than further micro-opts Box Theory Compliance: ✅ Compliant (single point, reversible, clear boundary) Performance Compliance: ❌ No (-0.71% regression) Documentation: - PHASE62A_C7_ULTRA_DEPCHAIN_OPT_RESULTS.md: Full A/B test analysis - CURRENT_TASK.md: Updated with results and next phase options 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
45
core/box/c7_ultra_alloc_depchain_opt_box.h
Normal file
45
core/box/c7_ultra_alloc_depchain_opt_box.h
Normal file
@ -0,0 +1,45 @@
|
||||
#ifndef C7_ULTRA_ALLOC_DEPCHAIN_OPT_BOX_H
|
||||
#define C7_ULTRA_ALLOC_DEPCHAIN_OPT_BOX_H
|
||||
|
||||
// Phase 62A: C7 ULTRA Alloc Dependency Chain Trim
|
||||
//
|
||||
// 目的:
|
||||
// - tiny_c7_ultra_alloc() の hot hit path の dependency chain を縮める
|
||||
// - per-call header_light check を排除、TLS headers_initialized を活用
|
||||
// - tiny_region_id_write_header() 呼び出しを最小化
|
||||
// - Mixed 10-run で +1.0% GO、失敗時は NEUTRAL/NO-GO で research box化
|
||||
//
|
||||
// 最適化:
|
||||
// 1. header_light check を per-call から排除 → TLS headers_initialized で固定
|
||||
// 2. tiny_region_id_write_header() を必要時のみに(already initialized なら skip)
|
||||
// 3. refill 後の retry block を同じロジックで共有(レジスタ効率化)
|
||||
//
|
||||
// ENV:
|
||||
// - HAKMEM_C7_ULTRA_ALLOC_DEPCHAIN=0/1 (default: 0, OFF)
|
||||
//
|
||||
// Box Theory:
|
||||
// - Single conversion point: tiny_c7_ultra_alloc() 関数
|
||||
// - Reversible: ENV gate で OFF に戻す
|
||||
// - No side effects: Pure optimization, 新しいデータ構造なし
|
||||
|
||||
#ifndef HAKMEM_C7_ULTRA_ALLOC_DEPCHAIN_OPT
|
||||
#define HAKMEM_C7_ULTRA_ALLOC_DEPCHAIN_OPT 0
|
||||
#endif
|
||||
|
||||
#include <stdlib.h>
|
||||
|
||||
// ENV gate (compile-time constant in BENCH_MINIMAL, runtime otherwise)
|
||||
static inline int c7_ultra_alloc_depchain_opt_enabled(void) {
|
||||
#if HAKMEM_BENCH_MINIMAL
|
||||
return HAKMEM_C7_ULTRA_ALLOC_DEPCHAIN_OPT; // FAST: compile-time constant
|
||||
#else
|
||||
static int g_enable = -1;
|
||||
if (__builtin_expect(g_enable == -1, 0)) {
|
||||
const char* e = getenv("HAKMEM_C7_ULTRA_ALLOC_DEPCHAIN_OPT");
|
||||
g_enable = (e && *e && *e != '0') ? 1 : 0; // default OFF
|
||||
}
|
||||
return g_enable;
|
||||
#endif
|
||||
}
|
||||
|
||||
#endif // C7_ULTRA_ALLOC_DEPCHAIN_OPT_BOX_H
|
||||
Reference in New Issue
Block a user