Files
hakmem/core/box/fastlane_direct_env_box.h

47 lines
1.8 KiB
C
Raw Normal View History

Phase 17 v2 (FORCE_LIBC fix) + Phase 19-1b (FastLane Direct) — GO (+5.88%) ## Phase 17 v2: FORCE_LIBC Gap Validation Fix **Critical bug fix**: Phase 17 v1 の測定が壊れていた **Problem**: HAKMEM_FORCE_LIBC_ALLOC=1 が FastLane より後でしか見えず、 same-binary A/B が実質 "hakmem vs hakmem" になっていた(+0.39% 誤測定) **Fix**: core/box/hak_wrappers.inc.h:171 と :645 に g_force_libc_alloc==1 の early bypass を追加、__libc_malloc/__libc_free に最初に直行 **Result**: 正しい同一バイナリ A/B 測定 - hakmem (FORCE_LIBC=0): 48.99M ops/s - libc (FORCE_LIBC=1): 79.72M ops/s (+62.7%) - system binary: 88.06M ops/s (+10.5% vs libc) **Gap 分解**: - Allocator 差: +62.7% (主戦場) - Layout penalty: +10.5% (副次的) **Conclusion**: Case A 確定 (allocator dominant, NOT layout) Phase 17 v1 の Case B 判定は誤り。 Files: - docs/analysis/PHASE17_FORCE_LIBC_GAP_VALIDATION_1_AB_TEST_RESULTS.md (v2) - docs/analysis/PHASE17_FORCE_LIBC_GAP_VALIDATION_1_NEXT_INSTRUCTIONS.md (updated) --- ## Phase 19: FastLane Instruction Reduction Analysis **Goal**: libc との instruction gap (-35% instructions, -56% branches) を削減 **perf stat 分析** (FORCE_LIBC=0 vs 1, 200M ops): - hakmem: 209.09 instructions/op, 52.33 branches/op - libc: 135.92 instructions/op, 22.93 branches/op - Delta: +73.17 instructions/op (+53.8%), +29.40 branches/op (+128.2%) **Hot path** (perf report): - front_fastlane_try_free: 23.97% cycles - malloc wrapper: 23.84% cycles - free wrapper: 6.82% cycles - **Wrapper overhead: ~55% of all cycles** **Reduction candidates**: - A: Wrapper layer 削除 (-17.5 inst/op, +10-15% 期待) - B: ENV snapshot 統合 (-10.0 inst/op, +5-8%) - C: Stats 削除 (-5.0 inst/op, +3-5%) - D: Header inline (-4.0 inst/op, +2-3%) - E: Route fast path (-3.5 inst/op, +2-3%) Files: - docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_1_DESIGN.md - docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_2_NEXT_INSTRUCTIONS.md --- ## Phase 19-1b: FastLane Direct — GO (+5.88%) **Strategy**: Wrapper layer を bypass し、core allocator を直接呼ぶ - free() → free_tiny_fast() (not free_tiny_fast_hot) - malloc() → malloc_tiny_fast() **Phase 19-1 が NO-GO (-3.81%) だった原因**: 1. __builtin_expect(fastlane_direct_enabled(), 0) が逆効果(A/B 不公平) 2. free_tiny_fast_hot() が誤選択(free_tiny_fast() が勝ち筋) **Phase 19-1b の修正**: 1. __builtin_expect() 削除 2. free_tiny_fast() を直接呼び出し **Result** (Mixed, 10-run, 20M iters, ws=400): - Baseline (FASTLANE_DIRECT=0): 49.17M ops/s - Optimized (FASTLANE_DIRECT=1): 52.06M ops/s - **Delta: +5.88%** (GO 基準 +5% クリア) **perf stat** (200M iters): - Instructions/op: 199.90 → 169.45 (-30.45, -15.23%) - Branches/op: 51.49 → 41.52 (-9.97, -19.36%) - Cycles/op: 88.88 → 84.37 (-4.51, -5.07%) - I-cache miss: 111K → 98K (-11.79%) **Trade-offs** (acceptable): - iTLB miss: +41.46% (front-end cost) - dTLB miss: +29.15% (backend cost) - Overall gain (+5.88%) outweighs costs **Implementation**: 1. **ENV gate**: core/box/fastlane_direct_env_box.{h,c} - HAKMEM_FASTLANE_DIRECT=0/1 (default: 0, opt-in) - Single _Atomic global (wrapper キャッシュ問題を解決) 2. **Wrapper 修正**: core/box/hak_wrappers.inc.h - malloc: direct call to malloc_tiny_fast() when FASTLANE_DIRECT=1 - free: direct call to free_tiny_fast() when FASTLANE_DIRECT=1 - Safety: !g_initialized では direct 使わない、fallback 維持 3. **Preset 昇格**: core/bench_profile.h:88 - bench_setenv_default("HAKMEM_FASTLANE_DIRECT", "1") - Comment: +5.88% proven on Mixed, 10-run 4. **cleanenv 更新**: scripts/run_mixed_10_cleanenv.sh:22 - HAKMEM_FASTLANE_DIRECT=${HAKMEM_FASTLANE_DIRECT:-1} - Phase 9/10 と同様に昇格 **Verdict**: GO — 本線採用、プリセット昇格完了 **Rollback**: HAKMEM_FASTLANE_DIRECT=0 で既存 FastLane path に戻る Files: - core/box/fastlane_direct_env_box.{h,c} (new) - core/box/hak_wrappers.inc.h (modified) - core/bench_profile.h (preset promotion) - scripts/run_mixed_10_cleanenv.sh (ENV default aligned) - Makefile (new obj) - docs/analysis/PHASE19_1B_FASTLANE_DIRECT_REVISED_AB_TEST_RESULTS.md --- ## Cumulative Performance - Baseline (all optimizations OFF): ~40M ops/s (estimated) - Current (Phase 19-1b): 52.06M ops/s - **Cumulative gain: ~+30% from baseline** Remaining gap to libc (79.72M): - Current: 52.06M ops/s - Target: 79.72M ops/s - **Gap: +53.2%** (was +62.7% before Phase 19-1b) Next: Phase 19-2 (ENV snapshot consolidation, +5-8% expected) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-15 11:28:40 +09:00
// fastlane_direct_env_box.h - Phase 19-1: FastLane Direct Path ENV Control
//
// Goal: Remove wrapper layer overhead (30.79% of cycles) by calling core allocator directly
// Strategy: Compile-time + runtime gate to bypass front_fastlane_try_*() wrapper
//
// Box Theory:
// - Boundary: HAKMEM_FASTLANE_DIRECT=0/1 (default: 0, opt-in)
// - Rollback: ENV=0 reverts to existing FastLane wrapper path
// - Observability: perf stat shows instruction/branch reduction
//
// Expected Performance:
// - Reduction: -17.5 instructions/op, -6.0 branches/op
// - Impact: +10-15% throughput (remove 30% wrapper overhead)
//
// ENV Variables:
// HAKMEM_FASTLANE_DIRECT=0/1 # Enable direct path (default: 0, research box)
#pragma once
#include <stdatomic.h>
#include <stdlib.h>
// ENV control: cached flag for fastlane_direct_enabled()
// -1: uninitialized, 0: disabled, 1: enabled
// NOTE: Must be a single global (not header-static) so bench_profile refresh can
// update the same cache used by malloc/free wrappers.
extern _Atomic int g_fastlane_direct_enabled;
// Runtime check: Is FastLane Direct path enabled?
// Returns: 1 if enabled, 0 if disabled
// Hot path: Single atomic load (after first call)
static inline int fastlane_direct_enabled(void) {
int val = atomic_load_explicit(&g_fastlane_direct_enabled, memory_order_relaxed);
if (__builtin_expect(val == -1, 0)) {
// Cold path: Initialize from ENV
const char* e = getenv("HAKMEM_FASTLANE_DIRECT");
int enable = (e && *e && *e != '0') ? 1 : 0;
atomic_store_explicit(&g_fastlane_direct_enabled, enable, memory_order_relaxed);
return enable;
}
return val;
}
// Refresh from ENV: Called during benchmark ENV reloads
// Allows runtime toggle without recompilation
void fastlane_direct_env_refresh_from_env(void);