Files
hakmem/scripts/run_mixed_10_cleanenv.sh
Moe Charm (CI) ec87025da6 Phase 17 v2 (FORCE_LIBC fix) + Phase 19-1b (FastLane Direct) — GO (+5.88%)
## Phase 17 v2: FORCE_LIBC Gap Validation Fix

**Critical bug fix**: Phase 17 v1 の測定が壊れていた

**Problem**: HAKMEM_FORCE_LIBC_ALLOC=1 が FastLane より後でしか見えず、
same-binary A/B が実質 "hakmem vs hakmem" になっていた(+0.39% 誤測定)

**Fix**: core/box/hak_wrappers.inc.h:171 と :645 に g_force_libc_alloc==1 の
early bypass を追加、__libc_malloc/__libc_free に最初に直行

**Result**: 正しい同一バイナリ A/B 測定
- hakmem (FORCE_LIBC=0): 48.99M ops/s
- libc (FORCE_LIBC=1): 79.72M ops/s (+62.7%)
- system binary: 88.06M ops/s (+10.5% vs libc)

**Gap 分解**:
- Allocator 差: +62.7% (主戦場)
- Layout penalty: +10.5% (副次的)

**Conclusion**: Case A 確定 (allocator dominant, NOT layout)
Phase 17 v1 の Case B 判定は誤り。

Files:
- docs/analysis/PHASE17_FORCE_LIBC_GAP_VALIDATION_1_AB_TEST_RESULTS.md (v2)
- docs/analysis/PHASE17_FORCE_LIBC_GAP_VALIDATION_1_NEXT_INSTRUCTIONS.md (updated)

---

## Phase 19: FastLane Instruction Reduction Analysis

**Goal**: libc との instruction gap (-35% instructions, -56% branches) を削減

**perf stat 分析** (FORCE_LIBC=0 vs 1, 200M ops):
- hakmem: 209.09 instructions/op, 52.33 branches/op
- libc: 135.92 instructions/op, 22.93 branches/op
- Delta: +73.17 instructions/op (+53.8%), +29.40 branches/op (+128.2%)

**Hot path** (perf report):
- front_fastlane_try_free: 23.97% cycles
- malloc wrapper: 23.84% cycles
- free wrapper: 6.82% cycles
- **Wrapper overhead: ~55% of all cycles**

**Reduction candidates**:
- A: Wrapper layer 削除 (-17.5 inst/op, +10-15% 期待)
- B: ENV snapshot 統合 (-10.0 inst/op, +5-8%)
- C: Stats 削除 (-5.0 inst/op, +3-5%)
- D: Header inline (-4.0 inst/op, +2-3%)
- E: Route fast path (-3.5 inst/op, +2-3%)

Files:
- docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_1_DESIGN.md
- docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_2_NEXT_INSTRUCTIONS.md

---

## Phase 19-1b: FastLane Direct — GO (+5.88%)

**Strategy**: Wrapper layer を bypass し、core allocator を直接呼ぶ
- free() → free_tiny_fast() (not free_tiny_fast_hot)
- malloc() → malloc_tiny_fast()

**Phase 19-1 が NO-GO (-3.81%) だった原因**:
1. __builtin_expect(fastlane_direct_enabled(), 0) が逆効果(A/B 不公平)
2. free_tiny_fast_hot() が誤選択(free_tiny_fast() が勝ち筋)

**Phase 19-1b の修正**:
1. __builtin_expect() 削除
2. free_tiny_fast() を直接呼び出し

**Result** (Mixed, 10-run, 20M iters, ws=400):
- Baseline (FASTLANE_DIRECT=0): 49.17M ops/s
- Optimized (FASTLANE_DIRECT=1): 52.06M ops/s
- **Delta: +5.88%** (GO 基準 +5% クリア)

**perf stat** (200M iters):
- Instructions/op: 199.90 → 169.45 (-30.45, -15.23%)
- Branches/op: 51.49 → 41.52 (-9.97, -19.36%)
- Cycles/op: 88.88 → 84.37 (-4.51, -5.07%)
- I-cache miss: 111K → 98K (-11.79%)

**Trade-offs** (acceptable):
- iTLB miss: +41.46% (front-end cost)
- dTLB miss: +29.15% (backend cost)
- Overall gain (+5.88%) outweighs costs

**Implementation**:
1. **ENV gate**: core/box/fastlane_direct_env_box.{h,c}
   - HAKMEM_FASTLANE_DIRECT=0/1 (default: 0, opt-in)
   - Single _Atomic global (wrapper キャッシュ問題を解決)

2. **Wrapper 修正**: core/box/hak_wrappers.inc.h
   - malloc: direct call to malloc_tiny_fast() when FASTLANE_DIRECT=1
   - free: direct call to free_tiny_fast() when FASTLANE_DIRECT=1
   - Safety: !g_initialized では direct 使わない、fallback 維持

3. **Preset 昇格**: core/bench_profile.h:88
   - bench_setenv_default("HAKMEM_FASTLANE_DIRECT", "1")
   - Comment: +5.88% proven on Mixed, 10-run

4. **cleanenv 更新**: scripts/run_mixed_10_cleanenv.sh:22
   - HAKMEM_FASTLANE_DIRECT=${HAKMEM_FASTLANE_DIRECT:-1}
   - Phase 9/10 と同様に昇格

**Verdict**: GO — 本線採用、プリセット昇格完了

**Rollback**: HAKMEM_FASTLANE_DIRECT=0 で既存 FastLane path に戻る

Files:
- core/box/fastlane_direct_env_box.{h,c} (new)
- core/box/hak_wrappers.inc.h (modified)
- core/bench_profile.h (preset promotion)
- scripts/run_mixed_10_cleanenv.sh (ENV default aligned)
- Makefile (new obj)
- docs/analysis/PHASE19_1B_FASTLANE_DIRECT_REVISED_AB_TEST_RESULTS.md

---

## Cumulative Performance

- Baseline (all optimizations OFF): ~40M ops/s (estimated)
- Current (Phase 19-1b): 52.06M ops/s
- **Cumulative gain: ~+30% from baseline**

Remaining gap to libc (79.72M):
- Current: 52.06M ops/s
- Target: 79.72M ops/s
- **Gap: +53.2%** (was +62.7% before Phase 19-1b)

Next: Phase 19-2 (ENV snapshot consolidation, +5-8% expected)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-15 11:28:40 +09:00

31 lines
1.5 KiB
Bash
Executable File

#!/usr/bin/env bash
set -euo pipefail
# 10-run Mixed benchmark with explicit clean ENV for frozen/research knobs.
# Purpose: avoid "bench_setenv_default() does not override exported ENV" drift.
profile=${HAKMEM_PROFILE:-MIXED_TINYV3_C7_SAFE}
iters=${ITERS:-20000000}
ws=${WS:-400}
runs=${RUNS:-10}
# Force known research knobs OFF to avoid accidental carry-over.
export HAKMEM_TINY_HEADER_WRITE_ONCE=${HAKMEM_TINY_HEADER_WRITE_ONCE:-0}
export HAKMEM_TINY_C7_PRESERVE_HEADER=${HAKMEM_TINY_C7_PRESERVE_HEADER:-0}
export HAKMEM_TINY_TCACHE=${HAKMEM_TINY_TCACHE:-0}
export HAKMEM_TINY_TCACHE_CAP=${HAKMEM_TINY_TCACHE_CAP:-64}
export HAKMEM_MALLOC_TINY_DIRECT=${HAKMEM_MALLOC_TINY_DIRECT:-0}
export HAKMEM_FRONT_FASTLANE_ALLOC_LEGACY_DIRECT=${HAKMEM_FRONT_FASTLANE_ALLOC_LEGACY_DIRECT:-0}
export HAKMEM_FORCE_LIBC_ALLOC=${HAKMEM_FORCE_LIBC_ALLOC:-0}
export HAKMEM_ENV_SNAPSHOT_SHAPE=${HAKMEM_ENV_SNAPSHOT_SHAPE:-0}
# NOTE: Phase 19-1b is promoted in presets. Keep cleanenv aligned by default.
export HAKMEM_FASTLANE_DIRECT=${HAKMEM_FASTLANE_DIRECT:-1}
# NOTE: Phase 9/10 are promoted (bench_profile defaults to 1). Keep cleanenv aligned by default.
export HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=${HAKMEM_FREE_TINY_FAST_MONO_DUALHOT:-1}
export HAKMEM_FREE_TINY_FAST_MONO_LEGACY_DIRECT=${HAKMEM_FREE_TINY_FAST_MONO_LEGACY_DIRECT:-1}
for i in $(seq 1 "${runs}"); do
echo "=== Run ${i}/${runs} ==="
HAKMEM_PROFILE="${profile}" ./bench_random_mixed_hakmem "${iters}" "${ws}" 1 2>&1 | rg "Throughput" || true
done