2025-12-10 09:08:18 +09:00
|
|
|
|
# ENV Profile Presets (HAKMEM)
|
|
|
|
|
|
|
|
|
|
|
|
よく使う構成を 3 つのプリセットにまとめました。まずここからコピペし、必要な ENV だけを追加してください。v2 系や LEGACY 専用オプションは明示 opt-in で扱います。
|
|
|
|
|
|
ベンチバイナリでは `HAKMEM_PROFILE=<名前>` をセットすると、ここで定義した ENV を自動で注入します(既に設定済みの ENV は上書きしません)。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Profile 1: MIXED_TINYV3_C7_SAFE(標準 Mixed 16–1024B)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- Mixed 16–1024B の標準ベンチ用。
|
|
|
|
|
|
- C7-only SmallObject v3 + Tiny front v3 + LUT + fast classify ON。
|
2025-12-10 22:57:26 +09:00
|
|
|
|
- v4 系(C6/C7 v4、fast classify v4、small segment v4)はすべて OFF。
|
|
|
|
|
|
- Tiny/Pool v2 もすべて OFF。
|
2025-12-12 06:13:15 +09:00
|
|
|
|
- **v7 (C5/C6) は OFF** (Phase v10: Mixed 本線では v7 無効化、C5/C6 専用プリセット参照)
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
- Mixed 本線では **MID v3 (C6)** をデフォルト OFF に固定し、C6 も Tiny LEGACY(unified cache)で処理する(Mixed で MID v3 が大きく遅くなるため)。
|
2025-12-10 09:08:18 +09:00
|
|
|
|
|
|
|
|
|
|
### ENV 最小セット(Release)
|
|
|
|
|
|
```sh
|
2025-12-10 14:12:13 +09:00
|
|
|
|
# プリセットでまとめて指定
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE
|
|
|
|
|
|
|
|
|
|
|
|
# Mixed 16–1024B 前提のサイズ範囲(必要に応じて明示)
|
2025-12-10 09:08:18 +09:00
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=16
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=1024
|
|
|
|
|
|
```
|
2025-12-10 22:57:26 +09:00
|
|
|
|
プリセットで自動設定される主な ENV:
|
|
|
|
|
|
- `HAKMEM_TINY_HEAP_PROFILE=C7_SAFE`
|
|
|
|
|
|
- `HAKMEM_TINY_C7_HOT=1`
|
|
|
|
|
|
- `HAKMEM_TINY_HOTHEAP_V2=0`
|
|
|
|
|
|
- `HAKMEM_SMALL_HEAP_V3_ENABLED=1`
|
|
|
|
|
|
- `HAKMEM_SMALL_HEAP_V3_CLASSES=0x80`(C7-only v3)
|
|
|
|
|
|
- `HAKMEM_SMALL_HEAP_V4_ENABLED=0` / `HAKMEM_SMALL_HEAP_V4_CLASSES=0x0`
|
|
|
|
|
|
- `HAKMEM_TINY_PTR_FAST_CLASSIFY_ENABLED=1`
|
|
|
|
|
|
- `HAKMEM_TINY_PTR_FAST_CLASSIFY_V4_ENABLED=0`
|
|
|
|
|
|
- `HAKMEM_SMALL_SEGMENT_V4_ENABLED=0`
|
|
|
|
|
|
- `HAKMEM_POOL_V2_ENABLED=0`
|
|
|
|
|
|
- `HAKMEM_TINY_FRONT_V3_ENABLED=1`
|
|
|
|
|
|
- `HAKMEM_TINY_FRONT_V3_LUT_ENABLED=1`
|
2025-12-13 18:46:11 +09:00
|
|
|
|
- `HAKMEM_FREE_TINY_FAST_HOTCOLD=1`(Phase FREE-TINY-FAST-DUALHOT-1: free の第2ホット(C0-C3)を直行)
|
|
|
|
|
|
- `HAKMEM_WRAP_SHAPE=1`(Phase 2 B4: wrapper hot/cold split を default ON)
|
|
|
|
|
|
- `HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1`(Phase 2 B3: alloc の route dispatch 形を最適化)
|
2025-12-13 19:01:57 +09:00
|
|
|
|
- `HAKMEM_TINY_STATIC_ROUTE=1`(Phase 3 C3: policy_snapshot bypass を default ON)
|
2025-12-13 23:47:19 +09:00
|
|
|
|
- `HAKMEM_FREE_STATIC_ROUTE=1`(Phase 3 D1: free path route cache を default ON)
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
- `HAKMEM_MID_V3_ENABLED=0`(Mixed 本線では OFF。C6-heavy のみ推奨ON)
|
|
|
|
|
|
- `HAKMEM_MID_V3_CLASSES=0x0`(Mixed 本線では未使用)
|
2025-12-12 16:26:42 +09:00
|
|
|
|
- `HAKMEM_MID_V35_ENABLED=0`(Phase v11a-5: Mixed では MID v3.5 OFF が最速)
|
2025-12-14 05:59:43 +09:00
|
|
|
|
- `HAKMEM_FREE_TINY_DIRECT=1`(Phase 5 E5-1: free() で Tiny を直通、重複排除)
|
2025-12-10 09:08:18 +09:00
|
|
|
|
|
|
|
|
|
|
### 任意オプション
|
|
|
|
|
|
- stats を見たいとき:
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_TINY_HEAP_STATS=1
|
|
|
|
|
|
HAKMEM_TINY_HEAP_STATS_DUMP=1
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_STATS=1
|
|
|
|
|
|
```
|
2025-12-12 19:19:25 +09:00
|
|
|
|
- **Phase MID-V35-HOTPATH-OPT-1** (FROZEN - research only):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_MID_V35_HEADER_PREFILL=1 # refill境界でheader先行書き
|
|
|
|
|
|
HAKMEM_MID_V35_HOT_COUNTS=0 # alloc_count削除
|
|
|
|
|
|
HAKMEM_MID_V35_C6_FASTPATH=1 # C6特化 fast path
|
|
|
|
|
|
```
|
|
|
|
|
|
- **Status**: Default OFF, FROZEN (all 3 knobs)
|
|
|
|
|
|
- **Actual Results** (Phase MID-V35-HOTPATH-OPT-1 Mixed A/B):
|
|
|
|
|
|
- Mixed (16–1024B, MID_V35_OFF): **-0.2%** (誤差範囲, ±2%以内) ✓
|
|
|
|
|
|
- C6-heavy (257–768B, MID_V35_ON): **+7.3%** improvement ✅
|
|
|
|
|
|
- **Finding**: Mixed は MID_V3(C6-only) 固定で効果微小;C6-heavy のみ効果大
|
|
|
|
|
|
- **Recommendation**: C6_HEAVY_LEGACY_POOLV1 プリセットで推奨ON
|
|
|
|
|
|
- **NOT recommended for**: MIXED_TINYV3_C7_SAFE mainline (keep all defaults OFF)
|
2025-12-12 18:40:08 +09:00
|
|
|
|
- **Phase POLICY-FAST-PATH-V2** (FROZEN - research only):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_TINY_FREE_POLICY_FAST_V2=1 # Fast-path free optimization
|
|
|
|
|
|
```
|
|
|
|
|
|
- **Status**: Default OFF, FROZEN (merge complete)
|
|
|
|
|
|
- **Actual Results** (Phase POLICY-FAST-PATH-V2 A/B):
|
|
|
|
|
|
- Mixed (ws=400): **-1.6%** regression ❌ (added branch cost > skip benefit)
|
|
|
|
|
|
- C6-heavy (ws=200): **+5.4%** improvement ✅
|
|
|
|
|
|
- **Finding**: Large working set (ws>300) causes branch misprediction cost to dominate
|
|
|
|
|
|
- **Recommendation**: Use only for C6-heavy or ws<300 research benchmarks
|
|
|
|
|
|
- **NOT recommended for**: MIXED_TINYV3_C7_SAFE mainline (keep OFF)
|
|
|
|
|
|
- **Requirement**: Only effective when v7 Learner is disabled
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
- **Phase 3 D1(Free Path Route Cache)** ✅ ADOPT (PROMOTED TO DEFAULT):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_FREE_STATIC_ROUTE=1
|
|
|
|
|
|
```
|
|
|
|
|
|
- **Status**: ✅ Promoted to MIXED_TINYV3_C7_SAFE preset default (2025-12-13)
|
|
|
|
|
|
- **Actual Results** (Mixed 20-run validation):
|
|
|
|
|
|
- Baseline (ROUTE=0): Mean 46.30M ops/s, Median 46.30M ops/s
|
|
|
|
|
|
- Optimized (ROUTE=1): Mean 47.32M ops/s, Median 47.39M ops/s
|
|
|
|
|
|
- Gain: Mean **+2.19%** ✓, Median **+2.37%** ✓
|
|
|
|
|
|
- **Decision**: Both criteria met (mean >= +1.0%, median >= +0.0%)
|
|
|
|
|
|
- **Implementation**: TLS cache for free path routing, bypasses tiny_route_for_class() call
|
|
|
|
|
|
- **Phase 3 D2(Wrapper Env Cache)** ❌ NO-GO (FROZEN):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_WRAP_ENV_CACHE=1
|
|
|
|
|
|
```
|
|
|
|
|
|
- **Status**: ❌ FROZEN(Mixed **-1.44%** regression)→ default OFF, do not pursue
|
|
|
|
|
|
- **Reason**: TLS overhead > benefit in Mixed workload
|
2025-12-14 00:26:57 +09:00
|
|
|
|
- **Phase 4 D3(Alloc Gate Shape)** 🔬 NEUTRAL (research only):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_ALLOC_GATE_SHAPE=1
|
|
|
|
|
|
```
|
|
|
|
|
|
- **Status**: NEUTRAL(Mixed 10-run: Mean **+0.56%** / Median **-0.5%**)→ default OFF
|
|
|
|
|
|
- **Effect**: `tiny_alloc_gate_fast()` の分岐形を簡素化(`tiny_route_get()` と release logging branch を回避)
|
2025-12-14 01:46:18 +09:00
|
|
|
|
- **Phase 4 E1(ENV Snapshot Consolidation)** ✅ GO (opt-in):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_ENV_SNAPSHOT=1
|
|
|
|
|
|
```
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
- **Status**: ✅ GO(Mixed 10-run: **+3.92% avg / +4.01% median**)→ ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset default(opt-out 可)
|
2025-12-14 01:46:18 +09:00
|
|
|
|
- **Effect**: `tiny_c7_ultra_enabled_env/tiny_front_v3_enabled/tiny_metadata_cache_enabled` のホット ENV gate を snapshot 1 本に集約
|
|
|
|
|
|
- **Rollback**: `HAKMEM_ENV_SNAPSHOT=0`
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
- **Phase 4 E3-4(ENV Constructor Init)** ❌ NO-GO (FROZEN):
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
```sh
|
|
|
|
|
|
# Requires E1
|
|
|
|
|
|
HAKMEM_ENV_SNAPSHOT=1
|
|
|
|
|
|
HAKMEM_ENV_SNAPSHOT_CTOR=1
|
|
|
|
|
|
```
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
- **Status**: ❌ NO-GO(Mixed 10-run: **-1.44% mean / -1.03% median**)→ default OFF(freeze)
|
|
|
|
|
|
- **Reason**: constructor mode の gate 判定は “追加の分岐/ロード” になり、現状の hot path では得にならない
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
- **Rollback**: `HAKMEM_ENV_SNAPSHOT_CTOR=0`
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
- **Phase 5 E4-1(Free Wrapper ENV Snapshot)** ✅ GO (PROMOTION READY):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1
|
|
|
|
|
|
```
|
|
|
|
|
|
- **Status**: ✅ GO(Mixed 10-run: **+3.51% mean / +4.07% median**)→ ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset default(opt-out 可)
|
|
|
|
|
|
- **Effect**: `free()` wrapper の ENV 判定(複数 TLS read)を TLS snapshot 1 本に集約して early gate を短絡
|
|
|
|
|
|
- **Rollback**: `HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0`
|
2025-12-14 05:13:29 +09:00
|
|
|
|
- **Phase 5 E4-2(Malloc Wrapper ENV Snapshot)** ✅ GO (PROMOTION READY):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1
|
|
|
|
|
|
```
|
|
|
|
|
|
- **Status**: ✅ GO(Mixed 10-run: **+21.83% mean / +22.86% median**)→ ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset default(opt-out 可)
|
|
|
|
|
|
- **Effect**: `malloc()` wrapper の tiny fast 判定を TLS snapshot で短絡し、hot path の関数呼び出し/判定を削減(特に `tiny_get_max_size()`)
|
|
|
|
|
|
- **Rollback**: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0`
|
2025-12-14 05:59:43 +09:00
|
|
|
|
- **Phase 5 E5-1(Free Tiny Direct Path)** ✅ GO (PROMOTED TO DEFAULT):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_FREE_TINY_DIRECT=1
|
|
|
|
|
|
```
|
|
|
|
|
|
- **Status**: ✅ GO(Mixed 10-run: **+3.35% mean / +3.36% median**)→ ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset default(opt-out 可)
|
|
|
|
|
|
- **Effect**: free wrapper で Tiny header を 1 回だけ検証し、Tiny free を直通(重複/コールド分岐を回避)
|
|
|
|
|
|
- **Rollback**: `HAKMEM_FREE_TINY_DIRECT=0`
|
2025-12-10 09:08:18 +09:00
|
|
|
|
- v2 系は触らない(C7_SAFE では Pool v2 / Tiny v2 は常時 OFF)。
|
2025-12-10 14:12:13 +09:00
|
|
|
|
- FREE_POLICY/THP を触る実験例(現在の HEAD では必須ではなく、組み合わせによっては微マイナスになる場合もある):
|
2025-12-10 09:08:18 +09:00
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_FREE_POLICY=keep
|
|
|
|
|
|
HAKMEM_DISABLE_BATCH=1
|
|
|
|
|
|
HAKMEM_SS_MADVISE_STRICT=0
|
2025-12-10 14:12:13 +09:00
|
|
|
|
# or
|
|
|
|
|
|
HAKMEM_FREE_POLICY=batch
|
|
|
|
|
|
HAKMEM_THP=auto
|
2025-12-10 09:08:18 +09:00
|
|
|
|
```
|
2025-12-10 22:57:26 +09:00
|
|
|
|
- 参考(v4 研究箱の現状):
|
|
|
|
|
|
- C7/C6 v4 + fast classify v4 ON(v3 OFF, segment OFF): **≈32.0–32.5M ops/s**(MIXED 1M/ws=400, Release)。
|
|
|
|
|
|
- C7-only v4(C6 v1、v3 OFF): **≈33.0M ops/s**。
|
|
|
|
|
|
- 現状は v3 構成が最速のため、標準プロファイルでは v4 系をすべて OFF に固定。
|
2025-12-10 09:08:18 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Profile 2: C6_HEAVY_LEGACY_POOLV1(mid/smallmid C6-heavy ベンチ)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- C6-heavy mid/smallmid のベンチ用。
|
2025-12-10 22:57:26 +09:00
|
|
|
|
- C6 は v1 固定(C6 v3/v4/ULTRA は研究箱のみ)。Pool v2 OFF。Pool v1 flatten は bench 用に opt-in。
|
2025-12-10 09:08:18 +09:00
|
|
|
|
|
2025-12-12 03:12:28 +09:00
|
|
|
|
### ENV(v1 基準線 + MID v3)
|
2025-12-10 09:08:18 +09:00
|
|
|
|
```sh
|
2025-12-12 03:12:28 +09:00
|
|
|
|
HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1
|
|
|
|
|
|
# または直接指定:
|
2025-12-10 09:08:18 +09:00
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=768
|
|
|
|
|
|
HAKMEM_TINY_HEAP_PROFILE=C7_SAFE
|
2025-12-10 22:57:26 +09:00
|
|
|
|
HAKMEM_TINY_C6_HOT=0
|
2025-12-10 09:08:18 +09:00
|
|
|
|
HAKMEM_TINY_HOTHEAP_V2=0
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_ENABLED=1
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 # C7-only v3, C6 v3 は OFF
|
|
|
|
|
|
HAKMEM_POOL_V2_ENABLED=0
|
|
|
|
|
|
HAKMEM_POOL_V1_FLATTEN_ENABLED=0 # flatten は初回 OFF
|
2025-12-12 03:12:28 +09:00
|
|
|
|
HAKMEM_MID_V3_ENABLED=1 # Phase MID-V3: 257-768B, C6 only
|
|
|
|
|
|
HAKMEM_MID_V3_CLASSES=0x40 # C6 only (+11% on C6-heavy)
|
2025-12-12 16:26:42 +09:00
|
|
|
|
HAKMEM_MID_V35_ENABLED=1 # Phase v11a-5: C6-heavy で +8% 改善
|
|
|
|
|
|
HAKMEM_MID_V35_CLASSES=0x40 # C6 only (53.1M ops/s)
|
2025-12-13 18:46:11 +09:00
|
|
|
|
HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1 # Phase 2 B3: alloc route dispatch shape (ADOPT)
|
2025-12-12 19:19:25 +09:00
|
|
|
|
|
|
|
|
|
|
# Phase MID-V35-HOTPATH-OPT-1: C6-heavy 最速セット(推奨ON)
|
|
|
|
|
|
# 機能: header prefill + hot counts削除 + C6 fast path (組み合わせで +7.3%)
|
|
|
|
|
|
HAKMEM_MID_V35_HEADER_PREFILL=1 # refill境界でheader先行書き
|
|
|
|
|
|
HAKMEM_MID_V35_HOT_COUNTS=0 # alloc_count削除(free_count/retire残す)
|
|
|
|
|
|
HAKMEM_MID_V35_C6_FASTPATH=1 # C6特化 fast path (constant slot_size=512)
|
2025-12-10 09:08:18 +09:00
|
|
|
|
```
|
2025-12-10 14:00:57 +09:00
|
|
|
|
- mid_desc_lookup TLS キャッシュを試すときだけ: `HAKMEM_MID_DESC_CACHE_ENABLED=1` を上乗せ(デフォルトは OFF)。
|
2025-12-10 09:08:18 +09:00
|
|
|
|
|
|
|
|
|
|
### Pool v1 flatten A/B 用(LEGACY 専用)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
# LEGACY + flatten ON (研究/bench専用)
|
|
|
|
|
|
HAKMEM_TINY_HEAP_PROFILE=LEGACY
|
|
|
|
|
|
HAKMEM_POOL_V2_ENABLED=0
|
|
|
|
|
|
HAKMEM_POOL_V1_FLATTEN_ENABLED=1
|
|
|
|
|
|
HAKMEM_POOL_V1_FLATTEN_STATS=1
|
2025-12-10 22:57:26 +09:00
|
|
|
|
```
|
2025-12-10 14:00:57 +09:00
|
|
|
|
|
|
|
|
|
|
## Profile 2b: C6_HEAVY_LEGACY_POOLV1_FLATTEN(mid/smallmid LEGACY flatten ベンチ専用)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- LEGACY プロファイルで mid/smallmid の flatten + header-only zero をまとめて opt-in するベンチ専用セット。
|
|
|
|
|
|
- C7_SAFE では使わないこと(安定性優先のため C7_SAFE は flatten 常時 OFF)。
|
|
|
|
|
|
|
|
|
|
|
|
### ENV(ベンチ専用)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 # base を流用
|
|
|
|
|
|
HAKMEM_POOL_V1_FLATTEN_ENABLED=1
|
|
|
|
|
|
HAKMEM_POOL_ZERO_MODE=header
|
|
|
|
|
|
HAKMEM_POOL_V1_FLATTEN_STATS=1
|
|
|
|
|
|
```
|
|
|
|
|
|
※ LEGACY 専用。C7_SAFE / C7_ULTRA_BENCH ではこのプリセットを使用しないこと。
|
2025-12-10 09:08:18 +09:00
|
|
|
|
- flatten は LEGACY 専用。C7_SAFE / C7_ULTRA_BENCH ではコード側で強制 OFF になる前提。
|
|
|
|
|
|
|
2025-12-10 22:57:26 +09:00
|
|
|
|
### C6 研究用プリセット(標準ラインには影響させない)
|
|
|
|
|
|
|
|
|
|
|
|
- C6 v3 研究(Tiny/SmallObject に C6 を載せるときだけ)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_PROFILE=C6_SMALL_HEAP_V3_EXPERIMENT
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=768
|
|
|
|
|
|
# bench_profile が以下を自動注入(既存 ENV を上書きしません):
|
|
|
|
|
|
# HAKMEM_TINY_C6_HOT=1
|
|
|
|
|
|
# HAKMEM_SMALL_HEAP_V3_ENABLED=1
|
|
|
|
|
|
# HAKMEM_SMALL_HEAP_V3_CLASSES=0x40 # C6 only v3
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
- C6 v4 研究(C6 を v4 に載せるときだけ)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_PROFILE=C6_SMALL_HEAP_V4_EXPERIMENT
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=768
|
|
|
|
|
|
# bench_profile が以下を自動注入(既存 ENV を上書きしません):
|
|
|
|
|
|
# HAKMEM_TINY_C6_HOT=1
|
|
|
|
|
|
# HAKMEM_SMALL_HEAP_V3_ENABLED=0
|
|
|
|
|
|
# HAKMEM_SMALL_HEAP_V4_ENABLED=1
|
|
|
|
|
|
# HAKMEM_SMALL_HEAP_V4_CLASSES=0x40 # C6 only v4
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
※ いずれも「研究箱」です。Mixed/C6-heavy の標準評価では使わず、回帰やセグフォを許容できるときだけ明示的に opt-in してください。
|
|
|
|
|
|
|
2025-12-10 09:08:18 +09:00
|
|
|
|
---
|
|
|
|
|
|
|
2025-12-11 03:25:37 +09:00
|
|
|
|
## Research Profile 0: C6_SMALL_HEAP_V5_STUB(SmallObject v5 C6-only route stub, Phase v5-1)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- C6-only を SmallObject v5 route に載せるベンチ専用(v5-1 段階では挙動は v1/pool fallback)。
|
|
|
|
|
|
- ENV gate(HAKMEM_SMALL_HEAP_V5_ENABLED=1, HAKMEM_SMALL_HEAP_V5_CLASSES=0x40)で route 制御。
|
|
|
|
|
|
- front 経由で v5 通電確認&sanity テスト(実装は v5-2 以降)。
|
|
|
|
|
|
- Mixed/C6-heavy で v1 baseline と同じ perf を期待。
|
|
|
|
|
|
|
|
|
|
|
|
### ENV(v5 C6-only opt-in)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=768
|
|
|
|
|
|
HAKMEM_TINY_HEAP_PROFILE=C7_SAFE
|
|
|
|
|
|
HAKMEM_TINY_C6_HOT=0
|
|
|
|
|
|
HAKMEM_TINY_HOTHEAP_V2=0
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_ENABLED=1
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 # C7-only v3(C6 は v3 OFF)
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_ENABLED=1 # ★ v5 ON
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_CLASSES=0x40 # ★ C6(bit6) だけ v5 route に
|
|
|
|
|
|
HAKMEM_POOL_V2_ENABLED=0
|
|
|
|
|
|
HAKMEM_POOL_V1_FLATTEN_ENABLED=0
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### テストコマンド
|
|
|
|
|
|
```sh
|
|
|
|
|
|
export HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1
|
|
|
|
|
|
export HAKMEM_SMALL_HEAP_V5_ENABLED=1
|
|
|
|
|
|
export HAKMEM_SMALL_HEAP_V5_CLASSES=0x40
|
|
|
|
|
|
./bench_random_mixed_hakmem 256 512 100 # C6 size range
|
|
|
|
|
|
./bench_mid_large_mt_hakmem 1 1000000 400 1 # pool baseline comparison
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 期待値
|
|
|
|
|
|
- Throughput ≈ v1 baseline(変化なし、v1 fallback の為)
|
|
|
|
|
|
- segv/assert なし
|
|
|
|
|
|
- route snapshot で C6 → TINY_ROUTE_SMALL_HEAP_V5 に分岐確認
|
|
|
|
|
|
|
|
|
|
|
|
### 注意
|
|
|
|
|
|
- v5-1 では中身は v1/pool fallback のまま(実装は v5-2)
|
|
|
|
|
|
- 本線には載せない、研究箱扱い
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2025-12-10 23:37:45 +09:00
|
|
|
|
## Research Profile 1: C6_ONLY_SMALLOBJECT_V4(SmallObject v4 C6-only 試運転)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- C6-only を SmallObject v4 route に載せて、page_meta_of() の試運転。
|
|
|
|
|
|
- 挙動はまだ pool v1 fallback のため、perf は v1 固定と同じ。
|
|
|
|
|
|
- Phase v4-mid-1: page_meta_of() が落ちないか、segv/assert なしか確認する研究ベンチ。
|
|
|
|
|
|
|
|
|
|
|
|
### ENV(v4 C6-only opt-in)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=768
|
|
|
|
|
|
HAKMEM_TINY_HEAP_PROFILE=C7_SAFE
|
|
|
|
|
|
HAKMEM_TINY_C6_HOT=0
|
|
|
|
|
|
HAKMEM_TINY_HOTHEAP_V2=0
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_ENABLED=1
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 # C7-only v3(C6 は v3 OFF)
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V4_ENABLED=1 # ★ v4 ON
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V4_CLASSES=0x40 # ★ C6(bit6) だけ v4 route に
|
|
|
|
|
|
HAKMEM_POOL_V2_ENABLED=0
|
|
|
|
|
|
HAKMEM_POOL_V1_FLATTEN_ENABLED=0
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### テストコマンド
|
|
|
|
|
|
```sh
|
|
|
|
|
|
export HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1
|
|
|
|
|
|
export HAKMEM_SMALL_HEAP_V4_ENABLED=1
|
|
|
|
|
|
export HAKMEM_SMALL_HEAP_V4_CLASSES=0x40
|
|
|
|
|
|
./bench_mid_large_mt_hakmem 1000000 400 1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 期待値
|
|
|
|
|
|
- Throughput ≈ **28–29M ops/s**(v1 基線の ≈28M と同じ)
|
|
|
|
|
|
- segv/assert なし
|
|
|
|
|
|
- small_segment_v4_page_meta_of(ptr) が動く(debug output で確認可能)
|
|
|
|
|
|
|
|
|
|
|
|
### 注意
|
|
|
|
|
|
- 実際の alloc/free 動作は pool v1 のまま(v4 freelist は使わない)
|
|
|
|
|
|
- Phase v4-mid-2 で本格実装時に差し替える
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Research Profile 2: C7_C6_V4_EXPERIMENT(C7+C6 v4 統合研究)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- 後続フェーズで C7+C6 両者を v4 に載せるときの参考プリセット。
|
|
|
|
|
|
- 現フェーズではまだ使わない(v4-mid-1 は C6-only)。
|
|
|
|
|
|
|
|
|
|
|
|
### ENV(参考用)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=16
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=1024
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V4_ENABLED=1
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V4_CLASSES=0xC0 # C6(0x40) + C7(0x80)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2025-12-10 09:08:18 +09:00
|
|
|
|
## Profile 3: DEBUG_TINY_FRONT_PERF(perf 用 DEBUG プロファイル)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- Tiny front v3(C7 v3 含む)の perf record 用。
|
|
|
|
|
|
- -O0 / -g / LTO OFF でシンボル付き計測。
|
|
|
|
|
|
|
|
|
|
|
|
### ビルド例
|
|
|
|
|
|
```sh
|
|
|
|
|
|
make clean
|
|
|
|
|
|
CFLAGS='-O0 -g' USE_LTO=0 OPT_LEVEL=0 NATIVE=0 \
|
|
|
|
|
|
make bench_random_mixed_hakmem -j4
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### ENV
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=16
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=1024
|
|
|
|
|
|
HAKMEM_TINY_HEAP_PROFILE=C7_SAFE
|
|
|
|
|
|
HAKMEM_TINY_C7_HOT=1
|
|
|
|
|
|
HAKMEM_TINY_HOTHEAP_V2=0
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_ENABLED=1
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V3_CLASSES=0x80
|
|
|
|
|
|
HAKMEM_POOL_V2_ENABLED=0
|
|
|
|
|
|
HAKMEM_TINY_FRONT_V3_ENABLED=1
|
|
|
|
|
|
HAKMEM_TINY_FRONT_V3_LUT_ENABLED=1
|
|
|
|
|
|
HAKMEM_TINY_PTR_FAST_CLASSIFY_ENABLED=1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### perf 例
|
|
|
|
|
|
```sh
|
|
|
|
|
|
perf record -F 5000 --call-graph dwarf -e cycles:u \
|
|
|
|
|
|
-o perf.data.tiny_front_tf3 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 1000000 400 1
|
|
|
|
|
|
```
|
|
|
|
|
|
- perf 計測時はログを極力 OFF、ENV は MIXED_TINYV3_C7_SAFE をベースにする。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2025-12-12 06:13:15 +09:00
|
|
|
|
## Research Profile 3: C5_C6_SMALL_HEAP_V7_LEARNER(SmallObject v7 C5/C6 専用プリセット, Phase v10)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- **C5/C6 サイズ帯専用(Mixed 本線では使わない)**
|
|
|
|
|
|
- SmallObject v7 + Learner で動的ルート最適化を検証。
|
|
|
|
|
|
- Learner がワークロード比率を監視し、C5 route を v7 ↔ MID_v3 で自動切り替え。
|
|
|
|
|
|
- Phase v10 で v7 を「C5/C6 専用プリセット」として凍結し、本プリセットで完全に管理。
|
|
|
|
|
|
|
|
|
|
|
|
### 性能実績
|
|
|
|
|
|
- C5/C6 mixed (200-500B, 300K iter):
|
|
|
|
|
|
- **v7 + Learner: 38.7M ops/s** (+127% vs Legacy)
|
|
|
|
|
|
- Learner route switch: C5 ratio 28% < threshold 30% → MID_v3 自動切り替え
|
|
|
|
|
|
|
|
|
|
|
|
### ENV(v7 C5/C6 opt-in)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=200
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=500
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V7_ENABLED=1 # ★ v7 ON
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V7_CLASSES=0x60 # ★ C5(0x20) + C6(0x40)
|
|
|
|
|
|
# Learner は v7 enabled 時にデフォルト ON (Phase v10)
|
|
|
|
|
|
# HAKMEM_SMALL_LEARNER_V7_ENABLED=0 で個別無効化可
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### テストコマンド
|
|
|
|
|
|
```sh
|
|
|
|
|
|
# C5/C6 混合(50/50)
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=200 HAKMEM_BENCH_MAX_SIZE=500 \
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V7_ENABLED=1 HAKMEM_SMALL_HEAP_V7_CLASSES=0x60 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 300000 400 1
|
|
|
|
|
|
|
|
|
|
|
|
# C6 heavy(90% C6, 10% C5)→ Learner route switch トリガー
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=200 HAKMEM_BENCH_MAX_SIZE=500 \
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V7_ENABLED=1 HAKMEM_SMALL_HEAP_V7_CLASSES=0x60 \
|
|
|
|
|
|
./bench_allocators --allocator hakmem --scenario c6heavy --iterations 10
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 期待値
|
|
|
|
|
|
- Throughput: **38-39M ops/s** (C5/C6 mixed)
|
|
|
|
|
|
- Learner log: `[LEARNER_V7] C5 route switch: V7 → MID_V3` (C5 ratio < 30%)
|
|
|
|
|
|
- Route decision: `[POLICY_V7_INIT] C5: v7, C6: v7`
|
|
|
|
|
|
|
|
|
|
|
|
### 注意
|
|
|
|
|
|
- **Mixed 本線(16-1040B)では v7 OFF** (HAKMEM_SMALL_HEAP_V7_ENABLED=0)
|
|
|
|
|
|
- v7 は C5/C6 専用、他のサイズ帯には影響なし
|
|
|
|
|
|
- Learner は C5 の動的最適化のみ(C6 は固定 v7)
|
|
|
|
|
|
- Phase v10 で v3/v4/v5 削除済み(古い ENV は無視される)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2025-12-12 18:40:08 +09:00
|
|
|
|
## Research Profile 4: C6_ULTRA_INTRUSIVE_EXPERIMENT_V12(C6 ULTRA intrusive LIFO vs array magazine, Phase TLS-UNIFY-3)
|
|
|
|
|
|
|
|
|
|
|
|
**FROZEN - Research Only**: Phase TLS-UNIFY-3 validation complete.
|
|
|
|
|
|
Findings:
|
|
|
|
|
|
- C6-heavy (257-512B): +3.8% improvement ✅
|
|
|
|
|
|
- Mixed (16-1024B): -12~14% regression ❌ (policy overhead + TLS contention)
|
|
|
|
|
|
- Recommendation: Use only for C6-heavy workloads or research/debugging
|
|
|
|
|
|
- Default: OFF (MID v3/v3.5 faster for Mixed)
|
|
|
|
|
|
|
|
|
|
|
|
### 目的
|
|
|
|
|
|
- **Phase TLS-UNIFY-3 validation**: C6 ULTRA intrusive LIFO freelist と array magazine の比較。
|
|
|
|
|
|
- C6 を ULTRA path に routing し、TLS 内の LIFO 表現だけを A/B。
|
|
|
|
|
|
- ULTRA routing は MID v3/v3.5 を override するため、研究コンテキストのみで使用。
|
|
|
|
|
|
|
|
|
|
|
|
### 性能実績
|
|
|
|
|
|
- C6-heavy (257-512B, 1M iter, ws=200, 5-run mean):
|
|
|
|
|
|
- Baseline (C6=MID v3.5): 55.3M ops/s
|
|
|
|
|
|
- ULTRA+array (intrusive OFF): 57.4M ops/s (+3.79% vs Baseline)
|
|
|
|
|
|
- ULTRA+intrusive (intrusive ON): 54.5M ops/s (-1.44% vs Baseline, ✅ PASS)
|
|
|
|
|
|
- IFL stats: push=265,890 / pop=265,815 / fallback=0(perfect LIFO behavior)
|
|
|
|
|
|
- Mixed 16–1024B(標準本線):
|
|
|
|
|
|
- **ULTRA+intrusive は約 -14% の回帰**を確認。
|
|
|
|
|
|
- Root cause:
|
|
|
|
|
|
- 8 クラス(C0–C7)が 1TLS 内で競合し、C4/C5/C6/C7 の ULTRA TLS(約2KB)が奪い合い状態になる。
|
|
|
|
|
|
- ULTRA miss が増え、Legacy fallback が約 24% に達する。
|
|
|
|
|
|
- **結論**: Mixed 本線では C6 ULTRA を使わない(`MIXED_TINYV3_C7_SAFE` の設計どおり)。
|
|
|
|
|
|
|
|
|
|
|
|
### ENV(ULTRA intrusive opt-in)
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257
|
|
|
|
|
|
HAKMEM_BENCH_MAX_SIZE=512
|
|
|
|
|
|
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 # ★ C6 を ULTRA path に routing
|
|
|
|
|
|
HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 # ★ intrusive LIFO freelist 有効化
|
|
|
|
|
|
HAKMEM_FREE_PATH_STATS=1 # stats 取得用
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### テストコマンド
|
|
|
|
|
|
```sh
|
|
|
|
|
|
# Baseline: C6=MID v3.5 (ULTRA routing なし)
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 1000000 200 1
|
|
|
|
|
|
|
|
|
|
|
|
# ULTRA+array: array magazine (intrusive OFF)
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
|
|
|
|
|
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=0 \
|
|
|
|
|
|
HAKMEM_FREE_PATH_STATS=1 ./bench_random_mixed_hakmem 1000000 200 1
|
|
|
|
|
|
|
|
|
|
|
|
# ULTRA+intrusive: intrusive LIFO freelist
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
|
|
|
|
|
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 \
|
|
|
|
|
|
HAKMEM_FREE_PATH_STATS=1 ./bench_random_mixed_hakmem 1000000 200 1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 期待値
|
|
|
|
|
|
- ULTRA+intrusive >= Baseline(or small regression < 5%)
|
|
|
|
|
|
- c6_ifl_fallback ≈ 0(intrusive LIFO が正常動作)
|
|
|
|
|
|
- c6_ultra_free/alloc > 0(ULTRA path が動作)
|
|
|
|
|
|
|
|
|
|
|
|
### 注意
|
|
|
|
|
|
- **WARNING**: ULTRA routing overrides MID v3/v3.5 - use only in research context.
|
|
|
|
|
|
- **Usage**: C6-heavy 専用の研究箱として使用(Mixed 本線では非推奨 / 回帰あり)。
|
|
|
|
|
|
- 本線には載せない、研究箱扱い。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2025-12-10 09:08:18 +09:00
|
|
|
|
### 共通注意
|
|
|
|
|
|
- プリセットから外れて単発の ENV を積み足すと再現が難しくなるので、まずは上記いずれかからスタートし、変更点を必ずメモしてください。
|
|
|
|
|
|
- v2 系(Pool v2 / Tiny v2)はベンチごとに opt-in。不要なら常に 0。
|
2025-12-12 06:13:15 +09:00
|
|
|
|
- **Phase v10**: v3/v4/v5 は削除されました(古い ENV は無視されます)。C5/C6 最適化には本プリセット(v7+Learner)を使用してください。
|