Commit Graph

117 Commits

Author SHA1 Message Date
fe70e3baf5 Phase MID-V35-HOTPATH-OPT-1 complete: +7.3% on C6-heavy
Step 0: Geometry SSOT
  - New: core/box/smallobject_mid_v35_geom_box.h (L1/L2 consistency)
  - Fix: C6 slots/page 102→128 in L2 (smallobject_cold_iface_mid_v3.c)
  - Applied: smallobject_mid_v35.c, smallobject_segment_mid_v3.c

Step 1-3: ENV gates for hotpath optimizations
  - New: core/box/mid_v35_hotpath_env_box.h
    * HAKMEM_MID_V35_HEADER_PREFILL (default 0)
    * HAKMEM_MID_V35_HOT_COUNTS (default 1)
    * HAKMEM_MID_V35_C6_FASTPATH (default 0)
  - Implementation: smallobject_mid_v35.c
    * Header prefill at refill boundary (Step 1)
    * Gated alloc_count++ in hot path (Step 2)
    * C6 specialized fast path with constant slot_size (Step 3)

A/B Results:
  C6-heavy (257–768B): 8.75M→9.39M ops/s (+7.3%, 5-run mean) 
  Mixed (16–1024B): 9.98M→9.96M ops/s (-0.2%, within noise) ✓

Decision: FROZEN - defaults OFF, C6-heavy推奨ON, Mixed現状維持
Documentation: ENV_PROFILE_PRESETS.md updated

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 19:19:25 +09:00
e95e61f0ff Phase POLICY-FAST-PATH-V2 complete + MID-V35-HOTPATH-OPT-1 design
## Phase POLICY-FAST-PATH-V2 (FROZEN)
- Implementation complete: free_policy_fast_v2_box.h + malloc_tiny_fast.h integration
- A/B Results:
  - Mixed (ws=400): -1.6% regression  (branch cost > skip benefit)
  - C6-heavy (ws=200): +5.4% improvement 
- Decision: Default OFF, FROZEN (ws<300 / C6-heavy research only)
- Learning: Large WS causes branch misprediction to dominate

## Phase 3-GRADUATE + ENV probe fix
- 64-probe retry for getenv() stability during bench_profile putenv()
- C6 ULTRA intrusive freelist: FROZEN (research box)

## Phase MID-V35-HOTPATH-OPT-1-DESIGN
- Design doc for next optimization target
- Target: MID v3.5 alloc/free hot path (C5-C6)
- Boxes: Stats Gate, TLS Layout, Boundary Check elimination
- Expected: +3-9% on Mixed mainline

Files:
- core/box/free_policy_fast_v2_box.h (new)
- core/box/free_path_stats_box.h/c (policy_fast_v2_skip counter)
- core/front/malloc_tiny_fast.h (fast-path integration)
- docs/analysis/MID_V35_HOTPATH_OPT_1_DESIGN.md (new)
- docs/analysis/PHASE_3_GRADUATE_*.md (new)
- CURRENT_TASK.md (phase status update)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-12 18:40:08 +09:00
1a8652a91a Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)
Implement C6 ULTRA intrusive LIFO freelist with ENV gating:
- Single-linked LIFO using next pointer at USER+1 offset
- tiny_next_store/tiny_next_load for pointer access (single source of truth)
- Segment learning via ss_fast_lookup (per-class seg_base/seg_end)
- ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF)
- Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS

Files:
- core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO
- core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6)
- core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new)
- core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new)
- core/tiny_debug_ring.h: C6_IFL events
- core/box/free_path_stats_box.h/c: c6_ifl_* counters

A/B Test Results (1M iterations, ws=200, 257-512B):
- ENV_OFF (array): 56.6 Mop/s avg
- ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise)
- Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 16:26:42 +09:00
bf83612b97 Phase v11a-4: Mixed本線ベンチマーク結果追加
Results:
- C6-heavy (257-512B): +5.1% (34.0M → 35.8M ops/s)
- Mixed 16-1024B:      +4.4% (38.6M → 40.3M ops/s)

Conclusion: Mixed本線で C6→MID v3.5 は採用候補。
予測(+1-3%)を上回る +4-5% の改善を確認。

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 07:17:52 +09:00
212739607a Phase v11a-3: MID v3.5 Activation (Build Complete)
Integrated MID v3.5 into active code path, making it available for C5/C6/C7 routing.

Key Changes:
- Policy Box: Added SMALL_ROUTE_MID_V35 with ENV gates (HAKMEM_MID_V35_ENABLED, HAKMEM_MID_V35_CLASSES)
- HotBox: Implemented small_mid_v35_alloc/free with TLS-cached page allocation
- Front Gate: Wired MID_V35 routing into malloc_tiny_fast.h (priority: ULTRA > MID_V35 > V7)
- Build: Added core/smallobject_mid_v35.o to all object lists

Architecture:
- Slot sizes: C5=384B, C6=512B, C7=1024B
- Page size: 64KB (170/128/64 slots)
- Integration: ColdIface v2 (refill/retire), Stats v2 (observation), Learner v2 (dormant)

Status: Build successful, ready for A/B benchmarking
Next: Performance validation (C6-heavy, C5+C6-only, Mixed benchmarks)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 06:52:14 +09:00
79674c9390 Phase v10: Remove legacy v3/v4/v5 implementations
Removal strategy: Deprecate routes by disabling ENV-based routing
- v3/v4/v5 enum types kept for binary compatibility
- small_heap_v3/v4/v5_enabled() always return 0
- small_heap_v3/v4/v5_class_enabled() always return 0
- Any v3/v4/v5 ENVs are silently ignored, routes to LEGACY

Changes:
- core/box/smallobject_hotbox_v3_env_box.h: stub functions
- core/box/smallobject_hotbox_v4_env_box.h: stub functions
- core/box/smallobject_v5_env_box.h: stub functions
- core/front/malloc_tiny_fast.h: remove alloc/free cases (20+ lines)

Benefits:
- Cleaner routing logic (v6/v7 only for SmallObject)
- 20+ lines deleted from hot path validation
- No behavioral change (routes were rarely used)

Performance: No regression expected (v3/v4/v5 already disabled by default)

Next: Set Learner v7 default ON, production testing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 06:09:12 +09:00
ea905b2ccb docs: HAKMEM v2 generation summary and Phase v7-4 completion
- Add HAKMEM_V2_GENERATION_SUMMARY.md: comprehensive overview of v2 generation
- Update CURRENT_TASK.md: 'v2 generation complete' section
- Update SMALLOBJECT_V7_DESIGN.md: Phase v7-4 completion notes + v7-5 candidates

v2 generation freeze: ULTRA (FROZEN) / MID_v3 (stable) / v7 (research, code freeze)
Next: HakORune / JoinIR priority, HAKMEM resumes at v7-5 (multi-class expansion)

Layer structure (L0-L3) established, Box Theory implementation patterns confirmed.
Design documents serve as maps for future v7 second chapter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 04:00:55 +09:00
8143e8b797 Phase v7-4: Policy Box 導入 (L3 層の明確化とフロント芯の作り直し)
- SmallPolicyV7 Box: L3 Policy layer に配置、route 決定を一元化
- Route kind enum: SMALL_ROUTE_ULTRA / V7 / MID_V3 / LEGACY
- ENV priority (fixed): ULTRA > v7 > MID_v3 > LEGACY
- Frontend integration: v7 routing を Policy Box 経由に変更 (段階移行)
- Legacy compatibility: 既存の tiny_route_env_box.h は併用維持

Box Theory layer structure:
- L0: ULTRA (C4-C7, FROZEN)
- L1: SmallObject v7 (research box)
- L1': MID_v3 / LEGACY (fallback)
- L2: Segment / RegionId
- L3: Policy / Stats / Learner ← Policy Box added here

Frontend now follows clean "size→class→route_kind→switch" pattern.
ENV variables read once at Policy init, not scattered across frontend.

Future: ULTRA/MID_v3/LEGACY consolidation, Learner integration, flexible priority.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 03:50:58 +09:00
2bdf29a9ed Phase v7-3: TLS segment fast path optimization (RegionIdBox overhead reduction)
- SmallHeapCtx_v7: Add TLS segment hints (tls_seg_base/end) for fast bounds check
- free fast path: TLS segment hit → skip RegionIdBox binary search
- Simplified control flow: removed same-page cache (negligible benefit vs branch cost)
- Optimization: O(1) page_idx calculation via bit shift vs O(log N) RegionIdBox lookup

Performance improvement:
- Phase v7-2: 54.5M ops/s (-7.0% vs 58.6M legacy)
- Phase v7-3: 56.3M ops/s (-4.3% vs legacy)
- Overhead reduction: 38% (from -7.0% to -4.3%)

TLS segment hit path bypasses RegionIdBox for most C6 frees.
Remaining -4.3% overhead acceptable for modular v7 architecture.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 03:38:39 +09:00
0af409260d docs: Phase v7-2 results + Phase v7-3 design (TLS fast path + page_meta cache) 2025-12-12 03:13:13 +09:00
39a3c53dbc Phase v7-2: SmallObject v7 C6-only implementation with RegionIdBox integration
- SmallSegment_v7: 2MiB segment with TLS slot and free page stack
- ColdIface_v7: Page refill/retire between HotBox and SegmentBox
- HotBox_v7: Full C6-only alloc/free with header writing (HEADER_MAGIC|class_idx)
- Free path early-exit: Check v7 route BEFORE ss_fast_lookup (separate mmap segment)
- RegionIdBox: Register v7 segment for ptr->region lookup
- Benchmark: v7 ON ~54.5M ops/s (-7% overhead vs 58.6M legacy baseline)

v7 correctly balances alloc/free counts and page lifecycle.
RegionIdBox overhead identified as primary cost driver.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 03:12:28 +09:00
a8d0ab06fc MID-V3: Specialize to 257-768B, exclude C7 (ULTRA handles 1KB)
Role separation based on ultrathink analysis:
- MID v3: 257-768B専用 (C6 only, HAKMEM_MID_V3_CLASSES=0x40)
- C7 ULTRA: 769-1024B専用 (existing optimized path)

Changes:
- core/box/hak_alloc_api.inc.h: Remove C7 route, restrict to 257-768B
- core/box/mid_hotbox_v3_env_box.h: Update ENV comments
- docs/analysis/MID_POOL_V3_DESIGN.md: Add performance results & role
- CURRENT_TASK.md: Document MID-V3 completion & role separation

Verified:
- 257-768B with v3 ON: 1,199,526 ops/s (+1.7% vs baseline)
- 769-1024B with v3 ON: 1,181,254 ops/s (same as baseline, C7 excluded)
- C7 correctly routes to ULTRA instead of MID v3

Rationale: C7-only showed -11% regression, but C6/mixed showed +11-19%
improvement. Specializing to mid-range (257-768B) leverages v3 strengths
while keeping C7 on the proven ULTRA path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 01:14:13 +09:00
fbaaf232ae Phase V6-HDR 総括: ドキュメント整備 + v6 凍結宣言
## ドキュメント更新内容

1. CURRENT_TASK.md
   - V6-HDR-0~4 を 1 ブロックに集約(実装完了)
   - 性能推移サマリー(-3.5%~-8.3% → ±0% に回復)
   - 最終ベンチマーク結果(C6-heavy + Mixed)
   - 凍結宣言: v6 は研究箱として OFF がデフォルト

2. AGENTS.md
   - 「研究箱ポリシー: SmallObject v6」セクション追加
   - v6 の現在地・凍結ルール・ハンドリング条件を明示
   - 「基本的な設計目標達成 → 今後リソースは mid/pool へ」の方針を宣言

## 成果総括

### Headerless 設計検証
- RegionIdBox (分類のみ) + TLS-scope cache で ±数% baseline 相当
- 複数フェーズでボトルネック除去(P0: double validation → P1: page_meta cache)
- 実装可能性が実証された

### 設計成果物(参考価値あり)
- RegionIdBox 薄層設計(ptr→(kind, page_meta) のみ)
- Same-page TLS cache(64KiB page level の最適化)
- TLS-scope segment registration(マルチセグメント対応時の基盤)

### 凍結方針
- デフォルト OFF(ENV opt-in)
- バグ修正・基盤伝播以外は触らない
- mid/pool v3 による C6-heavy 改善に注力

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 00:23:54 +09:00
969170c0fb Doc: Update CURRENT_TASK.md with Phase V6-HDR-3 completion summary 2025-12-11 23:52:39 +09:00
df216b6901 Phase V6-HDR-3: SmallSegmentV6 実割り当て & RegionIdBox Registration
実装内容:
1. SmallSegmentV6のmmap割り当ては既に v6-0で実装済み
2. small_heap_ctx_v6() で segment 取得時に region_id_register_v6_segment() 呼び出し
3. region_id_v6.c に TLS スコープのセグメント登録ロジック実装:
   - 4つの static __thread 変数でセグメント情報をキャッシュ
   - region_id_register_v6_segment(): セグメント base/end を TLS に記録
   - region_id_lookup_v6(): TLS segment の range check を最初に実行
   - TLS cache 更新で O(1) lookup 実現
4. region_id_v6_box.h に SmallSegmentV6 type include & function 宣言追加
5. small_v6_region_observe_validate() に region_id_observe_lookup() 呼び出し追加

効果:
- HeaderlessデザインでRegionIdBoxが正式にSMALL_V6分類を返せるように
- TLS-scopedな簡潔な登録メカニズム (マルチスレッド対応)
- Fast path: TLS segment range check -> page_meta lookup
- Fall back path: 従来の small_page_meta_v6_of() による動的検出
- Latency: O(1) TLS cache hit rate がv6 alloc/free の大部分をカバー

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 23:51:48 +09:00
406835feb3 Phase V6-HDR-0: C6-only headerless core 設計確定
- CURRENT_TASK.md: V6-HDR-0 セクション追加(4層 Box Theory)
- SMALLOBJECT_CORE_V6_DESIGN.md: V6-HDR-0 設計方針追加
- REGIONID_V6_DESIGN.md: RegionIdBox 設計書新規作成
- smallobject_core_v6_box.h: SmallTlsLaneV6 型+TLS API 追加
- smallobject_core_v6.c: OBSERVE モード追加
- region_id_v6_box.h: RegionIdBox 型スケルトン
- page_stats_v6_box.h: PageStatsV6 箱スケルトン
- AGENTS.md: v6 研究箱ルールセクション追加

サニティベンチ: Mixed 42.1M, C6-heavy 25.0M(挙動不変確認)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 23:07:26 +09:00
2d684ffd25 Phase SO-BACKEND-OPT-1: v3 backend 分解&Tiny/ULTRA 完成世代宣言
=== 実装内容 ===

1. v3 backend 詳細計測
   - ENV: HAKMEM_SO_V3_STATS で alloc/free パス内訳計測
   - 追加 stats: alloc_current_hit, alloc_partial_hit, free_current, free_partial, free_retire
   - so_alloc_fast / so_free_fast に埋め込み
   - デストラクタで [ALLOC_DETAIL] / [FREE_DETAIL] 出力

2. v3 backend ボトルネック分析完了
   - C7-only: alloc_current_hit=99.99%, alloc_refill=0.9%, free_retire=0.1%, page_of_fail=0
   - Mixed: alloc_current_hit=100%, alloc_refill=0.85%, free_retire=0.07%, page_of_fail=0
   - 結論: v3 ロジック部分(ページ選択・retire)は完全最適化済み
   - 残り 5% overhead は内部コスト(header write, memcpy, 分岐)

3. Tiny/ULTRA 層「完成世代」宣言
   - 総括ドキュメント作成: docs/analysis/PERF_EXEC_SUMMARY_ULTRA_PHASE_20251211.md
   - CURRENT_TASK.md に Phase ULTRA 総括セクション追加
   - AGENTS.md に Tiny/ULTRA 完成世代宣言追加
   - 最終成果: Mixed 16–1024B = 43.9M ops/s (baseline 30.6M → +43.5%)

=== ボトルネック地図 ===

| 層 | 関数 | overhead |
|-----|------|----------|
| Front | malloc/free dispatcher | ~40–45% |
| ULTRA | C4–C7 alloc/free/refill | ~12% |
| v3 backend | so_alloc/so_free | ~5% |
| mid/pool | hak_super_lookup | 3–5% |

=== フェーズ履歴(Phase ULTRA cycle) ===

- Phase PERF-ULTRA-FREE-OPT-1: C4–C7 ULTRA統合 → +9.3%
- Phase REFACTOR: Code quality (60行削減)
- Phase PERF-ULTRA-REFILL-OPT-1a/1b: C7 ULTRA refill最適化 → +11.1%
- Phase SO-BACKEND-OPT-1: v3 backend分解 → 設計限界確認

=== 次フェーズ(独立ライン) ===

1. Phase SO-BACKEND-OPT-2: v3 header write削減 (1-2%)
2. Headerless/v6系: out-of-band header (1-2%)
3. mid/pool v3新設計: C6-heavy 10M → 20–25M

本フェーズでTiny/ULTRA層は「完成世代」として基盤固定。
今後の大きい変更はHeaderless/mid系の独立ラインで検討。

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 22:45:14 +09:00
022ba56033 Document Phase PERF-ULTRA-REFILL-OPT-1a/1b completion
実装完了・成功:
- Phase 1a: Page size macro化(division → bit shift)
- Phase 1b: Segment learning移動(free初回削除)
- 合算: +11.1% throughput improvement (39.5M → 43.9M ops/s)

このフェーズで C7 ULTRA refill パス最適化は完了。
次のボトルネック: so_alloc/so_free (v3 backend, 合計 ~5%)
新規ボトルネック発見時は Option A (v3 最適化) を推奨。
2025-12-11 22:16:27 +09:00
9fb2240319 Fix: Add alloc_gate_stats_box.o to BENCH_HAKMEM_OBJS_BASE; Document PERF-ULTRA-REBASE-4 findings
Phase PERF-ULTRA-REBASE-4 confirmed:
- dispatcher (25.48%) and alloc gate (21.13%) already heavily optimized via snapshot
- New bottleneck: C7 ULTRA refill path (tiny_c7_ultra_page_of at 1.78%)
- Recommendation: Next optimize C7 ULTRA refill for +1-2% overall gain
2025-12-11 21:36:58 +09:00
0f15adae4e Phase ALLOC-GATE-OPT-1: tiny_alloc_gate_fast 統計計測
- AllocGateStats 構造体追加(size2class/route/env/class分布)
- malloc_tiny_fast にカウンタ埋め込み
- ENV: HAKMEM_ALLOC_GATE_STATS (default 0)
- 挙動変更なし(計測のみ)

計測結果:
- Mixed: total=542k, size2class=0, route_calls=0, env_checks=275k, C4-C7=95.2%
  - size_to_class/route_for_class は完全削減済み(LUT 効果)
  - C4-C7 が 95% → ULTRA fast path が有効
  - env_checks ≈ c7_calls → C7 ULTRA の ENV gate が毎回呼ばれる
- C6-heavy: total=11 → malloc_tiny_fast はほぼ通らない(mid/pool 主体)

結論:
- alloc gate は既に十分最適化済み(LUT + ULTRA で削減済み)
- さらなる最適化余地は小さい(env_checks は軽量化済み、数%以下の効果)
- 次フェーズでは free dispatcher (29%) や C7 ULTRA refill (7%) など、他のボトルネックを狙う

詳細: docs/analysis/ALLOC_GATE_ANALYSIS.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 21:32:40 +09:00
118c0e4857 Phase FREE-DISPATCHER-OPT-1: free dispatcher 統計計測
**目的**: free dispatcher(29%)の内訳を細分化して計測。

**実装内容**:
- FreeDispatchStats 構造体追加(ENV: HAKMEM_FREE_DISPATCH_STATS, default 0)
- カウンタ: total_calls / domain (tiny/mid/large) / route (ultra/legacy/pool/v6) / env_checks / route_for_class_calls
- hak_free_at / tiny_route_for_class / tiny_route_snapshot_init にカウンタ埋め込み
- 挙動変更なし(計測のみ、ENV OFF 時は overhead ゼロ)

**計測結果**:

Mixed 16-1024B (1M iter, ws=400):
- total=8,081, route_calls=267,967, env_checks=9
- BENCH_FAST_FRONT により大半は早期リターン
- route_for_class は主に alloc 側で呼ばれる(267k calls vs 8k frees)
- ENV check は初期化時の 9回のみ(snapshot 効果)

C6-heavy (257-768B, 1M iter, ws=400):
- total=500,099, route_calls=1,034, env_checks=9
- fg_classify_domain に到達する free が多い
- route_for_class 呼び出しは極小(snapshot 効果)

**結論**:
- ENV check は既に十分最適化されている(初期化時のみ)
- route_for_class は alloc 側での呼び出しが主で、free 側は snapshot で O(1)
- 次フェーズ(OPT-2)では別のアプローチを検討

**ドキュメント追加**:
- docs/analysis/FREE_DISPATCHER_ANALYSIS.md(新規)
- CURRENT_TASK.md に Phase FREE-DISPATCHER-OPT-1 セクション追加

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 21:21:40 +09:00
11dc9d390a Phase PERF-ULTRA-FREE-OPT-1: C4-C7 ULTRA free 薄型化
- C4-C7 ULTRA free を pure TLS push + cold segment learning に統一
- C7 ULTRA free を同じパターンに整列(likely/unlikely + FREE_PATH_STAT_INC)
- C4/C5/C6 ULTRA は既に最適化済み(統一 legacy fallback 経由)
- base/user 変換を tiny_ptr_convert_box.h マクロで統一

実測値 (Mixed 16-1024B, 1M iter, ws=400):
- Baseline (C7 のみ): 42.0M ops/s, legacy=266,943 (49.2%)
- Optimized (C4-C7): 46.5M ops/s, legacy=26,025 (4.8%)
- 改善: +9.3% (+4M ops/s)

FREE_PATH_STATS:
- C6 ULTRA: 137,319 free + 137,241 alloc (100% カバー)
- C5 ULTRA: 68,871 free + 68,827 alloc (100% カバー)
- C4 ULTRA: 34,727 free + 34,696 alloc (100% カバー)
- Legacy: 266,943 → 26,025 (−90.2%, C2/C3 のみ)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 20:49:39 +09:00
753909fa4d Phase PERF-ULTRA-ALLOC-OPT-1 (改訂版): C7 ULTRA 内部最適化
設計判断:
- 寄生型 C7 ULTRA_FREE_BOX を削除(設計的に不整合)
- C7 ULTRA は C4/C5/C6 と異なり専用 segment + TLS を持つ独立サブシステム
- tiny_c7_ultra.c 内部で直接最適化する方針に統一

実装内容:
1. 寄生型パスの削除
   - core/box/tiny_c7_ultra_free_box.{h,c} 削除
   - core/box/tiny_c7_ultra_free_env_box.h 削除
   - Makefile から tiny_c7_ultra_free_box.o 削除
   - malloc_tiny_fast.h を元の tiny_c7_ultra_alloc/free 呼び出しに戻す

2. TLS 構造の最適化 (tiny_c7_ultra_box.h)
   - count を struct 先頭に移動(L1 cache locality 向上)
   - 配列ベース TLS キャッシュに変更(cap=128, C6 同等)
   - freelist: linked-list → BASE pointer 配列
   - cold フィールド(seg_base/seg_end/meta)を後方配置

3. alloc の純 TLS pop 化 (tiny_c7_ultra.c)
   - hot path: 1 分岐のみ(count > 0)
   - TLS access は 1 回のみ(ctx に cache)
   - ENV check を呼び出し側に移動
   - segment/page_meta アクセスは refill 時(cold path)のみ

4. free の UF-3 segment learning 維持
   - 最初の free で segment 学習(seg_base/seg_end を TLS に記憶)
   - 以降は範囲チェック → TLS push
   - 範囲外は v3 free にフォールバック

実測値 (Mixed 16-1024B, 1M iter, ws=400):
- tiny_c7_ultra_alloc self%: 7.66% (維持 - 既に最適化済み)
- tiny_c7_ultra_free self%: 3.50%
- Throughput: 43.5M ops/s

評価: 部分達成
- 設計一貫性の回復: 成功
- Array-based TLS cache 移行: 成功
- pure TLS pop パターン統一: 成功
- perf self% 削減(7.66% → 5-6%): 未達成(既に最適)

C7 ULTRA は独立サブシステムとして tiny_c7_ultra.c に閉じる設計を維持。
次は refill path 最適化または C4-C7 ULTRA free 群の軽量化へ。

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 20:39:46 +09:00
b381219a68 Phase PERF-ULTRA-REBASE-1 計測完了 + PERF-ULTRA-ALLOC-OPT-1 計画策定
## Phase PERF-ULTRA-REBASE-1 実施
- C4-C7 ULTRA 全て ON 状態での CPU ホットパス計測
- Mixed 16-1024B, 10M cycles での perf 分析
- **発見**: C7 ULTRA alloc が新しい最大ボトルネック(7.66% self%)

## ホットパス分析結果
| 順位 | 関数 | self% |
|------|------|-------|
| #1 | C7 ULTRA alloc | **7.66%** ← 最大ボトルネック |
| #2 | C4-C7 ULTRA free群 | 5.41% |
| #3 | gate/front前段 | 2.51% ← 既に十分薄い |
| #4 | header | < 0.17% ← ULTRA で削減済み |

## 戦略転換(重要)
これまで: 新しい箱や世代(v4/v5/v6)を追加
→ 今後: 既に当たりが出ている ULTRA 内部を細かく削る

理由:
- v6/v5 拡張は -12〜33% の大幅回帰
- gate/front や header はもう改善の余地が少ない
- C7 ULTRA alloc の 7.66% → 5-6% 削減で全体効果 2-3%

## Phase PERF-ULTRA-ALLOC-OPT-1 計画策定
- ターゲット: tiny_c7_ultra_alloc() の hot path を直線化
- 施策:
  1. TLS ヒットパスの直線化(env check/snapshot 削除)
  2. TLS freelist レイアウト最適化(L1 キャッシュ親和性)
  3. segment/page_meta アクセスの確認(slow path 確認)
- 計測: C7-only + Mixed での A/B テスト
- 期待: 7.66% → 5-6%、全体で +2-3M ops/s

## ドキュメント更新
- CURRENT_TASK.md: PERF-ULTRA-REBASE-1 結果と ALLOC-OPT-1 計画を追記
- TINY_C7_ULTRA_DESIGN.md: Phase PERF-ULTRA-ALLOC-OPT-1 セクション追加
- NEW: docs/analysis/PERF_ULTRA_ALLOC_OPT_1_PLAN.md - 詳細な実装計画書

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 20:05:09 +09:00
224cc8d1ca Docs: Phase FREE-LEGACY-OPT-4-4 completion summary + design notes
Phase 4-4 で C6 ULTRA free+alloc 統合(寄生型 TLS キャッシュ)が完了。
- Mixed 16-1024B: 40.2M → 42.3M ops/s (+5.2%)
- C6 legacy 完全排除: 137,319 → 0 (-100%)
- Legacy 半減: 266,942 → 129,623 (-51.4%)
- 次ターゲット: C5 (残存 Legacy の 53.1%)

設計メモを FREE_LEGACY_PATH_ANALYSIS.md に追記:
- Free-only TLS が失敗する理由
- Free+alloc 統合で成功する理由
- 寄生型 TLS キャッシュの設計原理
- C5/C4 への一般化可能性

次フェーズ候補を CURRENT_TASK.md に追記:
- 選択肢 1: C5 ULTRA への展開
- 選択肢 2: Tiny Alloc Hotpath 最適化
- 判断ポイント: stats 確認後に決定

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 19:11:44 +09:00
c60199182e Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
  - 2MiB segment / 64KiB page allocation
  - O(1) ptr→page_meta lookup with segment masking
  - C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)

Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
  - TLS ownership fast-path skip page_meta for 90%+ of frees
  - Batch header writes during refill (32 allocs = 1 header write)
  - TLS batch refill (1/32 refill frequency)
  - C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) 

Phase v6-4: Mixed hang fix (segment metadata lookup correction)
  - Root cause: metadata lookup was reading mmap region instead of TLS slot
  - Fix: use TLS slot descriptor with in_use validation
  - Mixed health: 5M iterations SEGV-free, 35.8M ops/s 

Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
  - Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
  - Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
  - Doxygen-style comments for all public functions
  - Result: 0 compiler warnings, maintained +1.2% performance

Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)

Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) 
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) 
- Build: 0 warnings, fully documented 

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
2f5d53fd6d Phase v5-5: TLS cache for C6 v5
Add 1-slot TLS cache to C6 v5 to reduce page_meta access overhead.

Implementation:
- Add HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED ENV (default: 0)
- SmallHeapCtxV5: add c6_cached_block field for TLS cache
- alloc: cache hit bypasses page_meta lookup, returns immediately
- free: empty cache stores block, full cache evicts old block first

Results (1M iter, ws=400, HEADER_MODE=full):
- C6-heavy (257-768B): 35.53M → 37.02M ops/s (+4.2%)
- Mixed 16-1024B: 38.04M → 37.93M ops/s (-0.3%, noise)

Known issue: header_mode=light has infinite loop bug
(freelist pointer/header collision). Full mode only for now.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 07:40:22 +09:00
e0fb7d550a Phase v5-2: SmallObject v5 C6-only 本実装 (WIP - header fix)
本実装修正:
- tiny_region_id_write_header() を追加: USER pointer を正しく返す
- TLS slot からの segment 探索 (page_meta_of)
- Page-level allocation で segment 再利用
- 2MiB alignment 保証 (4MiB 確保 + alignment)
- free パスの route 修正 (v4 から v5 への fallthrough 削除)

動作確認:
- SEGV 消失: alloc/free 基本動作 OK
- 性能: ~18-20M ops/s (baseline 43-47M の約 40-45%)
- 回帰原因: TLS slot 線形探索 O(n)、find_page O(n)

残タスク:
- O(1) segment lookup 最適化 (hash または array 直接参照)
- find_page 除去 (segment lookup 成功時)
- partial_count/list 管理の最適化

ENV デフォルト OFF なので本線影響なし。

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 04:14:51 +09:00
9c24bebf08 Phase v5-1: SmallObject v5 C6-only route stub 接続
- tiny_route_env_box.h: TINY_ROUTE_SMALL_HEAP_V5 enum 追加、route snapshot で C6→v5 分岐
- malloc_tiny_fast.h: alloc/free switch に v5 case 追加(v1/pool fallback)
- smallobject_hotbox_v5.c: stub 実装(alloc は NULL 返却、free は no-op)
- smallobject_hotbox_v5_box.h: 関数 signature に ctx パラメータ追加
- Makefile: core/smallobject_hotbox_v5.o をリンクリストに追加
- ENV_PROFILE_PRESETS.md: v5-1 プリセット追記
- CURRENT_TASK.md: Phase v5-1 完了記録

**特性**:
- ENV: HAKMEM_SMALL_HEAP_V5_ENABLED=1 / HAKMEM_SMALL_HEAP_V5_CLASSES=0x40 で opt-in
- テスト結果: C6-heavy (v5 OFF 15.5M → v5 ON 16.4M ops/s, 正常), Mixed 47.2M ops/s, SEGV/assert なし
- 挙動は v1/pool fallback と同じ(実装は v5-2)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 03:25:37 +09:00
dedfea27d5 Phase v5-0 refactor: ENV統一・マクロ化・構造体最適化
- ENV initialization を sentinel パターンで統一
  - ENV_UNINIT/ENABLED/DISABLED 定数追加
  - __builtin_expect で初期化チェックを最適化
  - small_heap_v5_enabled/class_mask を統一パターンに変更

- ポインタマクロ化(O(1) segment/page 計算)
  - SMALL_SEGMENT_V5_BASE_FROM_PTR: ptr から segment base を mask で計算
  - SMALL_SEGMENT_V5_PAGE_IDX: segment 内の page_idx を shift で計算
  - SMALL_SEGMENT_V5_PAGE_META: page_meta への O(1) access(bounds check付き)
  - SMALL_SEGMENT_V5_VALIDATE_MAGIC: magic 検証
  - SMALL_SEGMENT_V5_VALIDATE_PTR: Fail-Fast validation pipeline

- SmallClassHeapV5 に partial_count 追加
  - partial ページリストのカウンタを追加(refill/retire 最適化用)

- SmallPageMetaV5 の field 再配置(L1 cache 最適化)
  - hot fields (free_list, used, capacity) を先頭に集約
  - metadata (class_idx, flags, page_idx, segment) を後方配置
  - total 24B、offset コメント追加

- route priority ENV 追加
  - HAKMEM_ROUTE_PRIORITY={v4|v5|auto}(default: v4)
  - enum small_route_priority 定義
  - small_route_priority() 関数追加

- segment_size override ENV 追加
  - HAKMEM_SMALL_HEAP_V5_SEGMENT_SIZE(default: 2MiB)
  - power of 2 & >= 64KiB validation

挙動: 完全不変(v5 route は呼ばれない、ENV default OFF)
テスト: Mixed 16–1024B で 43.0–43.8M ops/s(変化なし)、SEGV/assert なし

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 03:19:18 +09:00
83d4096fbc Phase v5-0: SmallObject v5 の設計・型/IF/ENV スケルトン追加
設計ドキュメント:
- docs/analysis/SMALLOBJECT_V5_DESIGN.md: v5 アーキテクチャ全体設計

新規ファイル (v5 スケルトン):
- core/box/smallobject_hotbox_v5_box.h: HotBox v5 型定義
- core/box/smallsegment_v5_box.h: Segment v5 型定義
- core/box/smallobject_cold_iface_v5.h: ColdIface v5 IF宣言
- core/box/smallobject_v5_env_box.h: ENV ゲート
- core/smallobject_hotbox_v5.c: 実装 stub (完全 fallback)

特徴:
 型とインターフェースのみ定義(v5-0 は機能なし)
 ENV デフォルト OFF(HAKMEM_SMALL_HEAP_V5_ENABLED=0)
 挙動完全不変(Mixed/C6 benchmark 確認済み)
 v4 との区別を明確化 (*_v5 suffix)
 v5-1 (stub) → v5-2 (本実装) → v5-3 (Mixed) への段階実装準備完了

フェーズ:
- v5-0: 型定義のみ(現在)
- v5-1: C6-only stub route 追加
- v5-2: Segment/HotBox 本実装 (C6-only bench A/B)
- v5-3: Mixed での段階昇格 (C6 → C5 → ...)

目標性能: Mixed 16–1024B で 50–60M ops/s (mimalloc の 5割)

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 03:09:57 +09:00
bdfa32d869 Phase v4-mid-SEGV: C6 v4 を SmallSegment 専用に切り替え、TinyHeap SEGV を解決
問題: C6 v4 が TinyHeap のページを共有することで iters >= 800k で freelist 破壊 → SEGV 発生

修正内容:
- c6_segment_alloc_page_direct(): C6 専用ページ割当 (SmallSegment v4 経由, TinyHeap 非共有)
- c6_segment_release_page_direct(): C6 専用ページ返却
- cold_refill_page_v4() で C6 を分岐: SmallSegment 直接使用
- cold_retire_page_v4() で C6 を分岐: SmallSegment に直接返却
- fastlist state reset 処理追加 (L392-399)

結果:
 iters=1M, ws <= 390 で SEGV 消失
 C6-only: v4 OFF ~47M → v4 ON ~43M ops/s (−8.5%, 安定)
 Mixed: v4 ON で SEGV なし (小幅回帰許容)

方針: C6 v4 は研究箱として安定化完了。本線には載せない (既存 mid/pool v1 使用)。

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 02:39:32 +09:00
e486dd2c55 Phase v4-mid-6: Implement C6 v4 TLS Fastlist (Gated)
- Implemented TLS fastlist logic for C6 in smallobject_hotbox_v4.c (alloc/free).
- Added SmallC6FastState struct and g_small_c6_fast TLS variable.
- Gated the fastlist logic with HAKMEM_SMALL_HEAP_V4_FASTLIST (default OFF) due to observed instability in mixed workloads.
- Fixed a memory leak in small_heap_free_fast_v4 fallback path by calling hak_pool_free.
- Updated CURRENT_TASK.md with phase report.
2025-12-11 01:44:08 +09:00
dd974b49c5 Phase v4-mid-2, v4-mid-3, v4-mid-5: SmallObject HotBox v4 implementation and docs update
Implementation:
- SmallObject HotBox v4 (core/smallobject_hotbox_v4.c) now fully implements C6-only allocations and frees, including current/partial management and freelist operations.
- Cold Iface (tiny_heap based) for page refill/retire is integrated.
- Stats instrumentation (v4-mid-5) added to small_heap_alloc_fast_v4 and small_heap_free_fast_v4, with a new header file core/box/smallobject_hotbox_v4_stats_box.h and atexit dump function.

Updates:
- CURRENT_TASK.md has been condensed and updated with summaries of Phase v4-mid-2 (C6-only v4), Phase v4-mid-3 (C5-only v4 pilot), and the stats implementation (v4-mid-5).
- docs/analysis/SMALLOBJECT_V4_BOX_DESIGN.md updated with A/B results and conclusions for C6-only and C5-only v4 implementations.
- The previous CURRENT_TASK.md content has been archived to CURRENT_TASK_ARCHIVE_20251210.md.
2025-12-11 01:01:15 +09:00
3b4449d773 Phase v4-mid-1: C6-only v4 route + page_meta_of() Fail-Fast validation
Implementation:
- SMALL_SEGMENT_V4_* constants (SIZE=2MiB, PAGE_SIZE=64KiB, MAGIC=0xDEADBEEF)
- smallsegment_v4_page_meta_of(): O(1) mask+shift lookup with magic validation
  - Computes segment base: addr & ~(2MiB - 1)
  - Verifies SmallSegment magic number
  - Calculates page_idx: (addr - seg_base) >> PAGE_SHIFT (16)
  - Returns non-NULL sentinel for now (full page_meta[] in Phase v4-mid-2)

Stubs for C6-only phase:
- small_heap_alloc_fast_v4(): C6 returns NULL → pool v1 fallback
- small_heap_free_fast_v4(): C6 calls page_meta_of() for Fail-Fast, then pool v1 fallback

Documentation:
- ENV_PROFILE_PRESETS.md: Add "C6_ONLY_SMALLOBJECT_V4" research profile
  - HAKMEM_SMALL_HEAP_V4_ENABLED=1, HAKMEM_SMALL_HEAP_V4_CLASSES=0x40
  - Expected: Throughput ≈ 28–29M ops/s (same as v1)

Build:
- ビルド成功(警告のみ)
- Backward compatible, alloc/free stubs fall back to pool v1

Sanity:
- C6-heavy with v4 opt-in: segv/assert なし
- page_meta_of() lookup working correctly
- Performance unchanged (expected for stub phase)

Status:
- C6-only v4 route now available via ENV opt-in
- Phase v4-mid-2: SmallHeapCtx v4 full implementation with A/B

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 23:37:45 +09:00
52c65da783 Phase v4-mid-0: Small-object v4 型・IF 足場(箱化モジュール化)
- SmallHeapCtx/SmallPageMeta/SmallClassHeap typedef alias 追加
- SmallSegment struct (base/num_pages/owner_tid/magic) を smallsegment_v4_box.h に定義
- SmallColdIface_v4 direct function prototypes (refill/retire/remote_push/drain)
- smallobject_hotbox_v4.c の internal/public API 分離(small_segment_v4_internal)
- direct function stubs 実装(SmallColdIfaceV4 delegate 形式)
- ENV OFF デフォルト(ENABLED=0/CLASSES=0)で既存挙動 100% 不変
- ビルド成功・sanity 確認(mixed/C6-heavy、segv/assert なし)
- CURRENT_TASK.md に Phase v4-mid-0 記録

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 23:23:07 +09:00
2a13478dc7 Optimize C6 heavy and C7 ultra performance analysis with refined design refinements
- Update environment profile presets and visibility analysis
- Enhance small object and tiny segment v4 box implementations
- Refine C7 ultra and C6 heavy allocation strategies
- Add comprehensive performance metrics and design documentation

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 22:57:26 +09:00
9460785bd6 Enable C7 ULTRA segment path by default 2025-12-10 22:25:24 +09:00
bbb55b018a Add C7 ULTRA segment skeleton and TLS freelist 2025-12-10 22:19:32 +09:00
f2ce7256cd Add v4 C7/C6 fast classify and small-segment v4 scaffolding 2025-12-10 19:14:38 +09:00
3261025995 Phase v4-4: pilot C6 v4 route with opt-in gate 2025-12-10 18:18:05 +09:00
cbd33511eb Phase v4-3.1: reuse C7 v4 pages and record prep calls 2025-12-10 17:58:42 +09:00
677030d699 Document new Mixed baseline and C7 header dedup A/B 2025-12-10 14:38:49 +09:00
d576116484 Document current Mixed baseline throughput and ENV profile 2025-12-10 14:12:13 +09:00
406a2f4d26 Incremental improvements: mid_desc cache, pool hotpath optimization, and doc updates
**Changes:**
- core/box/pool_api.inc.h: Code organization and micro-optimizations
- CURRENT_TASK.md: Updated Phase MD1 (mid_desc TLS cache: +3.2% for C6-heavy)
- docs/analysis files: Various analysis and documentation updates
- AGENTS.md: Agent role clarifications
- TINY_FRONT_V3_FLATTENING_GUIDE.md: Flattening strategy documentation

**Verification:**
- random_mixed_hakmem: 44.8M ops/s (1M iterations, 400 working set)
- No segfaults or assertions across all benchmark variants
- Stable performance across multiple runs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 14:00:57 +09:00
0e5a2634bc Phase 82 Final: Documentation of mid_desc race fix and comprehensive A/B results
**Implementation Summary:**
- Early `mid_desc_init_once()` in `hak_pool_init_impl()` prevents uninitialized mutex crash
- Eliminates race condition that caused C7_SAFE + flatten crashes
- Enables safe operation across all profiles (C7_SAFE, LEGACY)

**Benchmark Results (C6_HEAVY_LEGACY_POOLV1, Release):**
- Phase 1 (Baseline): 3.03M / 14.86M / 26.67M ops/s (10K/100K/1M)
- Phase 2 (Zero Mode): +5.0% / -2.7% / -0.2%
- Phase 3 (Flatten): +3.7% / +6.1% / -5.0%
- Phase 4 (Combined): -5.1% / +8.8% / +2.0% (best at 100K: +8.8%)
- Phase 5 (C7_SAFE Safety): NO CRASH  (all iterations stable)

**Mainline Policy:**
- mid_desc initialization: Always enabled (crash prevention)
- Flatten: Default OFF (bench opt-in via HAKMEM_POOL_V1_FLATTEN_ENABLED=1)
- Zero Mode: Default FULL (bench opt-in via HAKMEM_POOL_ZERO_MODE=header)
- Workload-specific: Medium (100K) benefits most (+8.8%)

**Documentation Updated:**
- CURRENT_TASK.md: Added Phase 82 conclusions with benchmark table
- MID_LARGE_CPU_HOTPATH_ANALYSIS.md: Added Phase 82 Final with workload analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 09:35:18 +09:00
acc64f2438 Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement)
## Summary
- ChatGPT により bench_profile.h の setenv segfault を修正(RTLD_NEXT 経由に切り替え)
- core/box/pool_zero_mode_box.h 新設:ENV キャッシュ経由で ZERO_MODE を統一管理
- core/hakmem_pool.c で zero mode に応じた memset 制御(FULL/header/off)
- A/B テスト結果:ZERO_MODE=header で +15.34% improvement(1M iterations, C6-heavy)

## Files Modified
- core/box/pool_api.inc.h: pool_zero_mode_box.h include
- core/bench_profile.h: glibc setenv → malloc+putenv(segfault 回避)
- core/hakmem_pool.c: zero mode 参照・制御ロジック
- core/box/pool_zero_mode_box.h (新設): enum/getter
- CURRENT_TASK.md: Phase ML1 結果記載

## Test Results
| Iterations | ZERO_MODE=full | ZERO_MODE=header | Improvement |
|-----------|----------------|-----------------|------------|
| 10K       | 3.06 M ops/s   | 3.17 M ops/s    | +3.65%     |
| 1M        | 23.71 M ops/s  | 27.34 M ops/s   | **+15.34%** |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 09:08:18 +09:00
a905e0ffdd Guard madvise ENOMEM and stabilize pool/tiny front v3 2025-12-09 21:50:15 +09:00
e274d5f6a9 pool v1 flatten: break down free fallback causes and normalize mid_desc keys 2025-12-09 19:34:54 +09:00
8f18963ad5 Phase 36-37: TinyHotHeap v2 HotBox redesign and C7 current_page policy fixes
- Redefine TinyHotHeap v2 as per-thread Hot Box with clear boundaries
- Add comprehensive OS statistics tracking for SS allocations
- Implement route-based free handling for TinyHeap v2
- Add C6/C7 debugging and statistics improvements
- Update documentation with implementation guidelines and analysis
- Add new box headers for stats, routing, and front-end management
2025-12-08 21:30:21 +09:00