fe70e3baf5
Phase MID-V35-HOTPATH-OPT-1 complete: +7.3% on C6-heavy
...
Step 0: Geometry SSOT
- New: core/box/smallobject_mid_v35_geom_box.h (L1/L2 consistency)
- Fix: C6 slots/page 102→128 in L2 (smallobject_cold_iface_mid_v3.c)
- Applied: smallobject_mid_v35.c, smallobject_segment_mid_v3.c
Step 1-3: ENV gates for hotpath optimizations
- New: core/box/mid_v35_hotpath_env_box.h
* HAKMEM_MID_V35_HEADER_PREFILL (default 0)
* HAKMEM_MID_V35_HOT_COUNTS (default 1)
* HAKMEM_MID_V35_C6_FASTPATH (default 0)
- Implementation: smallobject_mid_v35.c
* Header prefill at refill boundary (Step 1)
* Gated alloc_count++ in hot path (Step 2)
* C6 specialized fast path with constant slot_size (Step 3)
A/B Results:
C6-heavy (257–768B): 8.75M→9.39M ops/s (+7.3%, 5-run mean) ✅
Mixed (16–1024B): 9.98M→9.96M ops/s (-0.2%, within noise) ✓
Decision: FROZEN - defaults OFF, C6-heavy推奨ON, Mixed現状維持
Documentation: ENV_PROFILE_PRESETS.md updated
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-12 19:19:25 +09:00
e95e61f0ff
Phase POLICY-FAST-PATH-V2 complete + MID-V35-HOTPATH-OPT-1 design
...
## Phase POLICY-FAST-PATH-V2 (FROZEN)
- Implementation complete: free_policy_fast_v2_box.h + malloc_tiny_fast.h integration
- A/B Results:
- Mixed (ws=400): -1.6% regression ❌ (branch cost > skip benefit)
- C6-heavy (ws=200): +5.4% improvement ✅
- Decision: Default OFF, FROZEN (ws<300 / C6-heavy research only)
- Learning: Large WS causes branch misprediction to dominate
## Phase 3-GRADUATE + ENV probe fix
- 64-probe retry for getenv() stability during bench_profile putenv()
- C6 ULTRA intrusive freelist: FROZEN (research box)
## Phase MID-V35-HOTPATH-OPT-1-DESIGN
- Design doc for next optimization target
- Target: MID v3.5 alloc/free hot path (C5-C6)
- Boxes: Stats Gate, TLS Layout, Boundary Check elimination
- Expected: +3-9% on Mixed mainline
Files:
- core/box/free_policy_fast_v2_box.h (new)
- core/box/free_path_stats_box.h/c (policy_fast_v2_skip counter)
- core/front/malloc_tiny_fast.h (fast-path integration)
- docs/analysis/MID_V35_HOTPATH_OPT_1_DESIGN.md (new)
- docs/analysis/PHASE_3_GRADUATE_*.md (new)
- CURRENT_TASK.md (phase status update)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-12-12 18:40:08 +09:00
1a8652a91a
Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)
...
Implement C6 ULTRA intrusive LIFO freelist with ENV gating:
- Single-linked LIFO using next pointer at USER+1 offset
- tiny_next_store/tiny_next_load for pointer access (single source of truth)
- Segment learning via ss_fast_lookup (per-class seg_base/seg_end)
- ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF)
- Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS
Files:
- core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO
- core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6)
- core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new)
- core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new)
- core/tiny_debug_ring.h: C6_IFL events
- core/box/free_path_stats_box.h/c: c6_ifl_* counters
A/B Test Results (1M iterations, ws=200, 257-512B):
- ENV_OFF (array): 56.6 Mop/s avg
- ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise)
- Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-12 16:26:42 +09:00
397aea0131
Phase v10: Freeze v7 as C5/C6-only research preset
...
Documentation: Baseline fixed per Phase v10
- HAKMEM_V2_GENERATION_SUMMARY.md:
- v7 repositioned as 「C5/C6 專用研究箱」
- Mixed baseline: HAKMEM_SMALL_HEAP_V7_ENABLED=0 (OFF)
- Added Phase v7-7 (Learner), Phase v10 (legacy removal)
- Learner performance: +127% on C5/C6 workload
- Size class table: segregated Mixed (v7 OFF) vs C5/C6 preset (v7 ON)
- ENV_PROFILE_PRESETS.md:
- MIXED_TINYV3_C7_SAFE: explicitly v7 OFF (Mixed baseline)
- NEW: C5_C6_SMALL_HEAP_V7_LEARNER profile
- Learner dynamic route switching documentation
- Test commands and expected performance (38-39M ops/s)
- Phase v10 deprecation notice (v3/v4/v5 removed)
Purpose:
- Set clear baseline: v7 OFF for Mixed, ON for C5/C6 benchmarks
- Document Learner preset for future reference
- No code changes (docs-only checkpoint)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-12 06:13:15 +09:00
39a3c53dbc
Phase v7-2: SmallObject v7 C6-only implementation with RegionIdBox integration
...
- SmallSegment_v7: 2MiB segment with TLS slot and free page stack
- ColdIface_v7: Page refill/retire between HotBox and SegmentBox
- HotBox_v7: Full C6-only alloc/free with header writing (HEADER_MAGIC|class_idx)
- Free path early-exit: Check v7 route BEFORE ss_fast_lookup (separate mmap segment)
- RegionIdBox: Register v7 segment for ptr->region lookup
- Benchmark: v7 ON ~54.5M ops/s (-7% overhead vs 58.6M legacy baseline)
v7 correctly balances alloc/free counts and page lifecycle.
RegionIdBox overhead identified as primary cost driver.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-12 03:12:28 +09:00
9c24bebf08
Phase v5-1: SmallObject v5 C6-only route stub 接続
...
- tiny_route_env_box.h: TINY_ROUTE_SMALL_HEAP_V5 enum 追加、route snapshot で C6→v5 分岐
- malloc_tiny_fast.h: alloc/free switch に v5 case 追加(v1/pool fallback)
- smallobject_hotbox_v5.c: stub 実装(alloc は NULL 返却、free は no-op)
- smallobject_hotbox_v5_box.h: 関数 signature に ctx パラメータ追加
- Makefile: core/smallobject_hotbox_v5.o をリンクリストに追加
- ENV_PROFILE_PRESETS.md: v5-1 プリセット追記
- CURRENT_TASK.md: Phase v5-1 完了記録
**特性**:
- ENV: HAKMEM_SMALL_HEAP_V5_ENABLED=1 / HAKMEM_SMALL_HEAP_V5_CLASSES=0x40 で opt-in
- テスト結果: C6-heavy (v5 OFF 15.5M → v5 ON 16.4M ops/s, 正常), Mixed 47.2M ops/s, SEGV/assert なし
- 挙動は v1/pool fallback と同じ(実装は v5-2)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-11 03:25:37 +09:00
3b4449d773
Phase v4-mid-1: C6-only v4 route + page_meta_of() Fail-Fast validation
...
Implementation:
- SMALL_SEGMENT_V4_* constants (SIZE=2MiB, PAGE_SIZE=64KiB, MAGIC=0xDEADBEEF)
- smallsegment_v4_page_meta_of(): O(1) mask+shift lookup with magic validation
- Computes segment base: addr & ~(2MiB - 1)
- Verifies SmallSegment magic number
- Calculates page_idx: (addr - seg_base) >> PAGE_SHIFT (16)
- Returns non-NULL sentinel for now (full page_meta[] in Phase v4-mid-2)
Stubs for C6-only phase:
- small_heap_alloc_fast_v4(): C6 returns NULL → pool v1 fallback
- small_heap_free_fast_v4(): C6 calls page_meta_of() for Fail-Fast, then pool v1 fallback
Documentation:
- ENV_PROFILE_PRESETS.md: Add "C6_ONLY_SMALLOBJECT_V4" research profile
- HAKMEM_SMALL_HEAP_V4_ENABLED=1, HAKMEM_SMALL_HEAP_V4_CLASSES=0x40
- Expected: Throughput ≈ 28–29M ops/s (same as v1)
Build:
- ビルド成功(警告のみ)
- Backward compatible, alloc/free stubs fall back to pool v1
Sanity:
- C6-heavy with v4 opt-in: segv/assert なし
- page_meta_of() lookup working correctly
- Performance unchanged (expected for stub phase)
Status:
- C6-only v4 route now available via ENV opt-in
- Phase v4-mid-2: SmallHeapCtx v4 full implementation with A/B
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-10 23:37:45 +09:00
2a13478dc7
Optimize C6 heavy and C7 ultra performance analysis with refined design refinements
...
- Update environment profile presets and visibility analysis
- Enhance small object and tiny segment v4 box implementations
- Refine C7 ultra and C6 heavy allocation strategies
- Add comprehensive performance metrics and design documentation
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-10 22:57:26 +09:00
d576116484
Document current Mixed baseline throughput and ENV profile
2025-12-10 14:12:13 +09:00
406a2f4d26
Incremental improvements: mid_desc cache, pool hotpath optimization, and doc updates
...
**Changes:**
- core/box/pool_api.inc.h: Code organization and micro-optimizations
- CURRENT_TASK.md: Updated Phase MD1 (mid_desc TLS cache: +3.2% for C6-heavy)
- docs/analysis files: Various analysis and documentation updates
- AGENTS.md: Agent role clarifications
- TINY_FRONT_V3_FLATTENING_GUIDE.md: Flattening strategy documentation
**Verification:**
- random_mixed_hakmem: 44.8M ops/s (1M iterations, 400 working set)
- No segfaults or assertions across all benchmark variants
- Stable performance across multiple runs
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-10 14:00:57 +09:00
acc64f2438
Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement)
...
## Summary
- ChatGPT により bench_profile.h の setenv segfault を修正(RTLD_NEXT 経由に切り替え)
- core/box/pool_zero_mode_box.h 新設:ENV キャッシュ経由で ZERO_MODE を統一管理
- core/hakmem_pool.c で zero mode に応じた memset 制御(FULL/header/off)
- A/B テスト結果:ZERO_MODE=header で +15.34% improvement(1M iterations, C6-heavy)
## Files Modified
- core/box/pool_api.inc.h: pool_zero_mode_box.h include
- core/bench_profile.h: glibc setenv → malloc+putenv(segfault 回避)
- core/hakmem_pool.c: zero mode 参照・制御ロジック
- core/box/pool_zero_mode_box.h (新設): enum/getter
- CURRENT_TASK.md: Phase ML1 結果記載
## Test Results
| Iterations | ZERO_MODE=full | ZERO_MODE=header | Improvement |
|-----------|----------------|-----------------|------------|
| 10K | 3.06 M ops/s | 3.17 M ops/s | +3.65% |
| 1M | 23.71 M ops/s | 27.34 M ops/s | **+15.34%** |
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-10 09:08:18 +09:00