Commit Graph

7 Commits

Author SHA1 Message Date
acc64f2438 Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement)
## Summary
- ChatGPT により bench_profile.h の setenv segfault を修正(RTLD_NEXT 経由に切り替え)
- core/box/pool_zero_mode_box.h 新設:ENV キャッシュ経由で ZERO_MODE を統一管理
- core/hakmem_pool.c で zero mode に応じた memset 制御(FULL/header/off)
- A/B テスト結果:ZERO_MODE=header で +15.34% improvement(1M iterations, C6-heavy)

## Files Modified
- core/box/pool_api.inc.h: pool_zero_mode_box.h include
- core/bench_profile.h: glibc setenv → malloc+putenv(segfault 回避)
- core/hakmem_pool.c: zero mode 参照・制御ロジック
- core/box/pool_zero_mode_box.h (新設): enum/getter
- CURRENT_TASK.md: Phase ML1 結果記載

## Test Results
| Iterations | ZERO_MODE=full | ZERO_MODE=header | Improvement |
|-----------|----------------|-----------------|------------|
| 10K       | 3.06 M ops/s   | 3.17 M ops/s    | +3.65%     |
| 1M        | 23.71 M ops/s  | 27.34 M ops/s   | **+15.34%** |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 09:08:18 +09:00
d17ec46628 Fix C7 warm/TLS Release path and unify debug instrumentation 2025-12-05 23:41:01 +09:00
6e2552e654 Bugfix: Add Header Box and fix Class 0/7 header handling (crash rate -50%)
Root Cause Analysis:
- tls_sll_box.h had hardcoded `class_idx != 7` checks
- This incorrectly assumed only C7 uses offset=0
- But C0 (8B) also uses offset=0 (header overwritten by next pointer)
- Result: C0 blocks had corrupted headers in TLS SLL → crash

Architecture Fix: Header Box (Single Source of Truth)
- Created core/box/tiny_header_box.h
- Encapsulates "which classes preserve headers" logic
- Delegates to tiny_nextptr.h (0x7E bitmask: C0=0, C1-C6=1, C7=0)
- API:
  * tiny_class_preserves_header() - C1-C6 only
  * tiny_header_write_if_preserved() - Conditional write
  * tiny_header_validate() - Conditional validation
  * tiny_header_write_for_alloc() - Unconditional (alloc path)

Bug Fixes (6 locations):
- tls_sll_box.h:366 - push header restore (C1-C6 only; skip C0/C7)
- tls_sll_box.h:560 - pop header validate (C1-C6 only; skip C0/C7)
- tls_sll_box.h:700 - splice header restore head (C1-C6 only)
- tls_sll_box.h:722 - splice header restore next (C1-C6 only)
- carve_push_box.c:198 - freelist→TLS SLL header restore
- hakmem_tiny_free.inc:78 - drain freelist header restore

Impact:
- Before: 23.8% crash rate (bench_random_mixed_hakmem)
- After: 12% crash rate
- Improvement: 49.6% reduction in crashes
- Test: 88/100 runs successful (vs 76/100 before)

Design Principles:
- Eliminates hardcoded class_idx checks (class_idx != 7)
- Single Source of Truth (tiny_nextptr.h → Header Box)
- Type-safe API prevents future bugs
- Future: Add lint to forbid direct header manipulation

Remaining Work:
- 12% crash rate still exists (likely different root cause)
- Next: Investigate with core dump analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 07:57:49 +09:00
3c6c76cb11 Fix: Restore headers in box_carve_and_push_with_freelist()
Root cause identified by Task exploration agent:
- box_carve_and_push_with_freelist() pops blocks from slab
  freelist without restoring headers before pushing to TLS SLL
- Freelist blocks have stale data at offset 0
- When popped from TLS SLL, header validation fails
- Error: [TLS_SLL_HDR_RESET] cls=1 got=0x00 expect=0xa1

Fix applied:
1. Added HEADER_MAGIC restoration before tls_sll_push()
   in box_carve_and_push_with_freelist() (carve_push_box.c:193-198)
2. Added tiny_region_id.h include for HEADER_MAGIC definition

Results:
- 20 threads: Header corruption ELIMINATED ✓
- 4 threads: Still shows 1 corruption (partial fix)
- Suggests multiple freelist pop paths exist

Additional work needed:
- Check hakmem_tiny_alloc_new.inc freelist pops
- Verify all freelist → TLS SLL paths write headers

Reference:
Same pattern as tiny_superslab_alloc.inc.h:159-169 (correct impl)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 05:44:13 +09:00
3d341a8b3f Fix: TLS SLL double-free diagnostics - Add error handling and detection improvements
Problem:
workset=8192 crashes at 240K iterations with TLS SLL double-free:
[TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... already in SLL

Investigation (Task agent):
Identified 8 tls_sll_push() call sites and 3 high-risk areas:
1. HIGH: Carve-Push Rollback pop failures (carve_push_box.c)
2. MEDIUM: Splice partial orphaned nodes (tiny_refill_opt.h)
3. MEDIUM: Incomplete double-free scan - only 64 nodes (tls_sll_box.h)

Fixes Applied:

1. core/box/carve_push_box.c (Lines 115-139)
   - Track pop_failed count during rollback
   - Log orphaned blocks: [BOX_CARVE_PUSH_ROLLBACK] warning
   - Helps identify when rollback leaves blocks in SLL

2. core/box/tls_sll_box.h (Lines 347-370)
   - Increase double-free scan: 64 → 256 nodes
   - Add scanned count to error: (scanned=%u/%u)
   - Catches orphaned blocks deeper in chain

3. core/tiny_refill_opt.h (Lines 135-166)
   - Enhanced splice partial logging
   - Abort in debug builds on orphaned nodes
   - Prevents silent memory leaks

Test Results:
Before: SEGV at 220K iterations
After:  SEGV at 240K iterations (improved detection)
        [TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... (scanned=2/71)

Impact:
 Early detection working (catches at position 2)
 Diagnostic capability greatly improved
⚠️  Root cause not yet resolved (deeper investigation needed)

Status: Diagnostic improvements committed for further analysis

Credit: Root cause analysis by Task agent (Explore)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 08:43:18 +09:00
9b0d746407 Phase 3d-B: TLS Cache Merge - Unified g_tls_sll[] structure (+12-18% expected)
Merge separate g_tls_sll_head[] and g_tls_sll_count[] arrays into unified
TinyTLSSLL struct to improve L1D cache locality. Expected performance gain:
+12-18% from reducing cache line splits (2 loads → 1 load per operation).

Changes:
- core/hakmem_tiny.h: Add TinyTLSSLL type (16B aligned, head+count+pad)
- core/hakmem_tiny.c: Replace separate arrays with g_tls_sll[8]
- core/box/tls_sll_box.h: Update Box API (13 sites) for unified access
- Updated 32+ files: All g_tls_sll_head[i] → g_tls_sll[i].head
- Updated 32+ files: All g_tls_sll_count[i] → g_tls_sll[i].count
- core/hakmem_tiny_integrity.h: Unified canary guards
- core/box/integrity_box.c: Simplified canary validation
- Makefile: Added core/box/tiny_sizeclass_hist_box.o to link

Build:  PASS (10K ops sanity test)
Warnings: Only pre-existing LTO type mismatches (unrelated)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 07:32:30 +09:00
c7616fd161 Box API Phase 1-3: Capacity Manager, Carve-Push, Prewarm 実装
Priority 1-3のBox Modulesを実装し、安全なpre-warming APIを提供。
既存の複雑なprewarmコードを1行のBox API呼び出しに置き換え。

## 新規Box Modules

1. **Box Capacity Manager** (capacity_box.h/c)
   - TLS SLL容量の一元管理
   - adaptive_sizing初期化保証
   - Double-free バグ防止

2. **Box Carve-And-Push** (carve_push_box.h/c)
   - アトミックなblock carve + TLS SLL push
   - All-or-nothing semantics
   - Rollback保証(partial failure防止)

3. **Box Prewarm** (prewarm_box.h/c)
   - 安全なTLS cache pre-warming
   - 初期化依存性を隠蔽
   - シンプルなAPI (1関数呼び出し)

## コード簡略化

hakmem_tiny_init.inc: 20行 → 1行
```c
// BEFORE: 複雑なP0分岐とエラー処理
adaptive_sizing_init();
if (prewarm > 0) {
    #if HAKMEM_TINY_P0_BATCH_REFILL
        int taken = sll_refill_batch_from_ss(5, prewarm);
    #else
        int taken = sll_refill_small_from_ss(5, prewarm);
    #endif
}

// AFTER: Box API 1行
int taken = box_prewarm_tls(5, prewarm);
```

## シンボルExport修正

hakmem_tiny.c: 5つのシンボルをstatic → non-static
- g_tls_slabs[] (TLS slab配列)
- g_sll_multiplier (SLL容量乗数)
- g_sll_cap_override[] (容量オーバーライド)
- superslab_refill() (SuperSlab再充填)
- ss_active_add() (アクティブカウンタ)

## ビルドシステム

Makefile: TINY_BENCH_OBJS_BASEに3つのBox modules追加
- core/box/capacity_box.o
- core/box/carve_push_box.o
- core/box/prewarm_box.o

## 動作確認

 Debug build成功
 Box Prewarm API動作確認
   [PREWARM] class=5 requested=128 taken=32

## 次のステップ

- Box Refill Manager (Priority 4)
- Box SuperSlab Allocator (Priority 5)
- Release build修正(tiny_debug_ring_record)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 01:45:30 +09:00