b7021061b8
Fix: CRITICAL double-allocation bug in trc_linear_carve()
...
Root Cause:
trc_linear_carve() used meta->used as cursor, but meta->used decrements
on free, causing already-allocated blocks to be re-carved.
Evidence:
- [LINEAR_CARVE] used=61 batch=1 → block 61 created
- (blocks freed, used decrements 62→59)
- [LINEAR_CARVE] used=59 batch=3 → blocks 59,60,61 RE-CREATED!
- Result: double-allocation → memory corruption → SEGV
Fix Implementation:
1. Added TinySlabMeta.carved (monotonic counter, never decrements)
2. Changed trc_linear_carve() to use carved instead of used
3. carved tracks carve progress, used tracks active count
Files Modified:
- core/superslab/superslab_types.h: Add carved field
- core/tiny_refill_opt.h: Use carved in trc_linear_carve()
- core/hakmem_tiny_superslab.c: Initialize carved=0
- core/tiny_alloc_fast.inc.h: Add next pointer validation
- core/hakmem_tiny_free.inc: Add drain/free validation
Test Results:
✅ bench_random_mixed: 950,037 ops/s (no crash)
✅ Fail-fast mode: 651,627 ops/s (with diagnostic logs)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-08 01:18:37 +09:00
d2f0d84584
Phase 6-2.5: Fix SuperSlab alignment bug + refactor constants
...
## Problem: 53-byte misalignment mystery
**Symptom:** All SuperSlab allocations misaligned by exactly 53 bytes
```
[TRC_FAILFAST_PTR] stage=alloc_ret_align cls=7 ptr=0x..f835
offset=63541 (expected: 63488)
Diff: 63541 - 63488 = 53 bytes
```
## Root Cause (Ultrathink investigation)
**sizeof(SuperSlab) != hardcoded offset:**
- `sizeof(SuperSlab)` = 1088 bytes (actual struct size)
- `tiny_slab_base_for()` used: 1024 (hardcoded)
- `superslab_init_slab()` assumed: 2048 (in capacity calc)
**Impact:**
1. Memory corruption: 64-byte overlap with SuperSlab metadata
2. Misalignment: 1088 % 1024 = 64 (violates class 7 alignment)
3. Inconsistency: Init assumed 2048, but runtime used 1024
## Solution
### 1. Centralize constants (NEW)
**File:** `core/hakmem_tiny_superslab_constants.h`
- `SLAB_SIZE` = 64KB
- `SUPERSLAB_HEADER_SIZE` = 1088
- `SUPERSLAB_SLAB0_DATA_OFFSET` = 2048 (aligned to 1024)
- `SUPERSLAB_SLAB0_USABLE_SIZE` = 63488 (64KB - 2048)
- Compile-time validation checks
**Why 2048?**
- Round up 1088 to next 1024-byte boundary
- Ensures proper alignment for class 7 (1024-byte blocks)
- Previous: (1088 + 1023) & ~1023 = 2048
### 2. Update all code to use constants
- `hakmem_tiny_superslab.h`: `tiny_slab_base_for()` → use `SUPERSLAB_SLAB0_DATA_OFFSET`
- `hakmem_tiny_superslab.c`: `superslab_init_slab()` → use `SUPERSLAB_SLAB0_USABLE_SIZE`
- Removed hardcoded 1024, 2048 magic numbers
### 3. Add class consistency check
**File:** `core/tiny_superslab_alloc.inc.h:433-449`
- Verify `tls->ss->size_class == class_idx` before allocation
- Unbind TLS if mismatch detected
- Prevents using wrong block_size for calculations
## Status
⚠️ **INCOMPLETE - New issue discovered**
After fix, benchmark hits different error:
```
[TRC_FAILFAST] stage=freelist_next cls=7 node=0x...d474
```
Freelist corruption detected. Likely caused by:
- 2048 offset change affects free() path
- Block addresses no longer match freelist expectations
- Needs further investigation
## Files Modified
- `core/hakmem_tiny_superslab_constants.h` - NEW: Centralized constants
- `core/hakmem_tiny_superslab.h` - Use SUPERSLAB_SLAB0_DATA_OFFSET
- `core/hakmem_tiny_superslab.c` - Use SUPERSLAB_SLAB0_USABLE_SIZE
- `core/tiny_superslab_alloc.inc.h` - Add class consistency check
- `core/hakmem_tiny_init.inc` - Remove diet mode override (Phase 6-2.5)
- `core/hakmem_super_registry.h` - Remove debug output (cleaned)
- `PERFORMANCE_INVESTIGATION_REPORT.md` - Task agent analysis
## Next Steps
1. Investigate freelist corruption with 2048 offset
2. Verify free() path uses tiny_slab_base_for() correctly
3. Consider reverting to 1024 and fixing capacity calculation instead
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-07 21:45:20 +09:00
25a81713b4
Fix: Move g_hakmem_lock_depth++ to function start (27% → 70% success)
...
**Problem**: After previous fixes, 4T Larson success rate dropped 27% (4/15)
**Root Cause**:
In `log_superslab_oom_once()`, `g_hakmem_lock_depth++` was placed AFTER
`getrlimit()` call. However, the function was already called from within
malloc wrapper context where `g_hakmem_lock_depth = 1`.
When `getrlimit()` or other LIBC functions call `malloc()` internally,
they enter the wrapper with lock_depth=1, but the increment to 2 hasn't
happened yet, so getenv() in wrapper can trigger recursion.
**Fix**:
Move `g_hakmem_lock_depth++` to the VERY FIRST line after early return check.
This ensures ALL subsequent LIBC calls (getrlimit, fopen, fclose, fprintf)
bypass HAKMEM wrapper.
**Result**: 4T Larson success rate improved 27% → 70% (14/20 runs) ✅
+43% improvement, but 30% crash rate remains (continuing investigation)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-07 03:03:07 +09:00
77ed72fcf6
Fix: LIBC/HAKMEM mixed allocation crashes (0% → 80% success)
...
**Problem**: 4T Larson crashed 100% due to "free(): invalid pointer"
**Root Causes** (6 bugs found via Task Agent ultrathink):
1. **Invalid magic fallback** (`hak_free_api.inc.h:87`)
- When `hdr->magic != HAKMEM_MAGIC`, ptr came from LIBC (no header)
- Was calling `free(raw)` where `raw = ptr - HEADER_SIZE` (garbage!)
- Fixed: Use `__libc_free(ptr)` instead
2. **BigCache eviction** (`hakmem.c:230`)
- Same issue: invalid magic means LIBC allocation
- Fixed: Use `__libc_free(ptr)` directly
3. **Malloc wrapper recursion** (`hakmem_internal.h:209`)
- `hak_alloc_malloc_impl()` called `malloc()` → wrapper recursion
- Fixed: Use `__libc_malloc()` directly
4. **ALLOC_METHOD_MALLOC free** (`hak_free_api.inc.h:106`)
- Was calling `free(raw)` → wrapper recursion
- Fixed: Use `__libc_free(raw)` directly
5. **fopen/fclose crash** (`hakmem_tiny_superslab.c:131`)
- `log_superslab_oom_once()` used `fopen()` → FILE buffer via wrapper
- `fclose()` calls `__libc_free()` on HAKMEM-allocated buffer → crash
- Fixed: Wrap with `g_hakmem_lock_depth++/--` to force LIBC path
6. **g_hakmem_lock_depth visibility** (`hakmem.c:163`)
- Was `static`, needed by hakmem_tiny_superslab.c
- Fixed: Remove `static` keyword
**Result**: 4T Larson success rate improved 0% → 80% (8/10 runs) ✅
**Remaining**: 20% crash rate still needs investigation
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-07 02:48:20 +09:00
1da8754d45
CRITICAL FIX: TLS 未初期化による 4T SEGV を完全解消
...
**問題:**
- Larson 4T で 100% SEGV (1T は 2.09M ops/s で完走)
- System/mimalloc は 4T で 33.52M ops/s 正常動作
- SS OFF + Remote OFF でも 4T で SEGV
**根本原因: (Task agent ultrathink 調査結果)**
```
CRASH: mov (%r15),%r13
R15 = 0x6261 ← ASCII "ba" (ゴミ値、未初期化TLS)
```
Worker スレッドの TLS 変数が未初期化:
- `__thread void* g_tls_sll_head[TINY_NUM_CLASSES];` ← 初期化なし
- pthread_create() で生成されたスレッドでゼロ初期化されない
- NULL チェックが通過 (0x6261 != NULL) → dereference → SEGV
**修正内容:**
全 TLS 配列に明示的初期化子 `= {0}` を追加:
1. **core/hakmem_tiny.c:**
- `g_tls_sll_head[TINY_NUM_CLASSES] = {0}`
- `g_tls_sll_count[TINY_NUM_CLASSES] = {0}`
- `g_tls_live_ss[TINY_NUM_CLASSES] = {0}`
- `g_tls_bcur[TINY_NUM_CLASSES] = {0}`
- `g_tls_bend[TINY_NUM_CLASSES] = {0}`
2. **core/tiny_fastcache.c:**
- `g_tiny_fast_cache[TINY_FAST_CLASS_COUNT] = {0}`
- `g_tiny_fast_count[TINY_FAST_CLASS_COUNT] = {0}`
- `g_tiny_fast_free_head[TINY_FAST_CLASS_COUNT] = {0}`
- `g_tiny_fast_free_count[TINY_FAST_CLASS_COUNT] = {0}`
3. **core/hakmem_tiny_magazine.c:**
- `g_tls_mags[TINY_NUM_CLASSES] = {0}`
4. **core/tiny_sticky.c:**
- `g_tls_sticky_ss[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}`
- `g_tls_sticky_idx[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}`
- `g_tls_sticky_pos[TINY_NUM_CLASSES] = {0}`
**効果:**
```
Before: 1T: 2.09M ✅ | 4T: SEGV 💀
After: 1T: 2.41M ✅ | 4T: 4.19M ✅ (+15% 1T, SEGV解消)
```
**テスト:**
```bash
# 1 thread: 完走
./larson_hakmem 2 8 128 1024 1 12345 1
→ Throughput = 2,407,597 ops/s ✅
# 4 threads: 完走(以前は SEGV)
./larson_hakmem 2 8 128 1024 1 12345 4
→ Throughput = 4,192,155 ops/s ✅
```
**調査協力:** Task agent (ultrathink mode) による完璧な根本原因特定
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-07 01:27:04 +09:00
cd6507468e
Fix critical SuperSlab accounting bug + ACE improvements
...
Critical Bug Fix (OOM Root Cause):
- ss_remote_push() was missing ss_active_dec_one() call
- Cross-thread frees did not decrement total_active_blocks
- SuperSlabs appeared "full" even when empty
- hak_tiny_trim() could never free SuperSlabs → OOM
- Result: alloc=49,123 freed=0 bytes=103GB
One-Line Fix (core/hakmem_tiny_superslab.h:360):
+ ss_active_dec_one(ss); // Decrement on cross-thread free
Impact:
- OOM eliminated (167GB VmSize → clean exit)
- SuperSlabs now properly freed
- Performance maintained: 4.19M ops/s (±0%)
- Memory leak fixed (freed: 0 → expected ~45,000+)
ACE Improvements:
- Set SUPERSLAB_LG_DEFAULT = 21 (2MB, was 1MB)
- g_ss_min_lg_env now uses SUPERSLAB_LG_DEFAULT
- hak_tiny_superslab_next_lg() fallback to default if uninitialized
- Centralized ACE constants in .h for easier tuning
Verification:
- Larson benchmark: Clean completion, no OOM
- Throughput: 4,192,124 ops/s (baseline maintained)
Root cause analysis by Task agent: Larson 50%+ cross-thread frees
triggered accounting leak, preventing SuperSlab reclamation.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-06 22:26:58 +09:00
52386401b3
Debug Counters Implementation - Clean History
...
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation
Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files
Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)
This is a clean repository without large log files.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-05 12:31:14 +09:00