8b67718bf2
Fix C7 TLS SLL corruption: Protect next pointer from user data overwrites
...
## Root Cause
C7 (1024B allocations, 2048B stride) was using offset=1 for freelist next
pointers, storing them at `base[1..8]`. Since user pointer is `base+1`, users
could overwrite the next pointer area, corrupting the TLS SLL freelist.
## The Bug Sequence
1. Block freed → TLS SLL push stores next at `base[1..8]`
2. Block allocated → User gets `base+1`, can modify `base[1..2047]`
3. User writes data → Overwrites `base[1..8]` (next pointer area!)
4. Block freed again → tiny_next_load() reads garbage from `base[1..8]`
5. TLS SLL head becomes invalid (0xfe, 0xdb, 0x58, etc.)
## Why This Was Reverted
Previous fix (C7 offset=0) was reverted with comment:
"C7も header を保持して class 判別を壊さないことを優先"
(Prioritize preserving C7 header to avoid breaking class identification)
This reasoning was FLAWED because:
- Header IS restored during allocation (HAK_RET_ALLOC), not freelist ops
- Class identification at free time reads from ptr-1 = base[0] (after restoration)
- During freelist, header CAN be sacrificed (not visible to user)
- The revert CREATED the race condition by exposing base[1..8] to user
## Fix Applied
### 1. Revert C7 offset to 0 (tiny_nextptr.h:54)
```c
// BEFORE (BROKEN):
return (class_idx == 0) ? 0u : 1u;
// AFTER (FIXED):
return (class_idx == 0 || class_idx == 7) ? 0u : 1u;
```
### 2. Remove C7 header restoration in freelist (tiny_nextptr.h:84)
```c
// BEFORE (BROKEN):
if (class_idx != 0) { // Restores header for all classes including C7
// AFTER (FIXED):
if (class_idx != 0 && class_idx != 7) { // Only C1-C6 restore headers
```
### 3. Bonus: Remove premature slab release (tls_sll_drain_box.h:182-189)
Removed `shared_pool_release_slab()` call from drain path that could cause
use-after-free when blocks from same slab remain in TLS SLL.
## Why This Fix Works
**Memory Layout** (C7 in freelist):
```
Address: base base+1 base+2048
┌────┬──────────────────────┐
Content: │next│ (user accessible) │
└────┴──────────────────────┘
8B ptr ← USER CANNOT TOUCH base[0]
```
- **Next pointer at base[0]**: Protected from user modification ✓
- **User pointer at base+1**: User sees base[1..2047] only ✓
- **Header restored during allocation**: HAK_RET_ALLOC writes 0xa7 at base[0] ✓
- **Class ID preserved**: tiny_region_id_read_header(ptr) reads ptr-1 = base[0] ✓
## Verification Results
### Before Fix
- **Errors**: 33 TLS_SLL_POP_INVALID per 100K iterations (0.033%)
- **Performance**: 1.8M ops/s (corruption caused slow path fallback)
- **Symptoms**: Invalid TLS SLL heads (0xfe, 0xdb, 0x58, 0x80, 0xc2, etc.)
### After Fix
- **Errors**: 0 per 200K iterations ✅
- **Performance**: 10.0M ops/s (+456%!) ✅
- **C7 direct test**: 5.5M ops/s, 100K iterations, 0 errors ✅
## Files Modified
- core/tiny_nextptr.h (lines 49-54, 82-84) - C7 offset=0, no header restoration
- core/box/tls_sll_drain_box.h (lines 182-189) - Remove premature slab release
## Architectural Lesson
**Design Principle**: Freelist metadata MUST be stored in memory NOT accessible to user.
| Class | Offset | Next Storage | User Access | Result |
|-------|--------|--------------|-------------|--------|
| C0 | 0 | base[0] | base[1..7] | Safe ✓ |
| C1-C6 | 1 | base[1..8] | base[1..N] | Safe (header at base[0]) ✓ |
| C7 (broken) | 1 | base[1..8] | base[1..2047] | **CORRUPTED** ✗ |
| C7 (fixed) | 0 | base[0] | base[1..2047] | Safe ✓ |
🧹 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-21 23:42:43 +09:00
9b0d746407
Phase 3d-B: TLS Cache Merge - Unified g_tls_sll[] structure (+12-18% expected)
...
Merge separate g_tls_sll_head[] and g_tls_sll_count[] arrays into unified
TinyTLSSLL struct to improve L1D cache locality. Expected performance gain:
+12-18% from reducing cache line splits (2 loads → 1 load per operation).
Changes:
- core/hakmem_tiny.h: Add TinyTLSSLL type (16B aligned, head+count+pad)
- core/hakmem_tiny.c: Replace separate arrays with g_tls_sll[8]
- core/box/tls_sll_box.h: Update Box API (13 sites) for unified access
- Updated 32+ files: All g_tls_sll_head[i] → g_tls_sll[i].head
- Updated 32+ files: All g_tls_sll_count[i] → g_tls_sll[i].count
- core/hakmem_tiny_integrity.h: Unified canary guards
- core/box/integrity_box.c: Simplified canary validation
- Makefile: Added core/box/tiny_sizeclass_hist_box.o to link
Build: ✅ PASS (10K ops sanity test)
Warnings: Only pre-existing LTO type mismatches (unrelated)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-20 07:32:30 +09:00
82ba74933a
Tiny Step 2: drain interval optimization (default 1024→2048)
...
Completed A/B testing for TLS SLL drain interval and implemented
optimal default value based on empirical results.
Changes:
- core/box/tls_sll_drain_box.h: Default drain interval 1024 → 2048
- TINY_DRAIN_INTERVAL_AB_REPORT.md: Complete A/B analysis report
Results (100K iterations):
- 256B: 7.68M ops/s (+4.9% vs baseline 7.32M)
- 128B: 8.76M ops/s (+13.6% vs baseline 7.71M)
- Syscalls: Unchanged (2410) - drain affects frontend only
Key Findings:
- Size-dependent optimal intervals discovered (128B→512, 256B→2048)
- Prioritized 256B critical path (classify_ptr 3.65% in perf profile)
- No regression observed; both classes improved
Methodology:
- ENV-only testing (no code changes during A/B)
- Tested intervals: 512, 1024 (baseline), 2048
- Workload: bench_random_mixed_hakmem
- Metrics: Throughput, syscall count (strace -c)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-14 17:41:26 +09:00
dd613bc93a
Drain optimization: Drain ALL blocks to maximize empty detection
...
Issue:
- Previous drain: only 32 blocks/trigger → slabs partially empty
- Shared pool SuperSlabs mix multiple classes (C0-C7)
- active_slabs only reaches 0 when ALL classes empty
- Result: superslab_free() rarely called, LRU cache unused
Fix:
- Change drain batch_size: 32 → 0 (drain all available)
- Added active_slabs logging in shared_pool_release_slab
- Maximizes chance of SuperSlab becoming completely empty
Performance Impact (ws=4096, 200K iterations):
- Before (batch=32): 5.9M ops/s
- After (batch=all): 6.1M ops/s (+3.4%)
- Baseline improvement: 563K → 6.1M ops/s (+980%!)
Known Issue:
- LRU cache still unused due to Shared Pool design
- SuperSlabs rarely become completely empty (multi-class mixing)
- Requires Shared Pool architecture optimization (Phase 12)
Next: Investigate Shared Pool optimization strategies
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-14 07:55:51 +09:00
4ffdaae2fc
Add empty slab detection to drain: call shared_pool_release_slab
...
Issue:
- Drain was detecting meta->used==0 but not releasing slabs
- Logic missing: shared_pool_release_slab() call after empty detection
- Result: SuperSlabs not freed, LRU cache not populated
Fix:
- Added shared_pool_release_slab() call when meta->used==0 (line 194)
- Mirrors logic in tiny_superslab_free.inc.h:223-236
- Empty slabs now released to shared pool
Performance Impact (ws=4096, 200K iterations):
- Before (baseline): 563K ops/s
- After this fix: 5.9M ops/s (+950% improvement!)
Note: LRU cache still not populated (investigating next)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-14 07:13:00 +09:00
2ef28ee5ab
Fix drain box compilation: Use pthread_self() directly
...
Issue:
- tiny_self_u32() is static inline, cannot be linked from drain box
- Link error: undefined reference to 'tiny_self_u32'
Fix:
- Use pthread_self() directly like hakmem_tiny_superslab.c:917
- Added <pthread.h> include
- Changed extern declaration from size_t to const size_t
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-14 07:10:46 +09:00
88f3592ef6
Option B: Periodic TLS SLL Drain - Fix Phase 9 LRU Architecture Issue
...
Root Cause:
- TLS SLL fast path (95-99% of frees) does NOT decrement meta->used
- Slabs never appear empty → SuperSlabs never freed → LRU never used
- Impact: 6,455 mmap/munmap calls per 200K iterations (74.8% time)
- Performance: -94% regression (9.38M → 563K ops/s)
Solution:
- Periodic drain every N frees (default: 1024) per size class
- Drain path: TLS SLL → slab freelist via tiny_free_local_box()
- This properly decrements meta->used and enables empty detection
Implementation:
1. core/box/tls_sll_drain_box.h - New drain box function
- tiny_tls_sll_drain(): Pop from TLS SLL, push to slab freelist
- tiny_tls_sll_try_drain(): Drain trigger with counter
- ENV: HAKMEM_TINY_SLL_DRAIN_ENABLE=1/0 (default: 1)
- ENV: HAKMEM_TINY_SLL_DRAIN_INTERVAL=N (default: 1024)
- ENV: HAKMEM_TINY_SLL_DRAIN_DEBUG=1 (debug logging)
2. core/tiny_free_fast_v2.inc.h - Integrated drain trigger
- Added drain call after successful TLS SLL push (line 145)
- Cost: 2-3 cycles per free (counter increment + comparison)
- Drain triggered every 1024 frees (0.1% overhead)
Expected Impact:
- mmap/munmap: 6,455 → ~100 calls (-96-97%)
- Throughput: 563K → 8-10M ops/s (+1,300-1,700%)
- LRU utilization: 0% → >90% (functional)
Reference: PHASE9_LRU_ARCHITECTURE_ISSUE.md
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-14 07:09:18 +09:00