Commit Graph

7 Commits

Author SHA1 Message Date
8b67718bf2 Fix C7 TLS SLL corruption: Protect next pointer from user data overwrites
## Root Cause
C7 (1024B allocations, 2048B stride) was using offset=1 for freelist next
pointers, storing them at `base[1..8]`. Since user pointer is `base+1`, users
could overwrite the next pointer area, corrupting the TLS SLL freelist.

## The Bug Sequence
1. Block freed → TLS SLL push stores next at `base[1..8]`
2. Block allocated → User gets `base+1`, can modify `base[1..2047]`
3. User writes data → Overwrites `base[1..8]` (next pointer area!)
4. Block freed again → tiny_next_load() reads garbage from `base[1..8]`
5. TLS SLL head becomes invalid (0xfe, 0xdb, 0x58, etc.)

## Why This Was Reverted
Previous fix (C7 offset=0) was reverted with comment:
  "C7も header を保持して class 判別を壊さないことを優先"
  (Prioritize preserving C7 header to avoid breaking class identification)

This reasoning was FLAWED because:
- Header IS restored during allocation (HAK_RET_ALLOC), not freelist ops
- Class identification at free time reads from ptr-1 = base[0] (after restoration)
- During freelist, header CAN be sacrificed (not visible to user)
- The revert CREATED the race condition by exposing base[1..8] to user

## Fix Applied

### 1. Revert C7 offset to 0 (tiny_nextptr.h:54)
```c
// BEFORE (BROKEN):
return (class_idx == 0) ? 0u : 1u;

// AFTER (FIXED):
return (class_idx == 0 || class_idx == 7) ? 0u : 1u;
```

### 2. Remove C7 header restoration in freelist (tiny_nextptr.h:84)
```c
// BEFORE (BROKEN):
if (class_idx != 0) {  // Restores header for all classes including C7

// AFTER (FIXED):
if (class_idx != 0 && class_idx != 7) {  // Only C1-C6 restore headers
```

### 3. Bonus: Remove premature slab release (tls_sll_drain_box.h:182-189)
Removed `shared_pool_release_slab()` call from drain path that could cause
use-after-free when blocks from same slab remain in TLS SLL.

## Why This Fix Works

**Memory Layout** (C7 in freelist):
```
Address:     base      base+1        base+2048
            ┌────┬──────────────────────┐
Content:    │next│  (user accessible)  │
            └────┴──────────────────────┘
            8B ptr  ← USER CANNOT TOUCH base[0]
```

- **Next pointer at base[0]**: Protected from user modification ✓
- **User pointer at base+1**: User sees base[1..2047] only ✓
- **Header restored during allocation**: HAK_RET_ALLOC writes 0xa7 at base[0] ✓
- **Class ID preserved**: tiny_region_id_read_header(ptr) reads ptr-1 = base[0] ✓

## Verification Results

### Before Fix
- **Errors**: 33 TLS_SLL_POP_INVALID per 100K iterations (0.033%)
- **Performance**: 1.8M ops/s (corruption caused slow path fallback)
- **Symptoms**: Invalid TLS SLL heads (0xfe, 0xdb, 0x58, 0x80, 0xc2, etc.)

### After Fix
- **Errors**: 0 per 200K iterations 
- **Performance**: 10.0M ops/s (+456%!) 
- **C7 direct test**: 5.5M ops/s, 100K iterations, 0 errors 

## Files Modified
- core/tiny_nextptr.h (lines 49-54, 82-84) - C7 offset=0, no header restoration
- core/box/tls_sll_drain_box.h (lines 182-189) - Remove premature slab release

## Architectural Lesson

**Design Principle**: Freelist metadata MUST be stored in memory NOT accessible to user.

| Class | Offset | Next Storage | User Access | Result |
|-------|--------|--------------|-------------|--------|
| C0 | 0 | base[0] | base[1..7] | Safe ✓ |
| C1-C6 | 1 | base[1..8] | base[1..N] | Safe (header at base[0]) ✓ |
| C7 (broken) | 1 | base[1..8] | base[1..2047] | **CORRUPTED** ✗ |
| C7 (fixed) | 0 | base[0] | base[1..2047] | Safe ✓ |

🧹 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 23:42:43 +09:00
9b0d746407 Phase 3d-B: TLS Cache Merge - Unified g_tls_sll[] structure (+12-18% expected)
Merge separate g_tls_sll_head[] and g_tls_sll_count[] arrays into unified
TinyTLSSLL struct to improve L1D cache locality. Expected performance gain:
+12-18% from reducing cache line splits (2 loads → 1 load per operation).

Changes:
- core/hakmem_tiny.h: Add TinyTLSSLL type (16B aligned, head+count+pad)
- core/hakmem_tiny.c: Replace separate arrays with g_tls_sll[8]
- core/box/tls_sll_box.h: Update Box API (13 sites) for unified access
- Updated 32+ files: All g_tls_sll_head[i] → g_tls_sll[i].head
- Updated 32+ files: All g_tls_sll_count[i] → g_tls_sll[i].count
- core/hakmem_tiny_integrity.h: Unified canary guards
- core/box/integrity_box.c: Simplified canary validation
- Makefile: Added core/box/tiny_sizeclass_hist_box.o to link

Build:  PASS (10K ops sanity test)
Warnings: Only pre-existing LTO type mismatches (unrelated)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 07:32:30 +09:00
82ba74933a Tiny Step 2: drain interval optimization (default 1024→2048)
Completed A/B testing for TLS SLL drain interval and implemented
optimal default value based on empirical results.

Changes:
- core/box/tls_sll_drain_box.h: Default drain interval 1024 → 2048
- TINY_DRAIN_INTERVAL_AB_REPORT.md: Complete A/B analysis report

Results (100K iterations):
- 256B: 7.68M ops/s (+4.9% vs baseline 7.32M)
- 128B: 8.76M ops/s (+13.6% vs baseline 7.71M)
- Syscalls: Unchanged (2410) - drain affects frontend only

Key Findings:
- Size-dependent optimal intervals discovered (128B→512, 256B→2048)
- Prioritized 256B critical path (classify_ptr 3.65% in perf profile)
- No regression observed; both classes improved

Methodology:
- ENV-only testing (no code changes during A/B)
- Tested intervals: 512, 1024 (baseline), 2048
- Workload: bench_random_mixed_hakmem
- Metrics: Throughput, syscall count (strace -c)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 17:41:26 +09:00
dd613bc93a Drain optimization: Drain ALL blocks to maximize empty detection
Issue:
- Previous drain: only 32 blocks/trigger → slabs partially empty
- Shared pool SuperSlabs mix multiple classes (C0-C7)
- active_slabs only reaches 0 when ALL classes empty
- Result: superslab_free() rarely called, LRU cache unused

Fix:
- Change drain batch_size: 32 → 0 (drain all available)
- Added active_slabs logging in shared_pool_release_slab
- Maximizes chance of SuperSlab becoming completely empty

Performance Impact (ws=4096, 200K iterations):
- Before (batch=32): 5.9M ops/s
- After (batch=all): 6.1M ops/s (+3.4%)
- Baseline improvement: 563K → 6.1M ops/s (+980%!)

Known Issue:
- LRU cache still unused due to Shared Pool design
- SuperSlabs rarely become completely empty (multi-class mixing)
- Requires Shared Pool architecture optimization (Phase 12)

Next: Investigate Shared Pool optimization strategies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:55:51 +09:00
4ffdaae2fc Add empty slab detection to drain: call shared_pool_release_slab
Issue:
- Drain was detecting meta->used==0 but not releasing slabs
- Logic missing: shared_pool_release_slab() call after empty detection
- Result: SuperSlabs not freed, LRU cache not populated

Fix:
- Added shared_pool_release_slab() call when meta->used==0 (line 194)
- Mirrors logic in tiny_superslab_free.inc.h:223-236
- Empty slabs now released to shared pool

Performance Impact (ws=4096, 200K iterations):
- Before (baseline): 563K ops/s
- After this fix: 5.9M ops/s (+950% improvement!)

Note: LRU cache still not populated (investigating next)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:13:00 +09:00
2ef28ee5ab Fix drain box compilation: Use pthread_self() directly
Issue:
- tiny_self_u32() is static inline, cannot be linked from drain box
- Link error: undefined reference to 'tiny_self_u32'

Fix:
- Use pthread_self() directly like hakmem_tiny_superslab.c:917
- Added <pthread.h> include
- Changed extern declaration from size_t to const size_t

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:10:46 +09:00
88f3592ef6 Option B: Periodic TLS SLL Drain - Fix Phase 9 LRU Architecture Issue
Root Cause:
- TLS SLL fast path (95-99% of frees) does NOT decrement meta->used
- Slabs never appear empty → SuperSlabs never freed → LRU never used
- Impact: 6,455 mmap/munmap calls per 200K iterations (74.8% time)
- Performance: -94% regression (9.38M → 563K ops/s)

Solution:
- Periodic drain every N frees (default: 1024) per size class
- Drain path: TLS SLL → slab freelist via tiny_free_local_box()
- This properly decrements meta->used and enables empty detection

Implementation:
1. core/box/tls_sll_drain_box.h - New drain box function
   - tiny_tls_sll_drain(): Pop from TLS SLL, push to slab freelist
   - tiny_tls_sll_try_drain(): Drain trigger with counter
   - ENV: HAKMEM_TINY_SLL_DRAIN_ENABLE=1/0 (default: 1)
   - ENV: HAKMEM_TINY_SLL_DRAIN_INTERVAL=N (default: 1024)
   - ENV: HAKMEM_TINY_SLL_DRAIN_DEBUG=1 (debug logging)

2. core/tiny_free_fast_v2.inc.h - Integrated drain trigger
   - Added drain call after successful TLS SLL push (line 145)
   - Cost: 2-3 cycles per free (counter increment + comparison)
   - Drain triggered every 1024 frees (0.1% overhead)

Expected Impact:
- mmap/munmap: 6,455 → ~100 calls (-96-97%)
- Throughput: 563K → 8-10M ops/s (+1,300-1,700%)
- LRU utilization: 0% → >90% (functional)

Reference: PHASE9_LRU_ARCHITECTURE_ISSUE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:09:18 +09:00