hakmem

Author	SHA1	Message	Date
Moe Charm (CI)	8b67718bf2	Fix C7 TLS SLL corruption: Protect next pointer from user data overwrites ## Root Cause C7 (1024B allocations, 2048B stride) was using offset=1 for freelist next pointers, storing them at `base[1..8]`. Since user pointer is `base+1`, users could overwrite the next pointer area, corrupting the TLS SLL freelist. ## The Bug Sequence 1. Block freed → TLS SLL push stores next at `base[1..8]` 2. Block allocated → User gets `base+1`, can modify `base[1..2047]` 3. User writes data → Overwrites `base[1..8]` (next pointer area!) 4. Block freed again → tiny_next_load() reads garbage from `base[1..8]` 5. TLS SLL head becomes invalid (0xfe, 0xdb, 0x58, etc.) ## Why This Was Reverted Previous fix (C7 offset=0) was reverted with comment: "C7も header を保持して class 判別を壊さないことを優先" (Prioritize preserving C7 header to avoid breaking class identification) This reasoning was FLAWED because: - Header IS restored during allocation (HAK_RET_ALLOC), not freelist ops - Class identification at free time reads from ptr-1 = base[0] (after restoration) - During freelist, header CAN be sacrificed (not visible to user) - The revert CREATED the race condition by exposing base[1..8] to user ## Fix Applied ### 1. Revert C7 offset to 0 (tiny_nextptr.h:54) ```c // BEFORE (BROKEN): return (class_idx == 0) ? 0u : 1u; // AFTER (FIXED): return (class_idx == 0 \|\| class_idx == 7) ? 0u : 1u; ``` ### 2. Remove C7 header restoration in freelist (tiny_nextptr.h:84) ```c // BEFORE (BROKEN): if (class_idx != 0) { // Restores header for all classes including C7 // AFTER (FIXED): if (class_idx != 0 && class_idx != 7) { // Only C1-C6 restore headers ``` ### 3. Bonus: Remove premature slab release (tls_sll_drain_box.h:182-189) Removed `shared_pool_release_slab()` call from drain path that could cause use-after-free when blocks from same slab remain in TLS SLL. ## Why This Fix Works Memory Layout (C7 in freelist): ``` Address: base base+1 base+2048 ┌────┬──────────────────────┐ Content: │next│ (user accessible) │ └────┴──────────────────────┘ 8B ptr ← USER CANNOT TOUCH base[0] ``` - Next pointer at base[0]: Protected from user modification ✓ - User pointer at base+1: User sees base[1..2047] only ✓ - Header restored during allocation: HAK_RET_ALLOC writes 0xa7 at base[0] ✓ - Class ID preserved: tiny_region_id_read_header(ptr) reads ptr-1 = base[0] ✓ ## Verification Results ### Before Fix - Errors: 33 TLS_SLL_POP_INVALID per 100K iterations (0.033%) - Performance: 1.8M ops/s (corruption caused slow path fallback) - Symptoms: Invalid TLS SLL heads (0xfe, 0xdb, 0x58, 0x80, 0xc2, etc.) ### After Fix - Errors: 0 per 200K iterations ✅ - Performance: 10.0M ops/s (+456%!) ✅ - C7 direct test: 5.5M ops/s, 100K iterations, 0 errors ✅ ## Files Modified - core/tiny_nextptr.h (lines 49-54, 82-84) - C7 offset=0, no header restoration - core/box/tls_sll_drain_box.h (lines 182-189) - Remove premature slab release ## Architectural Lesson Design Principle: Freelist metadata MUST be stored in memory NOT accessible to user. \| Class \| Offset \| Next Storage \| User Access \| Result \| \|-------\|--------\|--------------\|-------------\|--------\| \| C0 \| 0 \| base[0] \| base[1..7] \| Safe ✓ \| \| C1-C6 \| 1 \| base[1..8] \| base[1..N] \| Safe (header at base[0]) ✓ \| \| C7 (broken) \| 1 \| base[1..8] \| base[1..2047] \| CORRUPTED ✗ \| \| C7 (fixed) \| 0 \| base[0] \| base[1..2047] \| Safe ✓ \| 🧹 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 23:42:43 +09:00
Moe Charm (CI)	9b0d746407	Phase 3d-B: TLS Cache Merge - Unified g_tls_sll[] structure (+12-18% expected) Merge separate g_tls_sll_head[] and g_tls_sll_count[] arrays into unified TinyTLSSLL struct to improve L1D cache locality. Expected performance gain: +12-18% from reducing cache line splits (2 loads → 1 load per operation). Changes: - core/hakmem_tiny.h: Add TinyTLSSLL type (16B aligned, head+count+pad) - core/hakmem_tiny.c: Replace separate arrays with g_tls_sll[8] - core/box/tls_sll_box.h: Update Box API (13 sites) for unified access - Updated 32+ files: All g_tls_sll_head[i] → g_tls_sll[i].head - Updated 32+ files: All g_tls_sll_count[i] → g_tls_sll[i].count - core/hakmem_tiny_integrity.h: Unified canary guards - core/box/integrity_box.c: Simplified canary validation - Makefile: Added core/box/tiny_sizeclass_hist_box.o to link Build: ✅ PASS (10K ops sanity test) Warnings: Only pre-existing LTO type mismatches (unrelated) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 07:32:30 +09:00
Moe Charm (CI)	82ba74933a	Tiny Step 2: drain interval optimization (default 1024→2048) Completed A/B testing for TLS SLL drain interval and implemented optimal default value based on empirical results. Changes: - core/box/tls_sll_drain_box.h: Default drain interval 1024 → 2048 - TINY_DRAIN_INTERVAL_AB_REPORT.md: Complete A/B analysis report Results (100K iterations): - 256B: 7.68M ops/s (+4.9% vs baseline 7.32M) - 128B: 8.76M ops/s (+13.6% vs baseline 7.71M) - Syscalls: Unchanged (2410) - drain affects frontend only Key Findings: - Size-dependent optimal intervals discovered (128B→512, 256B→2048) - Prioritized 256B critical path (classify_ptr 3.65% in perf profile) - No regression observed; both classes improved Methodology: - ENV-only testing (no code changes during A/B) - Tested intervals: 512, 1024 (baseline), 2048 - Workload: bench_random_mixed_hakmem - Metrics: Throughput, syscall count (strace -c) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 17:41:26 +09:00
Moe Charm (CI)	dd613bc93a	Drain optimization: Drain ALL blocks to maximize empty detection Issue: - Previous drain: only 32 blocks/trigger → slabs partially empty - Shared pool SuperSlabs mix multiple classes (C0-C7) - active_slabs only reaches 0 when ALL classes empty - Result: superslab_free() rarely called, LRU cache unused Fix: - Change drain batch_size: 32 → 0 (drain all available) - Added active_slabs logging in shared_pool_release_slab - Maximizes chance of SuperSlab becoming completely empty Performance Impact (ws=4096, 200K iterations): - Before (batch=32): 5.9M ops/s - After (batch=all): 6.1M ops/s (+3.4%) - Baseline improvement: 563K → 6.1M ops/s (+980%!) Known Issue: - LRU cache still unused due to Shared Pool design - SuperSlabs rarely become completely empty (multi-class mixing) - Requires Shared Pool architecture optimization (Phase 12) Next: Investigate Shared Pool optimization strategies 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 07:55:51 +09:00
Moe Charm (CI)	4ffdaae2fc	Add empty slab detection to drain: call shared_pool_release_slab Issue: - Drain was detecting meta->used==0 but not releasing slabs - Logic missing: shared_pool_release_slab() call after empty detection - Result: SuperSlabs not freed, LRU cache not populated Fix: - Added shared_pool_release_slab() call when meta->used==0 (line 194) - Mirrors logic in tiny_superslab_free.inc.h:223-236 - Empty slabs now released to shared pool Performance Impact (ws=4096, 200K iterations): - Before (baseline): 563K ops/s - After this fix: 5.9M ops/s (+950% improvement!) Note: LRU cache still not populated (investigating next) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 07:13:00 +09:00
Moe Charm (CI)	2ef28ee5ab	Fix drain box compilation: Use pthread_self() directly Issue: - tiny_self_u32() is static inline, cannot be linked from drain box - Link error: undefined reference to 'tiny_self_u32' Fix: - Use pthread_self() directly like hakmem_tiny_superslab.c:917 - Added <pthread.h> include - Changed extern declaration from size_t to const size_t 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 07:10:46 +09:00
Moe Charm (CI)	88f3592ef6	Option B: Periodic TLS SLL Drain - Fix Phase 9 LRU Architecture Issue Root Cause: - TLS SLL fast path (95-99% of frees) does NOT decrement meta->used - Slabs never appear empty → SuperSlabs never freed → LRU never used - Impact: 6,455 mmap/munmap calls per 200K iterations (74.8% time) - Performance: -94% regression (9.38M → 563K ops/s) Solution: - Periodic drain every N frees (default: 1024) per size class - Drain path: TLS SLL → slab freelist via tiny_free_local_box() - This properly decrements meta->used and enables empty detection Implementation: 1. core/box/tls_sll_drain_box.h - New drain box function - tiny_tls_sll_drain(): Pop from TLS SLL, push to slab freelist - tiny_tls_sll_try_drain(): Drain trigger with counter - ENV: HAKMEM_TINY_SLL_DRAIN_ENABLE=1/0 (default: 1) - ENV: HAKMEM_TINY_SLL_DRAIN_INTERVAL=N (default: 1024) - ENV: HAKMEM_TINY_SLL_DRAIN_DEBUG=1 (debug logging) 2. core/tiny_free_fast_v2.inc.h - Integrated drain trigger - Added drain call after successful TLS SLL push (line 145) - Cost: 2-3 cycles per free (counter increment + comparison) - Drain triggered every 1024 frees (0.1% overhead) Expected Impact: - mmap/munmap: 6,455 → ~100 calls (-96-97%) - Throughput: 563K → 8-10M ops/s (+1,300-1,700%) - LRU utilization: 0% → >90% (functional) Reference: PHASE9_LRU_ARCHITECTURE_ISSUE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-14 07:09:18 +09:00

7 Commits