Files
hakmem/core/tiny_refill_opt.h

379 lines
16 KiB
C
Raw Normal View History

// tiny_refill_opt.h - Inline helpers to batch and splice refill chains
// Box: Refill Boundary optimization helpers (kept header-only)
#pragma once
#include <stdint.h>
#include <stdio.h>
#include <stdatomic.h>
#include <stdlib.h>
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
#include "tiny_region_id.h" // For HEADER_MAGIC, HEADER_CLASS_MASK (Fix #6)
#include "ptr_track.h" // Pointer tracking for debugging header corruption
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
#include "box/tiny_next_ptr_box.h" // Box API: Next pointer read/write
Phase 1: Atomic Freelist Implementation - MT Safety Foundation PROBLEM: - Larson crashes with 3+ threads (SEGV in freelist operations) - Root cause: Non-atomic TinySlabMeta.freelist access under contention - Race condition: Multiple threads pop/push freelist concurrently SOLUTION: - Made TinySlabMeta.freelist and .used _Atomic for MT safety - Created lock-free accessor API (slab_freelist_atomic.h) - Converted 5 critical hot path sites to use atomic operations IMPLEMENTATION: 1. superslab_types.h:12-13 - Made freelist and used _Atomic 2. slab_freelist_atomic.h (NEW) - Lock-free CAS operations - slab_freelist_pop_lockfree() - Atomic pop with CAS loop - slab_freelist_push_lockfree() - Atomic push (template) - Relaxed load/store for non-critical paths 3. ss_slab_meta_box.h - Box API now uses atomic accessor 4. hakmem_tiny_superslab.c - Atomic init (store_relaxed) 5. tiny_refill_opt.h - trc_pop_from_freelist() uses lock-free CAS 6. hakmem_tiny_refill_p0.inc.h - Atomic used increment + prefetch PERFORMANCE: Single-Threaded (Random Mixed 256B): Before: 25.1M ops/s (Phase 3d-C baseline) After: 16.7M ops/s (-34%, atomic overhead expected) Multi-Threaded (Larson): 1T: 47.9M ops/s ✅ 2T: 48.1M ops/s ✅ 3T: 46.5M ops/s ✅ (was SEGV before) 4T: 48.1M ops/s ✅ 8T: 48.8M ops/s ✅ (stable, no crashes) MT STABILITY: Before: SEGV at 3+ threads (100% crash rate) After: Zero crashes (100% stable at 8 threads) DESIGN: - Lock-free CAS: 6-10 cycles overhead (vs 20-30 for mutex) - Relaxed ordering: 0 cycles overhead (same as non-atomic) - Memory ordering: acquire/release for CAS, relaxed for checks - Expected regression: <3% single-threaded, +MT stability NEXT STEPS: - Phase 2: Convert 40 important sites (TLS-related freelist ops) - Phase 3: Convert 25 cleanup sites (remaining + documentation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 02:46:57 +09:00
#include "box/slab_freelist_atomic.h" // Phase 1: Atomic freelist accessor
Cleanup: Consolidate debug ENV vars to HAKMEM_DEBUG_LEVEL Integrated 4 new debug environment variables added during bug fixes into the existing unified HAKMEM_DEBUG_LEVEL system (expanded to 0-5 levels). Changes: 1. Expanded HAKMEM_DEBUG_LEVEL from 0-3 to 0-5 levels: - 0 = OFF (production) - 1 = ERROR (critical errors) - 2 = WARN (warnings) - 3 = INFO (allocation paths, header validation, stats) - 4 = DEBUG (guard instrumentation, failfast) - 5 = TRACE (verbose tracing) 2. Integrated 4 environment variables: - HAKMEM_ALLOC_PATH_TRACE → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_SLL_VALIDATE_HDR → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_REFILL_FAILFAST → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) - HAKMEM_TINY_GUARD → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) 3. Kept 2 special-purpose variables (fine-grained control): - HAKMEM_TINY_GUARD_CLASS (target class for guard) - HAKMEM_TINY_GUARD_MAX (max guard events) 4. Backward compatibility: - Legacy ENV vars still work via hak_debug_check_level() - New code uses unified system - No behavior changes for existing users Updated files: - core/hakmem_debug_master.h (level 0-5 expansion) - core/hakmem_tiny_superslab_internal.h (alloc path trace) - core/box/tls_sll_box.h (header validation) - core/tiny_failfast.c (failfast level) - core/tiny_refill_opt.h (failfast guard) - core/hakmem_tiny_ace_guard_box.inc (guard enable) - core/hakmem_tiny.c (include hakmem_debug_master.h) Impact: - Simpler debug control: HAKMEM_DEBUG_LEVEL=3 instead of 4 separate ENVs - Easier to discover/use - Consistent debug levels across codebase - Reduces ENV variable proliferation (43+ vars surveyed) Future work: - Consolidate remaining 39+ debug variables (documented in survey) - Gradual migration over 2-3 releases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 06:57:03 +09:00
#include "hakmem_debug_master.h" // For unified debug level control
#ifndef HAKMEM_TINY_REFILL_OPT
#define HAKMEM_TINY_REFILL_OPT 1
#endif
// Local chain structure (head/tail pointers)
typedef struct TinyRefillChain {
void* head;
void* tail;
uint32_t count;
} TinyRefillChain;
static inline void trc_init(TinyRefillChain* c) {
c->head = NULL; c->tail = NULL; c->count = 0;
}
static inline void refill_opt_dbg(const char* stage, int class_idx, uint32_t n) {
#if HAKMEM_TINY_REFILL_OPT && !HAKMEM_BUILD_RELEASE
static int en = -1;
static _Atomic int printed = 0;
if (__builtin_expect(en == -1, 0)) {
const char* e = getenv("HAKMEM_TINY_REFILL_OPT_DEBUG");
en = (e && *e && *e != '0') ? 1 : 0;
}
if (!en) return;
int exp = 0;
if (atomic_compare_exchange_strong(&printed, &exp, 1)) {
fprintf(stderr, "[REFILL_OPT] stage=%s cls=%d n=%u\n", stage ? stage : "(null)", class_idx, (unsigned)n);
fflush(stderr);
}
#else
(void)stage; (void)class_idx; (void)n;
#endif
}
// Phase 7 header-aware push_front: link using base+1 for C0-C6 (C7 not used here)
static inline void trc_push_front(TinyRefillChain* c, void* node, int class_idx) {
if (c->head == NULL) {
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
c->head = node; c->tail = node; tiny_next_write(class_idx, node, NULL); c->count = 1;
} else {
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
tiny_next_write(class_idx, node, c->head); c->head = node; c->count++;
}
}
// Forward declaration of guard function
static inline int trc_refill_guard_enabled(void);
// Forward declare Box TLS-SLL API
#include "box/tls_sll_box.h"
// Splice local chain into TLS SLL using Box TLS-SLL API (C7-safe)
static inline void trc_splice_to_sll(int class_idx, TinyRefillChain* c,
void** sll_head, uint32_t* sll_count) {
if (!c || c->head == NULL) return;
// CORRUPTION DEBUG: Log chain splice (alignment check removed - false positive)
// NOTE: Blocks are stride-aligned from slab base, not absolutely aligned
// A slab at 0x1000 with 513B blocks is valid: 0x1000, 0x1201, 0x1402, etc.
if (__builtin_expect(trc_refill_guard_enabled(), 0)) {
fprintf(stderr, "[SPLICE_TO_SLL] cls=%d head=%p tail=%p count=%u\n",
class_idx, c->head, c->tail, c->count);
}
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
// DEBUG: Validate chain is properly NULL-terminated BEFORE splicing
static _Atomic uint64_t g_splice_count = 0;
uint64_t splice_num = atomic_fetch_add(&g_splice_count, 1);
if (splice_num > 40 && splice_num < 80 && class_idx == 0) {
fprintf(stderr, "[SPLICE_DEBUG] splice=%lu cls=%d head=%p tail=%p count=%u\n",
splice_num, class_idx, c->head, c->tail, c->count);
// Walk chain to verify NULL termination
void* cursor = c->head;
uint32_t walked = 0;
while (cursor && walked < c->count + 5) {
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
void* next = tiny_next_read(class_idx, cursor);
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
fprintf(stderr, "[SPLICE_WALK] node=%p next=%p walked=%u/%u\n",
cursor, next, walked, c->count);
if (walked == c->count - 1 && next != NULL) {
fprintf(stderr, "[SPLICE_ERROR] Tail not NULL-terminated! tail=%p next=%p\n",
cursor, next);
abort();
}
cursor = next;
walked++;
}
fflush(stderr);
}
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// 🐛 DEBUG: Log splice call BEFORE calling tls_sll_splice()
#if !HAKMEM_BUILD_RELEASE
{
static _Atomic uint64_t g_splice_call_count = 0;
uint64_t call_num = atomic_fetch_add(&g_splice_call_count, 1);
if (call_num < 10) { // Log first 10 calls
fprintf(stderr, "[TRC_SPLICE #%lu] BEFORE: cls=%d count=%u sll_count_before=%u\n",
call_num, class_idx, c->count, g_tls_sll[class_idx].count);
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
fflush(stderr);
}
}
#endif
// CRITICAL: Use Box TLS-SLL API for splice (C7-safe, no race)
// Note: tls_sll_splice() requires capacity parameter (use large value for refill)
uint32_t moved = tls_sll_splice(class_idx, c->head, c->count, 4096);
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// 🐛 DEBUG: Log splice result AFTER calling tls_sll_splice()
#if !HAKMEM_BUILD_RELEASE
{
static _Atomic uint64_t g_splice_result_count = 0;
uint64_t result_num = atomic_fetch_add(&g_splice_result_count, 1);
if (result_num < 10) { // Log first 10 results
fprintf(stderr, "[TRC_SPLICE #%lu] AFTER: cls=%d moved=%u/%u sll_count_after=%u\n",
result_num, class_idx, moved, c->count, g_tls_sll[class_idx].count);
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
fflush(stderr);
}
}
#endif
// Update sll_count if provided (Box API already updated g_tls_sll internally)
// Note: sll_count parameter is typically &g_tls_sll[class_idx].count, already updated
(void)sll_count; // Suppress unused warning
(void)sll_head; // Suppress unused warning
Fix: TLS SLL double-free diagnostics - Add error handling and detection improvements Problem: workset=8192 crashes at 240K iterations with TLS SLL double-free: [TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... already in SLL Investigation (Task agent): Identified 8 tls_sll_push() call sites and 3 high-risk areas: 1. HIGH: Carve-Push Rollback pop failures (carve_push_box.c) 2. MEDIUM: Splice partial orphaned nodes (tiny_refill_opt.h) 3. MEDIUM: Incomplete double-free scan - only 64 nodes (tls_sll_box.h) Fixes Applied: 1. core/box/carve_push_box.c (Lines 115-139) - Track pop_failed count during rollback - Log orphaned blocks: [BOX_CARVE_PUSH_ROLLBACK] warning - Helps identify when rollback leaves blocks in SLL 2. core/box/tls_sll_box.h (Lines 347-370) - Increase double-free scan: 64 → 256 nodes - Add scanned count to error: (scanned=%u/%u) - Catches orphaned blocks deeper in chain 3. core/tiny_refill_opt.h (Lines 135-166) - Enhanced splice partial logging - Abort in debug builds on orphaned nodes - Prevents silent memory leaks Test Results: Before: SEGV at 220K iterations After: SEGV at 240K iterations (improved detection) [TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... (scanned=2/71) Impact: ✅ Early detection working (catches at position 2) ✅ Diagnostic capability greatly improved ⚠️ Root cause not yet resolved (deeper investigation needed) Status: Diagnostic improvements committed for further analysis Credit: Root cause analysis by Task agent (Explore) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 08:43:18 +09:00
// If splice was partial, warn about orphaned nodes
// Note: Orphaned nodes remain in chain but are not in SLL.
// Caller must handle cleanup if needed (typically by returning to freelist).
if (__builtin_expect(moved < c->count, 0)) {
Fix: TLS SLL double-free diagnostics - Add error handling and detection improvements Problem: workset=8192 crashes at 240K iterations with TLS SLL double-free: [TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... already in SLL Investigation (Task agent): Identified 8 tls_sll_push() call sites and 3 high-risk areas: 1. HIGH: Carve-Push Rollback pop failures (carve_push_box.c) 2. MEDIUM: Splice partial orphaned nodes (tiny_refill_opt.h) 3. MEDIUM: Incomplete double-free scan - only 64 nodes (tls_sll_box.h) Fixes Applied: 1. core/box/carve_push_box.c (Lines 115-139) - Track pop_failed count during rollback - Log orphaned blocks: [BOX_CARVE_PUSH_ROLLBACK] warning - Helps identify when rollback leaves blocks in SLL 2. core/box/tls_sll_box.h (Lines 347-370) - Increase double-free scan: 64 → 256 nodes - Add scanned count to error: (scanned=%u/%u) - Catches orphaned blocks deeper in chain 3. core/tiny_refill_opt.h (Lines 135-166) - Enhanced splice partial logging - Abort in debug builds on orphaned nodes - Prevents silent memory leaks Test Results: Before: SEGV at 220K iterations After: SEGV at 240K iterations (improved detection) [TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... (scanned=2/71) Impact: ✅ Early detection working (catches at position 2) ✅ Diagnostic capability greatly improved ⚠️ Root cause not yet resolved (deeper investigation needed) Status: Diagnostic improvements committed for further analysis Credit: Root cause analysis by Task agent (Explore) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 08:43:18 +09:00
uint32_t orphaned = c->count - moved;
fprintf(stderr, "[SPLICE_PARTIAL] CRITICAL: Only moved %u/%u blocks (cls=%d)\n",
moved, c->count, class_idx);
fprintf(stderr, "[SPLICE_PARTIAL] %u blocks orphaned - potential memory leak!\n",
orphaned);
fprintf(stderr, "[SPLICE_PARTIAL] Orphan chain starts at node %u in original chain\n",
moved);
// Log orphan chain head for debugging
void* scan = c->head;
for (uint32_t i = 0; i < moved && scan; i++) {
void* next = tiny_next_read(class_idx, scan);
if (i == moved - 1) {
void* orphan_head = tiny_next_read(class_idx, scan);
fprintf(stderr, "[SPLICE_PARTIAL] Orphan chain head: %p (after node %u)\n",
orphan_head, i);
break;
}
scan = next;
}
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
fflush(stderr);
Fix: TLS SLL double-free diagnostics - Add error handling and detection improvements Problem: workset=8192 crashes at 240K iterations with TLS SLL double-free: [TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... already in SLL Investigation (Task agent): Identified 8 tls_sll_push() call sites and 3 high-risk areas: 1. HIGH: Carve-Push Rollback pop failures (carve_push_box.c) 2. MEDIUM: Splice partial orphaned nodes (tiny_refill_opt.h) 3. MEDIUM: Incomplete double-free scan - only 64 nodes (tls_sll_box.h) Fixes Applied: 1. core/box/carve_push_box.c (Lines 115-139) - Track pop_failed count during rollback - Log orphaned blocks: [BOX_CARVE_PUSH_ROLLBACK] warning - Helps identify when rollback leaves blocks in SLL 2. core/box/tls_sll_box.h (Lines 347-370) - Increase double-free scan: 64 → 256 nodes - Add scanned count to error: (scanned=%u/%u) - Catches orphaned blocks deeper in chain 3. core/tiny_refill_opt.h (Lines 135-166) - Enhanced splice partial logging - Abort in debug builds on orphaned nodes - Prevents silent memory leaks Test Results: Before: SEGV at 220K iterations After: SEGV at 240K iterations (improved detection) [TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... (scanned=2/71) Impact: ✅ Early detection working (catches at position 2) ✅ Diagnostic capability greatly improved ⚠️ Root cause not yet resolved (deeper investigation needed) Status: Diagnostic improvements committed for further analysis Credit: Root cause analysis by Task agent (Explore) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 08:43:18 +09:00
// In debug builds, consider aborting on orphaned nodes
#if !HAKMEM_BUILD_RELEASE
fprintf(stderr, "[SPLICE_PARTIAL] Aborting to prevent memory corruption\n");
abort();
#endif
}
}
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
static inline int trc_refill_guard_enabled(void) {
// FIX: Allow runtime override even in release builds for debugging
Cleanup: Consolidate debug ENV vars to HAKMEM_DEBUG_LEVEL Integrated 4 new debug environment variables added during bug fixes into the existing unified HAKMEM_DEBUG_LEVEL system (expanded to 0-5 levels). Changes: 1. Expanded HAKMEM_DEBUG_LEVEL from 0-3 to 0-5 levels: - 0 = OFF (production) - 1 = ERROR (critical errors) - 2 = WARN (warnings) - 3 = INFO (allocation paths, header validation, stats) - 4 = DEBUG (guard instrumentation, failfast) - 5 = TRACE (verbose tracing) 2. Integrated 4 environment variables: - HAKMEM_ALLOC_PATH_TRACE → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_SLL_VALIDATE_HDR → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_REFILL_FAILFAST → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) - HAKMEM_TINY_GUARD → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) 3. Kept 2 special-purpose variables (fine-grained control): - HAKMEM_TINY_GUARD_CLASS (target class for guard) - HAKMEM_TINY_GUARD_MAX (max guard events) 4. Backward compatibility: - Legacy ENV vars still work via hak_debug_check_level() - New code uses unified system - No behavior changes for existing users Updated files: - core/hakmem_debug_master.h (level 0-5 expansion) - core/hakmem_tiny_superslab_internal.h (alloc path trace) - core/box/tls_sll_box.h (header validation) - core/tiny_failfast.c (failfast level) - core/tiny_refill_opt.h (failfast guard) - core/hakmem_tiny_ace_guard_box.inc (guard enable) - core/hakmem_tiny.c (include hakmem_debug_master.h) Impact: - Simpler debug control: HAKMEM_DEBUG_LEVEL=3 instead of 4 separate ENVs - Easier to discover/use - Consistent debug levels across codebase - Reduces ENV variable proliferation (43+ vars surveyed) Future work: - Consolidate remaining 39+ debug variables (documented in survey) - Gradual migration over 2-3 releases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 06:57:03 +09:00
// Enabled via HAKMEM_DEBUG_LEVEL >= 4 (DEBUG level)
// Legacy: HAKMEM_TINY_REFILL_FAILFAST still works for compatibility
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
static int g_trc_guard = -1;
if (__builtin_expect(g_trc_guard == -1, 0)) {
#if HAKMEM_BUILD_RELEASE
Cleanup: Consolidate debug ENV vars to HAKMEM_DEBUG_LEVEL Integrated 4 new debug environment variables added during bug fixes into the existing unified HAKMEM_DEBUG_LEVEL system (expanded to 0-5 levels). Changes: 1. Expanded HAKMEM_DEBUG_LEVEL from 0-3 to 0-5 levels: - 0 = OFF (production) - 1 = ERROR (critical errors) - 2 = WARN (warnings) - 3 = INFO (allocation paths, header validation, stats) - 4 = DEBUG (guard instrumentation, failfast) - 5 = TRACE (verbose tracing) 2. Integrated 4 environment variables: - HAKMEM_ALLOC_PATH_TRACE → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_SLL_VALIDATE_HDR → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_REFILL_FAILFAST → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) - HAKMEM_TINY_GUARD → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) 3. Kept 2 special-purpose variables (fine-grained control): - HAKMEM_TINY_GUARD_CLASS (target class for guard) - HAKMEM_TINY_GUARD_MAX (max guard events) 4. Backward compatibility: - Legacy ENV vars still work via hak_debug_check_level() - New code uses unified system - No behavior changes for existing users Updated files: - core/hakmem_debug_master.h (level 0-5 expansion) - core/hakmem_tiny_superslab_internal.h (alloc path trace) - core/box/tls_sll_box.h (header validation) - core/tiny_failfast.c (failfast level) - core/tiny_refill_opt.h (failfast guard) - core/hakmem_tiny_ace_guard_box.inc (guard enable) - core/hakmem_tiny.c (include hakmem_debug_master.h) Impact: - Simpler debug control: HAKMEM_DEBUG_LEVEL=3 instead of 4 separate ENVs - Easier to discover/use - Consistent debug levels across codebase - Reduces ENV variable proliferation (43+ vars surveyed) Future work: - Consolidate remaining 39+ debug variables (documented in survey) - Gradual migration over 2-3 releases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 06:57:03 +09:00
// Release: Use unified debug level
g_trc_guard = hak_debug_check_level("HAKMEM_TINY_REFILL_FAILFAST", 4);
#else
Cleanup: Consolidate debug ENV vars to HAKMEM_DEBUG_LEVEL Integrated 4 new debug environment variables added during bug fixes into the existing unified HAKMEM_DEBUG_LEVEL system (expanded to 0-5 levels). Changes: 1. Expanded HAKMEM_DEBUG_LEVEL from 0-3 to 0-5 levels: - 0 = OFF (production) - 1 = ERROR (critical errors) - 2 = WARN (warnings) - 3 = INFO (allocation paths, header validation, stats) - 4 = DEBUG (guard instrumentation, failfast) - 5 = TRACE (verbose tracing) 2. Integrated 4 environment variables: - HAKMEM_ALLOC_PATH_TRACE → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_SLL_VALIDATE_HDR → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_REFILL_FAILFAST → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) - HAKMEM_TINY_GUARD → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) 3. Kept 2 special-purpose variables (fine-grained control): - HAKMEM_TINY_GUARD_CLASS (target class for guard) - HAKMEM_TINY_GUARD_MAX (max guard events) 4. Backward compatibility: - Legacy ENV vars still work via hak_debug_check_level() - New code uses unified system - No behavior changes for existing users Updated files: - core/hakmem_debug_master.h (level 0-5 expansion) - core/hakmem_tiny_superslab_internal.h (alloc path trace) - core/box/tls_sll_box.h (header validation) - core/tiny_failfast.c (failfast level) - core/tiny_refill_opt.h (failfast guard) - core/hakmem_tiny_ace_guard_box.inc (guard enable) - core/hakmem_tiny.c (include hakmem_debug_master.h) Impact: - Simpler debug control: HAKMEM_DEBUG_LEVEL=3 instead of 4 separate ENVs - Easier to discover/use - Consistent debug levels across codebase - Reduces ENV variable proliferation (43+ vars surveyed) Future work: - Consolidate remaining 39+ debug variables (documented in survey) - Gradual migration over 2-3 releases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 06:57:03 +09:00
// Debug: Default ON
g_trc_guard = 1;
#endif
#if !HAKMEM_BUILD_RELEASE
Cleanup: Consolidate debug ENV vars to HAKMEM_DEBUG_LEVEL Integrated 4 new debug environment variables added during bug fixes into the existing unified HAKMEM_DEBUG_LEVEL system (expanded to 0-5 levels). Changes: 1. Expanded HAKMEM_DEBUG_LEVEL from 0-3 to 0-5 levels: - 0 = OFF (production) - 1 = ERROR (critical errors) - 2 = WARN (warnings) - 3 = INFO (allocation paths, header validation, stats) - 4 = DEBUG (guard instrumentation, failfast) - 5 = TRACE (verbose tracing) 2. Integrated 4 environment variables: - HAKMEM_ALLOC_PATH_TRACE → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_SLL_VALIDATE_HDR → HAKMEM_DEBUG_LEVEL >= 3 (INFO) - HAKMEM_TINY_REFILL_FAILFAST → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) - HAKMEM_TINY_GUARD → HAKMEM_DEBUG_LEVEL >= 4 (DEBUG) 3. Kept 2 special-purpose variables (fine-grained control): - HAKMEM_TINY_GUARD_CLASS (target class for guard) - HAKMEM_TINY_GUARD_MAX (max guard events) 4. Backward compatibility: - Legacy ENV vars still work via hak_debug_check_level() - New code uses unified system - No behavior changes for existing users Updated files: - core/hakmem_debug_master.h (level 0-5 expansion) - core/hakmem_tiny_superslab_internal.h (alloc path trace) - core/box/tls_sll_box.h (header validation) - core/tiny_failfast.c (failfast level) - core/tiny_refill_opt.h (failfast guard) - core/hakmem_tiny_ace_guard_box.inc (guard enable) - core/hakmem_tiny.c (include hakmem_debug_master.h) Impact: - Simpler debug control: HAKMEM_DEBUG_LEVEL=3 instead of 4 separate ENVs - Easier to discover/use - Consistent debug levels across codebase - Reduces ENV variable proliferation (43+ vars surveyed) Future work: - Consolidate remaining 39+ debug variables (documented in survey) - Gradual migration over 2-3 releases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 06:57:03 +09:00
fprintf(stderr, "[TRC_GUARD] failfast=%d mode=%s\n",
g_trc_guard, HAKMEM_BUILD_RELEASE ? "release" : "debug");
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
fflush(stderr);
#endif
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
}
return g_trc_guard;
}
static inline int trc_ptr_is_valid(uintptr_t base, uintptr_t limit, size_t blk, const void* node) {
if (!node || limit <= base) return 1;
uintptr_t addr = (uintptr_t)node;
if (addr < base || addr >= limit) return 0;
if (blk == 0) return 1;
return ((addr - base) % blk) == 0;
}
static inline void trc_failfast_abort(const char* stage,
int class_idx,
uintptr_t base,
uintptr_t limit,
const void* node) {
fprintf(stderr,
"[TRC_FAILFAST] stage=%s cls=%d node=%p base=%p limit=%p\n",
stage ? stage : "(null)",
class_idx,
node,
(void*)base,
(void*)limit);
fflush(stderr);
abort();
}
// Pop up to 'want' nodes from freelist into local chain
static inline uint32_t trc_pop_from_freelist(struct TinySlabMeta* meta,
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
int class_idx,
uintptr_t ss_base,
uintptr_t ss_limit,
size_t block_size,
uint32_t want,
TinyRefillChain* out) {
if (!out || want == 0) return 0;
trc_init(out);
uint32_t taken = 0;
Phase 1: Atomic Freelist Implementation - MT Safety Foundation PROBLEM: - Larson crashes with 3+ threads (SEGV in freelist operations) - Root cause: Non-atomic TinySlabMeta.freelist access under contention - Race condition: Multiple threads pop/push freelist concurrently SOLUTION: - Made TinySlabMeta.freelist and .used _Atomic for MT safety - Created lock-free accessor API (slab_freelist_atomic.h) - Converted 5 critical hot path sites to use atomic operations IMPLEMENTATION: 1. superslab_types.h:12-13 - Made freelist and used _Atomic 2. slab_freelist_atomic.h (NEW) - Lock-free CAS operations - slab_freelist_pop_lockfree() - Atomic pop with CAS loop - slab_freelist_push_lockfree() - Atomic push (template) - Relaxed load/store for non-critical paths 3. ss_slab_meta_box.h - Box API now uses atomic accessor 4. hakmem_tiny_superslab.c - Atomic init (store_relaxed) 5. tiny_refill_opt.h - trc_pop_from_freelist() uses lock-free CAS 6. hakmem_tiny_refill_p0.inc.h - Atomic used increment + prefetch PERFORMANCE: Single-Threaded (Random Mixed 256B): Before: 25.1M ops/s (Phase 3d-C baseline) After: 16.7M ops/s (-34%, atomic overhead expected) Multi-Threaded (Larson): 1T: 47.9M ops/s ✅ 2T: 48.1M ops/s ✅ 3T: 46.5M ops/s ✅ (was SEGV before) 4T: 48.1M ops/s ✅ 8T: 48.8M ops/s ✅ (stable, no crashes) MT STABILITY: Before: SEGV at 3+ threads (100% crash rate) After: Zero crashes (100% stable at 8 threads) DESIGN: - Lock-free CAS: 6-10 cycles overhead (vs 20-30 for mutex) - Relaxed ordering: 0 cycles overhead (same as non-atomic) - Memory ordering: acquire/release for CAS, relaxed for checks - Expected regression: <3% single-threaded, +MT stability NEXT STEPS: - Phase 2: Convert 40 important sites (TLS-related freelist ops) - Phase 3: Convert 25 cleanup sites (remaining + documentation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 02:46:57 +09:00
// Phase 1: Use lock-free atomic POP (MT-safe)
while (taken < want) {
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) break; // Freelist empty or CAS race lost
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
if (__builtin_expect(trc_refill_guard_enabled() &&
!trc_ptr_is_valid(ss_base, ss_limit, block_size, p),
0)) {
fprintf(stderr, "[FREELIST_CORRUPT] Reading freelist head: p=%p (ss_base=%p ss_limit=%p blk=%zu)\n",
p, (void*)ss_base, (void*)ss_limit, block_size);
fprintf(stderr, "[FREELIST_CORRUPT] Head pointer is corrupted (invalid range/alignment)\n");
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
trc_failfast_abort("freelist_head", class_idx, ss_base, ss_limit, p);
}
Phase 1: Atomic Freelist Implementation - MT Safety Foundation PROBLEM: - Larson crashes with 3+ threads (SEGV in freelist operations) - Root cause: Non-atomic TinySlabMeta.freelist access under contention - Race condition: Multiple threads pop/push freelist concurrently SOLUTION: - Made TinySlabMeta.freelist and .used _Atomic for MT safety - Created lock-free accessor API (slab_freelist_atomic.h) - Converted 5 critical hot path sites to use atomic operations IMPLEMENTATION: 1. superslab_types.h:12-13 - Made freelist and used _Atomic 2. slab_freelist_atomic.h (NEW) - Lock-free CAS operations - slab_freelist_pop_lockfree() - Atomic pop with CAS loop - slab_freelist_push_lockfree() - Atomic push (template) - Relaxed load/store for non-critical paths 3. ss_slab_meta_box.h - Box API now uses atomic accessor 4. hakmem_tiny_superslab.c - Atomic init (store_relaxed) 5. tiny_refill_opt.h - trc_pop_from_freelist() uses lock-free CAS 6. hakmem_tiny_refill_p0.inc.h - Atomic used increment + prefetch PERFORMANCE: Single-Threaded (Random Mixed 256B): Before: 25.1M ops/s (Phase 3d-C baseline) After: 16.7M ops/s (-34%, atomic overhead expected) Multi-Threaded (Larson): 1T: 47.9M ops/s ✅ 2T: 48.1M ops/s ✅ 3T: 46.5M ops/s ✅ (was SEGV before) 4T: 48.1M ops/s ✅ 8T: 48.8M ops/s ✅ (stable, no crashes) MT STABILITY: Before: SEGV at 3+ threads (100% crash rate) After: Zero crashes (100% stable at 8 threads) DESIGN: - Lock-free CAS: 6-10 cycles overhead (vs 20-30 for mutex) - Relaxed ordering: 0 cycles overhead (same as non-atomic) - Memory ordering: acquire/release for CAS, relaxed for checks - Expected regression: <3% single-threaded, +MT stability NEXT STEPS: - Phase 2: Convert 40 important sites (TLS-related freelist ops) - Phase 3: Convert 25 cleanup sites (remaining + documentation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 02:46:57 +09:00
// Phase 1: slab_freelist_pop_lockfree() already unlinked the node internally
// No need to manually update meta->freelist (already done atomically)
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// Phase E1-CORRECT: Restore header BEFORE trc_push_front
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
// ROOT CAUSE: Freelist stores next at base (offset 0), overwriting header.
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// trc_push_front() uses offset=1 for ALL classes, expecting header at base.
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
// Without restoration, offset=1 contains garbage → chain corruption → SEGV!
//
// SOLUTION: Restore header AFTER reading freelist next, BEFORE chain push.
// Cost: 1 byte write per freelist block (~1-2 cycles, negligible).
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// ALL classes (C0-C7) need header restoration!
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
#if HAKMEM_TINY_HEADER_CLASSIDX
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// DEBUG: Log header restoration for class 2
uint8_t before = *(uint8_t*)p;
PTR_TRACK_FREELIST_POP(p, class_idx);
*(uint8_t*)p = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK);
PTR_TRACK_HEADER_WRITE(p, HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
static _Atomic uint64_t g_freelist_count_c2 = 0;
if (class_idx == 2) {
uint64_t fl_num = atomic_fetch_add(&g_freelist_count_c2, 1);
if (fl_num < 100) { // Log first 100 freelist pops
extern _Atomic uint64_t malloc_count;
uint64_t call_num = atomic_load(&malloc_count);
fprintf(stderr, "[FREELIST_HEADER_RESTORE] fl#%lu call=%lu cls=%d ptr=%p before=0x%02x after=0x%02x\n",
fl_num, call_num, class_idx, p, before, HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
fflush(stderr);
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
}
}
#endif
trc_push_front(out, p, class_idx);
taken++;
}
// DEBUG REMOVED: refill_opt_dbg causes -26% regression (atomic CAS overhead)
return taken;
}
// Carve a contiguous batch of size 'batch' from linear area, return as chain
// Phase 7 header-aware carve: link chain using header-safe next location
// class_idx is required to decide headerless (C7) vs headered (C0-C6)
static inline uint32_t trc_linear_carve(uint8_t* base, size_t bs,
struct TinySlabMeta* meta,
uint32_t batch,
int class_idx,
TinyRefillChain* out) {
if (!out || batch == 0) return 0;
trc_init(out);
// FIX: Use carved (monotonic) instead of used (decrements on free)
// CORRUPTION DEBUG: Validate capacity before carving
if (__builtin_expect(trc_refill_guard_enabled(), 0)) {
if (meta->carved + batch > meta->capacity) {
fprintf(stderr, "[LINEAR_CARVE_CORRUPT] Carving beyond capacity!\n");
fprintf(stderr, "[LINEAR_CARVE_CORRUPT] carved=%u batch=%u capacity=%u (would be %u)\n",
meta->carved, batch, meta->capacity, meta->carved + batch);
fprintf(stderr, "[LINEAR_CARVE_CORRUPT] base=%p bs=%zu\n", (void*)base, bs);
abort();
}
}
// FIX: Use carved counter (monotonic) instead of used (which decrements on free)
// Caller passes bs as the effective stride already (includes header when enabled)
size_t stride = bs;
uint8_t* cursor = base + ((size_t)meta->carved * stride);
void* head = (void*)cursor;
// CORRUPTION DEBUG: Log carve operation
if (__builtin_expect(trc_refill_guard_enabled(), 0)) {
fprintf(stderr, "[LINEAR_CARVE] base=%p carved=%u batch=%u cursor=%p\n",
(void*)base, meta->carved, batch, (void*)cursor);
}
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// Phase E1-CORRECT: Write headers to carved blocks BEFORE linking
// ALL classes (C0-C7) have 1-byte headers now
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
// ROOT CAUSE: tls_sll_splice() checks byte 0 for header magic to determine
// next_offset. Without headers, it finds 0x00 and uses next_offset=0 (WRONG!),
// reading garbage pointers from wrong offset, causing SEGV.
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// SOLUTION: Write headers to ALL carved blocks (including C7) so splice detection works correctly.
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
#if HAKMEM_TINY_HEADER_CLASSIDX
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
// Write headers to all batch blocks (ALL classes C0-C7)
static _Atomic uint64_t g_carve_count = 0;
for (uint32_t i = 0; i < batch; i++) {
uint8_t* block = cursor + (i * stride);
PTR_TRACK_CARVE((void*)block, class_idx);
*block = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK);
PTR_TRACK_HEADER_WRITE((void*)block, HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
2025-11-17 05:29:08 +09:00
#if !HAKMEM_BUILD_RELEASE
// ✅ Option C: Class 2 inline logs - CARVE operation (DEBUG ONLY)
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
if (class_idx == 2) {
uint64_t carve_id = atomic_fetch_add(&g_carve_count, 1);
extern _Atomic uint64_t malloc_count;
uint64_t call = atomic_load(&malloc_count);
fprintf(stderr, "[C2_CARVE] ptr=%p header=0xa2 batch_idx=%u/%u carve_id=%lu call=%lu\n",
(void*)block, i+1, batch, carve_id, call);
fflush(stderr);
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
}
2025-11-17 05:29:08 +09:00
#endif
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
}
#endif
// CRITICAL FIX (Phase 7): header-aware next pointer placement
// For header classes (C0-C6), the first byte at base is the 1-byte header.
// Store the SLL next pointer at base+1 to avoid clobbering the header.
// For C7 (headerless), store at base.
for (uint32_t i = 1; i < batch; i++) {
uint8_t* next = cursor + stride;
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
tiny_next_write(class_idx, (void*)cursor, (void*)next);
cursor = next;
}
void* tail = (void*)cursor;
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
// ✅ FIX #2: NULL-terminate the tail to prevent garbage pointer traversal
// ROOT CAUSE: Without this, tail's next pointer contains GARBAGE from previous
// allocation, causing SEGV when TLS SLL is traversed (crash at iteration 38,985).
// The loop above only links blocks 0→1, 1→2, ..., (batch-2)→(batch-1).
// It does NOT write to tail's next pointer, leaving stale data!
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
tiny_next_write(class_idx, tail, NULL);
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
// Debug: validate first link
#if !HAKMEM_BUILD_RELEASE
if (batch >= 2) {
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
void* first_next = tiny_next_read(class_idx, head);
fprintf(stderr, "[LINEAR_LINK] cls=%d head=%p next=%p tail=%p\n",
class_idx, head, first_next, tail);
} else {
Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00
fprintf(stderr, "[LINEAR_LINK] cls=%d head=%p next=%p tail=%p\n",
class_idx, head, (void*)0, tail);
}
#endif
// FIX: Update both carved (monotonic) and used (active count)
meta->carved += batch;
Phase 1: Atomic Freelist Implementation - MT Safety Foundation PROBLEM: - Larson crashes with 3+ threads (SEGV in freelist operations) - Root cause: Non-atomic TinySlabMeta.freelist access under contention - Race condition: Multiple threads pop/push freelist concurrently SOLUTION: - Made TinySlabMeta.freelist and .used _Atomic for MT safety - Created lock-free accessor API (slab_freelist_atomic.h) - Converted 5 critical hot path sites to use atomic operations IMPLEMENTATION: 1. superslab_types.h:12-13 - Made freelist and used _Atomic 2. slab_freelist_atomic.h (NEW) - Lock-free CAS operations - slab_freelist_pop_lockfree() - Atomic pop with CAS loop - slab_freelist_push_lockfree() - Atomic push (template) - Relaxed load/store for non-critical paths 3. ss_slab_meta_box.h - Box API now uses atomic accessor 4. hakmem_tiny_superslab.c - Atomic init (store_relaxed) 5. tiny_refill_opt.h - trc_pop_from_freelist() uses lock-free CAS 6. hakmem_tiny_refill_p0.inc.h - Atomic used increment + prefetch PERFORMANCE: Single-Threaded (Random Mixed 256B): Before: 25.1M ops/s (Phase 3d-C baseline) After: 16.7M ops/s (-34%, atomic overhead expected) Multi-Threaded (Larson): 1T: 47.9M ops/s ✅ 2T: 48.1M ops/s ✅ 3T: 46.5M ops/s ✅ (was SEGV before) 4T: 48.1M ops/s ✅ 8T: 48.8M ops/s ✅ (stable, no crashes) MT STABILITY: Before: SEGV at 3+ threads (100% crash rate) After: Zero crashes (100% stable at 8 threads) DESIGN: - Lock-free CAS: 6-10 cycles overhead (vs 20-30 for mutex) - Relaxed ordering: 0 cycles overhead (same as non-atomic) - Memory ordering: acquire/release for CAS, relaxed for checks - Expected regression: <3% single-threaded, +MT stability NEXT STEPS: - Phase 2: Convert 40 important sites (TLS-related freelist ops) - Phase 3: Convert 25 cleanup sites (remaining + documentation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 02:46:57 +09:00
// Phase 1: Atomic increment for MT safety
atomic_fetch_add_explicit(&meta->used, batch, memory_order_relaxed);
out->head = head;
out->tail = tail;
out->count = batch;
// DEBUG REMOVED: refill_opt_dbg causes -26% regression (atomic CAS overhead)
return batch;
}