Fix #16: Resolve double BASE→USER conversion causing header corruption

🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting
BASE → USER pointers before returning to caller. The caller then applied
HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER
conversion, resulting in double offset (BASE+2) and header written at
wrong location.

📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at
tiny_region_id_write_header, making it the single source of truth for
BASE → USER conversion.

🔧 CHANGES:
- Fix #16: Remove premature BASE→USER conversions (6 locations)
  * core/tiny_alloc_fast.inc.h (3 fixes)
  * core/hakmem_tiny_refill.inc.h (2 fixes)
  * core/hakmem_tiny_fastcache.inc.h (1 fix)

- Fix #12: Add header validation in tls_sll_pop (detect corruption)
- Fix #14: Defense-in-depth header restoration in tls_sll_splice
- Fix #15: USER pointer detection (for debugging)
- Fix #13: Bump window header restoration
- Fix #2, #6, #7, #8: Various header restoration & NULL termination

🧪 TEST RESULTS: 100% SUCCESS
- 10K-500K iterations: All passed
- 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161)
- Performance: ~630K ops/s average (stable)
- Header corruption: ZERO

📋 FIXES SUMMARY:
Fix #1-8:   Initial header restoration & chain fixes (chatgpt-san)
Fix #9-10:  USER pointer auto-fix (later disabled)
Fix #12:    Validation system (caught corruption at call 14209)
Fix #13:    Bump window header writes
Fix #14:    Splice defense-in-depth
Fix #15:    USER pointer detection (debugging tool)
Fix #16:    Double conversion fix (FINAL SOLUTION) 

🎓 LESSONS LEARNED:
1. Validation catches bugs early (Fix #12 was critical)
2. Class-specific inline logging reveals patterns (Option C)
3. Box Theory provides clean architectural boundaries
4. Multiple investigation approaches (Task/chatgpt-san collaboration)

📄 DOCUMENTATION:
- P0_BUG_STATUS.md: Complete bug tracking timeline
- C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis
- FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Task Agent <task@anthropic.com>
Co-Authored-By: ChatGPT <chatgpt@openai.com>
This commit is contained in:
Moe Charm (CI)
2025-11-12 10:33:57 +09:00
parent af589c7169
commit 84dbd97fe9
13 changed files with 1270 additions and 72 deletions

View File

@ -91,8 +91,9 @@ static inline void* tiny_class5_minirefill_take(void) {
// Fast pop if available
void* base = tls_list_pop_fast(tls5, 5);
if (base) {
// CRITICAL FIX: Convert base -> user pointer for class 5
return (void*)((uint8_t*)base + 1);
// ✅ FIX #16: Return BASE pointer (not USER)
// Caller will apply HAK_RET_ALLOC which does BASE → USER conversion
return base;
}
// Robust refill via generic helperheader対応・境界検証済み
return tiny_fast_refill_and_take(5, tls5);
@ -189,6 +190,15 @@ static inline void* tiny_alloc_fast_pop(int class_idx) {
HAK_CHECK_CLASS_IDX(class_idx, "tiny_alloc_fast_pop");
atomic_fetch_add(&g_integrity_check_class_bounds, 1);
// DEBUG: Log class 2 pops (DISABLED for performance)
static _Atomic uint64_t g_fast_pop_count = 0;
uint64_t pop_call = atomic_fetch_add(&g_fast_pop_count, 1);
if (0 && class_idx == 2 && pop_call > 5840 && pop_call < 5900) {
fprintf(stderr, "[FAST_POP_C2] call=%lu cls=%d head=%p count=%u\n",
pop_call, class_idx, g_tls_sll_head[class_idx], g_tls_sll_count[class_idx]);
fflush(stderr);
}
// CRITICAL: C7 (1KB) is headerless - delegate to slow path completely
// Reason: Fast path uses SLL which stores next pointer in user data area
// C7's headerless design is incompatible with fast path assumptions
@ -246,9 +256,10 @@ static inline void* tiny_alloc_fast_pop(int class_idx) {
g_tiny_alloc_hits++;
}
#endif
// CRITICAL FIX: Convert base -> user pointer for classes 0-6
void* user_ptr = (class_idx == 7) ? base : (void*)((uint8_t*)base + 1);
return user_ptr;
// ✅ FIX #16: Return BASE pointer (not USER)
// Caller (tiny_alloc_fast) will call HAK_RET_ALLOC → tiny_region_id_write_header
// which does the BASE → USER conversion. Double conversion was causing corruption!
return base;
}
// SFC miss → try SLL (Layer 1)
}
@ -277,9 +288,10 @@ static inline void* tiny_alloc_fast_pop(int class_idx) {
g_tiny_alloc_hits++;
}
#endif
// CRITICAL FIX: Convert base -> user pointer for classes 0-6
void* user_ptr = (class_idx == 7) ? base : (void*)((uint8_t*)base + 1);
return user_ptr;
// ✅ FIX #16: Return BASE pointer (not USER)
// Caller (tiny_alloc_fast) will call HAK_RET_ALLOC → tiny_region_id_write_header
// which does the BASE → USER conversion. Double conversion was causing corruption!
return base;
}
}
@ -535,9 +547,11 @@ static inline void* tiny_alloc_fast(size_t size) {
abort();
}
// Debug logging near crash point
if (call_num > 14250 && call_num < 14280) {
fprintf(stderr, "[TINY_ALLOC] call=%lu size=%zu class=%d\n", call_num, size, class_idx);
// Debug logging (DISABLED for performance)
if (0 && call_num > 14250 && call_num < 14280) {
fprintf(stderr, "[TINY_ALLOC] call=%lu size=%zu class=%d sll_head[%d]=%p count=%u\n",
call_num, size, class_idx, class_idx,
g_tls_sll_head[class_idx], g_tls_sll_count[class_idx]);
fflush(stderr);
}
@ -563,12 +577,12 @@ static inline void* tiny_alloc_fast(size_t size) {
}
// Generic front (FastCache/SFC/SLL)
if (call_num > 14250 && call_num < 14280) {
if (0 && call_num > 14250 && call_num < 14280) {
fprintf(stderr, "[TINY_ALLOC] call=%lu before fast_pop\n", call_num);
fflush(stderr);
}
ptr = tiny_alloc_fast_pop(class_idx);
if (call_num > 14250 && call_num < 14280) {
if (0 && call_num > 14250 && call_num < 14280) {
fprintf(stderr, "[TINY_ALLOC] call=%lu after fast_pop ptr=%p\n", call_num, ptr);
fflush(stderr);
}