6b86c60a20
P1.3: Add meta->active for TLS SLL tracking
...
Add active field to TinySlabMeta to track blocks currently held by
users (not in TLS SLL or freelist caches). This enables accurate
empty slab detection that accounts for TLS SLL cached blocks.
Changes:
- superslab_types.h: Add _Atomic uint16_t active field
- ss_allocation_box.c, hakmem_tiny_superslab.c: Initialize active=0
- tiny_free_fast_v2.inc.h: Decrement active on TLS SLL push
- tiny_alloc_fast.inc.h: Add tiny_active_track_alloc() helper,
increment active on TLS SLL pop (all code paths)
- ss_hot_cold_box.h: ss_is_slab_empty() uses active when enabled
All tracking is ENV-gated: HAKMEM_TINY_ACTIVE_TRACK=1 to enable.
Default is off for zero performance impact.
Invariant: active = used - tls_cached (active <= used)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 13:53:45 +09:00
dc9e650db3
Tiny Pool redesign: P0.1, P0.3, P1.1, P1.2 - Out-of-band class_idx lookup
...
This commit implements the first phase of Tiny Pool redesign based on
ChatGPT architecture review. The goal is to eliminate Header/Next pointer
conflicts by moving class_idx lookup out-of-band (to SuperSlab metadata).
## P0.1: C0(8B) class upgraded to 16B
- Size table changed: {16,32,64,128,256,512,1024,2048} (8 classes)
- LUT updated: 1..16 → class 0, 17..32 → class 1, etc.
- tiny_next_off: C0 now uses offset 1 (header preserved)
- Eliminates edge cases for 8B allocations
## P0.3: Slab reuse guard Box (tls_slab_reuse_guard_box.h)
- New Box for draining TLS SLL before slab reuse
- ENV gate: HAKMEM_TINY_SLAB_REUSE_GUARD=1
- Prevents stale pointers when slabs are recycled
- Follows Box theory: single responsibility, minimal API
## P1.1: SuperSlab class_map addition
- Added uint8_t class_map[SLABS_PER_SUPERSLAB_MAX] to SuperSlab
- Maps slab_idx → class_idx for out-of-band lookup
- Initialized to 255 (UNASSIGNED) on SuperSlab creation
- Set correctly on slab initialization in all backends
## P1.2: Free fast path uses class_map
- ENV gate: HAKMEM_TINY_USE_CLASS_MAP=1
- Free path can now get class_idx from class_map instead of Header
- Falls back to Header read if class_map returns invalid value
- Fixed Legacy Backend dynamic slab initialization bug
## Documentation added
- HAKMEM_ARCHITECTURE_OVERVIEW.md: 4-layer architecture analysis
- TLS_SLL_ARCHITECTURE_INVESTIGATION.md: Root cause analysis
- PTR_LIFECYCLE_TRACE_AND_ROOT_CAUSE_ANALYSIS.md: Pointer tracking
- TINY_REDESIGN_CHECKLIST.md: Implementation roadmap (P0-P3)
## Test results
- Baseline: 70% success rate (30% crash - pre-existing issue)
- class_map enabled: 70% success rate (same as baseline)
- Performance: ~30.5M ops/s (unchanged)
## Next steps (P1.3, P2, P3)
- P1.3: Add meta->active for accurate TLS/freelist sync
- P2: TLS SLL redesign with Box-based counting
- P3: Complete Header out-of-band migration
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 13:42:39 +09:00
d8e3971dc2
Fix cross-thread ownership check: Use bits 8-15 for owner_tid_low
...
Problem:
- TLS_SLL_PUSH_DUP crash in Larson multi-threaded benchmark
- Cross-thread frees incorrectly routed to same-thread TLS path
- Root cause: pthread_t on glibc is 256-byte aligned (TCB base)
so lower 8 bits are ALWAYS 0x00 for ALL threads
Fix:
- Change owner_tid_low from (tid & 0xFF) to ((tid >> 8) & 0xFF)
- Bits 8-15 actually vary between threads, enabling correct detection
- Applied consistently across all ownership check locations:
- superslab_inline.h: ss_owner_try_acquire/release/is_mine
- slab_handle.h: slab_try_acquire
- tiny_free_fast.inc.h: tiny_free_is_same_thread_ss
- tiny_free_fast_v2.inc.h: cross-thread detection
- tiny_superslab_free.inc.h: same-thread check
- ss_allocation_box.c: slab initialization
- hakmem_tiny_superslab.c: ownership handling
Also added:
- Address watcher debug infrastructure (tiny_region_id.h)
- Cross-thread detection in malloc_tiny_fast.h Front Gate
Test results:
- Larson 1T/2T/4T: PASS (no TLS_SLL_PUSH_DUP crash)
- random_mixed: PASS
- Performance: ~20M ops/s (regression from 48M, needs optimization)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 11:52:11 +09:00
43015725af
ENV cleanup: Add RELEASE guards to DEBUG ENV variables (14 vars)
...
Added compile-time guards (#if HAKMEM_BUILD_RELEASE) to eliminate
DEBUG ENV variable overhead in RELEASE builds.
Variables guarded (14 total):
- HAKMEM_TINY_TRACE_RING, HAKMEM_TINY_DUMP_RING_ATEXIT
- HAKMEM_TINY_RF_TRACE, HAKMEM_TINY_MAILBOX_TRACE
- HAKMEM_TINY_MAILBOX_TRACE_LIMIT, HAKMEM_TINY_MAILBOX_SLOWDISC
- HAKMEM_TINY_MAILBOX_SLOWDISC_PERIOD
- HAKMEM_SS_PREWARM_DEBUG, HAKMEM_SS_FREE_DEBUG
- HAKMEM_TINY_FRONT_METRICS, HAKMEM_TINY_FRONT_DUMP
- HAKMEM_TINY_COUNTERS_DUMP, HAKMEM_TINY_REFILL_DUMP
- HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE
Files modified (9 core files):
- core/tiny_debug_ring.c (ring trace/dump)
- core/box/mailbox_box.c (mailbox trace + slowdisc)
- core/tiny_refill.h (refill trace)
- core/hakmem_tiny_superslab.c (superslab debug)
- core/box/ss_allocation_box.c (allocation debug)
- core/tiny_superslab_free.inc.h (free debug)
- core/box/front_metrics_box.c (frontend metrics)
- core/hakmem_tiny_stats.c (stats dump)
- core/ptr_trace.h (pointer trace)
Bug fixes during implementation:
1. mailbox_box.c - Fixed variable scope (moved 'used' outside guard)
2. hakmem_tiny_stats.c - Fixed incomplete declarations (on1, on2)
Impact:
- Binary size: -85KB total
- bench_random_mixed_hakmem: 319K → 305K (-14K, -4.4%)
- larson_hakmem: 380K → 309K (-71K, -18.7%)
- Performance: No regression (16.9-17.9M ops/s maintained)
- Functional: All tests pass (Random Mixed + Larson)
- Behavior: DEBUG ENV vars correctly ignored in RELEASE builds
Testing:
- Build: Clean compilation (warnings only, pre-existing)
- 100K Random Mixed: 16.9-17.9M ops/s (PASS)
- 10K Larson: 25.9M ops/s (PASS)
- DEBUG ENV verification: Correctly ignored (PASS)
Result: 14 DEBUG ENV variables now have zero overhead in RELEASE builds.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 03:41:07 +09:00
a78224123e
Fix C0/C7 class confusion: Upgrade C7 stride to 2048B and fix meta->class_idx initialization
...
Root Cause:
1. C7 stride was 1024B, unable to serve 1024B user requests (need 1025B with header)
2. New SuperSlabs start with meta->class_idx=0 (mmap zero-init)
3. superslab_init_slab() only sets class_idx if meta->class_idx==255
4. Multiple code paths used conditional assignment (if class_idx==255), leaving C7 slabs with class_idx=0
5. This caused C7 blocks to be misidentified as C0, leading to HDR_META_MISMATCH errors
Changes:
1. Upgrade C7 stride: 1024B → 2048B (can now serve 1024B requests)
2. Update blocks_per_slab[7]: 64 → 32 (2048B stride / 64KB slab)
3. Update size-to-class LUT: entries 513-2048 now map to C7
4. Fix superslab_init_slab() fail-safe: only reinitialize if class_idx==255 (not 0)
5. Add explicit class_idx assignment in 6 initialization paths:
- tiny_superslab_alloc.inc.h: superslab_refill() after init
- hakmem_tiny_superslab.c: backend_shared after init (main path)
- ss_unified_backend_box.c: unconditional assignment
- ss_legacy_backend_box.c: explicit assignment
- superslab_expansion_box.c: explicit assignment
- ss_allocation_box.c: fail-safe condition fix
Fix P0 refill bug:
- Update obsolete array access after Phase 3d-B TLS SLL unification
- g_tls_sll_head[cls] → g_tls_sll[cls].head
- g_tls_sll_count[cls] → g_tls_sll[cls].count
Results:
- HDR_META_MISMATCH: eliminated (0 errors in 100K iterations)
- 1024B allocations now routed to C7 (Tiny fast path)
- NXT_MISALIGN warnings remain (legacy 1024B SuperSlabs, separate issue)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-21 13:44:05 +09:00
38552c3f39
Phase 3d-A: SlabMeta Box boundary - Encapsulate SuperSlab metadata access
...
ChatGPT-guided Box theory refactoring (Phase A: Boundary only).
Changes:
- Created ss_slab_meta_box.h with 15 inline accessor functions
- HOT fields (8): freelist, used, capacity (fast path)
- COLD fields (6): class_idx, carved, owner_tid_low (init/debug)
- Legacy (1): ss_slab_meta_ptr() for atomic ops
- Migrated 14 direct slabs[] access sites across 6 files
- hakmem_shared_pool.c (4 sites)
- tiny_free_fast_v2.inc.h (1 site)
- hakmem_tiny.c (3 sites)
- external_guard_box.h (1 site)
- hakmem_tiny_lifecycle.inc (1 site)
- ss_allocation_box.c (4 sites)
Architecture:
- Zero overhead (static inline wrappers)
- Single point of change for future layout optimizations
- Enables Hot/Cold split (Phase C) without touching call sites
- A/B testing support via compile-time flags
Verification:
- Build: ✅ Success (no errors)
- Stability: ✅ All sizes pass (128B-1KB, 22-24M ops/s)
- Behavior: Unchanged (thin wrapper, no logic changes)
Next: Phase B (TLS Cache Merge, +12-18% expected)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-20 02:01:52 +09:00