c2716f5c01
Implement Phase 2: Headerless Allocator Support (Partial)
...
- Feature: Added HAKMEM_TINY_HEADERLESS toggle (A/B testing)
- Feature: Implemented Headerless layout logic (Offset=0)
- Refactor: Centralized layout definitions in tiny_layout_box.h
- Refactor: Abstracted pointer arithmetic in free path via ptr_conversion_box.h
- Verification: sh8bench passes in Headerless mode (No TLS_SLL_HDR_RESET)
- Known Issue: Regression in Phase 1 mode due to blind pointer conversion logic
2025-12-03 12:11:27 +09:00
a6aeeb7a4e
Phase 1 Refactoring Complete: Box-based Logic Consolidation ✅
...
Summary:
- Task 1.1 ✅ : Created tiny_layout_box.h for centralized class/header definitions
- Task 1.2 ✅ : Updated tiny_nextptr.h to use layout Box (bitmasking optimization)
- Task 1.3 ✅ : Enhanced ptr_conversion_box.h with Phantom Types support
- Task 1.4 ✅ : Implemented test_phantom.c for Debug-mode type checking
Verification Results (by Task Agent):
- Box Pattern Compliance: ⭐ ⭐ ⭐ ⭐ ⭐ (5/5) - MISSION/DESIGN documented
- Type Safety: ⭐ ⭐ ⭐ ⭐ ⭐ (5/5) - Phantom Types working as designed
- Test Coverage: ⭐ ⭐ ⭐ ☆☆ (3/5) - Compile-time tests OK, runtime tests planned
- Performance: 0 bytes, 0 cycles overhead in Release build
- Build Status: ✅ Success (526KB libhakmem.so, zero warnings)
Key Achievements:
1. Single Source of Truth principle fully implemented
2. Circular dependency eliminated (layout→header→nextptr→conversion)
3. Release build: 100% inlining, zero overhead
4. Debug build: Full type checking with Phantom Types
5. HAK_RET_ALLOC macro migrated to Box API
Known Issues (unrelated to Phase 1):
- TLS_SLL_HDR_RESET from sh8bench (existing, will be resolved in Phase 2)
Next Steps:
- Phase 2 readiness: ✅ READY
- Recommended: Create migration guide + runtime test suite
- Alignment guarantee will be addressed in Phase 2 (Headerless layout)
🤖 Generated with Claude Code + Gemini (implementation) + Task Agent (verification)
Co-Authored-By: Gemini <gemini@example.com >
Co-Authored-By: Claude <noreply@anthropic.com >
2025-12-03 11:38:11 +09:00
8af9123bcc
Larson double-free investigation: Add full operation lifecycle logging
...
**Diagnostic Enhancement**: Complete malloc/free/pop operation tracing for debug
**Problem**: Larson crashes with TLS_SLL_DUP at count=18, need to trace exact
pointer lifecycle to identify if allocator returns duplicate addresses or if
benchmark has double-free bug.
**Implementation** (ChatGPT + Claude + Task collaboration):
1. **Global Operation Counter** (core/hakmem_tiny_config_box.inc:9):
- Single atomic counter for all operations (malloc/free/pop)
- Chronological ordering across all paths
2. **Allocation Logging** (core/hakmem_tiny_config_box.inc:148-161):
- HAK_RET_ALLOC macro enhanced with operation logging
- Logs first 50 class=1 allocations with ptr/base/tls_count
3. **Free Logging** (core/tiny_free_fast_v2.inc.h:222-235):
- Added before tls_sll_push() call (line 221)
- Logs first 50 class=1 frees with ptr/base/tls_count_before
4. **Pop Logging** (core/box/tls_sll_box.h:587-597):
- Added in tls_sll_pop_impl() after successful pop
- Logs first 50 class=1 pops with base/tls_count_after
5. **Drain Debug Logging** (core/box/tls_sll_drain_box.h:143-151):
- Enhanced drain loop with detailed logging
- Tracks pop failures and drained block counts
**Initial Findings**:
- First 19 operations: ALL frees, ZERO allocations, ZERO pops
- OP#0006: First free of 0x...430
- OP#0018: Duplicate free of 0x...430 → TLS_SLL_DUP detected
- Suggests either: (a) allocations before logging starts, or (b) Larson bug
**Debug-only**: All logging gated by !HAKMEM_BUILD_RELEASE (zero cost in release)
**Next Steps**:
- Expand logging window to 200 operations
- Log initialization phase allocations
- Cross-check with Larson benchmark source
**Status**: Ready for extended testing
2025-11-27 08:18:01 +09:00
a2e65716b3
Port: Optimize tiny_get_max_size inline (e81fe783d)
...
- Move tiny_get_max_size to header for inlining
- Use cached static variable to avoid repeated env lookup
- Larson: 51.99M ops/s (stable)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 15:05:03 +09:00
a78224123e
Fix C0/C7 class confusion: Upgrade C7 stride to 2048B and fix meta->class_idx initialization
...
Root Cause:
1. C7 stride was 1024B, unable to serve 1024B user requests (need 1025B with header)
2. New SuperSlabs start with meta->class_idx=0 (mmap zero-init)
3. superslab_init_slab() only sets class_idx if meta->class_idx==255
4. Multiple code paths used conditional assignment (if class_idx==255), leaving C7 slabs with class_idx=0
5. This caused C7 blocks to be misidentified as C0, leading to HDR_META_MISMATCH errors
Changes:
1. Upgrade C7 stride: 1024B → 2048B (can now serve 1024B requests)
2. Update blocks_per_slab[7]: 64 → 32 (2048B stride / 64KB slab)
3. Update size-to-class LUT: entries 513-2048 now map to C7
4. Fix superslab_init_slab() fail-safe: only reinitialize if class_idx==255 (not 0)
5. Add explicit class_idx assignment in 6 initialization paths:
- tiny_superslab_alloc.inc.h: superslab_refill() after init
- hakmem_tiny_superslab.c: backend_shared after init (main path)
- ss_unified_backend_box.c: unconditional assignment
- ss_legacy_backend_box.c: explicit assignment
- superslab_expansion_box.c: explicit assignment
- ss_allocation_box.c: fail-safe condition fix
Fix P0 refill bug:
- Update obsolete array access after Phase 3d-B TLS SLL unification
- g_tls_sll_head[cls] → g_tls_sll[cls].head
- g_tls_sll_count[cls] → g_tls_sll[cls].count
Results:
- HDR_META_MISMATCH: eliminated (0 errors in 100K iterations)
- 1024B allocations now routed to C7 (Tiny fast path)
- NXT_MISALIGN warnings remain (legacy 1024B SuperSlabs, separate issue)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-21 13:44:05 +09:00
6b6ad69aca
Refactor: Extract 5 Box modules from hakmem_tiny.c (-52% size reduction)
...
Split hakmem_tiny.c (2081 lines) into focused modules for better maintainability.
## Changes
**hakmem_tiny.c**: 2081 → 995 lines (-1086 lines, -52% reduction)
## Extracted Modules (5 boxes)
1. **config_box** (211 lines)
- Size class tables, integrity counters
- Debug flags, benchmark macros
- HAK_RET_ALLOC/HAK_STAT_FREE instrumentation
2. **publish_box** (419 lines)
- Publish/Adopt counters and statistics
- Bench mailbox, partial ring
- Live cap/Hot slot management
- TLS helper functions (tiny_tls_default_*)
3. **globals_box** (256 lines)
- Global variable declarations (~70 variables)
- TinyPool instance and initialization flag
- TLS variables (g_tls_lists, g_fast_head, g_fast_count)
- SuperSlab configuration (partial ring, empty reserves)
- Adopt gate functions
4. **phase6_wrappers_box** (122 lines)
- Phase 6 Box Theory wrapper layer
- hak_tiny_alloc_fast_wrapper()
- hak_tiny_free_fast_wrapper()
- Diagnostic instrumentation
5. **ace_guard_box** (100 lines)
- ACE Learning Layer (hkm_ace_set_drain_threshold)
- FastCache API (tiny_fc_room, tiny_fc_push_bulk)
- Tiny Guard debugging system (5 functions)
## Benefits
- **Readability**: Giant 2k file → focused 1k core + 5 coherent modules
- **Maintainability**: Each box has clear responsibility and boundaries
- **Build**: All modules compile successfully ✅
## Technical Details
- Phase 1: ChatGPT extracted config_box + publish_box (-625 lines)
- Phase 2-4: Claude extracted globals_box + phase6_wrappers_box + ace_guard_box (-461 lines)
- All extractions use .inc files (same translation unit, preserves static/TLS linkage)
- Fixed Makefile: Added tiny_sizeclass_hist_box.o to OBJS_BASE and BENCH_HAKMEM_OBJS_BASE
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-21 01:16:45 +09:00