|
|
1a2c5dca0d
|
TinyHeapV2: Enable by default with optimized settings
Changes:
- Default: TinyHeapV2 ON (was OFF, now enabled without ENV)
- Default CLASS_MASK: 0xE (C1-C3 only, skip C0 8B due to -5% regression)
- Remove debug fprintf overhead in Release builds (HAKMEM_BUILD_RELEASE guard)
Performance (100K iterations, workset=128, default settings):
- 16B: 43.9M → 47.7M ops/s (+8.7%)
- 32B: 41.9M → 44.8M ops/s (+6.9%)
- 64B: 51.2M → 50.9M ops/s (-0.6%, within noise)
Key fix: Debug fprintf in tiny_heap_v2_enabled() caused 20-30% overhead
- Before: 36-42M ops/s (with debug log)
- After: 44-48M ops/s (debug log disabled in Release)
ENV override:
- HAKMEM_TINY_HEAP_V2=0 to disable
- HAKMEM_TINY_HEAP_V2_CLASS_MASK=0xF to enable all C0-C3
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-15 16:33:38 +09:00 |
|
|
|
bb70d422dc
|
Phase 13-B: TinyHeapV2 supply path with dual-mode A/B framework (Stealing vs Leftover)
Summary:
- Implemented free path supply with ENV-gated A/B modes (HAKMEM_TINY_HEAP_V2_LEFTOVER_MODE)
- Mode 0 (Stealing, default): L0 gets freed blocks first → +18% @ 32B
- Mode 1 (Leftover): L1 primary owner, L0 gets leftovers → Box-clean but -5% @ 16B
- Decision: Default to Stealing for performance (ChatGPT analysis: L0 doesn't corrupt learning layer signals)
Performance (100K iterations, workset=128):
- 16B: 43.9M → 45.6M ops/s (+3.9%)
- 32B: 41.9M → 49.6M ops/s (+18.4%) ✅
- 64B: 51.2M → 51.5M ops/s (+0.6%)
- 100% magazine hit rate (supply from free path working correctly)
Implementation:
- tiny_free_fast_v2.inc.h: Dual-mode supply (lines 134-166)
- tiny_heap_v2.h: Add tiny_heap_v2_leftover_mode() flag + rationale doc
- tiny_alloc_fast.inc.h: Alloc hook with tiny_heap_v2_alloc_by_class()
- CURRENT_TASK.md: Updated Phase 13-B status (complete) with A/B results
ENV flags:
- HAKMEM_TINY_HEAP_V2=1 # Enable TinyHeapV2
- HAKMEM_TINY_HEAP_V2_LEFTOVER_MODE=0 # Mode 0 (Stealing, default)
- HAKMEM_TINY_HEAP_V2_CLASS_MASK=0xE # C1-C3 only (skip C0 -5% regression)
- HAKMEM_TINY_HEAP_V2_STATS=1 # Print statistics
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-15 16:28:40 +09:00 |
|
|
|
d72a700948
|
Phase 13-B: TinyHeapV2 free path supply hook (magazine population)
Implement magazine supply from free path to enable TinyHeapV2 L0 cache
Changes:
1. core/tiny_free_fast_v2.inc.h (Line 24, 134-143):
- Include tiny_heap_v2.h for magazine API
- Add supply hook after BASE pointer conversion (Line 134-143)
- Try to push freed block to TinyHeapV2 magazine (C0-C3 only)
- Falls back to TLS SLL if magazine full (existing behavior)
2. core/front/tiny_heap_v2.h (Line 24-46):
- Move TinyHeapV2Mag / TinyHeapV2Stats typedef from hakmem_tiny.c
- Add extern declarations for TLS variables
- Define TINY_HEAP_V2_MAG_CAP (16 slots)
- Enables use from tiny_free_fast_v2.inc.h
3. core/hakmem_tiny.c (Line 1270-1276, 1766-1768):
- Remove duplicate typedef definitions
- Move TLS storage declarations after tiny_heap_v2.h include
- Reason: tiny_heap_v2.h must be included AFTER tiny_alloc_fast.inc.h
- Forward declarations remain for early reference
Supply Hook Flow:
```
hak_free_at(ptr) → hak_tiny_free_fast_v2(ptr)
→ class_idx = read_header(ptr)
→ base = ptr - 1
→ if (class_idx <= 3 && tiny_heap_v2_enabled())
→ tiny_heap_v2_try_push(class_idx, base)
→ success: return (magazine supplied)
→ full: fall through to TLS SLL
→ tls_sll_push(class_idx, base) # existing path
```
Benefits:
- Magazine gets populated from freed blocks (L0 cache warm-up)
- Next allocation hits magazine (fast L0 path, no backend refill)
- Expected: 70-90% hit rate for fixed-size workloads
- Expected: +200-500% performance for C0-C3 classes
Build & Smoke Test:
- ✅ Build successful
- ✅ bench_fixed_size 256B workset=50: 33M ops/s (stable)
- ✅ bench_fixed_size 16B workset=60: 30M ops/s (stable)
- 🔜 A/B test (hit rate measurement) deferred to next commit
Implementation Status:
- ✅ Phase 13-A: Alloc hook + stats (completed, committed)
- ✅ Phase 13-B: Free path supply (THIS COMMIT)
- 🔜 Phase 13-C: Evaluation & tuning
Notes:
- Supply hook is C0-C3 only (TinyHeapV2 target range)
- Magazine capacity=16 (same as Phase 13-A)
- No performance regression (hook is ENV-gated: HAKMEM_TINY_HEAP_V2=1)
🤝 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-15 13:39:37 +09:00 |
|
|
|
0836d62ff4
|
Phase 13-A Step 1 COMPLETE: TinyHeapV2 alloc hook + stats + supply infrastructure
Phase 13-A Status: ✅ COMPLETE
- Alloc hook working (hak_tiny_alloc via hakmem_tiny_alloc_new.inc)
- Statistics accurate (alloc_calls, mag_hits tracked correctly)
- NO-REFILL L0 cache stable (zero performance overhead)
- A/B tests: C1 +0.76%, C2 +0.42%, C3 -0.26% (all within noise)
Changes:
- Added tiny_heap_v2_try_push() infrastructure for Phase 13-B (free path supply)
- Currently unused but provides clean API for magazine supply from free path
Verification:
- Modified bench_fixed_size.c to use hak_alloc_at/hak_free_at (HAKMEM routing)
- Verified HAKMEM routing works: workset=10-127 ✅
- Found separate bug: workset=128 hangs (power-of-2 edge case, not HeapV2 related)
Phase 13-B: Free path supply deferred
- Actual free path: hak_free_at → hak_tiny_free_fast_v2
- Not tiny_free_fast (wrapper-only path)
- Requires hak_tiny_free_fast_v2 integration work
Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-15 02:28:26 +09:00 |
|
|
|
5cc1f93622
|
Phase 13-A Step 1: TinyHeapV2 NO-REFILL L0 cache implementation
Implement TinyHeapV2 as a minimal "lucky hit" L0 cache that avoids
circular dependency with FastCache by eliminating self-refill.
Key Changes:
- New: core/front/tiny_heap_v2.h - NO-REFILL L0 cache implementation
- tiny_heap_v2_alloc(): Pop from magazine if available, else return NULL
- tiny_heap_v2_refill_mag(): No-op stub (no backend refill)
- ENV: HAKMEM_TINY_HEAP_V2=1 to enable
- ENV: HAKMEM_TINY_HEAP_V2_CLASS_MASK=bitmask (C0-C3 control)
- ENV: HAKMEM_TINY_HEAP_V2_STATS=1 to print statistics
- Modified: core/hakmem_tiny_alloc_new.inc - Add TinyHeapV2 hook
- Hook at entry point (after class_idx calculation)
- Fallback to existing front if TinyHeapV2 returns NULL
- Modified: core/hakmem_tiny_alloc.inc - Add hook for legacy path
- Modified: core/hakmem_tiny.c - Add TLS variables and stats wrapper
- TinyHeapV2Mag: Per-class magazine (capacity=16)
- TinyHeapV2Stats: Per-class counters (alloc_calls, mag_hits, etc.)
- tiny_heap_v2_print_stats(): Statistics output at exit
- New: TINY_HEAP_V2_TASK_SPEC.md - Phase 13 specification
Root Cause Fixed:
- BEFORE: TinyHeapV2 refilled from FastCache → circular dependency
- TinyHeapV2 intercepted all allocs → FastCache never populated
- Result: 100% backend OOM, 0% hit rate, 99% slowdown
- AFTER: TinyHeapV2 is passive L0 cache (no refill)
- Magazine empty → return NULL → existing front handles it
- Result: 0% overhead, stable baseline performance
A/B Test Results (100K iterations, fixed-size bench):
- C1 (8B): Baseline 9,688 ops/s → HeapV2 ON 9,762 ops/s (+0.76%)
- C2 (16B): Baseline 9,804 ops/s → HeapV2 ON 9,845 ops/s (+0.42%)
- C3 (32B): Baseline 9,840 ops/s → HeapV2 ON 9,814 ops/s (-0.26%)
- All within noise range: NO PERFORMANCE REGRESSION ✅
Statistics (HeapV2 ON, C1-C3):
- alloc_calls: 200K (hook works correctly)
- mag_hits: 0 (0%) - Magazine empty as expected
- refill_calls: 0 - No refill executed (circular dependency avoided)
- backend_oom: 0 - No backend access
Next Steps (Phase 13-A Step 2):
- Implement magazine supply strategy (from existing front or free path)
- Goal: Populate magazine with "leftover" blocks from existing pipeline
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-15 01:42:57 +09:00 |
|