acc64f2438
Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement)
...
## Summary
- ChatGPT により bench_profile.h の setenv segfault を修正(RTLD_NEXT 経由に切り替え)
- core/box/pool_zero_mode_box.h 新設:ENV キャッシュ経由で ZERO_MODE を統一管理
- core/hakmem_pool.c で zero mode に応じた memset 制御(FULL/header/off)
- A/B テスト結果:ZERO_MODE=header で +15.34% improvement(1M iterations, C6-heavy)
## Files Modified
- core/box/pool_api.inc.h: pool_zero_mode_box.h include
- core/bench_profile.h: glibc setenv → malloc+putenv(segfault 回避)
- core/hakmem_pool.c: zero mode 参照・制御ロジック
- core/box/pool_zero_mode_box.h (新設): enum/getter
- CURRENT_TASK.md: Phase ML1 結果記載
## Test Results
| Iterations | ZERO_MODE=full | ZERO_MODE=header | Improvement |
|-----------|----------------|-----------------|------------|
| 10K | 3.06 M ops/s | 3.17 M ops/s | +3.65% |
| 1M | 23.71 M ops/s | 27.34 M ops/s | **+15.34%** |
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-10 09:08:18 +09:00
679c821573
ENV Cleanup Step 14: Gate HAKMEM_TINY_HEAP_V2_DEBUG
...
Gate the HeapV2 push debug logging behind #if !HAKMEM_BUILD_RELEASE:
- HAKMEM_TINY_HEAP_V2_DEBUG: Controls magazine push event tracing
- File: core/front/tiny_heap_v2.h:117-130
Wraps the ENV check and debug output that logs the first 5 push
operations per size class for HeapV2 magazine diagnostics.
Performance: 29.6M ops/s (within baseline range)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 04:33:39 +09:00
25d963a4aa
Code Cleanup: Remove false positives, redundant validations, and reduce verbose logging
...
Following the C7 stride upgrade fix (commit 23c0d9541 ), this commit performs
comprehensive cleanup to improve code quality and reduce debug noise.
## Changes
### 1. Disable False Positive Checks (tiny_nextptr.h)
- **Disabled**: NXT_MISALIGN validation block with `#if 0`
- **Reason**: Produces false positives due to slab base offsets (2048, 65536)
not being stride-aligned, causing all blocks to appear "misaligned"
- **TODO**: Reimplement to check stride DISTANCE between consecutive blocks
instead of absolute alignment to stride boundaries
### 2. Remove Redundant Geometry Validations
**hakmem_tiny_refill_p0.inc.h (P0 batch refill)**
- Removed 25-line CARVE_GEOMETRY_FIX validation block
- Replaced with NOTE explaining redundancy
- **Reason**: Stride table is now correct in tiny_block_stride_for_class(),
defense-in-depth validation adds overhead without benefit
**ss_legacy_backend_box.c (legacy backend)**
- Removed 18-line LEGACY_FIX_GEOMETRY validation block
- Replaced with NOTE explaining redundancy
- **Reason**: Shared_pool validates geometry at acquisition time
### 3. Reduce Verbose Logging
**hakmem_shared_pool.c (sp_fix_geometry_if_needed)**
- Made SP_FIX_GEOMETRY logging conditional on `!HAKMEM_BUILD_RELEASE`
- **Reason**: Geometry fixes are expected during stride upgrades,
no need to log in release builds
### 4. Verification
- Build: ✅ Successful (LTO warnings expected)
- Test: ✅ 10K iterations (1.87M ops/s, no crashes)
- NXT_MISALIGN false positives: ✅ Eliminated
## Files Modified
- core/tiny_nextptr.h - Disabled false positive NXT_MISALIGN check
- core/hakmem_tiny_refill_p0.inc.h - Removed redundant CARVE validation
- core/box/ss_legacy_backend_box.c - Removed redundant LEGACY validation
- core/hakmem_shared_pool.c - Made SP_FIX_GEOMETRY logging debug-only
## Impact
- **Code clarity**: Removed 43 lines of redundant validation code
- **Debug noise**: Reduced false positive diagnostics
- **Performance**: Eliminated overhead from redundant geometry checks
- **Maintainability**: Single source of truth for geometry validation
🧹 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-21 23:00:24 +09:00
1a2c5dca0d
TinyHeapV2: Enable by default with optimized settings
...
Changes:
- Default: TinyHeapV2 ON (was OFF, now enabled without ENV)
- Default CLASS_MASK: 0xE (C1-C3 only, skip C0 8B due to -5% regression)
- Remove debug fprintf overhead in Release builds (HAKMEM_BUILD_RELEASE guard)
Performance (100K iterations, workset=128, default settings):
- 16B: 43.9M → 47.7M ops/s (+8.7%)
- 32B: 41.9M → 44.8M ops/s (+6.9%)
- 64B: 51.2M → 50.9M ops/s (-0.6%, within noise)
Key fix: Debug fprintf in tiny_heap_v2_enabled() caused 20-30% overhead
- Before: 36-42M ops/s (with debug log)
- After: 44-48M ops/s (debug log disabled in Release)
ENV override:
- HAKMEM_TINY_HEAP_V2=0 to disable
- HAKMEM_TINY_HEAP_V2_CLASS_MASK=0xF to enable all C0-C3
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-15 16:33:38 +09:00
bb70d422dc
Phase 13-B: TinyHeapV2 supply path with dual-mode A/B framework (Stealing vs Leftover)
...
Summary:
- Implemented free path supply with ENV-gated A/B modes (HAKMEM_TINY_HEAP_V2_LEFTOVER_MODE)
- Mode 0 (Stealing, default): L0 gets freed blocks first → +18% @ 32B
- Mode 1 (Leftover): L1 primary owner, L0 gets leftovers → Box-clean but -5% @ 16B
- Decision: Default to Stealing for performance (ChatGPT analysis: L0 doesn't corrupt learning layer signals)
Performance (100K iterations, workset=128):
- 16B: 43.9M → 45.6M ops/s (+3.9%)
- 32B: 41.9M → 49.6M ops/s (+18.4%) ✅
- 64B: 51.2M → 51.5M ops/s (+0.6%)
- 100% magazine hit rate (supply from free path working correctly)
Implementation:
- tiny_free_fast_v2.inc.h: Dual-mode supply (lines 134-166)
- tiny_heap_v2.h: Add tiny_heap_v2_leftover_mode() flag + rationale doc
- tiny_alloc_fast.inc.h: Alloc hook with tiny_heap_v2_alloc_by_class()
- CURRENT_TASK.md: Updated Phase 13-B status (complete) with A/B results
ENV flags:
- HAKMEM_TINY_HEAP_V2=1 # Enable TinyHeapV2
- HAKMEM_TINY_HEAP_V2_LEFTOVER_MODE=0 # Mode 0 (Stealing, default)
- HAKMEM_TINY_HEAP_V2_CLASS_MASK=0xE # C1-C3 only (skip C0 -5% regression)
- HAKMEM_TINY_HEAP_V2_STATS=1 # Print statistics
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-15 16:28:40 +09:00
d72a700948
Phase 13-B: TinyHeapV2 free path supply hook (magazine population)
...
Implement magazine supply from free path to enable TinyHeapV2 L0 cache
Changes:
1. core/tiny_free_fast_v2.inc.h (Line 24, 134-143):
- Include tiny_heap_v2.h for magazine API
- Add supply hook after BASE pointer conversion (Line 134-143)
- Try to push freed block to TinyHeapV2 magazine (C0-C3 only)
- Falls back to TLS SLL if magazine full (existing behavior)
2. core/front/tiny_heap_v2.h (Line 24-46):
- Move TinyHeapV2Mag / TinyHeapV2Stats typedef from hakmem_tiny.c
- Add extern declarations for TLS variables
- Define TINY_HEAP_V2_MAG_CAP (16 slots)
- Enables use from tiny_free_fast_v2.inc.h
3. core/hakmem_tiny.c (Line 1270-1276, 1766-1768):
- Remove duplicate typedef definitions
- Move TLS storage declarations after tiny_heap_v2.h include
- Reason: tiny_heap_v2.h must be included AFTER tiny_alloc_fast.inc.h
- Forward declarations remain for early reference
Supply Hook Flow:
```
hak_free_at(ptr) → hak_tiny_free_fast_v2(ptr)
→ class_idx = read_header(ptr)
→ base = ptr - 1
→ if (class_idx <= 3 && tiny_heap_v2_enabled())
→ tiny_heap_v2_try_push(class_idx, base)
→ success: return (magazine supplied)
→ full: fall through to TLS SLL
→ tls_sll_push(class_idx, base) # existing path
```
Benefits:
- Magazine gets populated from freed blocks (L0 cache warm-up)
- Next allocation hits magazine (fast L0 path, no backend refill)
- Expected: 70-90% hit rate for fixed-size workloads
- Expected: +200-500% performance for C0-C3 classes
Build & Smoke Test:
- ✅ Build successful
- ✅ bench_fixed_size 256B workset=50: 33M ops/s (stable)
- ✅ bench_fixed_size 16B workset=60: 30M ops/s (stable)
- 🔜 A/B test (hit rate measurement) deferred to next commit
Implementation Status:
- ✅ Phase 13-A: Alloc hook + stats (completed, committed)
- ✅ Phase 13-B: Free path supply (THIS COMMIT)
- 🔜 Phase 13-C: Evaluation & tuning
Notes:
- Supply hook is C0-C3 only (TinyHeapV2 target range)
- Magazine capacity=16 (same as Phase 13-A)
- No performance regression (hook is ENV-gated: HAKMEM_TINY_HEAP_V2=1)
🤝 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-15 13:39:37 +09:00
0836d62ff4
Phase 13-A Step 1 COMPLETE: TinyHeapV2 alloc hook + stats + supply infrastructure
...
Phase 13-A Status: ✅ COMPLETE
- Alloc hook working (hak_tiny_alloc via hakmem_tiny_alloc_new.inc)
- Statistics accurate (alloc_calls, mag_hits tracked correctly)
- NO-REFILL L0 cache stable (zero performance overhead)
- A/B tests: C1 +0.76%, C2 +0.42%, C3 -0.26% (all within noise)
Changes:
- Added tiny_heap_v2_try_push() infrastructure for Phase 13-B (free path supply)
- Currently unused but provides clean API for magazine supply from free path
Verification:
- Modified bench_fixed_size.c to use hak_alloc_at/hak_free_at (HAKMEM routing)
- Verified HAKMEM routing works: workset=10-127 ✅
- Found separate bug: workset=128 hangs (power-of-2 edge case, not HeapV2 related)
Phase 13-B: Free path supply deferred
- Actual free path: hak_free_at → hak_tiny_free_fast_v2
- Not tiny_free_fast (wrapper-only path)
- Requires hak_tiny_free_fast_v2 integration work
Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-15 02:28:26 +09:00
5cc1f93622
Phase 13-A Step 1: TinyHeapV2 NO-REFILL L0 cache implementation
...
Implement TinyHeapV2 as a minimal "lucky hit" L0 cache that avoids
circular dependency with FastCache by eliminating self-refill.
Key Changes:
- New: core/front/tiny_heap_v2.h - NO-REFILL L0 cache implementation
- tiny_heap_v2_alloc(): Pop from magazine if available, else return NULL
- tiny_heap_v2_refill_mag(): No-op stub (no backend refill)
- ENV: HAKMEM_TINY_HEAP_V2=1 to enable
- ENV: HAKMEM_TINY_HEAP_V2_CLASS_MASK=bitmask (C0-C3 control)
- ENV: HAKMEM_TINY_HEAP_V2_STATS=1 to print statistics
- Modified: core/hakmem_tiny_alloc_new.inc - Add TinyHeapV2 hook
- Hook at entry point (after class_idx calculation)
- Fallback to existing front if TinyHeapV2 returns NULL
- Modified: core/hakmem_tiny_alloc.inc - Add hook for legacy path
- Modified: core/hakmem_tiny.c - Add TLS variables and stats wrapper
- TinyHeapV2Mag: Per-class magazine (capacity=16)
- TinyHeapV2Stats: Per-class counters (alloc_calls, mag_hits, etc.)
- tiny_heap_v2_print_stats(): Statistics output at exit
- New: TINY_HEAP_V2_TASK_SPEC.md - Phase 13 specification
Root Cause Fixed:
- BEFORE: TinyHeapV2 refilled from FastCache → circular dependency
- TinyHeapV2 intercepted all allocs → FastCache never populated
- Result: 100% backend OOM, 0% hit rate, 99% slowdown
- AFTER: TinyHeapV2 is passive L0 cache (no refill)
- Magazine empty → return NULL → existing front handles it
- Result: 0% overhead, stable baseline performance
A/B Test Results (100K iterations, fixed-size bench):
- C1 (8B): Baseline 9,688 ops/s → HeapV2 ON 9,762 ops/s (+0.76%)
- C2 (16B): Baseline 9,804 ops/s → HeapV2 ON 9,845 ops/s (+0.42%)
- C3 (32B): Baseline 9,840 ops/s → HeapV2 ON 9,814 ops/s (-0.26%)
- All within noise range: NO PERFORMANCE REGRESSION ✅
Statistics (HeapV2 ON, C1-C3):
- alloc_calls: 200K (hook works correctly)
- mag_hits: 0 (0%) - Magazine empty as expected
- refill_calls: 0 - No refill executed (circular dependency avoided)
- backend_oom: 0 - No backend access
Next Steps (Phase 13-A Step 2):
- Implement magazine supply strategy (from existing front or free path)
- Goal: Populate magazine with "leftover" blocks from existing pipeline
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-15 01:42:57 +09:00