Phase 1 完了:環境変数整理 + fprintf デバッグガード ENV変数削除(BG/HotMag系): - core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines) - core/hakmem_tiny_bg_spill.c: BG spill ENV 削除 - core/tiny_refill.h: BG remote 固定値化 - core/hakmem_tiny_slow.inc: BG refs 削除 fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE): - core/hakmem_shared_pool.c: Lock stats (~18 fprintf) - core/page_arena.c: Init/Shutdown/Stats (~27 fprintf) - core/hakmem.c: SIGSEGV init message ドキュメント整理: - 328 markdown files 削除(旧レポート・重複docs) 性能確認: - Larson: 52.35M ops/s (前回52.8M、安定動作✅) - ENV整理による機能影響なし - Debug出力は一部残存(次phase で対応) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
11 KiB
Phase 7 Comprehensive Benchmark Results
Date: 2025-11-08
Build Configuration: HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1
Status: CRITICAL BUGS FOUND - NOT PRODUCTION READY
Executive Summary
Production Readiness: FAILED
Critical Issues Found:
- Multi-threaded crash: Larson 2T/4T fail with
free(): invalid pointer(Exit 134) - 64B allocation crash: Bus error (Exit 135) on 64-byte allocations
- Debug output in production: "Phase 7: tiny_alloc(1024) rejected" messages indicate incomplete implementation
Performance (Single-threaded, working sizes):
- Single-thread performance is excellent (76-120% of System malloc)
- But crashes make this unusable in production
Key Findings
| Category | Result | Status |
|---|---|---|
| Larson 1T | 2.76M ops/s | ✅ PASS |
| Larson 2T/4T | CRASH (Exit 134) | ❌ CRITICAL FAIL |
| Random Mixed (most sizes) | 60-72M ops/s | ✅ PASS |
| Random Mixed 64B | CRASH (Bus Error 135) | ❌ CRITICAL FAIL |
| Stability (1M iterations) | Stable scores | ✅ PASS |
| Overall Production Ready | NO | ❌ FAIL |
Detailed Benchmark Results
1. Larson Multi-Thread Stress Test
| Threads | HAKMEM Result | System Result | Status |
|---|---|---|---|
| 1T | 2,758,490 ops/s | ~3.3M ops/s (est.) | ✅ 84% of System |
| 2T | CRASH (Exit 134) | N/A | ❌ CRITICAL |
| 4T | CRASH (Exit 134) | N/A | ❌ CRITICAL |
Crash Details:
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
free(): invalid pointer
Exit code: 134 (SIGABRT - double free or corruption)
Root Cause: Unknown - likely race condition in multi-threaded free path or malloc fallback integration issue.
2. Random Mixed Allocation Benchmark
Test: 100,000 iterations of mixed malloc/free patterns
| Size | HAKMEM (ops/s) | System (ops/s) | HAKMEM % | Status |
|---|---|---|---|---|
| 16B | 66,878,359 | 87,810,575 | 76.1% | ✅ |
| 32B | 69,730,339 | 64,490,458 | 108.1% | ✅ |
| 64B | CRASH (Bus Error 135) | 78,147,467 | N/A | ❌ CRITICAL |
| 128B | 72,090,413 | 65,960,798 | 109.2% | ✅ |
| 256B | 71,363,681 | 71,688,134 | 99.5% | ✅ |
| 512B | 60,501,851 | 62,967,613 | 96.0% | ✅ |
| 1024B | 63,229,630 | 67,220,203 | 94.0% | ✅ |
| 2048B | 55,868,013 | 46,557,492 | 119.9% | ✅ |
| 4096B | 40,585,997 | 45,157,552 | 89.8% | ✅ |
| 8192B | 35,442,103 | 33,984,326 | 104.2% | ✅ |
Performance Highlights (working sizes):
- 32B: +8% faster than System (108.1%)
- 128B: +9% faster than System (109.2%)
- 2048B: +20% faster than System (119.9%)
- 8192B: +4% faster than System (104.2%)
64B Crash Details:
Exit code: 135 (SIGBUS - unaligned memory access or invalid pointer)
Crash during allocation, not free
Root Cause: Unknown - possibly alignment issue or class index calculation error for 64B size class.
3. Long-Run Stability Tests
Test: 1,000,000 iterations (10x normal) to check for memory leaks and variance
| Size | Throughput (ops/s) | Variance vs 100K | Status |
|---|---|---|---|
| 128B | 72,829,711 | +1.0% | ✅ Stable |
| 256B | 72,305,587 | +1.3% | ✅ Stable |
| 1024B | 64,240,186 | +1.6% | ✅ Stable |
Analysis:
- Variance <2% indicates stable performance
- No memory leaks detected (throughput would degrade if leaking)
- Scores slightly higher in long runs (likely cache warmup effects)
4. Comparison vs Phase 6 Baseline
Phase 6 Baseline (from CLAUDE.md):
- Tiny: 52.59 M/s (38.7% of System 135.94 M/s)
- Phase 6 Goal: 85-92% of System
Phase 7 Results (working sizes):
- Tiny (128B): 72.09 M/s (109% of System 65.96 M/s) → +37% improvement
- Tiny (256B): 71.36 M/s (99.5% of System) → +36% improvement
- Mid (2048B): 55.87 M/s (120% of System) → Exceeds System by +20%
Goal Achievement:
- Target: 85-92% of System → Achieved 96-120% (working sizes)
- But: Critical crashes make this irrelevant
5. Comprehensive Benchmark (Phase 8 features)
Status: Could not run - linking errors
Issue: bench_comprehensive.c calls Phase 8 functions:
hak_tiny_print_memory_profile()hkm_learner_init()superslab_ace_print_stats()
These are not compatible with Phase 7 build. Would need:
- Remove Phase 8 dependencies, OR
- Build with Phase 8 flags, OR
- Use simpler benchmark suite
Root Cause Analysis
Issue 1: Multi-threaded Crash (Larson 2T/4T)
Symptoms:
- Single-threaded works perfectly (2.76M ops/s)
- 2+ threads crash immediately with "free(): invalid pointer"
- Consistent across 2T and 4T tests
Debug Output:
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
Hypotheses:
- Race condition in TLS initialization: Multiple threads accessing uninitialized TLS
- Malloc fallback bug: Mixed HAKMEM/libc allocations causing double-free
- Free path ownership bug: Wrong allocator freeing blocks from the other
Priority: CRITICAL - must fix before any production use
Issue 2: 64B Bus Error Crash
Symptoms:
- Bus error (SIGBUS) on 64-byte allocations
- All other sizes (16, 32, 128, 256, ..., 8192) work fine
- Crash happens during allocation, not free
Hypotheses:
- Class index calculation error: 64B might map to wrong class
- Alignment issue: 64B blocks not aligned to required boundary
- Header corruption: Class index stored in header (HEADER_CLASSIDX=1) might overflow for 64B
Clue: Debug message shows "tiny_alloc(1024) rejected" even for 64B allocations, suggesting routing logic is broken.
Priority: CRITICAL - 64B is a common allocation size
Issue 3: Debug Output in Production Build
Symptom:
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
Impact:
- Performance overhead (fprintf in hot path)
- Indicates incomplete implementation (rejections shouldn't happen in production)
- Suggests Phase 7 optimizations have broken size routing
Priority: HIGH - indicates deeper implementation issues
Production Readiness Assessment
Success Criteria (from CURRENT_TASK.md)
| Criterion | Result | Status |
|---|---|---|
| ✅ All benchmarks complete without crashes | ❌ 2T/4T Larson crash, 64B crash | FAIL |
| ✅ Tiny performance: 85-92% of System | ✅ 96-120% (working sizes) | PASS |
| ✅ Mid-Large performance: maintained | ✅ 120% of System | PASS |
| ✅ Multi-thread stability: no regression | ❌ Complete crash | FAIL |
| ✅ Fragmentation stress: acceptable | ⚠️ Not tested (build issues) | SKIP |
| ✅ Comprehensive report generated | ✅ This document | PASS |
Overall: FAIL - 2 critical crashes
Recommended Next Steps
Immediate Actions (Critical Bugs)
1. Fix Multi-threaded Crash (Highest Priority)
# Debug with ASan
make clean
make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 \
ASAN=1 larson_hakmem
./larson_hakmem 2 8 128 1024 1 12345 2
# Check TLS initialization
grep -r "PREWARM_TLS" core/
# Verify all TLS variables are initialized before thread spawn
Expected Root Cause: TLS prewarm not actually executing, or race in initialization.
2. Fix 64B Bus Error (High Priority)
# Add debug output to class index calculation
# File: core/box/hak_alloc_api.inc.h or similar
printf("tiny_alloc(%zu) -> class %d\n", size, class_idx);
# Check alignment
# File: core/hakmem_tiny_superslab.c
assert((uintptr_t)ptr % 64 == 0); // 64B must be 64-byte aligned
Expected Root Cause: HEADER_CLASSIDX=1 storing wrong class index for 64B.
3. Remove Debug Output
# Find and remove/disable debug prints
grep -r "DEBUG.*Phase 7" core/
# Should be gated by #ifdef HAKMEM_DEBUG
Phase 7 Feature Regression Test
Before deploying any fix, verify:
- All single-threaded benchmarks still pass
- Performance doesn't regress to Phase 6 levels
- No new crashes introduced
Test Suite:
# Single-thread (must pass)
./larson_hakmem 1 1 128 1024 1 12345 1 # Expect: 2.76M ops/s
./bench_random_mixed_hakmem 100000 128 1234567 # Expect: 72M ops/s
# Multi-thread (currently fails, must fix)
./larson_hakmem 2 8 128 1024 1 12345 2 # Expect: no crash
./larson_hakmem 4 8 128 1024 1 12345 4 # Expect: no crash
# 64B (currently fails, must fix)
./bench_random_mixed_hakmem 100000 64 1234567 # Expect: no crash, ~70M ops/s
Alternate Path: Revert Phase 7 Optimizations
If bugs are too complex to fix quickly:
# Revert to Phase 6
git checkout HEAD~3 # Or specific Phase 6 commit
# Verify Phase 6 still works
make clean && make larson_hakmem
./larson_hakmem 4 8 128 1024 1 12345 4 # Should work
# Incrementally re-apply Phase 7 optimizations
git cherry-pick <HEADER_CLASSIDX commit> # Test
git cherry-pick <AGGRESSIVE_INLINE commit> # Test
git cherry-pick <PREWARM_TLS commit> # Test
# Identify which commit introduced the bugs
Build Information
Compiler: gcc with LTO Flags:
-O3 -flto -march=native -mtune=native
-DHAKMEM_TINY_PHASE6_BOX_REFACTOR=1
-DHAKMEM_TINY_FAST_PATH=1
-DHAKMEM_TINY_HEADER_CLASSIDX=1
-DHAKMEM_TINY_AGGRESSIVE_INLINE=1
-DHAKMEM_TINY_PREWARM_TLS=1
Known Issues:
bench_comprehensivewon't link (Phase 8 dependencies)bench_fragment_stressnot tested (same issue)- Debug output leaking into production builds
Appendix: Full Benchmark Output Samples
Larson 1T (Success)
=== LARSON 1T BASELINE ===
Throughput = 2758490 operations per second, relative time: 362.517s.
Done sleeping...
[ELO] Initialized 12 strategies (thresholds: 512KB-32MB)
[Batch] Initialized (threshold=8 MB, min_size=64 KB, bg=on)
[ACE] ACE disabled (HAKMEM_ACE_ENABLED=0)
Larson 2T (Crash)
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
free(): invalid pointer
Exit code: 134
64B Crash
[SUPERSLAB_INIT] class 7 slab 0: usable_size=63488 block_size=1024 capacity=62
[SUPERSLAB_INIT] Expected: 63488 / 1024 = 62 blocks
Exit code: 135 (SIGBUS)
Conclusion
Phase 7 achieved exceptional single-threaded performance (96-120% of System malloc), but introduced critical bugs:
- Multi-threaded crash: Unusable with 2+ threads
- 64B crash: Unusable for common allocation size
- Incomplete implementation: Debug fallbacks in production code
Recommendation: DO NOT DEPLOY to production. Revert to Phase 6 or fix critical bugs before proceeding to Phase 7 Tasks 6-9.
Next Steps (in priority order):
- Fix multi-threaded crash (blocker for all production use)
- Fix 64B bus error (blocker for most workloads)
- Remove debug output (quality/performance issue)
- Re-run comprehensive validation
- Only then proceed to Phase 7 Tasks 6-9
Generated: 2025-11-08 Test Duration: ~2 hours Total Benchmarks: 15 tests (10 sizes × random mixed, 3 × Larson, 3 × stability) Crashes Found: 2 critical (Larson MT, 64B) Production Ready: ❌ NO