Files
hakmem/docs/analysis/PHASE7_COMPREHENSIVE_BENCHMARK_RESULTS.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

370 lines
11 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 7 Comprehensive Benchmark Results
**Date**: 2025-11-08
**Build Configuration**: `HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1`
**Status**: CRITICAL BUGS FOUND - NOT PRODUCTION READY
---
## Executive Summary
### Production Readiness: FAILED
**Critical Issues Found:**
1. **Multi-threaded crash**: Larson 2T/4T fail with `free(): invalid pointer` (Exit 134)
2. **64B allocation crash**: Bus error (Exit 135) on 64-byte allocations
3. **Debug output in production**: "Phase 7: tiny_alloc(1024) rejected" messages indicate incomplete implementation
**Performance (Single-threaded, working sizes):**
- Single-thread performance is excellent (76-120% of System malloc)
- But crashes make this unusable in production
### Key Findings
| Category | Result | Status |
|----------|--------|--------|
| Larson 1T | 2.76M ops/s | ✅ PASS |
| Larson 2T/4T | CRASH (Exit 134) | ❌ CRITICAL FAIL |
| Random Mixed (most sizes) | 60-72M ops/s | ✅ PASS |
| Random Mixed 64B | CRASH (Bus Error 135) | ❌ CRITICAL FAIL |
| Stability (1M iterations) | Stable scores | ✅ PASS |
| Overall Production Ready | NO | ❌ FAIL |
---
## Detailed Benchmark Results
### 1. Larson Multi-Thread Stress Test
| Threads | HAKMEM Result | System Result | Status |
|---------|---------------|---------------|--------|
| 1T | 2,758,490 ops/s | ~3.3M ops/s (est.) | ✅ 84% of System |
| 2T | **CRASH (Exit 134)** | N/A | ❌ CRITICAL |
| 4T | **CRASH (Exit 134)** | N/A | ❌ CRITICAL |
**Crash Details:**
```
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
free(): invalid pointer
Exit code: 134 (SIGABRT - double free or corruption)
```
**Root Cause**: Unknown - likely race condition in multi-threaded free path or malloc fallback integration issue.
---
### 2. Random Mixed Allocation Benchmark
**Test**: 100,000 iterations of mixed malloc/free patterns
| Size | HAKMEM (ops/s) | System (ops/s) | HAKMEM % | Status |
|------|----------------|----------------|----------|--------|
| 16B | 66,878,359 | 87,810,575 | 76.1% | ✅ |
| 32B | 69,730,339 | 64,490,458 | **108.1%** | ✅ |
| **64B** | **CRASH (Bus Error 135)** | 78,147,467 | N/A | ❌ CRITICAL |
| 128B | 72,090,413 | 65,960,798 | **109.2%** | ✅ |
| 256B | 71,363,681 | 71,688,134 | 99.5% | ✅ |
| 512B | 60,501,851 | 62,967,613 | 96.0% | ✅ |
| 1024B | 63,229,630 | 67,220,203 | 94.0% | ✅ |
| 2048B | 55,868,013 | 46,557,492 | **119.9%** | ✅ |
| 4096B | 40,585,997 | 45,157,552 | 89.8% | ✅ |
| 8192B | 35,442,103 | 33,984,326 | **104.2%** | ✅ |
**Performance Highlights (working sizes):**
- **32B: +8% faster than System** (108.1%)
- **128B: +9% faster than System** (109.2%)
- **2048B: +20% faster than System** (119.9%)
- **8192B: +4% faster than System** (104.2%)
**64B Crash Details:**
```
Exit code: 135 (SIGBUS - unaligned memory access or invalid pointer)
Crash during allocation, not free
```
**Root Cause**: Unknown - possibly alignment issue or class index calculation error for 64B size class.
---
### 3. Long-Run Stability Tests
**Test**: 1,000,000 iterations (10x normal) to check for memory leaks and variance
| Size | Throughput (ops/s) | Variance vs 100K | Status |
|------|-------------------|------------------|--------|
| 128B | 72,829,711 | +1.0% | ✅ Stable |
| 256B | 72,305,587 | +1.3% | ✅ Stable |
| 1024B | 64,240,186 | +1.6% | ✅ Stable |
**Analysis**:
- Variance <2% indicates stable performance
- No memory leaks detected (throughput would degrade if leaking)
- Scores slightly higher in long runs (likely cache warmup effects)
---
### 4. Comparison vs Phase 6 Baseline
**Phase 6 Baseline** (from CLAUDE.md):
- Tiny: 52.59 M/s (38.7% of System 135.94 M/s)
- Phase 6 Goal: 85-92% of System
**Phase 7 Results** (working sizes):
- Tiny (128B): 72.09 M/s (109% of System 65.96 M/s) **+37% improvement**
- Tiny (256B): 71.36 M/s (99.5% of System) **+36% improvement**
- Mid (2048B): 55.87 M/s (120% of System) Exceeds System by +20%
**Goal Achievement**:
- Target: 85-92% of System **Achieved 96-120%** (working sizes)
- But: **Critical crashes make this irrelevant**
---
### 5. Comprehensive Benchmark (Phase 8 features)
**Status**: Could not run - linking errors
**Issue**: `bench_comprehensive.c` calls Phase 8 functions:
- `hak_tiny_print_memory_profile()`
- `hkm_learner_init()`
- `superslab_ace_print_stats()`
These are not compatible with Phase 7 build. Would need:
- Remove Phase 8 dependencies, OR
- Build with Phase 8 flags, OR
- Use simpler benchmark suite
---
## Root Cause Analysis
### Issue 1: Multi-threaded Crash (Larson 2T/4T)
**Symptoms**:
- Single-threaded works perfectly (2.76M ops/s)
- 2+ threads crash immediately with "free(): invalid pointer"
- Consistent across 2T and 4T tests
**Debug Output**:
```
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
```
**Hypotheses**:
1. **Race condition in TLS initialization**: Multiple threads accessing uninitialized TLS
2. **Malloc fallback bug**: Mixed HAKMEM/libc allocations causing double-free
3. **Free path ownership bug**: Wrong allocator freeing blocks from the other
**Priority**: CRITICAL - must fix before any production use
---
### Issue 2: 64B Bus Error Crash
**Symptoms**:
- Bus error (SIGBUS) on 64-byte allocations
- All other sizes (16, 32, 128, 256, ..., 8192) work fine
- Crash happens during allocation, not free
**Hypotheses**:
1. **Class index calculation error**: 64B might map to wrong class
2. **Alignment issue**: 64B blocks not aligned to required boundary
3. **Header corruption**: Class index stored in header (HEADER_CLASSIDX=1) might overflow for 64B
**Clue**: Debug message shows "tiny_alloc(1024) rejected" even for 64B allocations, suggesting routing logic is broken.
**Priority**: CRITICAL - 64B is a common allocation size
---
### Issue 3: Debug Output in Production Build
**Symptom**:
```
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
```
**Impact**:
- Performance overhead (fprintf in hot path)
- Indicates incomplete implementation (rejections shouldn't happen in production)
- Suggests Phase 7 optimizations have broken size routing
**Priority**: HIGH - indicates deeper implementation issues
---
## Production Readiness Assessment
### Success Criteria (from CURRENT_TASK.md)
| Criterion | Result | Status |
|-----------|--------|--------|
| All benchmarks complete without crashes | 2T/4T Larson crash, 64B crash | FAIL |
| Tiny performance: 85-92% of System | 96-120% (working sizes) | PASS |
| Mid-Large performance: maintained | 120% of System | PASS |
| Multi-thread stability: no regression | Complete crash | FAIL |
| Fragmentation stress: acceptable | Not tested (build issues) | SKIP |
| Comprehensive report generated | This document | PASS |
**Overall**: **FAIL - 2 critical crashes**
---
## Recommended Next Steps
### Immediate Actions (Critical Bugs)
**1. Fix Multi-threaded Crash (Highest Priority)**
```bash
# Debug with ASan
make clean
make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 \
ASAN=1 larson_hakmem
./larson_hakmem 2 8 128 1024 1 12345 2
# Check TLS initialization
grep -r "PREWARM_TLS" core/
# Verify all TLS variables are initialized before thread spawn
```
**Expected Root Cause**: TLS prewarm not actually executing, or race in initialization.
**2. Fix 64B Bus Error (High Priority)**
```bash
# Add debug output to class index calculation
# File: core/box/hak_alloc_api.inc.h or similar
printf("tiny_alloc(%zu) -> class %d\n", size, class_idx);
# Check alignment
# File: core/hakmem_tiny_superslab.c
assert((uintptr_t)ptr % 64 == 0); // 64B must be 64-byte aligned
```
**Expected Root Cause**: HEADER_CLASSIDX=1 storing wrong class index for 64B.
**3. Remove Debug Output**
```bash
# Find and remove/disable debug prints
grep -r "DEBUG.*Phase 7" core/
# Should be gated by #ifdef HAKMEM_DEBUG
```
---
### Phase 7 Feature Regression Test
**Before deploying any fix, verify**:
1. All single-threaded benchmarks still pass
2. Performance doesn't regress to Phase 6 levels
3. No new crashes introduced
**Test Suite**:
```bash
# Single-thread (must pass)
./larson_hakmem 1 1 128 1024 1 12345 1 # Expect: 2.76M ops/s
./bench_random_mixed_hakmem 100000 128 1234567 # Expect: 72M ops/s
# Multi-thread (currently fails, must fix)
./larson_hakmem 2 8 128 1024 1 12345 2 # Expect: no crash
./larson_hakmem 4 8 128 1024 1 12345 4 # Expect: no crash
# 64B (currently fails, must fix)
./bench_random_mixed_hakmem 100000 64 1234567 # Expect: no crash, ~70M ops/s
```
---
### Alternate Path: Revert Phase 7 Optimizations
If bugs are too complex to fix quickly:
```bash
# Revert to Phase 6
git checkout HEAD~3 # Or specific Phase 6 commit
# Verify Phase 6 still works
make clean && make larson_hakmem
./larson_hakmem 4 8 128 1024 1 12345 4 # Should work
# Incrementally re-apply Phase 7 optimizations
git cherry-pick <HEADER_CLASSIDX commit> # Test
git cherry-pick <AGGRESSIVE_INLINE commit> # Test
git cherry-pick <PREWARM_TLS commit> # Test
# Identify which commit introduced the bugs
```
---
## Build Information
**Compiler**: gcc with LTO
**Flags**:
```
-O3 -flto -march=native -mtune=native
-DHAKMEM_TINY_PHASE6_BOX_REFACTOR=1
-DHAKMEM_TINY_FAST_PATH=1
-DHAKMEM_TINY_HEADER_CLASSIDX=1
-DHAKMEM_TINY_AGGRESSIVE_INLINE=1
-DHAKMEM_TINY_PREWARM_TLS=1
```
**Known Issues**:
- `bench_comprehensive` won't link (Phase 8 dependencies)
- `bench_fragment_stress` not tested (same issue)
- Debug output leaking into production builds
---
## Appendix: Full Benchmark Output Samples
### Larson 1T (Success)
```
=== LARSON 1T BASELINE ===
Throughput = 2758490 operations per second, relative time: 362.517s.
Done sleeping...
[ELO] Initialized 12 strategies (thresholds: 512KB-32MB)
[Batch] Initialized (threshold=8 MB, min_size=64 KB, bg=on)
[ACE] ACE disabled (HAKMEM_ACE_ENABLED=0)
```
### Larson 2T (Crash)
```
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
free(): invalid pointer
Exit code: 134
```
### 64B Crash
```
[SUPERSLAB_INIT] class 7 slab 0: usable_size=63488 block_size=1024 capacity=62
[SUPERSLAB_INIT] Expected: 63488 / 1024 = 62 blocks
Exit code: 135 (SIGBUS)
```
---
## Conclusion
**Phase 7 achieved exceptional single-threaded performance** (96-120% of System malloc), **but introduced critical bugs**:
1. **Multi-threaded crash**: Unusable with 2+ threads
2. **64B crash**: Unusable for common allocation size
3. **Incomplete implementation**: Debug fallbacks in production code
**Recommendation**: **DO NOT DEPLOY** to production. Revert to Phase 6 or fix critical bugs before proceeding to Phase 7 Tasks 6-9.
**Next Steps** (in priority order):
1. Fix multi-threaded crash (blocker for all production use)
2. Fix 64B bus error (blocker for most workloads)
3. Remove debug output (quality/performance issue)
4. Re-run comprehensive validation
5. Only then proceed to Phase 7 Tasks 6-9
---
**Generated**: 2025-11-08
**Test Duration**: ~2 hours
**Total Benchmarks**: 15 tests (10 sizes × random mixed, 3 × Larson, 3 × stability)
**Crashes Found**: 2 critical (Larson MT, 64B)
**Production Ready**: NO