Files
hakmem/docs/analysis/PHASE76_0_C7_STATISTICS_ANALYSIS.md
Moe Charm (CI) 89a9212700 Phase 83-1 + Allocator Comparison: Switch dispatch fixed (NO-GO +0.32%), PROFILE correction, SCORECARD update
Key changes:
- Phase 83-1: Switch dispatch fixed mode (tiny_inline_slots_switch_dispatch_fixed_box) - NO-GO (marginal +0.32%, branch reduction negligible)
  Reason: lazy-init pattern already optimal, Phase 78-1 pattern shows diminishing returns

- Allocator comparison baseline update (10-run SSOT, WS=400, ITERS=20M):
  tcmalloc: 115.26M (92.33% of mimalloc)
  jemalloc: 97.39M (77.96% of mimalloc)
  system: 85.20M (68.24% of mimalloc)
  mimalloc: 124.82M (baseline)

- hakmem PROFILE correction: scripts/run_mixed_10_cleanenv.sh + run_allocator_quick_matrix.sh
  PROFILE explicitly set to MIXED_TINYV3_C7_SAFE for hakmem measurements
  Result: baseline stabilized to 55.53M (44.46% of mimalloc)
  Previous unstable measurement (35.57M) was due to profile leak

- Documentation:
  * PERFORMANCE_TARGETS_SCORECARD.md: Reference allocators + M1/M2 milestone status
  * PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md: Phase 83-1 analysis (NO-GO)
  * ALLOCATOR_COMPARISON_QUICK_RUNBOOK.md: Quick comparison procedure
  * ALLOCATOR_COMPARISON_SSOT.md: Detailed SSOT methodology

- M2 milestone status: 44.46% (target 55%, gap -10.54pp) - structural improvements needed

🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 18:50:00 +09:00

184 lines
5.9 KiB
Markdown

# Phase 76-0: C7 Per-Class Statistics Analysis (SSOT化)
## Executive Summary
**Definitive C7 Statistics from Mixed SSOT Workload:**
- **C7 Hit Count: 0** (ZERO allocations)
- **C7 Percentage: 0.00%** of C4-C7 operations
- **Verdict: NO-GO for C7 P2 (inline slots optimization)**
---
## Test Configuration
**Binary**: `bench_random_mixed_hakmem_observe` (with HAKMEM_MEASURE_UNIFIED_CACHE=1)
**Environment Variables**:
```bash
HAKMEM_WARM_POOL_SIZE=16
HAKMEM_TINY_C5_INLINE_SLOTS=1
HAKMEM_TINY_C6_INLINE_SLOTS=1
```
**Benchmark Parameters**:
- Iterations: 20,000,000
- Working Set Size: 400
- Runs: 1 (per-class stats are cumulative)
**Unified Cache Initialization**:
```
C4 capacity = 64 (power of 2)
C5 capacity = 128 (power of 2)
C6 capacity = 128 (power of 2)
C7 capacity = 128 (power of 2)
```
---
## Results: Per-Class Statistics
### C7 Statistics (CRITICAL FINDING)
| Metric | Value |
|--------|-------|
| Hit Count | 0 |
| Miss Count | 0 |
| Push Count | 0 |
| Full Count | 0 |
| **Total Allocations** | **0** |
| **Occupied Slots** | **0/128** |
| Hit Rate | N/A |
| Full Rate | N/A |
**Status**: C7 received **ZERO allocations** in the Mixed SSOT workload.
### C4-C7 Ranking (Cumulative)
| Class | Hit Count | Miss Count | Capacity | Hit % | Percentage of Total |
|-------|-----------|-----------|----------|-------|---------------------|
| C6 | 2,750,854 | 1 | 128 | 100.0% | **57.17%** |
| C5 | 1,373,604 | 1 | 128 | 100.0% | **28.55%** |
| C4 | 687,563 | 1 | 64 | 100.0% | **14.29%** |
| C7 | 0 | 0 | 128 | N/A | **0.00%** |
| **TOTAL** | **4,812,021** | **3** | — | — | **100.00%** |
### Coverage Analysis
| Cumulative Classes | Operations | Percentage |
|--------------------|------------|-----------|
| C6 alone | 2,750,854 | 57.17% |
| C5+C6 | 4,124,458 | 85.72% |
| **C4+C5+C6** | **4,812,021** | **100.00%** |
| C4+C5+C6+C7 | 4,812,021 | 100.00% (no change) |
---
## Decision Analysis
### Threshold Criteria
- **GO for C7 P2**: C7 > 20% of C4-C7 operations
- **NEUTRAL**: 15% < C7 20% of C4-C7 operations
- **CONSIDER C4 redesign**: C7 15% of C4-C7 operations
### Verdict: **NO-GO for C7 P2**
**C7: 0.00%** - Falls far below any viable threshold
**Explanation:**
1. **Zero Volume**: The Mixed SSOT workload (128-1024B allocations) does NOT generate any C7 (1024-2048B) allocations.
2. **Workload Mismatch**: The benchmark parameters (400 working set size, 20M iterations) are tuned to exercise C4-C6 intensively but avoid C7 entirely.
3. **No Optimization Benefit**: Any C7 P2 (inline slots) optimization would provide 0% improvement for this specific workload.
4. **Resource Opportunity Cost**: Engineering effort for C7 P2 would be better spent on C4 (14.29%) or investigating alternative workloads.
---
## Recommended Next Phase
### Phase 76-1: C4 Per-Class Deep Dive
**Objective**: Analyze C4 (14.3% of total operations) as the next optimization target
**Rationale**:
- C4 is the **largest remaining bottleneck** after C5+C6 inline slots
- C4 (256-512B) represents a significant portion of tiny allocations
- After C5/C6 optimizations (85.7%), C4 becomes critical for overall performance
**Investigation Areas**:
1. **C4 Hit Rate**: Currently 100.0% (full cache hits) - room for miss reduction?
2. **C4 Cache Occupancy**: 63/64 slots occupied (near full)
3. **C4 Allocation Pattern**: Is there temporal locality opportunity?
4. **Alternative**: Investigate workloads that DO use C7 (system-level, long-lived objects)
**Suggested Implementation Options**:
- C4 LIFO optimization (vs current FIFO-like behavior)
- C4 spatial locality improvements
- C4 refill batching (similar to C5/C6)
- Hybrid C4-C5 inline slots strategy
---
## Artifacts
### Raw Log
Location: `/tmp/phase76_0_c7_stats.log`
Key excerpts:
```
[Unified-STATS] Unified Cache Metrics:
[Unified-STATS] Consistency Check:
[Unified-STATS] total_allocs (hit+miss) = 5327287
[Unified-STATS] total_frees (push+full) = 1202827
C2: 128/2048 slots occupied, hit=172530 miss=1 (100.0% hit), push=172531 full=0 (0.0% full)
C3: 128/2048 slots occupied, hit=342731 miss=1 (100.0% hit), push=342732 full=0 (0.0% full)
C4: 63/64 slots occupied, hit=687563 miss=1 (100.0% hit), push=687564 full=0 (0.0% full)
C5: 75/128 slots occupied, hit=1373604 miss=1 (100.0% hit), push=0 full=0 (0.0% full)
C6: 42/128 slots occupied, hit=2750854 miss=1 (100.0% hit), push=0 full=0 (0.0% full)
[C7 MISSING - 0 operations]
Throughput = 46152700 ops/s [iter=20000000 ws=400] time=0.433s
```
### Verification Output
```
C7 Initialization: ✓ Capacity=128 allocated
C7 Route Assignment: ✓ LEGACY route configured
C7 Operations: ✗ ZERO allocations
C7 Carve Attempts: 0 (no operations triggered)
C7 Warm Pool: 0 pops, 0 pushes
C7 Meta Used Counter: 0 total operations
```
---
## Key Insights
1. **Workload Characterization**: The Mixed SSOT benchmark is optimized for C4-C6 (128-1024B). This is intentional and appropriate for most mixed workloads.
2. **C7 Market Opportunity**: C7 (1024-2048B) allocations appear in:
- Long-lived data structures (hash tables, trees)
- System-level workloads (networking buffers)
- Specialized benchmarks (not representative of general use)
3. **Optimization Priority**:
- C6 (57.2%): Already optimized with inline slots
- C5 (28.5%): Already optimized with inline slots
- C4 (14.3%): **Next optimization target**
- C7 (0.0%): No presence in mixed workload
4. **Engineering Trade-offs**:
- C7 P2 would add complexity for 0% mixed-workload benefit
- C4 redesign could improve 14.3% of operations
- Consider phase-out of C7 optimization if isolated workloads don't justify it
---
## Conclusion
**Phase 76-0 Complete**: C7 is definitively measured at 0.00% of Mixed SSOT operations.
**Next Action**: Proceed to **Phase 76-1: C4 Analysis** to evaluate the largest remaining optimization opportunity (14.29% of total operations).
**File**: `/tmp/phase76_0_c7_stats.log`
**Date**: 2025-12-18
**Status**: Decision gate established