Files
hakmem/docs/analysis/PHASE76_0_C7_STATISTICS_ANALYSIS.md
Moe Charm (CI) 89a9212700 Phase 83-1 + Allocator Comparison: Switch dispatch fixed (NO-GO +0.32%), PROFILE correction, SCORECARD update
Key changes:
- Phase 83-1: Switch dispatch fixed mode (tiny_inline_slots_switch_dispatch_fixed_box) - NO-GO (marginal +0.32%, branch reduction negligible)
  Reason: lazy-init pattern already optimal, Phase 78-1 pattern shows diminishing returns

- Allocator comparison baseline update (10-run SSOT, WS=400, ITERS=20M):
  tcmalloc: 115.26M (92.33% of mimalloc)
  jemalloc: 97.39M (77.96% of mimalloc)
  system: 85.20M (68.24% of mimalloc)
  mimalloc: 124.82M (baseline)

- hakmem PROFILE correction: scripts/run_mixed_10_cleanenv.sh + run_allocator_quick_matrix.sh
  PROFILE explicitly set to MIXED_TINYV3_C7_SAFE for hakmem measurements
  Result: baseline stabilized to 55.53M (44.46% of mimalloc)
  Previous unstable measurement (35.57M) was due to profile leak

- Documentation:
  * PERFORMANCE_TARGETS_SCORECARD.md: Reference allocators + M1/M2 milestone status
  * PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md: Phase 83-1 analysis (NO-GO)
  * ALLOCATOR_COMPARISON_QUICK_RUNBOOK.md: Quick comparison procedure
  * ALLOCATOR_COMPARISON_SSOT.md: Detailed SSOT methodology

- M2 milestone status: 44.46% (target 55%, gap -10.54pp) - structural improvements needed

🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 18:50:00 +09:00

5.9 KiB

Phase 76-0: C7 Per-Class Statistics Analysis (SSOT化)

Executive Summary

Definitive C7 Statistics from Mixed SSOT Workload:

  • C7 Hit Count: 0 (ZERO allocations)
  • C7 Percentage: 0.00% of C4-C7 operations
  • Verdict: NO-GO for C7 P2 (inline slots optimization)

Test Configuration

Binary: bench_random_mixed_hakmem_observe (with HAKMEM_MEASURE_UNIFIED_CACHE=1)

Environment Variables:

HAKMEM_WARM_POOL_SIZE=16
HAKMEM_TINY_C5_INLINE_SLOTS=1
HAKMEM_TINY_C6_INLINE_SLOTS=1

Benchmark Parameters:

  • Iterations: 20,000,000
  • Working Set Size: 400
  • Runs: 1 (per-class stats are cumulative)

Unified Cache Initialization:

C4 capacity = 64  (power of 2)
C5 capacity = 128 (power of 2)
C6 capacity = 128 (power of 2)
C7 capacity = 128 (power of 2)

Results: Per-Class Statistics

C7 Statistics (CRITICAL FINDING)

Metric Value
Hit Count 0
Miss Count 0
Push Count 0
Full Count 0
Total Allocations 0
Occupied Slots 0/128
Hit Rate N/A
Full Rate N/A

Status: C7 received ZERO allocations in the Mixed SSOT workload.

C4-C7 Ranking (Cumulative)

Class Hit Count Miss Count Capacity Hit % Percentage of Total
C6 2,750,854 1 128 100.0% 57.17%
C5 1,373,604 1 128 100.0% 28.55%
C4 687,563 1 64 100.0% 14.29%
C7 0 0 128 N/A 0.00%
TOTAL 4,812,021 3 100.00%

Coverage Analysis

Cumulative Classes Operations Percentage
C6 alone 2,750,854 57.17%
C5+C6 4,124,458 85.72%
C4+C5+C6 4,812,021 100.00%
C4+C5+C6+C7 4,812,021 100.00% (no change)

Decision Analysis

Threshold Criteria

  • GO for C7 P2: C7 > 20% of C4-C7 operations
  • NEUTRAL: 15% < C7 ≤ 20% of C4-C7 operations
  • CONSIDER C4 redesign: C7 ≤ 15% of C4-C7 operations

Verdict: NO-GO for C7 P2

C7: 0.00% - Falls far below any viable threshold

Explanation:

  1. Zero Volume: The Mixed SSOT workload (128-1024B allocations) does NOT generate any C7 (1024-2048B) allocations.
  2. Workload Mismatch: The benchmark parameters (400 working set size, 20M iterations) are tuned to exercise C4-C6 intensively but avoid C7 entirely.
  3. No Optimization Benefit: Any C7 P2 (inline slots) optimization would provide 0% improvement for this specific workload.
  4. Resource Opportunity Cost: Engineering effort for C7 P2 would be better spent on C4 (14.29%) or investigating alternative workloads.

Phase 76-1: C4 Per-Class Deep Dive

Objective: Analyze C4 (14.3% of total operations) as the next optimization target

Rationale:

  • C4 is the largest remaining bottleneck after C5+C6 inline slots
  • C4 (256-512B) represents a significant portion of tiny allocations
  • After C5/C6 optimizations (85.7%), C4 becomes critical for overall performance

Investigation Areas:

  1. C4 Hit Rate: Currently 100.0% (full cache hits) - room for miss reduction?
  2. C4 Cache Occupancy: 63/64 slots occupied (near full)
  3. C4 Allocation Pattern: Is there temporal locality opportunity?
  4. Alternative: Investigate workloads that DO use C7 (system-level, long-lived objects)

Suggested Implementation Options:

  • C4 LIFO optimization (vs current FIFO-like behavior)
  • C4 spatial locality improvements
  • C4 refill batching (similar to C5/C6)
  • Hybrid C4-C5 inline slots strategy

Artifacts

Raw Log

Location: /tmp/phase76_0_c7_stats.log

Key excerpts:

[Unified-STATS] Unified Cache Metrics:
[Unified-STATS] Consistency Check:
[Unified-STATS]   total_allocs (hit+miss) = 5327287
[Unified-STATS]   total_frees (push+full) = 1202827

  C2: 128/2048 slots occupied, hit=172530 miss=1 (100.0% hit), push=172531 full=0 (0.0% full)
  C3: 128/2048 slots occupied, hit=342731 miss=1 (100.0% hit), push=342732 full=0 (0.0% full)
  C4: 63/64 slots occupied, hit=687563 miss=1 (100.0% hit), push=687564 full=0 (0.0% full)
  C5: 75/128 slots occupied, hit=1373604 miss=1 (100.0% hit), push=0 full=0 (0.0% full)
  C6: 42/128 slots occupied, hit=2750854 miss=1 (100.0% hit), push=0 full=0 (0.0% full)
  [C7 MISSING - 0 operations]

Throughput =  46152700 ops/s [iter=20000000 ws=400] time=0.433s

Verification Output

C7 Initialization: ✓ Capacity=128 allocated
C7 Route Assignment: ✓ LEGACY route configured
C7 Operations: ✗ ZERO allocations
C7 Carve Attempts: 0 (no operations triggered)
C7 Warm Pool: 0 pops, 0 pushes
C7 Meta Used Counter: 0 total operations

Key Insights

  1. Workload Characterization: The Mixed SSOT benchmark is optimized for C4-C6 (128-1024B). This is intentional and appropriate for most mixed workloads.

  2. C7 Market Opportunity: C7 (1024-2048B) allocations appear in:

    • Long-lived data structures (hash tables, trees)
    • System-level workloads (networking buffers)
    • Specialized benchmarks (not representative of general use)
  3. Optimization Priority:

    • C6 (57.2%): ✓ Already optimized with inline slots
    • C5 (28.5%): ✓ Already optimized with inline slots
    • C4 (14.3%): ← Next optimization target
    • C7 (0.0%): ✗ No presence in mixed workload
  4. Engineering Trade-offs:

    • C7 P2 would add complexity for 0% mixed-workload benefit
    • C4 redesign could improve 14.3% of operations
    • Consider phase-out of C7 optimization if isolated workloads don't justify it

Conclusion

Phase 76-0 Complete: C7 is definitively measured at 0.00% of Mixed SSOT operations.

Next Action: Proceed to Phase 76-1: C4 Analysis to evaluate the largest remaining optimization opportunity (14.29% of total operations).

File: /tmp/phase76_0_c7_stats.log Date: 2025-12-18 Status: ✓ Decision gate established