hakmem/docs/analysis/PHASE75_PERCLASS_ANALYSIS_0_SSOT.md

# Phase 75 Per-Class Analysis - Mixed SSOT Unified-STATS

**Status**: ANALYSIS COMPLETE, ready for Phase 75 (P2: Hot-class Inline Slots) targeting decision

**Workload**: Mixed SSOT (WS=400, ITERS=20000000, WarmPool=16)

**Measurement**: `HAKMEM_MEASURE_UNIFIED_CACHE=1` OBSERVE run

---

## 1. Per-Class Unified-STATS (Ranked by Volume)

### Data Summary

| Class | Capacity | Occupied | Hit Count | Push Count | Total Ops | Hit Rate | % of Total |
|-------|----------|----------|-----------|------------|-----------|----------|-----------|
| **C6** | 128 | 127 | 2,750,854 | 2,750,855 | **5,501,709** | 100.0% | **57.2%** |
| **C5** | 128 | 127 | 1,373,604 | 1,373,605 | **2,747,209** | 100.0% | **28.5%** |
| **C4** | 64  | 63  | 687,563   | 687,564   | **1,375,127** | 100.0% | **14.3%** |
| **C7** | ? | ? | ? | ? | **?** | ? | **?** |

**Total C4-C6**: 9,624,045 operations (100% hit rate across all three classes)

**Observation**: C7 statistics not visible in current OBSERVE output (may require additional diagnostics)

---

## 2. Ranking & Key Findings

### Volume Ranking (Descending)

1. **C6: 57.2% of C4-C7 volume** (2.75M hits, 2.75M pushes)
   - Highest operational density
   - Cache occupancy: 127/128 (99.2%)
   - Perfect 100% hit rate

2. **C5: 28.5% of C4-C7 volume** (1.37M hits, 1.37M pushes)
   - Second-highest operational density
   - Cache occupancy: 127/128 (99.2%)
   - Perfect 100% hit rate

3. **C4: 14.3% of C4-C7 volume** (687K hits, 687K pushes)
   - Lower operational density
   - Cache occupancy: 63/64 (98.4%)
   - Perfect 100% hit rate

4. **C7: UNKNOWN**
   - Statistics not yet captured
   - Requires separate analysis run with explicit C7 flags

---

## 3. Unified-STATS Interpretation

### Perfect Hit Rates (100% across all observed classes)

All observed classes (C4, C5, C6) achieve **100% hit rate** in Mixed SSOT workload:
- Zero refill events (`push == hit`)
- All allocations sourced from unified_cache (no fallback to backend)
- Cache capacity is **never exhausted** (0% full events)

**Implication**: UnifiedCache **sufficiently sized** for Mixed SSOT; refill path not active during benchmark.

### Cache Occupancy Patterns

```
C4: 63/64  slots occupied (98.4%) - 1 free slot
C5: 127/128 slots occupied (99.2%) - 1 free slot
C6: 127/128 slots occupied (99.2%) - 1 free slot
```

**Finding**: All classes operate at **near-capacity** (98-99%), indicating:
- Steady-state working set matches cache capacity
- Minimal fragmentation
- High cache efficiency

---

## 4. P2 (Hot-class Inline Slots) Targeting Strategy

### Recommendation: PRIMARY TARGET = C6

**Rationale**:
1. **Highest ROI**: C6 dominates with 57.2% of operations
   - ~2.75M hit operations = highest branch reduction opportunity
   - Any optimization on C6 provides 57% proportional benefit across all C4-C7 ops

2. **Secondary Target**: C5 (28.5%)
   - Significant volume, second-priority optimization
   - Compound benefit: C6 + C5 = 85.7% of C4-C7 operations

3. **Low Priority**: C4 (14.3%)
   - Lowest volume, lower ROI
   - Defer unless C6/C5 optimization requires it

4. **Unknown**: C7
   - Statistics not yet available
   - Recommend gathering C7 stats before deciding C6/C5/C4 vs C7 targeting

---

## 5. Inline Slots Design Impact Analysis

### Estimated Branch Reduction (per optimization)

Assuming **inline fast-path** placement (TLS-direct, zero-branch):

**Per-class impact** (based on Phase 74 lessons):
- Instruction count reduction per hit: ~2-4 instructions (push/pop branch elimination)
- Expected throughput gain per 1M hits: +0.05-0.10% (conservative estimate)

**C6 standalone**: 2.75M hits × 0.05-0.10%/M = **+0.14-0.27%** (projected, if branch overhead dominates)

**C6 + C5 combined**: 4.12M hits × 0.05-0.10%/M = **+0.21-0.41%** (projected)

**Risk factors**:
- Cache-miss sensitivity (Phase 74-2 showed +86% cache-misses from register pressure)
- TLS struct bloat (each inline slot = ~8-16 bytes × capacity per class)
- Memory hierarchy effects (L1-dcache pressure from TLS expansion)

---

## 6. Before/After Unified-STATS Baseline

### FAST PGO Baseline Reference (Phase 69: WarmPool=16)

**Important (SSOT)**:
- This baseline is from the FAST PGO scorecard and is the correct reference for mimalloc ratio tracking.
- If you run `scripts/run_mixed_10_cleanenv.sh` without setting `BENCH_BIN`, it defaults to the Standard binary (`./bench_random_mixed_hakmem`).
- To measure Phase 75 on FAST PGO, set:
  - `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh`

```
FAST Mixed SSOT Throughput: 62.63 M ops/s (51.77% of mimalloc)
Target M2: 55% of mimalloc (~65.1 M ops/s baseline)
Remaining gap: +3.23pp
```

### Phase 75 (P2) Success Criteria (measured vs FAST PGO baseline)

| Scenario | Throughput | vs Baseline | Status |
|----------|-----------|-----------|--------|
| **GO** | ≥ 64.1 M ops/s | +2.4% | +0.8pp toward M2 |
| **NEUTRAL** | 61.6-64.1 M ops/s | ±1.5% | freeze, continue Phase 76 |
| **NO-GO** | ≤ 61.6 M ops/s | -1.6% | revert immediately |

**Strict gate**: +2.0% for structural change (TLS bloat risk)

---

## 7. Risk Assessment: TLS Expansion vs Benefit

### TLS Struct Bloat Analysis

**Current TLS size** (estimated from Phase 69):
- UnifiedCache entries: minimal (backend pointers only)
- WarmPool SLL: ~2KB (Phase 69-71)
- **Total TINY_MEM TLS: ~2-4KB per thread**

**Proposed P2 expansion** (inline slots for C4-C7):
- C4 inline: 64 slots × 8 bytes = 512 bytes
- C5 inline: 128 slots × 8 bytes = 1,024 bytes
- C6 inline: 128 slots × 8 bytes = 1,024 bytes
- C7 inline: ??? slots × 8 bytes = ???
- **Total P2 expansion: ~2.5-3.5KB per class (selective) or ~4-5KB (all C4-C7)**

**TLS Memory Trade-off**:
- 10 threads × 4KB = **40KB system-wide** (negligible)
- But **per-thread L1-dcache footprint** increases
  - L1-dcache pressure → potential cache evictions
  - Phase 74-2 showed this can dominate (cache-misses +86%)

### Decision Gate

**Before proceeding with P2**:
1. Gather C7 statistics (currently missing)
2. Validate C6 > C5 > C4 > C7 ordering
3. Decide: C6-only, C6+C5, or full C4-C7?
4. Benchmark single-class inline (C6) first to validate ROI before expanding

---

## 8. Next Steps (User Decision Required)

### Option A: Proceed with C6-only P2 (Recommended - Lowest Risk)

**Approach**:
- Implement inline slots for C6 only (highest volume, 57.2%)
- Measure impact: target +1.5-2.5% throughput
- If successful, expand to C5 in Phase 75-2

**Pros**: Lowest TLS bloat, highest ROI/risk ratio
**Cons**: Multi-phase approach, requires two A/B cycles

### Option B: Proceed with C6+C5 P2 (Moderate Risk)

**Approach**:
- Implement inline slots for C6 + C5 (combined 85.7% of C4-C7 ops)
- Measure impact: target +2.0-3.0% throughput
- If successful, consolidate as Phase 75 final

**Pros**: Single A/B cycle, captures 85.7% of optimization opportunity
**Cons**: Higher TLS bloat (~2KB), higher register pressure risk

### Option C: Defer P2 Until C7 Analysis

**Approach**:
- Gather C7 statistics from separate OBSERVE run
- Rank all four classes before targeting
- Decide on C6/C5/C4/C7 balance based on full data

**Pros**: Data-driven decision, reduces risk of targeting wrong class
**Cons**: Adds diagnostic cycle before implementation

---

## 9. Recommendation Summary

**PRIMARY RECOMMENDATION**: **Option A - Start with C6-only**

**Rationale**:
1. C6 is clearly dominant (57.2% volume)
2. Lowest TLS bloat (~1KB) reduces register pressure risk
3. Conservative approach aligns with Phase 74 learnings (register pressure matters)
4. Fail-fast: if C6 shows positive ROI, expand to C5; if NO-GO, iterate differently

**Secondary**: Gather C7 stats in parallel to validate completeness

**Decision**: **User choice** - provide approach preference before proceeding to Phase 75 implementation

---

## Artifacts

- **Baseline**: Mixed SSOT OBSERVE run: `./bench_random_mixed_hakmem_observe 20000000 400 1`
- **Measurement**: Per-class Unified-STATS with `HAKMEM_MEASURE_UNIFIED_CACHE=1`
- **Analysis**: This document (PHASE75_PERCLASS_ANALYSIS_0_SSOT.md)

---

## Timeline

- Phase 74 (P1/P0): UnifiedCache hit-path optimization → FROZEN (NEUTRAL)
- Phase 75 (P2): Hot-class Inline Slots → **PENDING USER DECISION** (targeting strategy)
- Phase 75-1: Implement selected class(es) → (next)
- Phase 75-2: A/B test & results → (next)
-												Phase 75-1: C6-only Inline Slots (P2) - GO (+2.87%)

Modular implementation of hot-class inline slots optimization:
- Created 5 new boxes: env_box, tls_box, fast_path_api, integration_box, test_script
- Single decision point at TLS init (ENV gate: HAKMEM_TINY_C6_INLINE_SLOTS=0/1)
- Integration: 2 minimal boundary points (alloc/free paths for C6 class)
- Default OFF: zero overhead when disabled (full backward compatibility)

Results (10-run Mixed SSOT, WS=400):
- Baseline (C6 inline OFF):  44.24 M ops/s
- Treatment (C6 inline ON):  45.51 M ops/s
- Delta: +1.27 M ops/s (+2.87%)

Status: ✅ GO - Strong improvement via C6 ring buffer fast-path
Mechanism: Branch elimination on unified_cache_push/pop for C6 allocations
Next: Phase 75-2 (add C5 inline slots, target 85% C4-C7 coverage)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

											
										
										
											2025-12-18 08:22:09 +09:00
+								# Phase 75 Per-Class Analysis - Mixed SSOT Unified-STATS
 								**Status**: ANALYSIS COMPLETE, ready for Phase 75 (P2: Hot-class Inline Slots) targeting decision
 								**Workload**: Mixed SSOT (WS=400, ITERS=20000000, WarmPool=16)
 								**Measurement**: `HAKMEM_MEASURE_UNIFIED_CACHE=1` OBSERVE run
 								---
 								## 1. Per-Class Unified-STATS (Ranked by Volume)
 								### Data Summary
 								| Class | Capacity | Occupied | Hit Count | Push Count | Total Ops | Hit Rate | % of Total |
 								|-------|----------|----------|-----------|------------|-----------|----------|-----------|
 								| **C6** | 128 | 127 | 2,750,854 | 2,750,855 | **5,501,709** | 100.0% | **57.2%** |
 								| **C5** | 128 | 127 | 1,373,604 | 1,373,605 | **2,747,209** | 100.0% | **28.5%** |
 								| **C4** | 64  | 63  | 687,563   | 687,564   | **1,375,127** | 100.0% | **14.3%** |
 								| **C7** | ? | ? | ? | ? | **?** | ? | **?** |
 								**Total C4-C6**: 9,624,045 operations (100% hit rate across all three classes)
 								**Observation**: C7 statistics not visible in current OBSERVE output (may require additional diagnostics)
 								---
 								## 2. Ranking & Key Findings
 								### Volume Ranking (Descending)
 . **C6: 57.2% of C4-C7 volume** (2.75M hits, 2.75M pushes)
 								   - Highest operational density
 								   - Cache occupancy: 127/128 (99.2%)
 								   - Perfect 100% hit rate
 . **C5: 28.5% of C4-C7 volume** (1.37M hits, 1.37M pushes)
 								   - Second-highest operational density
 								   - Cache occupancy: 127/128 (99.2%)
 								   - Perfect 100% hit rate
 . **C4: 14.3% of C4-C7 volume** (687K hits, 687K pushes)
 								   - Lower operational density
 								   - Cache occupancy: 63/64 (98.4%)
 								   - Perfect 100% hit rate
 . **C7: UNKNOWN**
 								   - Statistics not yet captured
 								   - Requires separate analysis run with explicit C7 flags
 								---
 								## 3. Unified-STATS Interpretation
 								### Perfect Hit Rates (100% across all observed classes)
 								All observed classes (C4, C5, C6) achieve **100% hit rate** in Mixed SSOT workload:
 								- Zero refill events (`push == hit`)
 								- All allocations sourced from unified_cache (no fallback to backend)
 								- Cache capacity is **never exhausted** (0% full events)
 								**Implication**: UnifiedCache **sufficiently sized** for Mixed SSOT; refill path not active during benchmark.
 								### Cache Occupancy Patterns
 								```
 								C4: 63/64  slots occupied (98.4%) - 1 free slot
 								C5: 127/128 slots occupied (99.2%) - 1 free slot
 								C6: 127/128 slots occupied (99.2%) - 1 free slot
 								```
 								**Finding**: All classes operate at **near-capacity** (98-99%), indicating:
 								- Steady-state working set matches cache capacity
 								- Minimal fragmentation
 								- High cache efficiency
 								---
 								## 4. P2 (Hot-class Inline Slots) Targeting Strategy
 								### Recommendation: PRIMARY TARGET = C6
 								**Rationale**:
 . **Highest ROI**: C6 dominates with 57.2% of operations
 								   - ~2.75M hit operations = highest branch reduction opportunity
 								   - Any optimization on C6 provides 57% proportional benefit across all C4-C7 ops
 . **Secondary Target**: C5 (28.5%)
 								   - Significant volume, second-priority optimization
 								   - Compound benefit: C6 + C5 = 85.7% of C4-C7 operations
 . **Low Priority**: C4 (14.3%)
 								   - Lowest volume, lower ROI
 								   - Defer unless C6/C5 optimization requires it
 . **Unknown**: C7
 								   - Statistics not yet available
 								   - Recommend gathering C7 stats before deciding C6/C5/C4 vs C7 targeting
 								---
 								## 5. Inline Slots Design Impact Analysis
 								### Estimated Branch Reduction (per optimization)
 								Assuming **inline fast-path** placement (TLS-direct, zero-branch):
 								**Per-class impact** (based on Phase 74 lessons):
 								- Instruction count reduction per hit: ~2-4 instructions (push/pop branch elimination)
 								- Expected throughput gain per 1M hits: +0.05-0.10% (conservative estimate)
 								**C6 standalone**: 2.75M hits × 0.05-0.10%/M = **+0.14-0.27%** (projected, if branch overhead dominates)
 								**C6 + C5 combined**: 4.12M hits × 0.05-0.10%/M = **+0.21-0.41%** (projected)
 								**Risk factors**:
 								- Cache-miss sensitivity (Phase 74-2 showed +86% cache-misses from register pressure)
 								- TLS struct bloat (each inline slot = ~8-16 bytes × capacity per class)
 								- Memory hierarchy effects (L1-dcache pressure from TLS expansion)
 								---
 								## 6. Before/After Unified-STATS Baseline
-												docs: clarify Phase 75 vs FAST PGO SSOT

											
										
										
											2025-12-18 09:11:56 +09:00
+								### FAST PGO Baseline Reference (Phase 69: WarmPool=16)
 								**Important (SSOT)**:
 								- This baseline is from the FAST PGO scorecard and is the correct reference for mimalloc ratio tracking.
 								- If you run `scripts/run_mixed_10_cleanenv.sh` without setting `BENCH_BIN`, it defaults to the Standard binary (`./bench_random_mixed_hakmem`).
 								- To measure Phase 75 on FAST PGO, set:
 								  - `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh`
-												Phase 75-1: C6-only Inline Slots (P2) - GO (+2.87%)

Modular implementation of hot-class inline slots optimization:
- Created 5 new boxes: env_box, tls_box, fast_path_api, integration_box, test_script
- Single decision point at TLS init (ENV gate: HAKMEM_TINY_C6_INLINE_SLOTS=0/1)
- Integration: 2 minimal boundary points (alloc/free paths for C6 class)
- Default OFF: zero overhead when disabled (full backward compatibility)

Results (10-run Mixed SSOT, WS=400):
- Baseline (C6 inline OFF):  44.24 M ops/s
- Treatment (C6 inline ON):  45.51 M ops/s
- Delta: +1.27 M ops/s (+2.87%)

Status: ✅ GO - Strong improvement via C6 ring buffer fast-path
Mechanism: Branch elimination on unified_cache_push/pop for C6 allocations
Next: Phase 75-2 (add C5 inline slots, target 85% C4-C7 coverage)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

											
										
										
											2025-12-18 08:22:09 +09:00
 								```
-												docs: clarify Phase 75 vs FAST PGO SSOT

											
										
										
											2025-12-18 09:11:56 +09:00
+								FAST Mixed SSOT Throughput: 62.63 M ops/s (51.77% of mimalloc)
-												Phase 75-1: C6-only Inline Slots (P2) - GO (+2.87%)

Modular implementation of hot-class inline slots optimization:
- Created 5 new boxes: env_box, tls_box, fast_path_api, integration_box, test_script
- Single decision point at TLS init (ENV gate: HAKMEM_TINY_C6_INLINE_SLOTS=0/1)
- Integration: 2 minimal boundary points (alloc/free paths for C6 class)
- Default OFF: zero overhead when disabled (full backward compatibility)

Results (10-run Mixed SSOT, WS=400):
- Baseline (C6 inline OFF):  44.24 M ops/s
- Treatment (C6 inline ON):  45.51 M ops/s
- Delta: +1.27 M ops/s (+2.87%)

Status: ✅ GO - Strong improvement via C6 ring buffer fast-path
Mechanism: Branch elimination on unified_cache_push/pop for C6 allocations
Next: Phase 75-2 (add C5 inline slots, target 85% C4-C7 coverage)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

											
										
										
											2025-12-18 08:22:09 +09:00
+								Target M2: 55% of mimalloc (~65.1 M ops/s baseline)
 								Remaining gap: +3.23pp
 								```
-												docs: clarify Phase 75 vs FAST PGO SSOT

											
										
										
											2025-12-18 09:11:56 +09:00
+								### Phase 75 (P2) Success Criteria (measured vs FAST PGO baseline)
-												Phase 75-1: C6-only Inline Slots (P2) - GO (+2.87%)

Modular implementation of hot-class inline slots optimization:
- Created 5 new boxes: env_box, tls_box, fast_path_api, integration_box, test_script
- Single decision point at TLS init (ENV gate: HAKMEM_TINY_C6_INLINE_SLOTS=0/1)
- Integration: 2 minimal boundary points (alloc/free paths for C6 class)
- Default OFF: zero overhead when disabled (full backward compatibility)

Results (10-run Mixed SSOT, WS=400):
- Baseline (C6 inline OFF):  44.24 M ops/s
- Treatment (C6 inline ON):  45.51 M ops/s
- Delta: +1.27 M ops/s (+2.87%)

Status: ✅ GO - Strong improvement via C6 ring buffer fast-path
Mechanism: Branch elimination on unified_cache_push/pop for C6 allocations
Next: Phase 75-2 (add C5 inline slots, target 85% C4-C7 coverage)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

											
										
										
											2025-12-18 08:22:09 +09:00
 								| Scenario | Throughput | vs Baseline | Status |
 								|----------|-----------|-----------|--------|
 								| **GO** | ≥ 64.1 M ops/s | +2.4% | +0.8pp toward M2 |
 								| **NEUTRAL** | 61.6-64.1 M ops/s | ±1.5% | freeze, continue Phase 76 |
 								| **NO-GO** | ≤ 61.6 M ops/s | -1.6% | revert immediately |
 								**Strict gate**: +2.0% for structural change (TLS bloat risk)
 								---
 								## 7. Risk Assessment: TLS Expansion vs Benefit
 								### TLS Struct Bloat Analysis
 								**Current TLS size** (estimated from Phase 69):
 								- UnifiedCache entries: minimal (backend pointers only)
 								- WarmPool SLL: ~2KB (Phase 69-71)
 								- **Total TINY_MEM TLS: ~2-4KB per thread**
 								**Proposed P2 expansion** (inline slots for C4-C7):
 								- C4 inline: 64 slots × 8 bytes = 512 bytes
 								- C5 inline: 128 slots × 8 bytes = 1,024 bytes
 								- C6 inline: 128 slots × 8 bytes = 1,024 bytes
 								- C7 inline: ??? slots × 8 bytes = ???
 								- **Total P2 expansion: ~2.5-3.5KB per class (selective) or ~4-5KB (all C4-C7)**
 								**TLS Memory Trade-off**:
 								- 10 threads × 4KB = **40KB system-wide** (negligible)
 								- But **per-thread L1-dcache footprint** increases
 								  - L1-dcache pressure → potential cache evictions
 								  - Phase 74-2 showed this can dominate (cache-misses +86%)
 								### Decision Gate
 								**Before proceeding with P2**:
 . Gather C7 statistics (currently missing)
 . Validate C6 > C5 > C4 > C7 ordering
 . Decide: C6-only, C6+C5, or full C4-C7?
 . Benchmark single-class inline (C6) first to validate ROI before expanding
 								---
 								## 8. Next Steps (User Decision Required)
 								### Option A: Proceed with C6-only P2 (Recommended - Lowest Risk)
 								**Approach**:
 								- Implement inline slots for C6 only (highest volume, 57.2%)
 								- Measure impact: target +1.5-2.5% throughput
 								- If successful, expand to C5 in Phase 75-2
 								**Pros**: Lowest TLS bloat, highest ROI/risk ratio
 								**Cons**: Multi-phase approach, requires two A/B cycles
 								### Option B: Proceed with C6+C5 P2 (Moderate Risk)
 								**Approach**:
 								- Implement inline slots for C6 + C5 (combined 85.7% of C4-C7 ops)
 								- Measure impact: target +2.0-3.0% throughput
 								- If successful, consolidate as Phase 75 final
 								**Pros**: Single A/B cycle, captures 85.7% of optimization opportunity
 								**Cons**: Higher TLS bloat (~2KB), higher register pressure risk
 								### Option C: Defer P2 Until C7 Analysis
 								**Approach**:
 								- Gather C7 statistics from separate OBSERVE run
 								- Rank all four classes before targeting
 								- Decide on C6/C5/C4/C7 balance based on full data
 								**Pros**: Data-driven decision, reduces risk of targeting wrong class
 								**Cons**: Adds diagnostic cycle before implementation
 								---
 								## 9. Recommendation Summary
 								**PRIMARY RECOMMENDATION**: **Option A - Start with C6-only**
 								**Rationale**:
 . C6 is clearly dominant (57.2% volume)
 . Lowest TLS bloat (~1KB) reduces register pressure risk
 . Conservative approach aligns with Phase 74 learnings (register pressure matters)
 . Fail-fast: if C6 shows positive ROI, expand to C5; if NO-GO, iterate differently
 								**Secondary**: Gather C7 stats in parallel to validate completeness
 								**Decision**: **User choice** - provide approach preference before proceeding to Phase 75 implementation
 								---
 								## Artifacts
 								- **Baseline**: Mixed SSOT OBSERVE run: `./bench_random_mixed_hakmem_observe 20000000 400 1`
 								- **Measurement**: Per-class Unified-STATS with `HAKMEM_MEASURE_UNIFIED_CACHE=1`
 								- **Analysis**: This document (PHASE75_PERCLASS_ANALYSIS_0_SSOT.md)
 								---
 								## Timeline
 								- Phase 74 (P1/P0): UnifiedCache hit-path optimization → FROZEN (NEUTRAL)
 								- Phase 75 (P2): Hot-class Inline Slots → **PENDING USER DECISION** (targeting strategy)
 								- Phase 75-1: Implement selected class(es) → (next)
 								- Phase 75-2: A/B test & results → (next)