## Root Cause Analysis (GPT5) **Physical Layout Constraints**: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) **Correct Specification**: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 **Previous Bug**: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) **Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000` **Results**: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers **Analysis**: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
541 lines
15 KiB
Markdown
541 lines
15 KiB
Markdown
# Phase E3: Performance Restoration Implementation Plan
|
|
|
|
**Date**: 2025-11-12
|
|
**Goal**: Restore Phase 7 performance (9M → 59-70M ops/s, +541-674%)
|
|
**Status**: READY TO IMPLEMENT
|
|
|
|
---
|
|
|
|
## Quick Reference
|
|
|
|
### The One Critical Fix
|
|
|
|
**File**: `/mnt/workdisk/public_share/hakmem/core/tiny_free_fast_v2.inc.h`
|
|
**Lines to Remove**: 54-63 (SuperSlab registry lookup)
|
|
**Impact**: -91% latency, +1100% throughput
|
|
|
|
---
|
|
|
|
## Phase E3-1: Remove Registry Lookup (CRITICAL)
|
|
|
|
### Detailed Code Changes
|
|
|
|
**File**: `core/tiny_free_fast_v2.inc.h`
|
|
|
|
**Lines 51-63 (BEFORE - SLOW)**:
|
|
```c
|
|
static inline int hak_tiny_free_fast_v2(void* ptr) {
|
|
if (__builtin_expect(!ptr, 0)) return 0;
|
|
|
|
// CRITICAL: C7 (1KB headerless) MUST be excluded from Ultra-Fast Free
|
|
// Problem: Magic validation alone insufficient (C7 user data can be 0xaX pattern)
|
|
// Solution: Registry lookup to 100% identify C7 before header read
|
|
// Cost: 50-100 cycles (O(log N) RB-tree), but C7 is rare (~5% of allocations)
|
|
// Benefit: 100% SEGV prevention, no false positives
|
|
extern struct SuperSlab* hak_super_lookup(void* ptr);
|
|
struct SuperSlab* ss = hak_super_lookup(ptr);
|
|
if (__builtin_expect(ss && ss->size_class == 7, 0)) {
|
|
return 0; // C7 detected → force slow path (Front Gate will handle correctly)
|
|
}
|
|
|
|
// CRITICAL: Check if header is accessible before reading
|
|
void* header_addr = (char*)ptr - 1;
|
|
```
|
|
|
|
**Lines 51-63 (AFTER - FAST)**:
|
|
```c
|
|
static inline int hak_tiny_free_fast_v2(void* ptr) {
|
|
if (__builtin_expect(!ptr, 0)) return 0;
|
|
|
|
// Phase E3: C7 now has header (Phase E1), no registry lookup needed!
|
|
// Header magic validation (2-3 cycles) is sufficient to distinguish:
|
|
// - Tiny (0xA0-0xA7): valid header → fast path
|
|
// - Pool TLS (0xB0-0xBF): different magic → slow path
|
|
// - Mid/Large: no header → slow path
|
|
// - C7: has header like all other classes → fast path works!
|
|
//
|
|
// Performance: 5-10 cycles (vs 55-110 cycles with registry lookup)
|
|
|
|
// CRITICAL: Check if header is accessible before reading
|
|
void* header_addr = (char*)ptr - 1;
|
|
```
|
|
|
|
**Summary of Changes**:
|
|
- **DELETE**: Lines 54-62 (9 lines of SuperSlab registry lookup code)
|
|
- **ADD**: 7 lines of explanatory comments (why registry lookup is no longer needed)
|
|
- **Net change**: -2 lines, -50-100 cycles per free operation
|
|
|
|
### Build & Test Commands
|
|
|
|
```bash
|
|
# 1. Edit file
|
|
vim /mnt/workdisk/public_share/hakmem/core/tiny_free_fast_v2.inc.h
|
|
|
|
# 2. Build release binary
|
|
cd /mnt/workdisk/public_share/hakmem
|
|
./build.sh bench_random_mixed_hakmem
|
|
|
|
# 3. Verify build succeeded
|
|
ls -lh ./out/release/bench_random_mixed_hakmem
|
|
|
|
# 4. Run benchmarks (3 runs each for stability)
|
|
echo "=== 128B Benchmark ==="
|
|
./out/release/bench_random_mixed_hakmem 100000 128 42 2>&1 | tail -1
|
|
./out/release/bench_random_mixed_hakmem 100000 128 43 2>&1 | tail -1
|
|
./out/release/bench_random_mixed_hakmem 100000 128 44 2>&1 | tail -1
|
|
|
|
echo "=== 256B Benchmark ==="
|
|
./out/release/bench_random_mixed_hakmem 100000 256 42 2>&1 | tail -1
|
|
./out/release/bench_random_mixed_hakmem 100000 256 43 2>&1 | tail -1
|
|
./out/release/bench_random_mixed_hakmem 100000 256 44 2>&1 | tail -1
|
|
|
|
echo "=== 512B Benchmark ==="
|
|
./out/release/bench_random_mixed_hakmem 100000 512 42 2>&1 | tail -1
|
|
./out/release/bench_random_mixed_hakmem 100000 512 43 2>&1 | tail -1
|
|
./out/release/bench_random_mixed_hakmem 100000 512 44 2>&1 | tail -1
|
|
|
|
echo "=== 1024B Benchmark ==="
|
|
./out/release/bench_random_mixed_hakmem 100000 1024 42 2>&1 | tail -1
|
|
./out/release/bench_random_mixed_hakmem 100000 1024 43 2>&1 | tail -1
|
|
./out/release/bench_random_mixed_hakmem 100000 1024 44 2>&1 | tail -1
|
|
```
|
|
|
|
### Success Criteria (Phase E3-1)
|
|
|
|
**Minimum Acceptable Performance** (vs current 9M ops/s):
|
|
- 128B: ≥30M ops/s (+226%)
|
|
- 256B: ≥32M ops/s (+240%)
|
|
- 512B: ≥28M ops/s (+233%)
|
|
- 1024B: ≥28M ops/s (+233%)
|
|
|
|
**Target Performance** (Phase 7-1.3 baseline):
|
|
- 128B: 40-50M ops/s (+335-443%)
|
|
- 256B: 45-55M ops/s (+379-485%)
|
|
- 512B: 40-50M ops/s (+376-495%)
|
|
- 1024B: 40-50M ops/s (+376-495%)
|
|
|
|
---
|
|
|
|
## Phase E3-2: Header-First Classification (OPTIONAL)
|
|
|
|
### Why Optional?
|
|
|
|
Phase E3-1 (remove registry lookup from fast path) should restore 80-90% of Phase 7 performance. Phase E3-2 optimizes the **slow path** (TLS cache full, Pool TLS, Mid/Large), which is only 1-5% of operations.
|
|
|
|
**Impact**: Additional +10-20% on top of Phase E3-1
|
|
|
|
### Detailed Code Changes
|
|
|
|
**File**: `core/box/front_gate_classifier.h`
|
|
|
|
**Lines 166-234 (BEFORE - Registry-First)**:
|
|
```c
|
|
static inline __attribute__((always_inline))
|
|
ptr_classification_t classify_ptr(void* ptr) {
|
|
ptr_classification_t result = {
|
|
.kind = PTR_KIND_UNKNOWN,
|
|
.class_idx = -1,
|
|
.ss = NULL,
|
|
.slab_idx = -1
|
|
};
|
|
|
|
if (__builtin_expect(!ptr, 0)) return result;
|
|
if (__builtin_expect((uintptr_t)ptr < 4096, 0)) {
|
|
result.kind = PTR_KIND_UNKNOWN;
|
|
return result;
|
|
}
|
|
|
|
#ifdef HAKMEM_POOL_TLS_PHASE1
|
|
if (__builtin_expect(is_pool_tls_reg(ptr), 0)) {
|
|
result.kind = PTR_KIND_POOL_TLS;
|
|
return result;
|
|
}
|
|
#endif
|
|
|
|
// ❌ SLOW: Registry lookup FIRST (50-100 cycles)
|
|
result = registry_lookup(ptr);
|
|
if (__builtin_expect(result.kind == PTR_KIND_TINY_HEADERLESS, 0)) {
|
|
return result;
|
|
}
|
|
if (__builtin_expect(result.kind == PTR_KIND_TINY_HEADER, 1)) {
|
|
return result;
|
|
}
|
|
|
|
// ... rest of function ...
|
|
}
|
|
```
|
|
|
|
**Lines 166-234 (AFTER - Header-First)**:
|
|
```c
|
|
static inline __attribute__((always_inline))
|
|
ptr_classification_t classify_ptr(void* ptr) {
|
|
ptr_classification_t result = {
|
|
.kind = PTR_KIND_UNKNOWN,
|
|
.class_idx = -1,
|
|
.ss = NULL,
|
|
.slab_idx = -1
|
|
};
|
|
|
|
if (__builtin_expect(!ptr, 0)) return result;
|
|
if (__builtin_expect((uintptr_t)ptr < 4096, 0)) {
|
|
result.kind = PTR_KIND_UNKNOWN;
|
|
return result;
|
|
}
|
|
|
|
// ✅ FAST: Try header probe FIRST (2-3 cycles, 95-99% hit rate)
|
|
int class_idx = safe_header_probe(ptr);
|
|
if (__builtin_expect(class_idx >= 0, 1)) {
|
|
// Valid Tiny header found
|
|
result.kind = PTR_KIND_TINY_HEADER;
|
|
result.class_idx = class_idx;
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
extern __thread uint64_t g_classify_header_hit;
|
|
g_classify_header_hit++;
|
|
#endif
|
|
return result; // Fast path: 2-3 cycles!
|
|
}
|
|
|
|
#ifdef HAKMEM_POOL_TLS_PHASE1
|
|
// Fallback: Check Pool TLS registry (header probe failed)
|
|
if (__builtin_expect(is_pool_tls_reg(ptr), 0)) {
|
|
result.kind = PTR_KIND_POOL_TLS;
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
extern __thread uint64_t g_classify_pool_hit;
|
|
g_classify_pool_hit++;
|
|
#endif
|
|
return result;
|
|
}
|
|
#endif
|
|
|
|
// Fallback: Registry lookup (rare, <1%)
|
|
result = registry_lookup(ptr);
|
|
if (__builtin_expect(result.kind == PTR_KIND_TINY_HEADERLESS, 0)) {
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
extern __thread uint64_t g_classify_headerless_hit;
|
|
g_classify_headerless_hit++;
|
|
#endif
|
|
return result;
|
|
}
|
|
if (__builtin_expect(result.kind == PTR_KIND_TINY_HEADER, 0)) {
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
extern __thread uint64_t g_classify_header_hit;
|
|
g_classify_header_hit++;
|
|
#endif
|
|
return result;
|
|
}
|
|
|
|
// ... rest of function (16-byte AllocHeader check) ...
|
|
}
|
|
```
|
|
|
|
### Build & Test Commands
|
|
|
|
```bash
|
|
# 1. Edit file
|
|
vim /mnt/workdisk/public_share/hakmem/core/box/front_gate_classifier.h
|
|
|
|
# 2. Rebuild
|
|
./build.sh bench_random_mixed_hakmem
|
|
|
|
# 3. Benchmark (should see +10-20% improvement over Phase E3-1)
|
|
./out/release/bench_random_mixed_hakmem 100000 256 42 2>&1 | tail -1
|
|
```
|
|
|
|
### Success Criteria (Phase E3-2)
|
|
|
|
**Target**: +10-20% improvement over Phase E3-1
|
|
|
|
**Example**:
|
|
- Phase E3-1: 45M ops/s
|
|
- Phase E3-2: 50-55M ops/s (+11-22%)
|
|
|
|
---
|
|
|
|
## Phase E3-3: Remove C7 Special Cases (CLEANUP)
|
|
|
|
### Why Cleanup?
|
|
|
|
Phase E1 added headers to C7, making all `if (class_idx == 7)` conditionals obsolete. However, many files still contain C7 special cases from legacy code.
|
|
|
|
**Impact**: Code simplification + 5-10% reduced branching overhead
|
|
|
|
### Files to Edit
|
|
|
|
#### File 1: `core/hakmem_tiny_free.inc`
|
|
|
|
**Lines to Remove/Modify**:
|
|
```bash
|
|
# Find all C7 special cases
|
|
grep -n "class_idx == 7" core/hakmem_tiny_free.inc
|
|
```
|
|
|
|
**Expected Output**:
|
|
```
|
|
32: // CRITICAL: C7 (1KB) is headerless - MUST NOT drain to TLS SLL
|
|
34: if (__builtin_expect(class_idx == 7, 0)) return;
|
|
124: if (__builtin_expect(g_tiny_safe_free || class_idx == 7, 0)) {
|
|
145: if (__builtin_expect(g_tiny_safe_free || class_idx == 7, 0)) {
|
|
158: if (g_tiny_safe_free_strict || class_idx == 7) { raise(SIGUSR2); return; }
|
|
195: void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1);
|
|
211: void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1);
|
|
233: void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1);
|
|
241: void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1);
|
|
253: void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1);
|
|
348: // CRITICAL: C7 (1KB) is headerless - MUST NOT use TLS SLL
|
|
384: void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1);
|
|
445: void* base2 = (fast_class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1);
|
|
```
|
|
|
|
**Changes**:
|
|
|
|
1. **Line 32-34**: Remove early return for C7
|
|
```c
|
|
// BEFORE
|
|
// CRITICAL: C7 (1KB) is headerless - MUST NOT drain to TLS SLL
|
|
if (__builtin_expect(class_idx == 7, 0)) return;
|
|
|
|
// AFTER (DELETE these 2 lines)
|
|
```
|
|
|
|
2. **Lines 124, 145, 158**: Remove `|| class_idx == 7` conditions
|
|
```c
|
|
// BEFORE
|
|
if (__builtin_expect(g_tiny_safe_free || class_idx == 7, 0)) {
|
|
|
|
// AFTER
|
|
if (__builtin_expect(g_tiny_safe_free, 0)) {
|
|
```
|
|
|
|
3. **Lines 195, 211, 233, 241, 253, 384, 445**: Simplify base calculation
|
|
```c
|
|
// BEFORE
|
|
void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1);
|
|
|
|
// AFTER (ALL classes have header now)
|
|
void* base = (void*)((uint8_t*)ptr - 1);
|
|
```
|
|
|
|
4. **Line 348**: Remove C7 comment (obsolete)
|
|
```c
|
|
// BEFORE
|
|
// CRITICAL: C7 (1KB) is headerless - MUST NOT use TLS SLL
|
|
|
|
// AFTER (DELETE this line)
|
|
```
|
|
|
|
#### File 2: `core/hakmem_tiny_alloc.inc`
|
|
|
|
**Lines to Remove/Modify**:
|
|
```bash
|
|
grep -n "class_idx == 7" core/hakmem_tiny_alloc.inc
|
|
```
|
|
|
|
**Expected Output**:
|
|
```
|
|
252: if (__builtin_expect(class_idx == 7, 0)) { *(void**)hotmag_ptr = NULL; }
|
|
281: if (__builtin_expect(class_idx == 7, 0)) { *(void**)fast_hot = NULL; }
|
|
292: if (__builtin_expect(class_idx == 7, 0)) { *(void**)fast = NULL; }
|
|
```
|
|
|
|
**Changes**: Remove all 3 lines (C7 now has header, no NULL clearing needed)
|
|
|
|
#### File 3: `core/hakmem_tiny_slow.inc`
|
|
|
|
**Lines to Remove/Modify**:
|
|
```bash
|
|
grep -n "class_idx == 7" core/hakmem_tiny_slow.inc
|
|
```
|
|
|
|
**Expected Output**:
|
|
```
|
|
25: // Try TLS list refill (C7 is headerless: skip TLS list entirely)
|
|
```
|
|
|
|
**Changes**: Update comment
|
|
```c
|
|
// BEFORE
|
|
// Try TLS list refill (C7 is headerless: skip TLS list entirely)
|
|
|
|
// AFTER
|
|
// Try TLS list refill (all classes use TLS list now)
|
|
```
|
|
|
|
### Build & Test Commands
|
|
|
|
```bash
|
|
# 1. Edit files
|
|
vim core/hakmem_tiny_free.inc
|
|
vim core/hakmem_tiny_alloc.inc
|
|
vim core/hakmem_tiny_slow.inc
|
|
|
|
# 2. Rebuild
|
|
./build.sh bench_random_mixed_hakmem
|
|
|
|
# 3. Verify no regressions
|
|
./out/release/bench_random_mixed_hakmem 100000 1024 42 2>&1 | tail -1
|
|
```
|
|
|
|
### Success Criteria (Phase E3-3)
|
|
|
|
**Target**: 50-60M → 59-70M ops/s (+5-10% from reduced branching)
|
|
|
|
**Code Quality**:
|
|
- All C7 special cases removed
|
|
- Unified base pointer calculation (`ptr - 1` for all classes)
|
|
- Cleaner, more maintainable code
|
|
|
|
---
|
|
|
|
## Final Verification
|
|
|
|
### Full Benchmark Suite
|
|
|
|
```bash
|
|
# Run comprehensive benchmarks
|
|
cd /mnt/workdisk/public_share/hakmem
|
|
|
|
# 1. Random Mixed (primary benchmark)
|
|
for size in 128 256 512 1024; do
|
|
echo "=== Random Mixed ${size}B ==="
|
|
./out/release/bench_random_mixed_hakmem 100000 $size 42 2>&1 | grep "Throughput"
|
|
done
|
|
|
|
# 2. Fixed Size (stability check)
|
|
for size in 256 1024; do
|
|
echo "=== Fixed Size ${size}B ==="
|
|
./out/release/bench_fixed_size_hakmem 200000 $size 128 2>&1 | grep "Throughput"
|
|
done
|
|
|
|
# 3. Larson (multi-threaded stress test)
|
|
echo "=== Larson Multi-Threaded ==="
|
|
./out/release/larson_hakmem 1 2>&1 | grep "ops/sec"
|
|
```
|
|
|
|
### Expected Results (After All 3 Phases)
|
|
|
|
| Benchmark | Current | Phase E3 | Improvement |
|
|
|-----------|---------|----------|-------------|
|
|
| Random Mixed 128B | 9.2M | **59M** | **+541%** 🎯 |
|
|
| Random Mixed 256B | 9.4M | **70M** | **+645%** 🎯 |
|
|
| Random Mixed 512B | 8.4M | **68M** | **+710%** 🎯 |
|
|
| Random Mixed 1024B | 8.4M | **65M** | **+674%** 🎯 |
|
|
| Fixed Size 256B | 2.76M | **10-12M** | **+263-335%** |
|
|
| Larson 1T | 2.68M | **8-10M** | **+199-273%** |
|
|
|
|
---
|
|
|
|
## Rollback Plan (If Needed)
|
|
|
|
### If Phase E3-1 Causes Issues
|
|
|
|
```bash
|
|
# Revert to current version
|
|
git checkout HEAD -- core/tiny_free_fast_v2.inc.h
|
|
./build.sh bench_random_mixed_hakmem
|
|
```
|
|
|
|
### If Phase E3-2 Causes Issues
|
|
|
|
```bash
|
|
# Revert to Phase E3-1
|
|
git checkout HEAD -- core/box/front_gate_classifier.h
|
|
./build.sh bench_random_mixed_hakmem
|
|
```
|
|
|
|
### If Phase E3-3 Causes Issues
|
|
|
|
```bash
|
|
# Revert cleanup changes
|
|
git checkout HEAD -- core/hakmem_tiny_free.inc core/hakmem_tiny_alloc.inc core/hakmem_tiny_slow.inc
|
|
./build.sh bench_random_mixed_hakmem
|
|
```
|
|
|
|
---
|
|
|
|
## Risk Assessment
|
|
|
|
### Phase E3-1: Remove Registry Lookup
|
|
|
|
**Risk**: ⚠️ **LOW**
|
|
- Reverting to Phase 7-1.3 code (proven stable at 59-70M ops/s)
|
|
- Phase E1 already added headers to C7 (safety guaranteed)
|
|
- Header magic validation (2-3 cycles) sufficient for classification
|
|
|
|
**Mitigation**:
|
|
- Test with 1M iterations (stress test)
|
|
- Run Larson multi-threaded (race condition check)
|
|
- Monitor for SEGV (should be zero)
|
|
|
|
### Phase E3-2: Header-First Classification
|
|
|
|
**Risk**: ⚠️ **LOW-MEDIUM**
|
|
- Only affects slow path (1-5% of operations)
|
|
- Safe header probe already implemented (lines 100-117)
|
|
- No change to fast path (already optimized in E3-1)
|
|
|
|
**Mitigation**:
|
|
- Test with Pool TLS workloads (8-52KB allocations)
|
|
- Test with Mid/Large workloads (64KB+ allocations)
|
|
- Verify classification hit rates in debug mode
|
|
|
|
### Phase E3-3: Remove C7 Special Cases
|
|
|
|
**Risk**: ⚠️ **LOW**
|
|
- Code cleanup only (no algorithmic changes)
|
|
- Phase E1 already verified C7 works with headers
|
|
- All conditionals are redundant (dead code)
|
|
|
|
**Mitigation**:
|
|
- Test specifically with 1024B workload (C7 class)
|
|
- Run 1M iterations (comprehensive coverage)
|
|
- Check for any unexpected branches
|
|
|
|
---
|
|
|
|
## Timeline
|
|
|
|
| Phase | Time | Cumulative |
|
|
|-------|------|------------|
|
|
| E3-1: Remove Registry Lookup | 10 min | 10 min |
|
|
| E3-1: Build & Test | 5 min | 15 min |
|
|
| E3-2: Header-First Classification | 15 min | 30 min |
|
|
| E3-2: Build & Test | 5 min | 35 min |
|
|
| E3-3: Remove C7 Special Cases | 30 min | 65 min |
|
|
| E3-3: Build & Test | 5 min | 70 min |
|
|
| Final Verification | 10 min | 80 min |
|
|
| **TOTAL** | - | **~1.5 hours** |
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
### Performance (Primary)
|
|
|
|
✅ **Phase E3-1 Success**: ≥30M ops/s (all sizes)
|
|
✅ **Phase E3-2 Success**: ≥50M ops/s (all sizes)
|
|
✅ **Phase E3-3 Success**: ≥59M ops/s (target met!)
|
|
|
|
### Stability (Critical)
|
|
|
|
✅ **No SEGV**: 1M iterations without crash
|
|
✅ **No corruption**: Memory integrity checks pass
|
|
✅ **Multi-threaded**: Larson 4T stable
|
|
|
|
### Code Quality (Secondary)
|
|
|
|
✅ **Reduced LOC**: -50 lines (C7 special cases removed)
|
|
✅ **Reduced branching**: -10% branch-miss rate
|
|
✅ **Unified code**: Single base calculation (`ptr - 1`)
|
|
|
|
---
|
|
|
|
## Next Actions
|
|
|
|
1. **Start with Phase E3-1** (highest ROI, lowest risk)
|
|
2. **Verify performance** (should see 3-5x improvement immediately)
|
|
3. **Proceed to E3-2** (optional, +10-20% additional)
|
|
4. **Complete E3-3** (cleanup, +5-10% final boost)
|
|
5. **Update CLAUDE.md** (document restoration success)
|
|
|
|
**Ready to implement!** 🚀
|