MID-V3: Specialize to 257-768B, exclude C7 (ULTRA handles 1KB)

Role separation based on ultrathink analysis:
- MID v3: 257-768B専用 (C6 only, HAKMEM_MID_V3_CLASSES=0x40)
- C7 ULTRA: 769-1024B専用 (existing optimized path)

Changes:
- core/box/hak_alloc_api.inc.h: Remove C7 route, restrict to 257-768B
- core/box/mid_hotbox_v3_env_box.h: Update ENV comments
- docs/analysis/MID_POOL_V3_DESIGN.md: Add performance results & role
- CURRENT_TASK.md: Document MID-V3 completion & role separation

Verified:
- 257-768B with v3 ON: 1,199,526 ops/s (+1.7% vs baseline)
- 769-1024B with v3 ON: 1,181,254 ops/s (same as baseline, C7 excluded)
- C7 correctly routes to ULTRA instead of MID v3

Rationale: C7-only showed -11% regression, but C6/mixed showed +11-19%
improvement. Specializing to mid-range (257-768B) leverages v3 strengths
while keeping C7 on the proven ULTRA path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-12 01:14:13 +09:00
parent 7bb179df6c
commit a8d0ab06fc
4 changed files with 63 additions and 11 deletions

View File

@ -71,25 +71,25 @@ inline void* hak_alloc_at(size_t size, hak_callsite_t site) {
}
// =========================================================================
// Phase MID-V3: Mid/Pool HotBox v3 (256B-1KB, opt-in via ENV)
// Phase MID-V3: Mid/Pool HotBox v3 (257-768B ONLY, opt-in via ENV)
// =========================================================================
// Role separation: MID v3 handles 257-768B, C7 ULTRA handles 769-1024B
// Priority: v6 → v3 → v4 → pool (ENV-controlled routing)
// ENV: HAKMEM_MID_V3_ENABLED=1 HAKMEM_MID_V3_CLASSES=0x40 (C6 only)
// ENV: HAKMEM_MID_V3_ENABLED=1 HAKMEM_MID_V3_CLASSES=0x40 (C6 only, default)
// Design: TLS lane cache with page-based allocation, RegionIdBox integration
// NOTE: Must come BEFORE Tiny to intercept specific size classes
if (__builtin_expect(mid_v3_enabled() && size >= 256 && size <= 1024, 0)) {
// PERF: C6 shows +11% improvement, Mixed (257-768B) shows +19.8% improvement
if (__builtin_expect(mid_v3_enabled() && size >= 257 && size <= 768, 0)) {
static _Atomic int entry_log_count = 0;
if (mid_v3_debug_enabled() && atomic_fetch_add(&entry_log_count, 1) < 3) {
fprintf(stderr, "[MID_V3] Entered v3 path: size=%zu\n", size);
}
int class_idx = -1;
// C6: 256B class handles sizes up to 256B (145-256B range)
// C7: 1024B class handles sizes up to 1024B (769-1024B range)
if (size > 144 && size <= 256 && mid_v3_class_enabled(6)) {
// C6: 256B class handles 257-768B range (mid-size allocations)
// NOTE: C7 (1024B) is intentionally EXCLUDED - handled by C7 ULTRA instead
if (size >= 257 && size <= 768 && mid_v3_class_enabled(6)) {
class_idx = 6; // C6: 256B
} else if (size > 768 && size <= 1024 && mid_v3_class_enabled(7)) {
class_idx = 7; // C7: 1024B
}
if (mid_v3_debug_enabled() && class_idx >= 0) {