Commit Graph

3 Commits

Author SHA1 Message Date
84f5034e45 Phase 68: PGO training set diversification (seed/WS expansion)
Changes:
- scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3)
  for reduced overfitting and better production workload representativeness
- PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%)
- CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active

Results:
- 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold)
- M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp)
- Stability: 10-run mean/median with <2.1% CV

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 21:08:17 +09:00
b7085c47e1 Phase 35-39: FAST build optimization complete (+7.13% cumulative)
Phase 35-A: BENCH_MINIMAL gate function elimination (GO +4.39%)
- tiny_front_v3_enabled() → constant true
- tiny_metadata_cache_enabled() → constant 0
- learner_v7_enabled() → constant false
- small_learner_v2_enabled() → constant false

Phase 36: Policy snapshot init-once (GO +0.71%)
- small_policy_v7_snapshot() version check skip in BENCH_MINIMAL
- TLS cache for policy snapshot

Phase 37: Standard TLS cache (NO-GO -0.07%)
- TLS cache for Standard build attempted
- Runtime gate overhead negates benefit

Phase 38: FAST/OBSERVE/Standard workflow established
- make perf_fast, make perf_observe targets
- Scorecard and documentation updates

Phase 39: Hot path gate constantization (GO +1.98%)
- front_gate_unified_enabled() → constant 1
- alloc_dualhot_enabled() → constant 0
- g_bench_fast_front, g_v3_enabled blocks → compile-out
- free_dispatch_stats_enabled() → constant false

Results:
- FAST v3: 56.04M ops/s (47.4% of mimalloc)
- Standard: 53.50M ops/s (45.3% of mimalloc)
- M1 target (50%): 5.5% remaining

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 15:01:56 +09:00
0dba67ba9d Phase v11a-2: Core MID v3.5 implementation - segment, cold iface, stats, learner
Implement 5-layer infrastructure for multi-class MID v3.5 (C5-C7, 257-1KiB):

1. SegmentBox_mid_v3 (L2 Physical)
   - core/smallobject_segment_mid_v3.c (9.5 KB)
   - 2MiB segments, 64KiB pages (32 per segment)
   - Per-class free page stacks (LIFO)
   - RegionIdBox registration
   - Slots: C5→170, C6→102, C7→64

2. ColdIface_mid_v3 (L2→L1)
   - core/box/smallobject_cold_iface_mid_v3_box.h (NEW)
   - core/smallobject_cold_iface_mid_v3.c (3.5 KB)
   - refill: get page from free stack or new segment
   - retire: calculate free_hit_ratio, publish stats, return to stack
   - Clean separation: TLS cache for hot path, ColdIface for cold path

3. StatsBox_mid_v3 (L2→L3)
   - core/smallobject_stats_mid_v3.c (7.2 KB)
   - Circular buffer history (1000 events)
   - Per-page metrics: class_idx, allocs, frees, free_hit_ratio_bps
   - Periodic aggregation (every 100 retires)
   - Learner notification callback

4. Learner v2 (L3)
   - core/smallobject_learner_v2.c (11 KB)
   - Multi-class aggregation: allocs[8], retire_count[8], avg_free_hit_bps[8]
   - Exponential smoothing (90% history + 10% new)
   - Per-class efficiency tracking
   - Stats snapshot API
   - Route decision disabled for v11a-2 (v11b feature)

5. Build Integration
   - Modified Makefile: added 4 new .o files (segment, cold_iface, stats, learner)
   - Updated box header prototypes
   - Clean compilation, all dependencies resolved

Architecture Decision Implementation:
- v7 remains frozen (C5/C6 research preset)
- MID v3.5 becomes unified 257-1KiB main path
- Multi-class isolation: per-class free stacks
- Dormant infrastructure: linked but not active (zero overhead)

Performance:
- Build: clean compilation
- Sanity benchmark: 27.3M ops/s (no regression vs v10)
- Memory: ~30MB RSS (baseline maintained)

Design Compliance:
 Layer separation: L2 (segment) → L2 (cold iface) → L3 (stats) → L3 (learner)
 Hot path clean: alloc/free never touch stats/learner
 Backward compatible: existing MID v3 routes unchanged
 Transparent: v11a-2 is dormant (no behavior change)

Next Phase (v11a-3):
- Activate C5/C6/C7 routing through MID v3.5
- Connect TLS cache to segment refill
- Verify performance under load
- Then Phase v11a-4: dynamic C5 ratio routing

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 06:37:06 +09:00