Commit Graph

11 Commits

Author SHA1 Message Date
8fdbc6d07e Phase 70-73: Route banner + observe stats consistency + WarmPool analysis SSOT
Observability infrastructure:
- Route Banner (ENV: HAKMEM_ROUTE_BANNER=1) for runtime configuration display
- Unified Cache consistency check (total_allocs vs total_frees)
- Verified counters are balanced (5.3M allocs = 5.3M frees)

WarmPool=16 comprehensive analysis:
- Phase 71: A/B test confirmed +1.31% throughput, 2.4x stability improvement
- Phase 73: Hardware profiling identified instruction reduction as root cause
  * -17.4M instructions (-0.38%)
  * -3.7M branches (-0.30%)
  * Trade-off: dTLB/cache misses increased, but instruction savings dominate
- Phase 72-0: Function-level perf record pinpointed unified_cache_push
  * Branches: -0.86% overhead (largest single-function improvement)
  * Instructions: -0.22% overhead

Key finding: WarmPool=16 optimization is control-flow based, not memory-hierarchy based.
Full analysis: docs/analysis/PHASE70_71_WARMPOOL16_ANALYSIS.md
2025-12-18 05:55:27 +09:00
84f5034e45 Phase 68: PGO training set diversification (seed/WS expansion)
Changes:
- scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3)
  for reduced overfitting and better production workload representativeness
- PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%)
- CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active

Results:
- 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold)
- M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp)
- Stability: 10-run mean/median with <2.1% CV

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 21:08:17 +09:00
b7085c47e1 Phase 35-39: FAST build optimization complete (+7.13% cumulative)
Phase 35-A: BENCH_MINIMAL gate function elimination (GO +4.39%)
- tiny_front_v3_enabled() → constant true
- tiny_metadata_cache_enabled() → constant 0
- learner_v7_enabled() → constant false
- small_learner_v2_enabled() → constant false

Phase 36: Policy snapshot init-once (GO +0.71%)
- small_policy_v7_snapshot() version check skip in BENCH_MINIMAL
- TLS cache for policy snapshot

Phase 37: Standard TLS cache (NO-GO -0.07%)
- TLS cache for Standard build attempted
- Runtime gate overhead negates benefit

Phase 38: FAST/OBSERVE/Standard workflow established
- make perf_fast, make perf_observe targets
- Scorecard and documentation updates

Phase 39: Hot path gate constantization (GO +1.98%)
- front_gate_unified_enabled() → constant 1
- alloc_dualhot_enabled() → constant 0
- g_bench_fast_front, g_v3_enabled blocks → compile-out
- free_dispatch_stats_enabled() → constant false

Results:
- FAST v3: 56.04M ops/s (47.4% of mimalloc)
- Standard: 53.50M ops/s (45.3% of mimalloc)
- M1 target (50%): 5.5% remaining

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 15:01:56 +09:00
d54893ea1d Phase 3 C3: Static Routing A/B Test ADOPT (+2.20% Mixed gain)
Step 2 & 3 Complete:
- A/B test (Mixed 10-run): STATIC_ROUTE=0 (38.91M) → =1 (39.77M) = +2.20% avg
  - Median gain: +1.98%
  - Result:  GO (exceeds +1.0% threshold)

- Decision:  ADOPT into MIXED_TINYV3_C7_SAFE preset
  - bench_profile.h line 77: HAKMEM_TINY_STATIC_ROUTE=1 default
  - Learner auto-disables static route when HAKMEM_SMALL_LEARNER_V7_ENABLED=1

Implementation Summary:
- core/box/tiny_static_route_box.{h,c}: Research box (Step 1A)
- core/front/malloc_tiny_fast.h: Route lookup integration (Step 1B, lines 249-256)
- core/bench_profile.h: Bench sync + preset adoption

Cumulative Phase 2-3 Gains:
- B3 (Routing shape): +2.89%
- B4 (Wrapper split): +1.47%
- C3 (Static routing): +2.20%
- Total: ~6.8% (35.2M → ~39.8M ops/s)

Next: Phase 3 C1 (TLS Prefetch, expected +2-4%)

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 18:46:11 +09:00
1a8652a91a Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)
Implement C6 ULTRA intrusive LIFO freelist with ENV gating:
- Single-linked LIFO using next pointer at USER+1 offset
- tiny_next_store/tiny_next_load for pointer access (single source of truth)
- Segment learning via ss_fast_lookup (per-class seg_base/seg_end)
- ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF)
- Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS

Files:
- core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO
- core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6)
- core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new)
- core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new)
- core/tiny_debug_ring.h: C6_IFL events
- core/box/free_path_stats_box.h/c: c6_ifl_* counters

A/B Test Results (1M iterations, ws=200, 257-512B):
- ENV_OFF (array): 56.6 Mop/s avg
- ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise)
- Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 16:26:42 +09:00
d5ffb3eeb2 Fix MID v3.5 activation bugs: policy loop + malloc recursion
Two critical bugs fixed:

1. Policy snapshot infinite loop (smallobject_policy_v7.c):
   - Condition `g_policy_v7_version == 0` caused reinit on every call
   - Fixed via CAS to set global version to 1 after first init

2. Malloc recursion (smallobject_segment_mid_v3.c):
   - Internal malloc() routed back through hakmem → MID v3.5 → segment
     creation → malloc → infinite recursion / stack overflow
   - Fixed by using mmap() directly for internal allocations:
     - Segment struct, pages array, page metadata block

Performance results (bench_random_mixed 257-512B):
- Baseline (LEGACY): 34.0M ops/s
- MID_V35 ON (C6):   35.8M ops/s
- Improvement:       +5.1% ✓

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 07:12:24 +09:00
212739607a Phase v11a-3: MID v3.5 Activation (Build Complete)
Integrated MID v3.5 into active code path, making it available for C5/C6/C7 routing.

Key Changes:
- Policy Box: Added SMALL_ROUTE_MID_V35 with ENV gates (HAKMEM_MID_V35_ENABLED, HAKMEM_MID_V35_CLASSES)
- HotBox: Implemented small_mid_v35_alloc/free with TLS-cached page allocation
- Front Gate: Wired MID_V35 routing into malloc_tiny_fast.h (priority: ULTRA > MID_V35 > V7)
- Build: Added core/smallobject_mid_v35.o to all object lists

Architecture:
- Slot sizes: C5=384B, C6=512B, C7=1024B
- Page size: 64KB (170/128/64 slots)
- Integration: ColdIface v2 (refill/retire), Stats v2 (observation), Learner v2 (dormant)

Status: Build successful, ready for A/B benchmarking
Next: Performance validation (C6-heavy, C5+C6-only, Mixed benchmarks)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 06:52:14 +09:00
bbc4b66a22 Phase v10: Enable Learner v7 by default
Change: Learner now defaults to ON (when v7 is enabled)
- Old behavior: Learner only enabled if explicitly requested
- New behavior: Learner always ON (can disable with ENV=0)
- Learner is optional dependency of v7 (not intrusive)

Configuration:
- HAKMEM_SMALL_HEAP_V7_ENABLED=1: enables v7 + Learner
- HAKMEM_SMALL_LEARNER_V7_ENABLED=0: disable Learner only (keeps v7)

Benefits:
- Automatic workload detection without user configuration
- C5 allocation ratio monitored by default
- Route optimization happens transparently

Performance: v7+Learner C5/C6 workload = 39M ops/s (maintained)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 06:09:53 +09:00
540230c301 v7-7: Modularize Learner into separate box
Refactoring: Separate Learner API and types from Policy Box
- New: core/box/smallobject_learner_v7_box.h
  - SmallLearnerStatsV7 type definition
  - Learner recording API (record_refill, record_retire)
  - Learner evaluation and stats snapshot
  - Learner configuration constants
- Updated: core/box/smallobject_policy_v7_box.h
  - Removed Learner API (moved to Learner Box)
  - Removed SmallLearnerStatsV7 type (moved to Learner Box)
  - Added include of smallobject_learner_v7_box.h
  - Kept small_policy_v7_update_from_learner() (L3 integration)
- Updated: core/smallobject_policy_v7.c
  - Added include of smallobject_learner_v7_box.h

Benefits:
- Clearer module boundaries (Policy vs Learner)
- Easier testing and debugging (stats isolation)
- Reduced coupling between components

Performance: No regression (v7+Learner: 41M ops/s on C5/C6)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 06:06:44 +09:00
6f559e1a1d v7-7: Implement Learner for dynamic C5 route switching
- Add SmallLearnerStatsV7 type + API to policy box
- Hook ColdIface refill/retire to collect stats (capacity-based)
- Implement C5 route switching: if C5 ratio < 30%, switch to MID_V3
- Version-based TLS cache invalidation for policy updates
- Evaluation interval: every 100 refills

Tested with c6heavy scenario: C5 ratio=12% triggers V7 → MID_V3 switch

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 05:51:27 +09:00
8143e8b797 Phase v7-4: Policy Box 導入 (L3 層の明確化とフロント芯の作り直し)
- SmallPolicyV7 Box: L3 Policy layer に配置、route 決定を一元化
- Route kind enum: SMALL_ROUTE_ULTRA / V7 / MID_V3 / LEGACY
- ENV priority (fixed): ULTRA > v7 > MID_v3 > LEGACY
- Frontend integration: v7 routing を Policy Box 経由に変更 (段階移行)
- Legacy compatibility: 既存の tiny_route_env_box.h は併用維持

Box Theory layer structure:
- L0: ULTRA (C4-C7, FROZEN)
- L1: SmallObject v7 (research box)
- L1': MID_v3 / LEGACY (fallback)
- L2: Segment / RegionId
- L3: Policy / Stats / Learner ← Policy Box added here

Frontend now follows clean "size→class→route_kind→switch" pattern.
ENV variables read once at Policy init, not scattered across frontend.

Future: ULTRA/MID_v3/LEGACY consolidation, Learner integration, flexible priority.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 03:50:58 +09:00