hakmem

Author	SHA1	Message	Date
Moe Charm (CI)	8fdbc6d07e	Phase 70-73: Route banner + observe stats consistency + WarmPool analysis SSOT Observability infrastructure: - Route Banner (ENV: HAKMEM_ROUTE_BANNER=1) for runtime configuration display - Unified Cache consistency check (total_allocs vs total_frees) - Verified counters are balanced (5.3M allocs = 5.3M frees) WarmPool=16 comprehensive analysis: - Phase 71: A/B test confirmed +1.31% throughput, 2.4x stability improvement - Phase 73: Hardware profiling identified instruction reduction as root cause * -17.4M instructions (-0.38%) * -3.7M branches (-0.30%) * Trade-off: dTLB/cache misses increased, but instruction savings dominate - Phase 72-0: Function-level perf record pinpointed unified_cache_push * Branches: -0.86% overhead (largest single-function improvement) * Instructions: -0.22% overhead Key finding: WarmPool=16 optimization is control-flow based, not memory-hierarchy based. Full analysis: docs/analysis/PHASE70_71_WARMPOOL16_ANALYSIS.md	2025-12-18 05:55:27 +09:00
Moe Charm (CI)	84f5034e45	Phase 68: PGO training set diversification (seed/WS expansion) Changes: - scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3) for reduced overfitting and better production workload representativeness - PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%) - CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active Results: - 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold) - M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp) - Stability: 10-run mean/median with <2.1% CV 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-17 21:08:17 +09:00
Moe Charm (CI)	b7085c47e1	Phase 35-39: FAST build optimization complete (+7.13% cumulative) Phase 35-A: BENCH_MINIMAL gate function elimination (GO +4.39%) - tiny_front_v3_enabled() → constant true - tiny_metadata_cache_enabled() → constant 0 - learner_v7_enabled() → constant false - small_learner_v2_enabled() → constant false Phase 36: Policy snapshot init-once (GO +0.71%) - small_policy_v7_snapshot() version check skip in BENCH_MINIMAL - TLS cache for policy snapshot Phase 37: Standard TLS cache (NO-GO -0.07%) - TLS cache for Standard build attempted - Runtime gate overhead negates benefit Phase 38: FAST/OBSERVE/Standard workflow established - make perf_fast, make perf_observe targets - Scorecard and documentation updates Phase 39: Hot path gate constantization (GO +1.98%) - front_gate_unified_enabled() → constant 1 - alloc_dualhot_enabled() → constant 0 - g_bench_fast_front, g_v3_enabled blocks → compile-out - free_dispatch_stats_enabled() → constant false Results: - FAST v3: 56.04M ops/s (47.4% of mimalloc) - Standard: 53.50M ops/s (45.3% of mimalloc) - M1 target (50%): 5.5% remaining 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-16 15:01:56 +09:00
Moe Charm (CI)	d54893ea1d	Phase 3 C3: Static Routing A/B Test ADOPT (+2.20% Mixed gain) Step 2 & 3 Complete: - A/B test (Mixed 10-run): STATIC_ROUTE=0 (38.91M) → =1 (39.77M) = +2.20% avg - Median gain: +1.98% - Result: ✅ GO (exceeds +1.0% threshold) - Decision: ✅ ADOPT into MIXED_TINYV3_C7_SAFE preset - bench_profile.h line 77: HAKMEM_TINY_STATIC_ROUTE=1 default - Learner auto-disables static route when HAKMEM_SMALL_LEARNER_V7_ENABLED=1 Implementation Summary: - core/box/tiny_static_route_box.{h,c}: Research box (Step 1A) - core/front/malloc_tiny_fast.h: Route lookup integration (Step 1B, lines 249-256) - core/bench_profile.h: Bench sync + preset adoption Cumulative Phase 2-3 Gains: - B3 (Routing shape): +2.89% - B4 (Wrapper split): +1.47% - C3 (Static routing): +2.20% - Total: ~6.8% (35.2M → ~39.8M ops/s) Next: Phase 3 C1 (TLS Prefetch, expected +2-4%) 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 18:46:11 +09:00
Moe Charm (CI)	1a8652a91a	Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成) Implement C6 ULTRA intrusive LIFO freelist with ENV gating: - Single-linked LIFO using next pointer at USER+1 offset - tiny_next_store/tiny_next_load for pointer access (single source of truth) - Segment learning via ss_fast_lookup (per-class seg_base/seg_end) - ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF) - Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS Files: - core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO - core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6) - core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new) - core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new) - core/tiny_debug_ring.h: C6_IFL events - core/box/free_path_stats_box.h/c: c6_ifl_* counters A/B Test Results (1M iterations, ws=200, 257-512B): - ENV_OFF (array): 56.6 Mop/s avg - ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise) - Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-12 16:26:42 +09:00
Moe Charm (CI)	d5ffb3eeb2	Fix MID v3.5 activation bugs: policy loop + malloc recursion Two critical bugs fixed: 1. Policy snapshot infinite loop (smallobject_policy_v7.c): - Condition `g_policy_v7_version == 0` caused reinit on every call - Fixed via CAS to set global version to 1 after first init 2. Malloc recursion (smallobject_segment_mid_v3.c): - Internal malloc() routed back through hakmem → MID v3.5 → segment creation → malloc → infinite recursion / stack overflow - Fixed by using mmap() directly for internal allocations: - Segment struct, pages array, page metadata block Performance results (bench_random_mixed 257-512B): - Baseline (LEGACY): 34.0M ops/s - MID_V35 ON (C6): 35.8M ops/s - Improvement: +5.1% ✓ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 07:12:24 +09:00
Moe Charm (CI)	212739607a	Phase v11a-3: MID v3.5 Activation (Build Complete) Integrated MID v3.5 into active code path, making it available for C5/C6/C7 routing. Key Changes: - Policy Box: Added SMALL_ROUTE_MID_V35 with ENV gates (HAKMEM_MID_V35_ENABLED, HAKMEM_MID_V35_CLASSES) - HotBox: Implemented small_mid_v35_alloc/free with TLS-cached page allocation - Front Gate: Wired MID_V35 routing into malloc_tiny_fast.h (priority: ULTRA > MID_V35 > V7) - Build: Added core/smallobject_mid_v35.o to all object lists Architecture: - Slot sizes: C5=384B, C6=512B, C7=1024B - Page size: 64KB (170/128/64 slots) - Integration: ColdIface v2 (refill/retire), Stats v2 (observation), Learner v2 (dormant) Status: Build successful, ready for A/B benchmarking Next: Performance validation (C6-heavy, C5+C6-only, Mixed benchmarks) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 06:52:14 +09:00
Moe Charm (CI)	bbc4b66a22	Phase v10: Enable Learner v7 by default Change: Learner now defaults to ON (when v7 is enabled) - Old behavior: Learner only enabled if explicitly requested - New behavior: Learner always ON (can disable with ENV=0) - Learner is optional dependency of v7 (not intrusive) Configuration: - HAKMEM_SMALL_HEAP_V7_ENABLED=1: enables v7 + Learner - HAKMEM_SMALL_LEARNER_V7_ENABLED=0: disable Learner only (keeps v7) Benefits: - Automatic workload detection without user configuration - C5 allocation ratio monitored by default - Route optimization happens transparently Performance: v7+Learner C5/C6 workload = 39M ops/s (maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 06:09:53 +09:00
Moe Charm (CI)	540230c301	v7-7: Modularize Learner into separate box Refactoring: Separate Learner API and types from Policy Box - New: core/box/smallobject_learner_v7_box.h - SmallLearnerStatsV7 type definition - Learner recording API (record_refill, record_retire) - Learner evaluation and stats snapshot - Learner configuration constants - Updated: core/box/smallobject_policy_v7_box.h - Removed Learner API (moved to Learner Box) - Removed SmallLearnerStatsV7 type (moved to Learner Box) - Added include of smallobject_learner_v7_box.h - Kept small_policy_v7_update_from_learner() (L3 integration) - Updated: core/smallobject_policy_v7.c - Added include of smallobject_learner_v7_box.h Benefits: - Clearer module boundaries (Policy vs Learner) - Easier testing and debugging (stats isolation) - Reduced coupling between components Performance: No regression (v7+Learner: 41M ops/s on C5/C6) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 06:06:44 +09:00
Moe Charm (CI)	6f559e1a1d	v7-7: Implement Learner for dynamic C5 route switching - Add SmallLearnerStatsV7 type + API to policy box - Hook ColdIface refill/retire to collect stats (capacity-based) - Implement C5 route switching: if C5 ratio < 30%, switch to MID_V3 - Version-based TLS cache invalidation for policy updates - Evaluation interval: every 100 refills Tested with c6heavy scenario: C5 ratio=12% triggers V7 → MID_V3 switch 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 05:51:27 +09:00
Moe Charm (CI)	8143e8b797	Phase v7-4: Policy Box 導入 (L3 層の明確化とフロント芯の作り直し) - SmallPolicyV7 Box: L3 Policy layer に配置、route 決定を一元化 - Route kind enum: SMALL_ROUTE_ULTRA / V7 / MID_V3 / LEGACY - ENV priority (fixed): ULTRA > v7 > MID_v3 > LEGACY - Frontend integration: v7 routing を Policy Box 経由に変更 (段階移行) - Legacy compatibility: 既存の tiny_route_env_box.h は併用維持 Box Theory layer structure: - L0: ULTRA (C4-C7, FROZEN) - L1: SmallObject v7 (research box) - L1': MID_v3 / LEGACY (fallback) - L2: Segment / RegionId - L3: Policy / Stats / Learner ← Policy Box added here Frontend now follows clean "size→class→route_kind→switch" pattern. ENV variables read once at Policy init, not scattered across frontend. Future: ULTRA/MID_v3/LEGACY consolidation, Learner integration, flexible priority. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-12 03:50:58 +09:00

11 Commits