hakmem

Author	SHA1	Message	Date
Moe Charm (CI)	7a3702e069	docs: Phase 7 FastLane free hot/cold alignment instructions	2025-12-14 17:52:55 +09:00
Moe Charm (CI)	dcc1d42e7f	Phase 6-2: Promote Front FastLane Free DeDup (default ON) Results: - A/B test: +5.18% on Mixed (10-run, clean env) - Baseline: 46.68M ops/s - Optimized: 49.10M ops/s - Improvement: +2.42M ops/s (+5.18%) Strategy: - Eliminate duplicate header validation in front_fastlane_try_free() - Direct call to free_tiny_fast() when dedup enabled - Single validation path (no redundant checks) Success factors: 1. Complete duplicate elimination (free path optimization) 2. Free path importance (50% of Mixed workload) 3. Improved execution stability (CV: 1.00% → 0.58%) Phase 6 cumulative: - Phase 6-1 FastLane: +11.13% - Phase 6-2 Free DeDup: +5.18% - Total: ~+16-17% from baseline (multiplicative effect) Promotion: - Default: HAKMEM_FRONT_FASTLANE_FREE_DEDUP=1 (opt-out) - Added to MIXED_TINYV3_C7_SAFE preset - Added to C6_HEAVY_LEGACY_POOLV1 preset - Rollback: HAKMEM_FRONT_FASTLANE_FREE_DEDUP=0 Files modified: - core/box/front_fastlane_env_box.h: default 0 → 1 - core/bench_profile.h: added to presets - CURRENT_TASK.md: Phase 6-2 GO result Health check: PASSED (all profiles) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 17:38:21 +09:00
Moe Charm (CI)	c0d2f47f7d	docs: Phase 6-2 FastLane free dedup instructions	2025-12-14 17:09:57 +09:00
Moe Charm (CI)	ea221d057a	Phase 6: promote Front FastLane (default ON)	2025-12-14 16:28:23 +09:00
Moe Charm (CI)	e48cbff4b9	Phase 5 Complete: E7 NO-GO confirmed + ChatGPT Pro questionnaire Summary: - E7 frozen box prune: -3.20% regression (NO-GO) with clean ENV - Keep E5-2/E5-4 (NEUTRAL) + E6 (NO-GO) as research boxes - Regression due to build differences (LTO/layout/alignment), not logic Results: - Winning boxes: E4-1 (+3.51%), E4-2 (+21.83%), E5-1 (+3.35%) → adopted - Frozen boxes: E5-2, E5-4, E6, E7 → kept with ENV gates (doc as assets) - Phase 5 cumulative progress: +6.43% on MIXED profile Documentation updates: - PHASE5_E7_FROZEN_BOX_PRUNE_AB_TEST_RESULTS.md: Final NO-GO record - PHASE5_E7_FROZEN_BOX_PRUNE_NEXT_INSTRUCTIONS.md: E7 conclusion Next phase planning: - PHASE_ML2_CHATGPT_QUESTIONNAIRE_FASTLANE.md: Design consultation template - Candidates: dedup new boundaries, PGO/layout optimization feasibility 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-14 08:56:09 +09:00
Moe Charm (CI)	6d38511318	Phase 5: E7 prune no-go (keep frozen boxes); add clean-env runner	2025-12-14 08:11:20 +09:00
Moe Charm (CI)	f92be5f541	Phase 5: freeze E6 env snapshot shape (no-go)	2025-12-14 07:18:59 +09:00
Moe Charm (CI)	4124c86d99	Phase 5: freeze E5-4 malloc tiny direct (neutral)	2025-12-14 06:59:35 +09:00
Moe Charm (CI)	580e7f4fa3	Phase 5 E5-3: Candidate Analysis (All DEFERRED) + E5-4 Instructions E5-3 Analysis Results: - free_tiny_fast_cold (7.14%): DEFER - cold path, low ROI - unified_cache_push (3.39%): DEFER - already optimized - hakmem_env_snapshot_enabled (2.97%): DEFER - low headroom Key Insight: perf self% is time-weighted, not frequency-weighted. Cold paths appear hot but have low total impact. Next: E5-4 (Malloc Tiny Direct Path) - Apply E5-1 winning pattern to malloc side - Target: tiny_alloc_gate_fast() gate tax elimination - ENV gate: HAKMEM_MALLOC_TINY_DIRECT=0/1 Files added: - docs/analysis/PHASE5_E5_3_ANALYSIS_AND_RECOMMENDATIONS.md - docs/analysis/PHASE5_E5_4_MALLOC_TINY_DIRECT_NEXT_INSTRUCTIONS.md - core/box/free_cold_shape_env_box.{h,c} (research box, not tested) - core/box/free_cold_shape_stats_box.{h,c} (research box, not tested) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-14 06:44:04 +09:00
Moe Charm (CI)	f7b18aaf13	Phase 5 E5-2: Header Write-Once (NEUTRAL, FROZEN) Target: tiny_region_id_write_header (3.35% self%) - Hypothesis: Headers redundant for reused blocks - Strategy: Write headers ONCE at refill boundary, skip in hot alloc Implementation: - ENV gate: HAKMEM_TINY_HEADER_WRITE_ONCE=0/1 (default 0) - core/box/tiny_header_write_once_env_box.h: ENV gate - core/box/tiny_header_write_once_stats_box.h: Stats counters - core/box/tiny_header_box.h: Added tiny_header_finalize_alloc() - core/front/tiny_unified_cache.c: Prefill at 3 refill sites - core/box/tiny_front_hot_box.h: Use finalize function A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (WRITE_ONCE=0): 44.22M ops/s (mean), 44.53M ops/s (median) - Optimized (WRITE_ONCE=1): 44.42M ops/s (mean), 44.36M ops/s (median) - Improvement: +0.45% mean, -0.38% median Decision: NEUTRAL (within ±1.0% threshold) - Action: FREEZE as research box (default OFF, do not promote) Root Cause Analysis: - Header writes are NOT redundant - existing code writes only when needed - Branch overhead (~4 cycles) cancels savings (~3-5 cycles) - perf self% ≠ optimization ROI (3.35% target → +0.45% gain) Key Lessons: 1. Verify assumptions before optimizing (inspect code paths) 2. Hot spot self% measures time IN function, not savings from REMOVING it 3. Branch overhead matters (even "simple" checks add cycles) Positive Outcome: - StdDev reduced 50% (0.96M → 0.48M) - more stable performance Health Check: PASS (all profiles) Next Candidates: - free_tiny_fast_cold: 7.14% self% - unified_cache_push: 3.39% self% - hakmem_env_snapshot_enabled: 2.97% self% Deliverables: - docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_DESIGN.md - docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md - CURRENT_TASK.md (E5-2 complete, FROZEN) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 06:22:25 +09:00
Moe Charm (CI)	75e20b29cc	Phase 5 E5-1: Promote to preset + next target instructions E5-1 Promotion: - Added HAKMEM_FREE_TINY_DIRECT=1 to MIXED_TINYV3_C7_SAFE preset - Updated ENV_PROFILE_PRESETS.md with rollback instructions - Rollback: HAKMEM_FREE_TINY_DIRECT=0 A/B Test Clarification: - Documented bench_setenv_default vs export ENV=0 interaction - bench_setenv_default only sets if ENV is unset - To force OFF in A/B: use value that differs from default Next Target Selection (E5-2 vs E5-3): - E5-2: Header write reduction (tiny_region_id_write_header) - E5-3: ENV snapshot gate shape optimization - Decision requires fresh perf profile on new baseline Deliverables: - docs/analysis/PHASE5_E5_1_FREE_TINY_DIRECT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E5_NEXT_INSTRUCTIONS.md (updated) - docs/analysis/ENV_PROFILE_PRESETS.md (E5-1 added) - docs/analysis/PHASE5_E5_1_FREE_TINY_DIRECT_1_AB_TEST_RESULTS.md (clarified) - CURRENT_TASK.md (progress links) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md (progress links) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 05:59:43 +09:00
Moe Charm (CI)	8875132134	Phase 5 E5-1: Free Tiny Direct Path (+3.35% GO) Target: Consolidate free() wrapper overhead (29.56% combined) - free() wrapper: 21.67% self% - free_tiny_fast_cold(): 7.89% self% Strategy: Single header check in wrapper → direct call to free_tiny_fast() - Eliminates redundant header validation (validated twice before) - Bypasses cold path routing for Tiny allocations - High coverage: 48% of frees in Mixed workload are Tiny Implementation: - ENV gate: HAKMEM_FREE_TINY_DIRECT=0/1 (default 0) - core/box/free_tiny_direct_env_box.h: ENV gate - core/box/free_tiny_direct_stats_box.h: Stats counters - core/box/hak_wrappers.inc.h: Wrapper integration (lines 593-625) Safety gates: - Page boundary guard ((ptr & 0xFFF) != 0) - Tiny magic validation ((header & 0xF0) == 0xA0) - Class bounds check (class_idx < 8) - Fail-fast fallback to existing paths A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (DIRECT=0): 44.38M ops/s (mean), 44.45M ops/s (median) - Optimized (DIRECT=1): 45.87M ops/s (mean), 45.95M ops/s (median) - Improvement: +3.35% mean, +3.36% median Decision: GO (+3.35% >= +1.0% threshold) - 3rd consecutive success with consolidation/deduplication pattern - E4-1: +3.51%, E4-2: +21.83%, E5-1: +3.35% - Health check: PASS (all profiles) Phase 5 Cumulative: - E4 Combined: +6.43% - E5-1: +3.35% - Estimated total: ~+10% Deliverables: - docs/analysis/PHASE5_E5_COMPREHENSIVE_ANALYSIS.md - docs/analysis/PHASE5_E5_1_FREE_TINY_DIRECT_1_DESIGN.md - docs/analysis/PHASE5_E5_1_FREE_TINY_DIRECT_1_AB_TEST_RESULTS.md - CURRENT_TASK.md (E5-1 complete) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 05:52:32 +09:00
Moe Charm (CI)	6cdbd815ab	Phase 5 E4 Combined: E4-1 + E4-2 (+6.43% GO, baseline consolidated) Combined A/B Test Results (10-run Mixed): - Baseline (both OFF): 44.48M ops/s (mean), 44.39M ops/s (median) - Optimized (both ON): 47.34M ops/s (mean), 47.38M ops/s (median) - Improvement: +6.43% mean, +6.74% median Interaction Analysis: - E4-1 alone: +3.51% (measured in separate session) - E4-2 alone: +21.83% (measured in separate session) - Combined: +6.43% (measured in same binary) - Pattern: SUBADDITIVE (overlapping bottlenecks) Key Finding: Single-binary incremental gain is the accurate metric - E4-1 and E4-2 target overlapping TLS/branch resources - Individual measurements were from different baselines/sessions - Combined measurement (same binary, both flags) shows true progress Phase 5 Total Progress: - Original baseline (session start): 35.74M ops/s - Combined optimized: 47.34M ops/s - Total gain: +32.4% (cross-session, reference only) - Same-binary gain: +6.43% (E4-1+E4-2 both ON vs both OFF) New Baseline Perf Profile (47.0M ops/s): - free: 37.56% self% (still top hotspot) - tiny_alloc_gate_fast: 13.73% (reduced from 19.50%) - malloc: 12.95% (reduced from 16.13%) - tiny_region_id_write_header: 6.97% (header write tax) - hakmem_env_snapshot_enabled: 4.29% (ENV overhead visible) Health Check: PASS - MIXED_TINYV3_C7_SAFE: 42.3M ops/s - C6_HEAVY_LEGACY_POOLV1: 20.9M ops/s Phase 5 E5 Candidates (from perf profile): - E5-1: free() path internals (37.56% self%) - E5-2: Header write reduction (6.97% self%) - E5-3: ENV snapshot overhead (4.29% self%) Deliverables: - docs/analysis/PHASE5_E4_COMBINED_AB_TEST_RESULTS.md - docs/analysis/PHASE5_E5_NEXT_INSTRUCTIONS.md - CURRENT_TASK.md (E4 combined complete, E5 candidates) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md (E5 pointer) - perf.data.e4combined (perf profile data) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 05:36:57 +09:00
Moe Charm (CI)	5528612f2a	Phase 5 E4-2: Malloc Wrapper ENV Snapshot (+21.83% GO, ADOPTED) Target: Consolidate malloc wrapper TLS reads + eliminate function calls - malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% combined - Strategy: E4-1 success pattern + function call elimination Implementation: - ENV gate: HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/malloc_wrapper_env_snapshot_box.{h,c}: New box - Consolidates multiple TLS reads → 1 TLS read - Pre-caches tiny_max_size() == 256 (eliminates function call) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in malloc() wrapper - Makefile: Add malloc_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 35.74M ops/s (mean), 35.75M ops/s (median) - Optimized (SNAPSHOT=1): 43.54M ops/s (mean), 43.92M ops/s (median) - Improvement: +21.83% mean, +22.86% median (+7.80M ops/s) Decision: GO (+21.83% >> +1.0% threshold, 21.8x over) - Why 6.2x better than E4-1 (+3.51%)? - Higher malloc call frequency (allocation-heavy workload) - Function call elimination (tiny_max_size pre-cached) - Larger target: 35.63% vs free's 25.26% - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative (estimated): - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - E4-2 (Malloc Wrapper Snapshot): +21.83% - Estimated combined: ~+30% (needs validation) Next Steps: - Combined A/B test (E4-1 + E4-2 simultaneously) - Measure actual cumulative effect - Profile new baseline for next optimization targets Deliverables: - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md (next) - docs/analysis/ENV_PROFILE_PRESETS.md (E4-2 added) - CURRENT_TASK.md (E4-2 complete) - core/bench_profile.h (E4-2 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 05:13:29 +09:00
Moe Charm (CI)	4a070d8a14	Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 04:24:34 +09:00
Moe Charm (CI)	21e2e4ac2b	Phase 4 E3-4: ENV Constructor Init (+4.75% GO) Target: Eliminate E1 lazy init check overhead (3.22% self%) - E1 consolidated ENV gates but lazy check remained in hot path - Strategy: __attribute__((constructor(101))) for pre-main init Implementation: - ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box) - core/box/hakmem_env_snapshot_box.c: Constructor function added - Reads ENV before main() when CTOR=1 - Refresh also syncs gate state for bench_profile putenv - core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check - CTOR=1 fast path: direct global read (no lazy branch) - CTOR=0 fallback: legacy lazy init (rollback safe) - Branch hints adjusted for default OFF baseline A/B Test Results (Mixed, 10-run, 20M iters, E1=1): - Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median) - Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median) - Improvement: +4.75% mean, +4.35% median Decision: GO (+4.75% >> +0.5% threshold) - Expected +0.5-1.5%, achieved +4.75% - Lazy init branch overhead was larger than expected - Action: Keep as research box (default OFF), evaluate promotion Phase 4 Cumulative: - E1 (ENV Snapshot): +3.92% - E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen) - E3-4 (Constructor Init): +4.75% - Total Phase 4: ~+8.5% Deliverables: - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md - docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md - scripts/verify_health_profiles.sh (sanity check script) - CURRENT_TASK.md (E3-4 complete, next instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 02:57:35 +09:00
Moe Charm (CI)	6a6744d065	Phase 4 E2: Alloc Per-Class FastPath - NEUTRAL (-0.21%) A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (DUALHOT=0): 45.40M ops/s (mean), 45.51M ops/s (median) - Optimized (DUALHOT=1): 45.30M ops/s (mean), 45.22M ops/s (median) - Improvement: -0.21% mean, -0.62% median Decision: NEUTRAL (within ±1.0% noise threshold) Action: FREEZE as research box (default OFF, no promotion) Key Findings: - C0-C3 fast path adds branch overhead without measurable benefit - Unlike FREE path (+13%), ALLOC path already has optimized route caching - Phase 3 C3 static routing eliminated route lookup overhead - Additional per-class specialization doesn't reduce existing cost Root Cause: - Free DUALHOT skips expensive policy_snapshot() + tiny_route_for_class() - Alloc DUALHOT adds C0-C3 branch but route already cached (Phase 3 C3) - Net effect: Branch cost ≈ Route savings → neutral Conclusion: Alloc route optimization has reached diminishing returns Cumulative Status: - Phase 4 E1: +3.92% (GO, research box) - Phase 4 E2: -0.21% (NEUTRAL, frozen) Files: - CURRENT_TASK.md: Updated with E2 results - docs/analysis/PHASE4_E2_ALLOC_PER_CLASS_FASTPATH_AB_TEST_RESULTS.md: Full A/B test report 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-14 01:54:21 +09:00
Moe Charm (CI)	7f3ff6c7e6	Phase 4: E1 docs + E2 next instructions	2025-12-14 01:46:18 +09:00
Moe Charm (CI)	42ba23fbd0	Phase 4 E1: env snapshot consolidation docs	2025-12-14 00:48:03 +09:00
Moe Charm (CI)	11b0e3f32b	Phase 4 D3: alloc gate shape (env-gated)	2025-12-14 00:26:57 +09:00
Moe Charm (CI)	b40aff290e	Phase 4 D3 Design: Alloc Gate Shape	2025-12-14 00:05:11 +09:00
Moe Charm (CI)	141cd8a5be	Phase 3 Closure & Phase 4 Preparation Summary: - Phase 3 optimization complete (cumulative +8.93%) - D1 promoted to default (HAKMEM_FREE_STATIC_ROUTE=1, +2.19%) - D2 frozen (NO-GO, -1.44% regression) - Phase 4 instructions prepared (D3/Alloc Gate Specialization) Results: B3 (Routing shape): +2.89% B4 (Wrapper split): +1.47% C3 (Static routing): +2.20% C1 (TLS prefetch): NEUTRAL (-0.34%, research box) C2 (Metadata cache): NEUTRAL (-0.45%, research box) D1 (Free route cache): +2.19% (now default) D2 (Wrapper env cache): NO-GO (-1.44%, frozen) MID_V3 fix: +13% (structural) Total Phase 2-3 gain: ~8.93% (37.5M → 51M ops/s) Updated: - CURRENT_TASK.md: Phase 3 final results + D3 conditions - ENV_PROFILE_PRESETS.md: Active optimizations listed - PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: Phase 3→4 transition - PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md: D3 execution plan - PHASE3_BASELINE_AND_CANDIDATES.md: Post-validation status Next phase: Phase 4 D3 - Alloc Gate Specialization - Requires: tiny_alloc_gate_fast self% ≥5% from perf - Design SSOT: PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md - Execution: PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 23:47:19 +09:00
Moe Charm (CI)	50bded8c85	Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established Summary: - D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT - Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median) - Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median) - Mean gain: +2.19%, Median gain: +2.37% - Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%) - Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset - D2 (Wrapper env cache): FROZEN - Previous result: -1.44% regression (TLS overhead > benefit) - Status: Research box (do not pursue further) - Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset) - Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13) Cumulative Gains (Phase 2-3): B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19% Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%) MID_V3 fix: +13% (structural change, Mixed OFF by default) Documentation Updates: - PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report - PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status - PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results - PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status - ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN - PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status - CURRENT_TASK.md: Phase 3 complete summary Next: - D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%) - Or Phase 4 planning if no more D3-class targets - Current active optimizations: B3, B4, C3, D1, MID_V3 fix Files Changed: - docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines) - docs/analysis/*.md (6 files updated with D1/D2 results) - CURRENT_TASK.md (Phase 3 status update) - analyze_d1_results.py (statistical analysis script) - core/bench_profile.h (D1 promoted to default in MIXED preset) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 22:42:22 +09:00
Moe Charm (CI)	19056282b6	Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO] Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path - Strategy: Cache wrapper env configuration pointer in TLS - Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*) Implementation: - core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE) - core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast) - core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths - ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF) A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median) - Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median) - Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO) Analysis: - Regression cause: TLS cache adds overhead (branch + TLS access) - wrapper_env_cfg() is already minimal (pointer return after simple check) - Adding TLS caching layer makes it worse, not better - Branch prediction penalty outweighs any potential savings Cumulative Phase 2-3: - B3: +2.89%, B4: +1.47%, C3: +2.20% - D1: +1.06% (opt-in), D2: -1.44% (NO-GO) - Total: ~7.2% (excluding D2) Decision: FREEZE as research box (default OFF, regression confirmed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 22:03:27 +09:00
Moe Charm (CI)	f059c0ec83	Phase 3 D1: Free Path Route Cache - DECISION: GO (+1.06%) Target: Eliminate tiny_route_for_class() overhead in free path - Perf finding: 4.39% self + 24.78% children (free bottleneck) - Approach: Use cached route_kind (like Phase 3 C3 for alloc) Implementation: - core/box/tiny_free_route_cache_env_box.h (new) * ENV gate: HAKMEM_FREE_STATIC_ROUTE=0/1 (default OFF) * Lazy initialization with sentinel value - core/front/malloc_tiny_fast.h (modified) * Two call sites: free_tiny_fast_cold() + legacy_fallback path * Direct route lookup: g_tiny_route_class[class_idx] * Fallback safety: Check g_tiny_route_snapshot_done A/B Test Results (Mixed, 10-run): - Baseline (D1=0): 45.13 M ops/s (avg), 45.76 M ops/s (median) - Optimized (D1=1): 45.61 M ops/s (avg), 45.40 M ops/s (median) - Improvement: +1.06% (avg), -0.77% (median) - DECISION: GO (avg gain meets +1.0% threshold) Cumulative Phase 2-3: - B3: +2.89%, B4: +1.47%, C3: +2.20% - D1: +1.06% - Total: ~7.2% cumulative gain 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 21:44:00 +09:00
Moe Charm (CI)	d0b931b197	Phase 3 C1: TLS Prefetch Implementation - NEUTRAL Result (Research Box) Step 1 & 2 Complete: - Implemented: core/front/malloc_tiny_fast.h prefetch (lines 264-267, 331-334) - LEGACY path prefetch of g_unified_cache[class_idx] to L1 - ENV gate: HAKMEM_TINY_PREFETCH=0/1 (default OFF) - Conditional: only when prefetch enabled + route_kind == LEGACY - A/B test (Mixed 10-run): PREFETCH=0 (39.33M) → =1 (39.20M) = -0.34% avg - Median: +1.28% (within ±1.0% neutral range) - Result: 🔬 NEUTRAL (research box, default OFF) Decision: FREEZE as research box - Average -0.34% suggests prefetch overhead > benefit - Prefetch timing too late (after route_kind selection) - TLS cache access is already fast (head/tail indices) - Actual memory wait happens at slots[] array access (after prefetch) Technical Learning: - Prefetch effectiveness depends on L1 miss rate at access time - Inserting prefetch after route selection may be too late - Future approach: move prefetch earlier or use different target Next: Phase 3 C2 (Metadata Cache Optimization, expected +5-10%) 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 19:01:57 +09:00
Moe Charm (CI)	d54893ea1d	Phase 3 C3: Static Routing A/B Test ADOPT (+2.20% Mixed gain) Step 2 & 3 Complete: - A/B test (Mixed 10-run): STATIC_ROUTE=0 (38.91M) → =1 (39.77M) = +2.20% avg - Median gain: +1.98% - Result: ✅ GO (exceeds +1.0% threshold) - Decision: ✅ ADOPT into MIXED_TINYV3_C7_SAFE preset - bench_profile.h line 77: HAKMEM_TINY_STATIC_ROUTE=1 default - Learner auto-disables static route when HAKMEM_SMALL_LEARNER_V7_ENABLED=1 Implementation Summary: - core/box/tiny_static_route_box.{h,c}: Research box (Step 1A) - core/front/malloc_tiny_fast.h: Route lookup integration (Step 1B, lines 249-256) - core/bench_profile.h: Bench sync + preset adoption Cumulative Phase 2-3 Gains: - B3 (Routing shape): +2.89% - B4 (Wrapper split): +1.47% - C3 (Static routing): +2.20% - Total: ~6.8% (35.2M → ~39.8M ops/s) Next: Phase 3 C1 (TLS Prefetch, expected +2-4%) 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 18:46:11 +09:00
Moe Charm (CI)	4c4796a1f8	Phase 2 B4: Documentation & Instruction Creation (Phase 2→3 Transition) Documentation Created: - docs/analysis/PHASE2_STRUCTURAL_CHANGES_NEXT_INSTRUCTIONS.md: Phase 2 完了レポート (B3+B4累積 +4.4%) - docs/analysis/PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: Phase 3 開始指示（C3 Static Routing優先） Verification Completed: - ✅ HAKMEM_WRAP_SHAPE=1 プリセット昇格（core/bench_profile.h:67） - ✅ wrapper_env_refresh_from_env() 実装済み（core/box/wrapper_env_box.c:49-64） - ✅ malloc_cold() lock_depth 対称性確認（全 return 経路で g_hakmem_lock_depth--） - ✅ A/B テスト結果: Mixed +1.47% (≥+1.0% GO threshold) Summary: B3 routing shape: +2.89% B4 wrapper shape: +1.47% ───────────────── Estimated total: ~+4.4% Next Phase: Phase 3 (Cache Locality, +12-22%) - Priority: C3 (Static Routing) - bypass policy_snapshot, +5-8% expected - Profile: perf top で malloc/policy_snapshot hot spot を特定推奨 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 17:32:34 +09:00
Moe Charm (CI)	d9991f39ff	Phase ALLOC-TINY-FAST-DUALHOT-1 & Optimization Roadmap Update Add comprehensive design docs and research boxes: - docs/analysis/ALLOC_TINY_FAST_DUALHOT_1_DESIGN.md: ALLOC DUALHOT investigation - docs/analysis/FREE_TINY_FAST_DUALHOT_1_DESIGN.md: FREE DUALHOT final specs - docs/analysis/FREE_TINY_FAST_HOTCOLD_OPT_1_DESIGN.md: Hot/Cold split research - docs/analysis/POOL_MID_INUSE_DEFERRED_DN_BATCH_DESIGN.md: Deferred batching design - docs/analysis/POOL_MID_INUSE_DEFERRED_REGRESSION_ANALYSIS.md: Stats overhead findings - docs/analysis/MID_DESC_CACHE_BENCHMARK_2025-12-12.md: Cache measurement results - docs/analysis/LAST_MATCH_CACHE_IMPLEMENTATION.md: TLS cache investigation Research boxes (SS page table): - core/box/ss_pt_env_box.h: HAKMEM_SS_LOOKUP_KIND gate - core/box/ss_pt_types_box.h: 2-level page table structures - core/box/ss_pt_lookup_box.h: ss_pt_lookup() implementation - core/box/ss_pt_register_box.h: Page table registration - core/box/ss_pt_impl.c: Global definitions Updates: - docs/specs/ENV_VARS_COMPLETE.md: HOTCOLD, DEFERRED, SS_LOOKUP env vars - core/box/hak_free_api.inc.h: FREE-DISPATCH-SSOT integration - core/box/pool_mid_inuse_deferred_box.h: Deferred API updates - core/box/pool_mid_inuse_deferred_stats_box.h: Stats collection - core/hakmem_super_registry: SS page table integration Current Status: - FREE-TINY-FAST-DUALHOT-1: +13% improvement, ready for adoption - ALLOC-TINY-FAST-DUALHOT-1: -2% regression, frozen as research box - Next: Optimization roadmap per ROI (mimalloc gap 2.5x) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-13 05:35:46 +09:00
Moe Charm (CI)	b2724e6f5d	Phase ALLOC-TINY-FAST-DUALHOT-1: WIP (regression), FREE DUALHOT confirmed +13% ALLOC-TINY-FAST-DUALHOT-1 (this phase): - Implementation: malloc_tiny_fast() C0-C3 early-exit with policy snapshot skip - ENV: HAKMEM_TINY_ALLOC_DUALHOT=0/1 (default OFF) - A/B Result: -1.17% median regression (Mixed, 10-run) - Root Cause: Branch prediction penalty on C4-C7 outweighs policy skip benefit - Decision: Freeze as research box (default OFF) - Difference from FREE: ALLOC requires structural changes (per-class paths) FREE-TINY-FAST-DUALHOT-1 (verified): - A/B Confirmation: +13.00% improvement (42.08M → 47.81M ops/s, Mixed, 10-run) - Success Criteria: +2% target ACHIEVED - Health Check: PASS (verify_health_profiles.sh, ENV OFF/ON) - Safety: HAKMEM_TINY_LARSON_FIX guard in place - Decision: Promotion to MIXED_TINYV3_C7_SAFE profile candidate Next Steps: - Profile adoption of FREE DUALHOT for MIXED workload - No further deep-dive on ALLOC optimization (deferred to future phases) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-13 05:10:45 +09:00
Moe Charm (CI)	0a7400d7d3	Phase ALLOC-TINY-FAST-DUALHOT-1: C0-C3 alloc direct path (WIP, -2% regression) Add C0-C3 early-exit optimization to malloc_tiny_fast() similar to FREE-TINY-FAST-DUALHOT-1. Skip policy snapshot for C0-C3 classes. A/B Result (10-run, Mixed TINYV3_C7_SAFE): - Baseline: 47.27M ops/s (median) - Optimized: 46.10M ops/s (median) - Result: -2.00% (regression, needs investigation) ENV: HAKMEM_TINY_ALLOC_DUALHOT=0/1 (default OFF) Implementation: - core/front/malloc_tiny_fast.h: alloc_dualhot_enabled() + early-exit - Design: docs/analysis/ALLOC_TINY_FAST_DUALHOT_1_DESIGN.md Status: Research box (default OFF), needs root cause analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-13 04:28:52 +09:00
Moe Charm (CI)	fe70e3baf5	Phase MID-V35-HOTPATH-OPT-1 complete: +7.3% on C6-heavy Step 0: Geometry SSOT - New: core/box/smallobject_mid_v35_geom_box.h (L1/L2 consistency) - Fix: C6 slots/page 102→128 in L2 (smallobject_cold_iface_mid_v3.c) - Applied: smallobject_mid_v35.c, smallobject_segment_mid_v3.c Step 1-3: ENV gates for hotpath optimizations - New: core/box/mid_v35_hotpath_env_box.h * HAKMEM_MID_V35_HEADER_PREFILL (default 0) * HAKMEM_MID_V35_HOT_COUNTS (default 1) * HAKMEM_MID_V35_C6_FASTPATH (default 0) - Implementation: smallobject_mid_v35.c * Header prefill at refill boundary (Step 1) * Gated alloc_count++ in hot path (Step 2) * C6 specialized fast path with constant slot_size (Step 3) A/B Results: C6-heavy (257–768B): 8.75M→9.39M ops/s (+7.3%, 5-run mean) ✅ Mixed (16–1024B): 9.98M→9.96M ops/s (-0.2%, within noise) ✓ Decision: FROZEN - defaults OFF, C6-heavy推奨ON, Mixed現状維持 Documentation: ENV_PROFILE_PRESETS.md updated 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-12 19:19:25 +09:00
Moe Charm (CI)	e95e61f0ff	Phase POLICY-FAST-PATH-V2 complete + MID-V35-HOTPATH-OPT-1 design ## Phase POLICY-FAST-PATH-V2 (FROZEN) - Implementation complete: free_policy_fast_v2_box.h + malloc_tiny_fast.h integration - A/B Results: - Mixed (ws=400): -1.6% regression ❌ (branch cost > skip benefit) - C6-heavy (ws=200): +5.4% improvement ✅ - Decision: Default OFF, FROZEN (ws<300 / C6-heavy research only) - Learning: Large WS causes branch misprediction to dominate ## Phase 3-GRADUATE + ENV probe fix - 64-probe retry for getenv() stability during bench_profile putenv() - C6 ULTRA intrusive freelist: FROZEN (research box) ## Phase MID-V35-HOTPATH-OPT-1-DESIGN - Design doc for next optimization target - Target: MID v3.5 alloc/free hot path (C5-C6) - Boxes: Stats Gate, TLS Layout, Boundary Check elimination - Expected: +3-9% on Mixed mainline Files: - core/box/free_policy_fast_v2_box.h (new) - core/box/free_path_stats_box.h/c (policy_fast_v2_skip counter) - core/front/malloc_tiny_fast.h (fast-path integration) - docs/analysis/MID_V35_HOTPATH_OPT_1_DESIGN.md (new) - docs/analysis/PHASE_3_GRADUATE_*.md (new) - CURRENT_TASK.md (phase status update) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-12 18:40:08 +09:00
Moe Charm (CI)	1a8652a91a	Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成) Implement C6 ULTRA intrusive LIFO freelist with ENV gating: - Single-linked LIFO using next pointer at USER+1 offset - tiny_next_store/tiny_next_load for pointer access (single source of truth) - Segment learning via ss_fast_lookup (per-class seg_base/seg_end) - ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF) - Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS Files: - core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO - core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6) - core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new) - core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new) - core/tiny_debug_ring.h: C6_IFL events - core/box/free_path_stats_box.h/c: c6_ifl_* counters A/B Test Results (1M iterations, ws=200, 257-512B): - ENV_OFF (array): 56.6 Mop/s avg - ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise) - Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-12 16:26:42 +09:00
Moe Charm (CI)	bf83612b97	Phase v11a-4: Mixed本線ベンチマーク結果追加 Results: - C6-heavy (257-512B): +5.1% (34.0M → 35.8M ops/s) - Mixed 16-1024B: +4.4% (38.6M → 40.3M ops/s) Conclusion: Mixed本線で C6→MID v3.5 は採用候補。予測(+1-3%)を上回る +4-5% の改善を確認。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 07:17:52 +09:00
Moe Charm (CI)	212739607a	Phase v11a-3: MID v3.5 Activation (Build Complete) Integrated MID v3.5 into active code path, making it available for C5/C6/C7 routing. Key Changes: - Policy Box: Added SMALL_ROUTE_MID_V35 with ENV gates (HAKMEM_MID_V35_ENABLED, HAKMEM_MID_V35_CLASSES) - HotBox: Implemented small_mid_v35_alloc/free with TLS-cached page allocation - Front Gate: Wired MID_V35 routing into malloc_tiny_fast.h (priority: ULTRA > MID_V35 > V7) - Build: Added core/smallobject_mid_v35.o to all object lists Architecture: - Slot sizes: C5=384B, C6=512B, C7=1024B - Page size: 64KB (170/128/64 slots) - Integration: ColdIface v2 (refill/retire), Stats v2 (observation), Learner v2 (dormant) Status: Build successful, ready for A/B benchmarking Next: Performance validation (C6-heavy, C5+C6-only, Mixed benchmarks) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 06:52:14 +09:00
Moe Charm (CI)	57313f7822	Phase v11a: Architecture design and implementation roadmap documents Create comprehensive design specifications for Phase v11a (MID v3.5): 1. PHASE_V11A_DESIGN_MID_V3.5.md - Decision rationale: Option A chosen (consolidation vs expansion) - MID v3.5 architecture: unified 257-1KiB box - Role clarification: v7 frozen as research preset - Learner v2 scope: multi-class tracking, C5 ratio primary decision - Segment design decision: shared segment (Design B) vs separate segments - Stats expansion: per-class efficiency metrics - API changes: minimal, backward compatible 2. PHASE_V11A_IMPLEMENTATION_ROADMAP.md - Detailed task breakdown for v11a-1, v11a-2, v11a-3 - File structure: new boxes, implementation files, modified files - Concrete function signatures and integration points - Benchmark commands and expected performance - Dependency graph and implementation order - Build/Makefile changes needed - Testing strategy and regression checks Key Design Decisions: - Multi-class segment uses shared 2MiB segment (not separate) - Per-class free page stacks for efficient refill - Stats published per-page retire (for Learner ingestion) - TLS version-based cache invalidation (atomic policy updates) - Backward compatibility: Policy v2 extends v1 interface Next Step: Phase v11a-2 (Core Implementation) - Implement segment creation/alloc/free - Add C7 support to existing MID_v3 - Stats recording during page retire - Learner aggregation logic 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-12 06:20:14 +09:00
Moe Charm (CI)	397aea0131	Phase v10: Freeze v7 as C5/C6-only research preset Documentation: Baseline fixed per Phase v10 - HAKMEM_V2_GENERATION_SUMMARY.md: - v7 repositioned as 「C5/C6 專用研究箱」 - Mixed baseline: HAKMEM_SMALL_HEAP_V7_ENABLED=0 (OFF) - Added Phase v7-7 (Learner), Phase v10 (legacy removal) - Learner performance: +127% on C5/C6 workload - Size class table: segregated Mixed (v7 OFF) vs C5/C6 preset (v7 ON) - ENV_PROFILE_PRESETS.md: - MIXED_TINYV3_C7_SAFE: explicitly v7 OFF (Mixed baseline) - NEW: C5_C6_SMALL_HEAP_V7_LEARNER profile - Learner dynamic route switching documentation - Test commands and expected performance (38-39M ops/s) - Phase v10 deprecation notice (v3/v4/v5 removed) Purpose: - Set clear baseline: v7 OFF for Mixed, ON for C5/C6 benchmarks - Document Learner preset for future reference - No code changes (docs-only checkpoint) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 06:13:15 +09:00
Moe Charm (CI)	79674c9390	Phase v10: Remove legacy v3/v4/v5 implementations Removal strategy: Deprecate routes by disabling ENV-based routing - v3/v4/v5 enum types kept for binary compatibility - small_heap_v3/v4/v5_enabled() always return 0 - small_heap_v3/v4/v5_class_enabled() always return 0 - Any v3/v4/v5 ENVs are silently ignored, routes to LEGACY Changes: - core/box/smallobject_hotbox_v3_env_box.h: stub functions - core/box/smallobject_hotbox_v4_env_box.h: stub functions - core/box/smallobject_v5_env_box.h: stub functions - core/front/malloc_tiny_fast.h: remove alloc/free cases (20+ lines) Benefits: - Cleaner routing logic (v6/v7 only for SmallObject) - 20+ lines deleted from hot path validation - No behavioral change (routes were rarely used) Performance: No regression expected (v3/v4/v5 already disabled by default) Next: Set Learner v7 default ON, production testing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 06:09:12 +09:00
Moe Charm (CI)	ed7e1285eb	Phase v7-6: Mixed A/B + Learner design (workload-dependent routes) Mixed 16-1024B A/B results: - v7 OFF: 41.3M ops/s (baseline) - v7 C6-only: 41.5M ops/s (+0.5%) - v7 C5+C6: 38.0M ops/s (-8.0%) ← C5 hurts in Mixed! Key finding: C5 route is workload-dependent - C5+C6 heavy (257-768B): C5+C6 v7 is +4.3% faster - Mixed 16-1024B: C5+C6 v7 is -8.0% slower Learner design: - SmallLearnerStatsV7 aggregate structure - small_policy_v7_update_from_learner() API - L3 updates snapshot, L1/L0 reads only C4 v7 and Intrusive LIFO marked as on-hold. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 05:18:44 +09:00
Moe Charm (CI)	d5aa3110c6	Phase v7-5b: C5+C6 multi-class expansion (+4.3% improvement) - Add C5 (256B blocks) support alongside C6 (512B blocks) - Same segment shared between C5/C6 (page_meta.class_idx distinguishes) - SMALL_V7_CLASS_SUPPORTED() macro for class validation - Extend small_v7_block_size() for C5 (switch statement) A/B Result: C6-only v7 avg 7.64M ops/s → C5+C6 v7 avg 7.97M ops/s (+4.3%) Criteria: C6 protected ✅, C5 net positive ✅, TLS bloat none ✅ ENV: HAKMEM_SMALL_HEAP_V7_CLASSES=0x60 (bit5+bit6) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 05:11:02 +09:00
Moe Charm (CI)	17ceed619c	Phase v7-5a: Hot path stats removal (C6 v7 極限最適化) - Remove per-page stats from hot path (alloc_count, free_count, live_current) - Add ENV-gated global atomic stats (HAKMEM_V7_HOT_STATS) - Stats now collected only at retire time (cold path) - Header write kept at alloc time (freelist overlaps block[0]) A/B Result: -4.3% overhead → ±0% (target: legacy ±2%) v7 OFF avg: 9.26M ops/s, v7 ON avg: 9.27M ops/s (+0.15%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 04:51:17 +09:00
Moe Charm (CI)	580e8f57f7	docs: V7 Architecture Decision Matrix (mimalloc 競争力評価) - mimalloc vs HAKMEM v7 feature-by-feature 比較表 - v7-5a vs v7-5b 決定基準フレームワーク - Intrusive LIFO 採用検討 - TLS cache hit rate 目標 - Overhead 内訳の実測計画結論: v7-5a (C6 極限最適化) を先に実施目標: Intrusive LIFO + Headerless で mimalloc 同等性能 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 04:36:37 +09:00
Moe Charm (CI)	ea905b2ccb	docs: HAKMEM v2 generation summary and Phase v7-4 completion - Add HAKMEM_V2_GENERATION_SUMMARY.md: comprehensive overview of v2 generation - Update CURRENT_TASK.md: 'v2 generation complete' section - Update SMALLOBJECT_V7_DESIGN.md: Phase v7-4 completion notes + v7-5 candidates v2 generation freeze: ULTRA (FROZEN) / MID_v3 (stable) / v7 (research, code freeze) Next: HakORune / JoinIR priority, HAKMEM resumes at v7-5 (multi-class expansion) Layer structure (L0-L3) established, Box Theory implementation patterns confirmed. Design documents serve as maps for future v7 second chapter. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-12 04:00:55 +09:00
Moe Charm (CI)	8143e8b797	Phase v7-4: Policy Box 導入 (L3 層の明確化とフロント芯の作り直し) - SmallPolicyV7 Box: L3 Policy layer に配置、route 決定を一元化 - Route kind enum: SMALL_ROUTE_ULTRA / V7 / MID_V3 / LEGACY - ENV priority (fixed): ULTRA > v7 > MID_v3 > LEGACY - Frontend integration: v7 routing を Policy Box 経由に変更 (段階移行) - Legacy compatibility: 既存の tiny_route_env_box.h は併用維持 Box Theory layer structure: - L0: ULTRA (C4-C7, FROZEN) - L1: SmallObject v7 (research box) - L1': MID_v3 / LEGACY (fallback) - L2: Segment / RegionId - L3: Policy / Stats / Learner ← Policy Box added here Frontend now follows clean "size→class→route_kind→switch" pattern. ENV variables read once at Policy init, not scattered across frontend. Future: ULTRA/MID_v3/LEGACY consolidation, Learner integration, flexible priority. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-12 03:50:58 +09:00
Moe Charm (CI)	0af409260d	docs: Phase v7-2 results + Phase v7-3 design (TLS fast path + page_meta cache)	2025-12-12 03:13:13 +09:00
Moe Charm (CI)	39a3c53dbc	Phase v7-2: SmallObject v7 C6-only implementation with RegionIdBox integration - SmallSegment_v7: 2MiB segment with TLS slot and free page stack - ColdIface_v7: Page refill/retire between HotBox and SegmentBox - HotBox_v7: Full C6-only alloc/free with header writing (HEADER_MAGIC\|class_idx) - Free path early-exit: Check v7 route BEFORE ss_fast_lookup (separate mmap segment) - RegionIdBox: Register v7 segment for ptr->region lookup - Benchmark: v7 ON ~54.5M ops/s (-7% overhead vs 58.6M legacy baseline) v7 correctly balances alloc/free counts and page lifecycle. RegionIdBox overhead identified as primary cost driver. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-12 03:12:28 +09:00
Moe Charm (CI)	a8d0ab06fc	MID-V3: Specialize to 257-768B, exclude C7 (ULTRA handles 1KB) Role separation based on ultrathink analysis: - MID v3: 257-768B専用 (C6 only, HAKMEM_MID_V3_CLASSES=0x40) - C7 ULTRA: 769-1024B専用 (existing optimized path) Changes: - core/box/hak_alloc_api.inc.h: Remove C7 route, restrict to 257-768B - core/box/mid_hotbox_v3_env_box.h: Update ENV comments - docs/analysis/MID_POOL_V3_DESIGN.md: Add performance results & role - CURRENT_TASK.md: Document MID-V3 completion & role separation Verified: - 257-768B with v3 ON: 1,199,526 ops/s (+1.7% vs baseline) - 769-1024B with v3 ON: 1,181,254 ops/s (same as baseline, C7 excluded) - C7 correctly routes to ULTRA instead of MID v3 Rationale: C7-only showed -11% regression, but C6/mixed showed +11-19% improvement. Specializing to mid-range (257-768B) leverages v3 strengths while keeping C7 on the proven ULTRA path. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-12 01:14:13 +09:00
Moe Charm (CI)	2b35de2123	MID-V3 Phase 0-2: Design doc, type skeleton, and RegionIdBox API - MID-V3-0: Create design doc (docs/analysis/MID_POOL_V3_DESIGN.md) - Lane vs Page role clarification - Phase plan and checklist - MID-V3-1: Type skeleton + ENV - MidHotBoxV3, MidLaneV3, MidPageDescV3 structures - ENV controls (HAKMEM_MID_V3_ENABLED, HAKMEM_MID_V3_CLASSES) - Cold interface declarations - MID-V3-2 (V6-HDR-2): RegionIdBox Registration API completion - RegionEntry structure with sorted array storage - Binary search lookup implementation - region_id_register_v6() / region_id_unregister_v6() - REGION_KIND_MID_V3 added to enum 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 00:46:25 +09:00
Moe Charm (CI)	df216b6901	Phase V6-HDR-3: SmallSegmentV6 実割り当て & RegionIdBox Registration 実装内容: 1. SmallSegmentV6のmmap割り当ては既に v6-0で実装済み 2. small_heap_ctx_v6() で segment 取得時に region_id_register_v6_segment() 呼び出し 3. region_id_v6.c に TLS スコープのセグメント登録ロジック実装: - 4つの static __thread 変数でセグメント情報をキャッシュ - region_id_register_v6_segment(): セグメント base/end を TLS に記録 - region_id_lookup_v6(): TLS segment の range check を最初に実行 - TLS cache 更新で O(1) lookup 実現 4. region_id_v6_box.h に SmallSegmentV6 type include & function 宣言追加 5. small_v6_region_observe_validate() に region_id_observe_lookup() 呼び出し追加効果: - HeaderlessデザインでRegionIdBoxが正式にSMALL_V6分類を返せるように - TLS-scopedな簡潔な登録メカニズム (マルチスレッド対応) - Fast path: TLS segment range check -> page_meta lookup - Fall back path: 従来の small_page_meta_v6_of() による動的検出 - Latency: O(1) TLS cache hit rate がv6 alloc/free の大部分をカバー 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 23:51:48 +09:00

1 2

100 Commits