Commit Graph

22 Commits

Author SHA1 Message Date
75e20b29cc Phase 5 E5-1: Promote to preset + next target instructions
E5-1 Promotion:
- Added HAKMEM_FREE_TINY_DIRECT=1 to MIXED_TINYV3_C7_SAFE preset
- Updated ENV_PROFILE_PRESETS.md with rollback instructions
- Rollback: HAKMEM_FREE_TINY_DIRECT=0

A/B Test Clarification:
- Documented bench_setenv_default vs export ENV=0 interaction
- bench_setenv_default only sets if ENV is unset
- To force OFF in A/B: use value that differs from default

Next Target Selection (E5-2 vs E5-3):
- E5-2: Header write reduction (tiny_region_id_write_header)
- E5-3: ENV snapshot gate shape optimization
- Decision requires fresh perf profile on new baseline

Deliverables:
- docs/analysis/PHASE5_E5_1_FREE_TINY_DIRECT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E5_NEXT_INSTRUCTIONS.md (updated)
- docs/analysis/ENV_PROFILE_PRESETS.md (E5-1 added)
- docs/analysis/PHASE5_E5_1_FREE_TINY_DIRECT_1_AB_TEST_RESULTS.md (clarified)
- CURRENT_TASK.md (progress links)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md (progress links)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 05:59:43 +09:00
5528612f2a Phase 5 E4-2: Malloc Wrapper ENV Snapshot (+21.83% GO, ADOPTED)
Target: Consolidate malloc wrapper TLS reads + eliminate function calls
- malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% combined
- Strategy: E4-1 success pattern + function call elimination

Implementation:
- ENV gate: HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/malloc_wrapper_env_snapshot_box.{h,c}: New box
  - Consolidates multiple TLS reads → 1 TLS read
  - Pre-caches tiny_max_size() == 256 (eliminates function call)
  - Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in malloc() wrapper
- Makefile: Add malloc_wrapper_env_snapshot_box.o to all targets

A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 35.74M ops/s (mean), 35.75M ops/s (median)
- Optimized (SNAPSHOT=1): 43.54M ops/s (mean), 43.92M ops/s (median)
- Improvement: +21.83% mean, +22.86% median (+7.80M ops/s)

Decision: GO (+21.83% >> +1.0% threshold, 21.8x over)
- Why 6.2x better than E4-1 (+3.51%)?
  - Higher malloc call frequency (allocation-heavy workload)
  - Function call elimination (tiny_max_size pre-cached)
  - Larger target: 35.63% vs free's 25.26%
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset

Phase 5 Cumulative (estimated):
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- E4-2 (Malloc Wrapper Snapshot): +21.83%
- Estimated combined: ~+30% (needs validation)

Next Steps:
- Combined A/B test (E4-1 + E4-2 simultaneously)
- Measure actual cumulative effect
- Profile new baseline for next optimization targets

Deliverables:
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-2 added)
- CURRENT_TASK.md (E4-2 complete)
- core/bench_profile.h (E4-2 promoted to default)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 05:13:29 +09:00
4a070d8a14 Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path

Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
  - Consolidates 2 TLS reads → 1 TLS read (50% reduction)
  - Reduces 4 branches → 3 branches (25% reduction)
  - Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets

A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median

Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset

Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%

E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
  - Branch prediction hint mismatch (UNLIKELY with always-true)
  - Retest confirmed -1.78% regression
  - Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer

Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
21e2e4ac2b Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init

Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
  - Reads ENV before main() when CTOR=1
  - Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
  - CTOR=1 fast path: direct global read (no lazy branch)
  - CTOR=0 fallback: legacy lazy init (rollback safe)
  - Branch hints adjusted for default OFF baseline

A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median

Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion

Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%

Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
7f3ff6c7e6 Phase 4: E1 docs + E2 next instructions 2025-12-14 01:46:18 +09:00
11b0e3f32b Phase 4 D3: alloc gate shape (env-gated) 2025-12-14 00:26:57 +09:00
141cd8a5be Phase 3 Closure & Phase 4 Preparation
Summary:
- Phase 3 optimization complete (cumulative +8.93%)
- D1 promoted to default (HAKMEM_FREE_STATIC_ROUTE=1, +2.19%)
- D2 frozen (NO-GO, -1.44% regression)
- Phase 4 instructions prepared (D3/Alloc Gate Specialization)

Results:
  B3 (Routing shape): +2.89%
  B4 (Wrapper split): +1.47%
  C3 (Static routing): +2.20%
  C1 (TLS prefetch): NEUTRAL (-0.34%, research box)
  C2 (Metadata cache): NEUTRAL (-0.45%, research box)
  D1 (Free route cache): +2.19% (now default)
  D2 (Wrapper env cache): NO-GO (-1.44%, frozen)
  MID_V3 fix: +13% (structural)

Total Phase 2-3 gain: ~8.93% (37.5M → 51M ops/s)

Updated:
- CURRENT_TASK.md: Phase 3 final results + D3 conditions
- ENV_PROFILE_PRESETS.md: Active optimizations listed
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: Phase 3→4 transition
- PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md: D3 execution plan
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-validation status

Next phase: Phase 4 D3 - Alloc Gate Specialization
- Requires: tiny_alloc_gate_fast self% ≥5% from perf
- Design SSOT: PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md
- Execution: PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 23:47:19 +09:00
50bded8c85 Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
  - Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
  - Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
  - Mean gain: +2.19%, Median gain: +2.37%
  - Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
  - Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset

- D2 (Wrapper env cache): FROZEN
  - Previous result: -1.44% regression (TLS overhead > benefit)
  - Status: Research box (do not pursue further)
  - Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)

- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)

Cumulative Gains (Phase 2-3):
  B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
  Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
  MID_V3 fix: +13% (structural change, Mixed OFF by default)

Documentation Updates:
  - PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
  - PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
  - PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
  - PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
  - ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
  - PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
  - CURRENT_TASK.md: Phase 3 complete summary

Next:
  - D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
  - Or Phase 4 planning if no more D3-class targets
  - Current active optimizations: B3, B4, C3, D1, MID_V3 fix

Files Changed:
  - docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
  - docs/analysis/*.md (6 files updated with D1/D2 results)
  - CURRENT_TASK.md (Phase 3 status update)
  - analyze_d1_results.py (statistical analysis script)
  - core/bench_profile.h (D1 promoted to default in MIXED preset)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
19056282b6 Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)

Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)

A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)

Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings

Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)

Decision: FREEZE as research box (default OFF, regression confirmed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
d0b931b197 Phase 3 C1: TLS Prefetch Implementation - NEUTRAL Result (Research Box)
Step 1 & 2 Complete:
- Implemented: core/front/malloc_tiny_fast.h prefetch (lines 264-267, 331-334)
  - LEGACY path prefetch of g_unified_cache[class_idx] to L1
  - ENV gate: HAKMEM_TINY_PREFETCH=0/1 (default OFF)
  - Conditional: only when prefetch enabled + route_kind == LEGACY

- A/B test (Mixed 10-run): PREFETCH=0 (39.33M) → =1 (39.20M) = -0.34% avg
  - Median: +1.28% (within ±1.0% neutral range)
  - Result: 🔬 NEUTRAL (research box, default OFF)

Decision: FREEZE as research box
- Average -0.34% suggests prefetch overhead > benefit
- Prefetch timing too late (after route_kind selection)
- TLS cache access is already fast (head/tail indices)
- Actual memory wait happens at slots[] array access (after prefetch)

Technical Learning:
- Prefetch effectiveness depends on L1 miss rate at access time
- Inserting prefetch after route selection may be too late
- Future approach: move prefetch earlier or use different target

Next: Phase 3 C2 (Metadata Cache Optimization, expected +5-10%)

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 19:01:57 +09:00
d54893ea1d Phase 3 C3: Static Routing A/B Test ADOPT (+2.20% Mixed gain)
Step 2 & 3 Complete:
- A/B test (Mixed 10-run): STATIC_ROUTE=0 (38.91M) → =1 (39.77M) = +2.20% avg
  - Median gain: +1.98%
  - Result:  GO (exceeds +1.0% threshold)

- Decision:  ADOPT into MIXED_TINYV3_C7_SAFE preset
  - bench_profile.h line 77: HAKMEM_TINY_STATIC_ROUTE=1 default
  - Learner auto-disables static route when HAKMEM_SMALL_LEARNER_V7_ENABLED=1

Implementation Summary:
- core/box/tiny_static_route_box.{h,c}: Research box (Step 1A)
- core/front/malloc_tiny_fast.h: Route lookup integration (Step 1B, lines 249-256)
- core/bench_profile.h: Bench sync + preset adoption

Cumulative Phase 2-3 Gains:
- B3 (Routing shape): +2.89%
- B4 (Wrapper split): +1.47%
- C3 (Static routing): +2.20%
- Total: ~6.8% (35.2M → ~39.8M ops/s)

Next: Phase 3 C1 (TLS Prefetch, expected +2-4%)

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 18:46:11 +09:00
fe70e3baf5 Phase MID-V35-HOTPATH-OPT-1 complete: +7.3% on C6-heavy
Step 0: Geometry SSOT
  - New: core/box/smallobject_mid_v35_geom_box.h (L1/L2 consistency)
  - Fix: C6 slots/page 102→128 in L2 (smallobject_cold_iface_mid_v3.c)
  - Applied: smallobject_mid_v35.c, smallobject_segment_mid_v3.c

Step 1-3: ENV gates for hotpath optimizations
  - New: core/box/mid_v35_hotpath_env_box.h
    * HAKMEM_MID_V35_HEADER_PREFILL (default 0)
    * HAKMEM_MID_V35_HOT_COUNTS (default 1)
    * HAKMEM_MID_V35_C6_FASTPATH (default 0)
  - Implementation: smallobject_mid_v35.c
    * Header prefill at refill boundary (Step 1)
    * Gated alloc_count++ in hot path (Step 2)
    * C6 specialized fast path with constant slot_size (Step 3)

A/B Results:
  C6-heavy (257–768B): 8.75M→9.39M ops/s (+7.3%, 5-run mean) 
  Mixed (16–1024B): 9.98M→9.96M ops/s (-0.2%, within noise) ✓

Decision: FROZEN - defaults OFF, C6-heavy推奨ON, Mixed現状維持
Documentation: ENV_PROFILE_PRESETS.md updated

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 19:19:25 +09:00
e95e61f0ff Phase POLICY-FAST-PATH-V2 complete + MID-V35-HOTPATH-OPT-1 design
## Phase POLICY-FAST-PATH-V2 (FROZEN)
- Implementation complete: free_policy_fast_v2_box.h + malloc_tiny_fast.h integration
- A/B Results:
  - Mixed (ws=400): -1.6% regression  (branch cost > skip benefit)
  - C6-heavy (ws=200): +5.4% improvement 
- Decision: Default OFF, FROZEN (ws<300 / C6-heavy research only)
- Learning: Large WS causes branch misprediction to dominate

## Phase 3-GRADUATE + ENV probe fix
- 64-probe retry for getenv() stability during bench_profile putenv()
- C6 ULTRA intrusive freelist: FROZEN (research box)

## Phase MID-V35-HOTPATH-OPT-1-DESIGN
- Design doc for next optimization target
- Target: MID v3.5 alloc/free hot path (C5-C6)
- Boxes: Stats Gate, TLS Layout, Boundary Check elimination
- Expected: +3-9% on Mixed mainline

Files:
- core/box/free_policy_fast_v2_box.h (new)
- core/box/free_path_stats_box.h/c (policy_fast_v2_skip counter)
- core/front/malloc_tiny_fast.h (fast-path integration)
- docs/analysis/MID_V35_HOTPATH_OPT_1_DESIGN.md (new)
- docs/analysis/PHASE_3_GRADUATE_*.md (new)
- CURRENT_TASK.md (phase status update)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-12 18:40:08 +09:00
1a8652a91a Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)
Implement C6 ULTRA intrusive LIFO freelist with ENV gating:
- Single-linked LIFO using next pointer at USER+1 offset
- tiny_next_store/tiny_next_load for pointer access (single source of truth)
- Segment learning via ss_fast_lookup (per-class seg_base/seg_end)
- ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF)
- Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS

Files:
- core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO
- core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6)
- core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new)
- core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new)
- core/tiny_debug_ring.h: C6_IFL events
- core/box/free_path_stats_box.h/c: c6_ifl_* counters

A/B Test Results (1M iterations, ws=200, 257-512B):
- ENV_OFF (array): 56.6 Mop/s avg
- ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise)
- Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 16:26:42 +09:00
397aea0131 Phase v10: Freeze v7 as C5/C6-only research preset
Documentation: Baseline fixed per Phase v10
- HAKMEM_V2_GENERATION_SUMMARY.md:
  - v7 repositioned as 「C5/C6 專用研究箱」
  - Mixed baseline: HAKMEM_SMALL_HEAP_V7_ENABLED=0 (OFF)
  - Added Phase v7-7 (Learner), Phase v10 (legacy removal)
  - Learner performance: +127% on C5/C6 workload
  - Size class table: segregated Mixed (v7 OFF) vs C5/C6 preset (v7 ON)

- ENV_PROFILE_PRESETS.md:
  - MIXED_TINYV3_C7_SAFE: explicitly v7 OFF (Mixed baseline)
  - NEW: C5_C6_SMALL_HEAP_V7_LEARNER profile
    - Learner dynamic route switching documentation
    - Test commands and expected performance (38-39M ops/s)
    - Phase v10 deprecation notice (v3/v4/v5 removed)

Purpose:
- Set clear baseline: v7 OFF for Mixed, ON for C5/C6 benchmarks
- Document Learner preset for future reference
- No code changes (docs-only checkpoint)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 06:13:15 +09:00
39a3c53dbc Phase v7-2: SmallObject v7 C6-only implementation with RegionIdBox integration
- SmallSegment_v7: 2MiB segment with TLS slot and free page stack
- ColdIface_v7: Page refill/retire between HotBox and SegmentBox
- HotBox_v7: Full C6-only alloc/free with header writing (HEADER_MAGIC|class_idx)
- Free path early-exit: Check v7 route BEFORE ss_fast_lookup (separate mmap segment)
- RegionIdBox: Register v7 segment for ptr->region lookup
- Benchmark: v7 ON ~54.5M ops/s (-7% overhead vs 58.6M legacy baseline)

v7 correctly balances alloc/free counts and page lifecycle.
RegionIdBox overhead identified as primary cost driver.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 03:12:28 +09:00
9c24bebf08 Phase v5-1: SmallObject v5 C6-only route stub 接続
- tiny_route_env_box.h: TINY_ROUTE_SMALL_HEAP_V5 enum 追加、route snapshot で C6→v5 分岐
- malloc_tiny_fast.h: alloc/free switch に v5 case 追加(v1/pool fallback)
- smallobject_hotbox_v5.c: stub 実装(alloc は NULL 返却、free は no-op)
- smallobject_hotbox_v5_box.h: 関数 signature に ctx パラメータ追加
- Makefile: core/smallobject_hotbox_v5.o をリンクリストに追加
- ENV_PROFILE_PRESETS.md: v5-1 プリセット追記
- CURRENT_TASK.md: Phase v5-1 完了記録

**特性**:
- ENV: HAKMEM_SMALL_HEAP_V5_ENABLED=1 / HAKMEM_SMALL_HEAP_V5_CLASSES=0x40 で opt-in
- テスト結果: C6-heavy (v5 OFF 15.5M → v5 ON 16.4M ops/s, 正常), Mixed 47.2M ops/s, SEGV/assert なし
- 挙動は v1/pool fallback と同じ(実装は v5-2)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 03:25:37 +09:00
3b4449d773 Phase v4-mid-1: C6-only v4 route + page_meta_of() Fail-Fast validation
Implementation:
- SMALL_SEGMENT_V4_* constants (SIZE=2MiB, PAGE_SIZE=64KiB, MAGIC=0xDEADBEEF)
- smallsegment_v4_page_meta_of(): O(1) mask+shift lookup with magic validation
  - Computes segment base: addr & ~(2MiB - 1)
  - Verifies SmallSegment magic number
  - Calculates page_idx: (addr - seg_base) >> PAGE_SHIFT (16)
  - Returns non-NULL sentinel for now (full page_meta[] in Phase v4-mid-2)

Stubs for C6-only phase:
- small_heap_alloc_fast_v4(): C6 returns NULL → pool v1 fallback
- small_heap_free_fast_v4(): C6 calls page_meta_of() for Fail-Fast, then pool v1 fallback

Documentation:
- ENV_PROFILE_PRESETS.md: Add "C6_ONLY_SMALLOBJECT_V4" research profile
  - HAKMEM_SMALL_HEAP_V4_ENABLED=1, HAKMEM_SMALL_HEAP_V4_CLASSES=0x40
  - Expected: Throughput ≈ 28–29M ops/s (same as v1)

Build:
- ビルド成功(警告のみ)
- Backward compatible, alloc/free stubs fall back to pool v1

Sanity:
- C6-heavy with v4 opt-in: segv/assert なし
- page_meta_of() lookup working correctly
- Performance unchanged (expected for stub phase)

Status:
- C6-only v4 route now available via ENV opt-in
- Phase v4-mid-2: SmallHeapCtx v4 full implementation with A/B

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 23:37:45 +09:00
2a13478dc7 Optimize C6 heavy and C7 ultra performance analysis with refined design refinements
- Update environment profile presets and visibility analysis
- Enhance small object and tiny segment v4 box implementations
- Refine C7 ultra and C6 heavy allocation strategies
- Add comprehensive performance metrics and design documentation

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 22:57:26 +09:00
d576116484 Document current Mixed baseline throughput and ENV profile 2025-12-10 14:12:13 +09:00
406a2f4d26 Incremental improvements: mid_desc cache, pool hotpath optimization, and doc updates
**Changes:**
- core/box/pool_api.inc.h: Code organization and micro-optimizations
- CURRENT_TASK.md: Updated Phase MD1 (mid_desc TLS cache: +3.2% for C6-heavy)
- docs/analysis files: Various analysis and documentation updates
- AGENTS.md: Agent role clarifications
- TINY_FRONT_V3_FLATTENING_GUIDE.md: Flattening strategy documentation

**Verification:**
- random_mixed_hakmem: 44.8M ops/s (1M iterations, 400 working set)
- No segfaults or assertions across all benchmark variants
- Stable performance across multiple runs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 14:00:57 +09:00
acc64f2438 Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement)
## Summary
- ChatGPT により bench_profile.h の setenv segfault を修正(RTLD_NEXT 経由に切り替え)
- core/box/pool_zero_mode_box.h 新設:ENV キャッシュ経由で ZERO_MODE を統一管理
- core/hakmem_pool.c で zero mode に応じた memset 制御(FULL/header/off)
- A/B テスト結果:ZERO_MODE=header で +15.34% improvement(1M iterations, C6-heavy)

## Files Modified
- core/box/pool_api.inc.h: pool_zero_mode_box.h include
- core/bench_profile.h: glibc setenv → malloc+putenv(segfault 回避)
- core/hakmem_pool.c: zero mode 参照・制御ロジック
- core/box/pool_zero_mode_box.h (新設): enum/getter
- CURRENT_TASK.md: Phase ML1 結果記載

## Test Results
| Iterations | ZERO_MODE=full | ZERO_MODE=header | Improvement |
|-----------|----------------|-----------------|------------|
| 10K       | 3.06 M ops/s   | 3.17 M ops/s    | +3.65%     |
| 1M        | 23.71 M ops/s  | 27.34 M ops/s   | **+15.34%** |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-10 09:08:18 +09:00