Removal strategy: Deprecate routes by disabling ENV-based routing - v3/v4/v5 enum types kept for binary compatibility - small_heap_v3/v4/v5_enabled() always return 0 - small_heap_v3/v4/v5_class_enabled() always return 0 - Any v3/v4/v5 ENVs are silently ignored, routes to LEGACY Changes: - core/box/smallobject_hotbox_v3_env_box.h: stub functions - core/box/smallobject_hotbox_v4_env_box.h: stub functions - core/box/smallobject_v5_env_box.h: stub functions - core/front/malloc_tiny_fast.h: remove alloc/free cases (20+ lines) Benefits: - Cleaner routing logic (v6/v7 only for SmallObject) - 20+ lines deleted from hot path validation - No behavioral change (routes were rarely used) Performance: No regression expected (v3/v4/v5 already disabled by default) Next: Set Learner v7 default ON, production testing ð€ Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
94 KiB
HAKMEM ç¶æ³ã¡ã¢ïŒã³ã³ãã¯ãç, 2025-12-11ïŒ
ãã®ãã¡ã€ã«ã¯ãããŸäœãåºæºã« A/B ãããããã©ã®ç®±ãæ¬ç·ããã ããçããŸãšãããã®ã§ãã
éå»ãã§ãŒãºã®è©³çްãªãã°ã¯ CURRENT_TASK_ARCHIVE_20251210.md ãšå docs/analysis/* ã«æ®ããŠããŸãã
ããŸã®æ¬çã¿ã¹ã¯ïŒHAKMEM v3 / v7 åãã¡ã¢ïŒ
- HAKMEM v2 äžä»£ã¯ã第1ç« ã®å®æçãïŒULTRA + MID v3 + v6/v7 ç ç©¶ç®±ïŒãããããå 㯠v2 ãããŒã¹ã©ã€ã³ã«ãã€ã€ãv3/v7 ã§ããã«æ»ãããã§ãŒãºã
- ããŸä»¥éã®ãv3 / v7 æ¬çã¿ã¹ã¯ãã¯ãã® 3 ã€ã ã:
- SmallObjectHotBox_v7 ã small/mid åãã³ã¢ãšããŠåèšèšãã
- æ¢å C6-only v7 å®è£
ãš
SMALLOBJECT_V7_DESIGN.mdãèªã¿è¿ããsmallãmid å šäœã 1 åã® SmallHeapCtx_v7 ã§èŠãèšèšãåºããïŒULTRA 㯠L0 ã®ãŸãŸç¶æïŒã
- æ¢å C6-only v7 å®è£
ãš
- PolicyBox v7 ãå
šã¯ã©ã¹ã«æ¡åŒµãã
- ããŸã¯ C6 v7 çš stub ã ã Policy çµç±ãå°æ¥ã¯ C2ãC7 ãã¹ãŠã
route_kind[class]ã§æ±ºããããã«ããtiny_route_env_box.hãæ®µéçã«çž®éãããã
- ããŸã¯ C6 v7 çš stub ã ã Policy çµç±ãå°æ¥ã¯ C2ãC7 ãã¹ãŠã
- RegionId / Segment / PageStats ã®å
±éãç©çå±€ãã small/mid/pool ã«å±éããæ¹éãæ±ºãã
- v6/v7 ã§äœã£ã RegionIdBox/SmallSegment/PageStats ã®ãã¿ãŒã³ããmid/pool v3 ã«ã©ãåå©çšããããèšèšããŒãã«èœãšãïŒå®è£ ã¯å¥ãã§ãŒãºïŒã
- SmallObjectHotBox_v7 ã small/mid åãã³ã¢ãšããŠåèšèšãã
- ãã以å€ã®æé©åïŒULTRA ã®åŸ®èª¿æŽã MID v3 / v7 ã®æåãïŒã¯ãããŸãã¿ã¹ã¯ããšããŠæ±ããäžã® 3 ã€ãæããŸã§ã¯ 第2äžä»£ïŒv3 ã³ã¢ïŒã®æ¬çãåããéããªãã
Phase V7-5a: C6 v7 極éæé©åïŒHot path stats åé€, 2025-12-12ïŒ
ç®ç
- C6-only SmallObjectHotBox_v7 ã® -4.3% overheadïŒPhase v7-3 æç¹ïŒããHot path ããã® stats æŽæ°åé€ã ã㧠±0% ä»è¿ãŸã§æ»ãã
宿œå 容
- v7 C6 Hot path ãã per-page stats æŽæ°ãåé€ã
alloc_count++ / free_count++ / live_current++/--ã ColdIface çµè·¯ïŒrefill/retire æïŒã«ç§»åã- Hot path ã§ã® stats ã¯
HAKMEM_V7_HOT_STATS=1ã®ãšãã ã ENV ã²ãŒãã§æå¹ïŒããã©ã«ã OFFïŒã
- Header-at-carve-time å®å
šç§»è¡ã¯èŠéãã
- freelist ã block[0] ã next pointer ãšããŠäœ¿ã£ãŠããããcarve æã ã header writeã㯠v7 çŸè¡æ§é ã§ã¯å®å šã«ã§ããªãããããããã¯åŒãç¶ã alloc æã« 1 byte æžãåæãç¶æã
çµæïŒC6-heavy ãã³ãïŒ
| Metric | v7 OFF (MID v3) | v7 ON (v7-5a) |
|---|---|---|
| Throughput (avg) | 9.26M ops/s | 9.27M ops/s |
| å·®å | baseline | +0.15% |
- ç®æšã ã£ã
-4.3% â ±2%ãæºãããC6-only v7 㯠MID v3 ãšã»ãŒåçïŒÂ±0%ïŒãŸã§æ¹åã - v7 C6 ã³ã¢ã¯ãæ§èœçã« MID v3 ãšåŒµãåããç ç©¶ç®±ããšããŠã次ãã§ãŒãºïŒmulti-class æ¡åŒµ / headerless åæ€èš / Learner 飿ºïŒã®åå°ã«ãªã£ãã
Phase V7-5b: C5+C6 multi-class æ¡åŒµïŒ2025-12-12ïŒ
ç®ç
- C6-only v7 ã§ç¢ºä¿ãã ±0% è¿èŸºã®æ§èœãç¶æãããŸãŸãC5 ã v7 small/mid ã³ã¢ã«èŒã㊠C5 åž¯ã®æ§èœãåºäžãããã
宿œå 容
- SmallSegment_v7 / ColdIface_v7 / HotBox_v7 ã C5+C6 察å¿ã«æ¡åŒµã
SMALL_V7_CLASS_SUPPORTED()macro ã« C5 ã远å ãsmall_v7_block_size()ã C5/C6 ã®äž¡ã¯ã©ã¹ãæ±ã switch ã«æ¡åŒµã- HotBox åŽã® alloc/free ã® class validation ã C5+C6 äž¡æ¹ã«å¯Ÿå¿ã
- TLS æ§é 㯠C6 lane ãç¶æãã€ã€ãC5 ã«ã€ããŠã¯ v7 small ã³ã¢ã«ä¹ããã TLS bloat ãçºçããªã圢ã«çããïŒfast path ã® C6 ãå®ãïŒã
çµæïŒC6-heavy / C5+C6 v7 ãããã¡ã€ã«ïŒ
| Config | Avg Throughput | Delta |
|---|---|---|
| C6-only v7 | 7.64M ops/s | baseline |
| C5+C6 v7 | 7.97M ops/s | +4.3% |
æ¡çšåºæº:
- C6 æ§èœç¶æ: â ïŒC6-only v7 ãšæ¯ã¹ãŠå£åãªãïŒ
- C5 net positive: â ïŒ+4.3%ïŒ
- TLS bloat: â ãªãïŒC6 lane ã®ãã£ãã·ã¥ãããçã«æªåœ±é¿ãªãïŒ
â v7 small/mid ã³ã¢ã¯ C5+C6 2 ã¯ã©ã¹å¯Ÿå¿ã§ãç Žç¶»ãããC5 垯㧠+æ°% ã®æ¹åã確èªã§ãããæ¬¡ã¯ C4 ã v7 ã«èŒãããã©ãããæ éã«è©äŸ¡ããããLearner 飿ºã Mixed 16â1024B ã§ã® A/B ã«é²ããã§ãŒãºã
Phase V7-6 æ§æ³: Mixed A/B + Learner 飿ºèšèš
æ¹éïŒv7-5c ããå ã«ããããšïŒ
- C4 v7 æ¡åŒµã¯ãv4/v5 äžä»£ã§ TLS bloat ãçµéšããŠããããšããããC5+C6 v7 ã®ææã Mixed ã§ã¡ãããšæž¬ã£ãŠãã 倿ããã
- ãã®ããã«ã次㮠2 æ¬ããv7-6 ã®æ¬çããšããŠé²ããïŒ
- Mixed 16â1024B ã§ v7 OFFïŒMID v3 + ULTRAïŒ vs v7 C5+C6 ON ã® A/B ãåããv7 ã®å©çãæ¬ç· Mixed ãããã¡ã€ã«ã§ã©ã®ãããåºãŠããã確èªããã
- SmallPolicyV7 ãš Stats/Learner ã®ã€ã³ã¿ãã§ãŒã¹ãèšèšãããStats â Learner â Policy.route_kind[] æŽæ°ãã®ããŒã¿ãããŒã doc ã«èœãšãã
å ·äœã¿ã¹ã¯ïŒèšèšãã§ãŒãºïŒ
- Mixed A/B:
- ãããã¡ã€ã«:
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFEãããŒã¹ã«ãC5+C6 v7 ON/OFF ã® 2 ãã¿ãŒã³ã§ 16â1024B ãã³ããå®è¡ã - èšé²: ops/s, free/alloc çµè·¯å èš³ïŒULTRA / v7 / MID v3 / LEGACYïŒãC5/C6 ã® hit çã docs/analysis åŽã«è¿œèšã
- ãããã¡ã€ã«:
- Learner 飿ºèšèš:
- SmallPolicyV7 ã«å¯ŸããæŽæ° APIïŒäŸ:
small_policy_v7_update_from_learner(...)ïŒã®ã·ã°ããã£ãšè²¬åãæ±ºããã - SmallPageStatsV7ïŒãŸãã¯æ¢å Stats BoxïŒãã Learner ãèªãã¹ããã£ãŒã«ããšãã©ã®ã¿ã€ãã³ã°ã§ snapshot ãå·®ãæ¿ãããïŒL3 ã®ã¿ãL1/L0 㯠snapshot ãèªãã ãïŒã
SMALLOBJECT_V7_DESIGN.mdã«è¿œèšã
- SmallPolicyV7 ã«å¯ŸããæŽæ° APIïŒäŸ:
ãã®ãã§ãŒãºã¯ã³ãŒã倿Žãããèšèšãš A/B ã®æŽçãã¡ã€ã³ã§ãC4 v7 æ¡åŒµã Intrusive LIFO ãšãã£ã倧ããªæ§é 倿Žã¯ããã®çµæãèŠãŠãã倿ããã
Phase V7-6: Mixed A/B çµæãš C5 ã«ãŒãæ¹éïŒ2025-12-12ïŒ
Mixed / C5+C6 å°çšã§ã® A/B çµæ
| Workload | C6-only v7 | C5+C6 v7 | æšå¥šèšå® |
|---|---|---|---|
| C5+C6 å°çš (257â768B) | baseline | +4.3% | CLASSES=0x60 |
| Mixed 16â1024B | +0.5% | -8.0% | CLASSES=0x40 |
- å°çš C5+C6 ã¯ãŒã¯ããŒãïŒ257â768B ã ããåããã³ãïŒã§ã¯ãC5+C6 v7 ã§ +4.3% ã®æ¹åãåºãã
- äžæ¹ãMixed 16â1024B ã§ã¯ C5 ãŸã§ v7 ã«èŒãããšå šäœã§ -8% ã®ååž°ã«ãªããC6-only v7 ã®æ¹ãå®å šã
çµè«: C5 ã® route 㯠workload-dependent
- C5 ã¯ãC5-heavyããªã v7 ã®æ¹ãåŸã ãããMixedãã§ã¯ MID v3 + ULTRA ã®æ¹ãããŒã¿ã«ã§éãã
- åºå®ã® ENV ã ãã§ C5 ã® route ãæ±ºããã®ã¯æç¶å¯èœã§ã¯ãªããLearner ã§åçã«åãæ¿ããããããã«ããã®ãèšèšçã«çŽ çŽã
Learner 飿ºã®æ¹åæ§ïŒv7-7 以éã®çš®ïŒ
- ããŒã¿ãããŒ:
SmallPageStatsV7ïŒColdIface åšåº«ïŒ
â éçŽæ§éSmallLearnerStatsV7ïŒper-class alloc/free/remote çïŒ
â PolicyLearnerïŒC5 垯ã heavy ãã©ãããå€å®ïŒ
âSmallPolicyV7.route_kind[C5]ãSMALL_ROUTE_V7/SMALL_ROUTE_MID_V3ã®ã©ã¡ããã«æŽæ°ã
- æå°æ§æ:
- C5 ã ãã察象ã«ããC5 alloc æ¯çãäžå®éŸå€ïŒäŸ: 30%ïŒãè¶ ããã C5âv7ãããã§ãªããã° MID v3ãã«åãæ¿ããã·ã³ãã«ãªã«ãŒã«ããå§ããã
- L3 ã§ snapshot ãå·®ãæ¿ããL1/L0 㯠snapshot ãèªãã ãããšãã Box Theory ãå®ãã
ãã®ãã§ãŒãºã®çµæããC5 ã¯åºå®ã«ãŒãã§ã¯ãªããLearner ã§éžã³åããã¹ãã¯ã©ã¹ããšããäœçœ®ã¥ããã¯ã£ãããããæ¬¡ã®å€§ããªã¿ã¹ã¯ã¯ãv7-7 ã§ãã® Learner çµè·¯ãäžæ°é貫ã§éãèšèšã»å®è£ ãé²ãããã©ããã®å€æã«ãªãã
Phase V7-8: C5 Learner Mixed A/BïŒ2025-12-11ïŒ
ç®ç
- C5 Learner ä»ã v7 ããC5/C6 å°çšã¯ãŒã¯ããŒããš Mixed 16â1040B ã§ã©ãæ¯ãèããã A/B ã§ç¢ºèªãããC5 ãæ¬ç·ã§ v7 ã«èŒãããããLearner ãç ç©¶ç®±ã«çããããã倿ããã
ãã³ãçµæïŒèŠçŽïŒ
-
C5/C6 éäžãããã¡ã€ã«ïŒ200â500B 垯ïŒ
- v7 OFF: çŽ 19M ops/s
- v7+Learner: çŽ 43M ops/sïŒ+126%ïŒ
- Learner 㯠C5-heavy ãšå€å®ããC5 ã V7 route ã«ç¶æã
-
å šç¯å² MixedïŒ16â1040BïŒ
- v7 OFF: çŽ 27M ops/s
- v7+Learner: çŽ 25M ops/sïŒçŽ -7%ïŒ
- C5 æ¯ç â 28% < éŸå€ 30% ãšãªããLearner 㯠C5 route ã
V7 â MID_V3ã«åãæ¿ãã
ææ / ä»ã®å€æ
- C5/C6 å°çšã¯ãŒã¯ããŒãã§ã¯ v7+Learner ãéåžžã«åŒ·ãïŒ2.2ÃïŒã®ã§ãC5/C6 å°çšãã³ãã§ã¯ãC5+C6 v7 + Learner ONããç ç©¶çšããªã»ãããšããŠç¶æãã䟡å€ãããã
- äžæ¹ãå
šç¯å² Mixed ã§ã¯ route åãæ¿ãåŸã§ãçŽ -7% ã®ãŸãŸã§ãçŸæç¹ã§ã¯
MIXED_TINYV3_C7_SAFEæ¬ç·ãããã¡ã€ã«ã« v7+Learner ãåžžæ ON ããã®ã¯èŠéãã - åœé¢ã®æ¬ç·:
- Mixed: ULTRA + MID v3 åºæºïŒv7 㯠C6-only ãå«ã㊠OFFãC5 Learner ã OFFïŒã
- C5/C6 å°çšãããã¡ã€ã«:
HAKMEM_SMALL_HEAP_V7_ENABLED=1, HAKMEM_SMALL_HEAP_V7_CLASSES=0x60+ Learner ON ãç ç©¶ç®±ãšã㊠opt-inã
Phase V7-7: C5 Learner å®è£ ïŒåç route åãæ¿ãïŒ
å®è£ å 容
- SmallLearnerStatsV7 åãš API ã远å ïŒ
smallobject_policy_v7_box.hïŒ:SmallLearnerClassStatsV7: per-class ã®v7_allocs,v7_retires,sample_countçãSmallLearnerStatsV7: äžèšã® per-class é åã- API 矀:
record_refill(),record_retire(),evaluate(),stats_snapshot()ãªã©ãL3 ããåŒã¶ã€ã³ã¿ãã§ãŒã¹ã
- ColdIface_v7 ã« stats hook ã远å ïŒ
smallobject_cold_iface_v7.cïŒ:refill_page()ã§record_refill(class_idx, capacity)ãåŒã³ãv7 refill ã€ãã³ããéèšãretire_page()ã§record_retire(class_idx, capacity)ãåŒã³ãv7 retire ã€ãã³ããéèšã
- PolicyBox v7 ãã C5 route ãåçã«åãæ¿ãïŒ
smallobject_policy_v7.cïŒ:- C5 ã® v7 å©çšæ¯çïŒC5 v7 alloc / å šäœïŒã Learner ããåãåããéŸå€ã§å€å®ã
- éŸå€: C5 ratio < 30% ã®ãšã C5 ã
SMALL_ROUTE_MID_V3ãžåãæ¿ãããã以å€ã¯SMALL_ROUTE_V7ã®ãŸãŸã - è©äŸ¡éé: v7 refill 100 åããšã«
evaluate()ãå®è¡ã - route_kind æŽæ°ã¯ version ããŒã¹ã§è¡ããTLS åŽã®ãã£ãã·ã¥ãšæŽåãåãã
æå確èª
- C5+C6 å°çšã·ããªãªïŒ50/50 æ··åšïŒ:
- C5 ratio â 50% â C5 㯠v7 ç¶æïŒæåŸ ã©ããïŒã
- C6-heavy ã·ããªãªïŒC6 90%, C5 10% çšåºŠïŒ:
- C5 ratio â 12% â Learner ã C5 route ã
V7 â MID_V3ã«åãæ¿ãã - ãã°äŸ:
[LEARNER_V7] C5 route switch: V7 â MID_V3 (C5 ratio=12%, threshold=30%)
- C5 ratio â 12% â Learner ã C5 route ã
ãã®ãã§ãŒãºã§ããC5-heavy ãªã v7ãå°ãã C5 æ··å ¥ãªã MID v3ããšããåãæ¿ãããèªåã»äžæ°é貫ã®çµè·¯ãšããŠéé»ãããä»åŸã¯ãã® Learner ã Mixed 16â1024B æ¬ç·ãããã¡ã€ã«ã§ã©ããŸã§æŽ»ãããããA/B ãšãã¥ãŒãã³ã°ã§è©°ããŠãããã§ãŒãºã«å ¥ãã
Phase ULTRA ç·æ¬ïŒ2025-12-11ïŒ
Tiny/ULTRA å±€ã¯ã宿äžä»£ããšããŠåºå®å
æçµææ: Mixed 16â1024B = 43.9M ops/sïŒbaseline 30.6M â +43.5%ïŒ
çŸåšã®æ¬ç·æ§æ:
- C4âC7 ULTRAïŒå¯çå TLS cacheïŒã§ legacy 49% â 4.8% ã«åæž
- v3 backendïŒalloc_current_hit=100%, free_retire=0.1%ïŒã§å ç¢ã«
- Dispatcher/gate snapshot ã§ ENV/route ã hot path ããæé€
- C7 ULTRA refill ã division â bit shift ã§ +11%
èšèšçãªå®æåºŠ:
- Small objectïŒC2âC7ïŒ = ULTRA æé©åæžã¿ïŒfast path ã slow path ãïŒ
- v3 backend = ããžãã¯éšåã¯å®å šæé©åïŒæ®ã 5% 㯠header write/memcpy çã®å éšã³ã¹ãïŒ
- ç ç©¶ç®±ïŒv4/v5/v6ïŒã¯ OFF ã§æšæºãããã¡ã€ã«ã«åœ±é¿ãªã
ä»åŸã®å€§ãã倿Žã¯å¥ã©ã€ã³:
- Headerless/v6 ç³»: header out-of-band åã§ alloc æ¯ã® write åæžïŒ1-2%ïŒ
- mid/pool v3: C6-heavy ã 10M â 20â25M ã«æ¹åããæ°èšèš
- äžèšã¯ Tiny/ULTRA å±€ã«åœ±é¿ãäžããªãç¬ç«ã©ã€ã³ã§æ€èšäºå®
詳现: docs/analysis/PERF_EXEC_SUMMARY_ULTRA_PHASE_20251211.md åç
§
Phase MID-V3: Mid/Pool HotBox v3 宿 â æ¬ç·æ¡çšïŒ2025-12-12ïŒ
圹å²åæ ã®æç¢ºå
MID v3: 257-768B å°çšïŒC6 ã®ã¿äœ¿çšïŒ C7 ULTRA: 769-1024B å°çšïŒæ¢å ULTRA ãã¹ïŒ
ãã®åæ ã«ãããåå±€ãæé©åãããçµè·¯ãæã€ïŒ
Size Range | Allocator | Performance
---------------|---------------|------------------
0-256B | Tiny/ULTRA | Optimized (frozen)
257-768B | MID v3 | +19.8% (mixed)
769-1024B | C7 ULTRA | Optimized (frozen)
1025B-52KB | Pool | Existing path
52KB+ | Large mmap | Existing path
å®è£ å®äº â æ¬ç·ãããã¡ã€ã«æ¡çš
- â MID-V3-0ïœ5: åå®çŸ©ãRegionIdBox çµ±åãalloc/free å®è£
- â MID-V3-6: hakmem.c ã¡ã€ã³çµè·¯çµ±åïŒç®±åã¢ãžã¥ãŒã«åïŒ
- â Performance: C6 +11.1%, Mixed (257-768B) +19.8%
- â Role separation: C7 ã MID v3 ããé€å€ãULTRA ã«äžæ¬å
- â Mainline adoption: C6_HEAVY_LEGACY_POOLV1 ãš MIXED_TINYV3_C7_SAFE ãããã¡ã€ã«ã§ããã©ã«ã ON
ENV èšå®ïŒæ¬ç·ãããã¡ã€ã«ã§ããã©ã«ã ONïŒ
# Profile çµç±ã§èªåæå¹å:
HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1
# ãŸãã¯
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE
# æç€ºçã«æå®ããå Žå:
HAKMEM_MID_V3_ENABLED=1 # Master switch (profiles ã§ããã©ã«ã ON)
HAKMEM_MID_V3_CLASSES=0x40 # C6 only (profiles ã§ããã©ã«ãèšå®)
HAKMEM_MID_V3_DEBUG=1 # Debug logging (opt-in)
èšèš doc: docs/analysis/MID_POOL_V3_DESIGN.md
Profile doc: docs/analysis/ENV_PROFILE_PRESETS.md
Phase V6-HDR-0: C6-only headerless core èšèšç¢ºå®ïŒfrozenïŒ
ç®ç
Tiny/ULTRA 宿ãåããC6-only ã§ headerless èšèš ãå®èšŒããæå°ã³ã¢ïŒv6ïŒãæ§ç¯ããã C7 ULTRA ã¯æ¢ã«å®æã»åçµãããŠãããv6 㯠C6 å°çšã®ç ç©¶ã©ã€ã³ãšããŠç¬ç«ãããã
4å±€ Box TheoryïŒèšèšååïŒ
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â L0: ULTRA lanes (TinyC7UltraBox ç) â
â - C7 ULTRA 㯠frozen / v6 ãšã¯ç¬ç« â
â - v6 ULTRAïŒå°æ¥ïŒã¯ C6-only ã§å¥éèšèš â
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ€
â L1: TLS Box (SmallTlsLaneV6 / SmallHeapCtxV6) â
â - per-class TLS freelist + current page ptr â
â - 責å: fast alloc/freeïŒheader æžã蟌ã¿ãªãïŒ â
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ€
â L2: Segment / ColdIface (SmallSegmentV6 / ColdIfaceV6) â
â - page_meta[], segment base/end 管ç â
â - refill / retire ã® page lifecycle 管ç â
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ€
â L3: Policy / RegionIdBox / Stats â
â - RegionIdBox: ptrâ(region_kind, region_id, page_meta) â
â - PageStatsV6: page lifetime summaryïŒå æçãretire é »åºŠïŒâ
â - Policy: GC / compaction 決å®ïŒå°æ¥ïŒ â
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
èšèšãã€ã³ã
-
C7 ULTRA ã¯ç¬ç« frozen ç®±
- TinyC7UltraBox / C7UltraSegmentBox ã¯ãã®ãŸãŸç¶æ
- v6 㯠C7 ã«è§ŠããªãïŒC6-onlyïŒ
-
v6 㯠C6-only small coreïŒheaderless ç ç©¶ïŒ
- alloc æã« header byte ãæžããªãïŒout-of-band metadataïŒ
- free æã¯ RegionIdBox ã§ ptr åé¡ â page_meta ãžçŽæ¥ã¢ã¯ã»ã¹
-
ptr åé¡ã¯ RegionIdBox ã«éçŽ
- åŸæ¥: classify_ptr / hak_super_lookup / ss_fast_lookup ãªã©åæ£
- v6:
region_id_lookup_v6(ptr)ã§ (region_kind, region_id, page_meta*) ãè¿ã - region_kind: SMALL_V6 / POOL / LARGE / UNKNOWN
-
Stats/Learning 㯠page lifetime summary ã®ã¿ã L3 ã«æž¡ã
- L1/L2 ã§åå¥ block ã® stats ã¯åããªã
- page retire æã« summary (total_allocs, avg_lifetime_ns) ã L3 ãž push
å®è£ ã¿ã¹ã¯ïŒPhase V6-HDR-0: å®äºïŒ
| No | ã¿ã¹ã¯ | ç¶æ |
|---|---|---|
| 1-1 | CURRENT_TASK.md æŽçïŒæ¬ã»ã¯ã·ã§ã³è¿œå ïŒ | â |
| 1-2 | SMALLOBJECT_CORE_V6_DESIGN.md æ°èŠäœæ | â |
| 1-3 | REGIONID_V6_DESIGN.md æ°èŠäœæ | â |
| 2-1 | SmallTlsLaneV6 / SmallHeapCtxV6 åã¹ã±ã«ãã³ | â |
| 2-2 | v6 TLS API (small_v6_tls_alloc/free) | â |
| 3-1 | RegionIdBox åãš lookup API ã¹ã±ã«ãã³ | â |
| 3-2 | OBSERVE ã¢ãŒãïŒv6 free å ¥å£ã«ãã°ïŒ | â |
| 4-1 | PageStatsV6 ç®±ïŒæªæ¥ç¶ïŒ | â |
| 5-1 | AGENTS.md ã« v6 ç ç©¶ç®±ã«ãŒã«è¿œèš | â |
| 5-2 | ãµããã£ãã³ãïŒMixed / C6-heavyïŒ | â |
ENV
HAKMEM_SMALL_CORE_V6_ENABLED=0(default OFF)HAKMEM_REGION_ID_V6_OBSERVE=0(default OFF, ãã°åºåçš)HAKMEM_PAGE_STATS_V6_ENABLED=0(default OFF)
Phase V6-HDR-2: C6 headerless free å®éš ONïŒé²è¡äžïŒ
ç®ç
V6-HDR-0/1 ã§äœæãã箱㚠RegionId ã®é ç·ã䜿ããC6 ã ã headerless free ãå®éã«éé»ããã æå㯠C6-heavy å°çšãããã¡ã€ã«ã§ã ãå€ããåæã
å®è£ ã¿ã¹ã¯
| No | ã¿ã¹ã¯ | ç¶æ |
|---|---|---|
| 1 | smallobject_v6_env_box.h äœæïŒENV ã²ãŒã管çïŒ | â |
| 2 | front ãã v6 free ã«ãŒãæ¥ç¶ | â |
| 3 | small_v6_headerless_free æ¬å®è£ | â |
| 4 | front ãã v6 alloc ã«ãŒãæ¥ç¶ | â |
| 5 | small_v6_headerless_alloc æ¬å®è£ | â |
| 6 | header æžã蟌㿠1åã ãåç¢ºèª | â |
| 7 | ããã¥ã¡ã³ãæŽæ° | â |
| 8 | C6-heavy ãã³ããã¹ã | â |
å®è£ 詳现
-
smallobject_v6_env_box.h (æ°èŠ)
small_heap_v6_headerless_enabled(): ENVHAKMEM_SMALL_HEAP_V6_HEADERLESSsmall_v6_region_observe_enabled(): ENVHAKMEM_SMALL_V6_REGION_OBSERVEsmall_v6_headerless_route_enabled(class_idx): çµåã²ãŒã
-
smallobject_core_v6.c (倿Ž)
small_v6_headerless_free(): RegionIdBox lookup â class_idx ååŸ â TLS pushsmall_v6_headerless_alloc(): TLS pop (header æžã蟌ã¿ãªã) + refill
-
malloc_tiny_fast.h (倿Ž)
TINY_ROUTE_SMALL_HEAP_V6case ã§ headerless free/alloc ãåŒã³åºã- ENV ã²ãŒãä»ãïŒããã©ã«ã OFFïŒ
Header æžã蟌ã¿ããªã·ãŒ
- refill æã®ã¿: carve/refill ã§ page ãã TLS ã«ãããã¯ãç§»åããéã« header æžã蟌ã¿
- alloc/free ã§ã¯æžããªã: v6 headerless route ã§ã¯ header ã«äžåè§Šããªã
- front ã¯åŸæ¥éã: class_idx hint 㯠header byte ããååŸïŒfront åŽã®èªã¿åãã¯ç¶æïŒ
ENV 倿°
| ENV | Default | Description |
|---|---|---|
HAKMEM_SMALL_HEAP_V6_ENABLED |
0 | v6 route æå¹å |
HAKMEM_SMALL_HEAP_V6_CLASSES |
0x40 | v6 察象ã¯ã©ã¹ãã¹ã¯ (0x40=C6) |
HAKMEM_SMALL_HEAP_V6_HEADERLESS |
0 | headerless mode æå¹å |
HAKMEM_SMALL_V6_REGION_OBSERVE |
0 | class_idx æ€èšŒãã° |
ãã³ãããŒã¯çµæ
# Baseline (v6 OFF)
C6-heavy 257-768B: 26.8M ops/s
# v6 headerless ON
C6-heavy 257-768B: 26.7M ops/s
# v6 headerless ON + OBSERVE
C6-heavy 257-768B: 26.1M ops/s (MISMATCH ãªã)
Phase V6-HDR ç·æ¬: C6-only Headerless ã³ã¢èšèšç¢ºå®ïŒå®äºïŒ
å®è£ å®äºïŒV6-HDR-0ïœ4ïŒ
| Phase | ã¿ã¹ã¯ | ææ |
|---|---|---|
| HDR-0 | åã¹ã±ã«ãã³ + OBSERVE | RegionIdBox, SmallSegmentV6 åºæ¬å®è£ |
| HDR-1 | RegionIdBox å®é ç· | ptrâ(kind, page_meta) åé¡åäœç¢ºèª |
| HDR-2 | v6 free/alloc ã«ãŒãæ¥ç¶ | Headerless free/alloc path æå¹å |
| HDR-3 | SmallSegmentV6 TLS ç»é² | TLS-scope segment registration å®è£ |
| HDR-4 | æ§èœæé©å (P0+P1) | Double validation æé€ + page_meta TLS cache |
æ§èœæšç§»ïŒC6-heavy 257-768BïŒ
V6-HDR-2: Region lookup overhead â -3.5% ïœ -8.3% ååž°
V6-HDR-3: Segment registration â lookup ã SMALL_V6 ãè¿ãããã«
V6-HDR-4: P0 (double validationæé€) + P1 (page_meta cache)
â +2.7% ïœ +12% æ¹å (äžéšrun)
宿ž¬å€ïŒè€æ°runå¹³åïŒ:
- Baseline (v6 OFF): 9.1M ops/s
- V6 HDR-4 (æé©ååŸ): ~9.0M ops/s (±0% çžåœ)
èšèšææ
- RegionIdBox ãèãä¿ããã - ptr åé¡ã®ã¿ãã¡ã¿ããŒã¿èšç®ã¯ TLS åŽã«å¯ãã
- Same-page TLS cache - åäžããŒãžå ã®ã¢ã¯ã»ã¹ã§ page_meta lookup å®å šã¹ããã
- Headerless ãå®è£ å¯èœ - ±æ°% ã§ baseline çžåœã®æ§èœãéæ
- è€æ°ã¯ã©ã¹å¯Ÿå¿ - C4/C5/C6 mixed ã§ãå®å®åäœïŒç ç©¶ç®±ïŒ
çŸåšã®ç¶æ
- ç ç©¶ç®±ãšããŠåçµ: C6-only headerless v6 㯠ENV opt-in ã®ç ç©¶ç®±ïŒããã©ã«ã OFFïŒ
- æ¬ç·ã¯ unchanged: C7 ULTRA + v3 backend ãåŒãç¶ãåºæº
- ä»åŸ: mid/pool v3 ã«ãã C6-heavy æ¹åã«æ³šåãv6 ã¯åèèšèšãšããŠä¿æ
æçµãã³ãããŒã¯ïŒ2025-12-12ïŒ
# C6-heavy (257-768B)
Run 1: Baseline 9.48M â V6 8.56M (-9.7%)
Run 2: Baseline 8.50M â V6 9.21M (+8.3%) â å®å®å€ã€ã¡ãŒãž
Run 3: Baseline 6.74M â V6 9.16M (+35.8%, baseline äžèª¿)
Average: V6 ãš Baseline ã»ãŒçžåœïŒÂ±æ°%ïŒ
# Mixed (16-1024B, v6 OFF)
Run 1: 9.14M ops/s
Run 2: 9.11M ops/s
Run 3: 7.09M ops/s
Average: ~8.4M ops/s ïŒæ¬ç·åºæºïŒ
ENV 倿°ïŒç ç©¶çšïŒ
# C6-only headerless v6ïŒç ç©¶ç®±ïŒ
HAKMEM_SMALL_HEAP_V6_ENABLED=1
HAKMEM_SMALL_HEAP_V6_CLASSES=0x40 # C6 ã®ã¿
HAKMEM_SMALL_HEAP_V6_HEADERLESS=1
HAKMEM_SMALL_V6_REGION_OBSERVE=0 # ãããã°çš
HAKMEM_REGION_ID_V6_OBSERVE=0 # ãããã°çš
åçµå®£èš
- v6 ã¯ç ç©¶ç®±ãšããŠåçµïŒããã©ã«ã OFFãENV opt-inïŒ
- æ§èœ: ±æ°% ã§ baseline çžåœ = headerless design å®çŸå¯èœãå®èšŒããããããåºæ¬çãªèšèšç®æšã¯éæ
- ä»åŸ: mid/pool v3 ã«ãã C6-heavy æ¬æ Œæ¹åã«æ³šå
- åèèšèš: RegionIdBox (åé¡ã®ã¿) + TLS-scope cache ã¯ãã«ã region å¯Ÿå¿æã®åèã«
Phase V7-0: SmallObjectHeap v7 / HAKMEM v3 ã³ã¢èšèšã¹ã±ã«ãã³ïŒæ°èŠ, èšèšã®ã¿ïŒ
ç®ç
ULTRA + MID v3 + V6 C6-only äžä»£ãã第1ç« å®æããšããŠç· ããããã§ã smallãmid ãäžäœã§æ±ãæ°ã³ã¢ SmallObjectHotBox_v7ïŒ= HAKMEM v3 small/mid ã³ã¢ïŒ ã®èšèšã ãå ã«åºããã ãã®ãã§ãŒãºã§ã¯ åãšããã¥ã¡ã³ãã®ã¿ ã远å ããæåã¯äžå倿Žããªãã
ãã£ãããšïŒèšèšã¬ãã«ïŒ
- æ°èŠããã¥ã¡ã³ã
docs/analysis/SMALLOBJECT_V7_DESIGN.mdã远å :- L0: ULTRA (C4âC7, FROZEN)
- L1: SmallObjectHotBox_v7 (small/mid ã³ã¢)
- L2: SegmentBox_v7 / ColdIface_v7
- L3: PolicyBox_v7 / RegionIdBox / PageStatsBox ã® 4 å±€æ§é ãææåã
SmallPageMeta_v7/SmallClassHeap_v7/SmallHeapCtx_v7/SmallSegment_v7ã® struct ã²ãªåœ¢ãå®çŸ©ïŒHot/cold ãã£ãŒã«ãåé¢ïŒã- RegionIdBox v7 ã® APIïŒ
RegionLookupResult_v7/region_id_lookup_v7()ïŒãš header ã®æ±ãïŒèãæ®ãã fast path ã§ã¯æ¥µåè§ŠããªãïŒãæŽçã - small v7 / mid v7 / pool v3 ã®é¢ä¿ïŒå ±éã® RegionId/Segment/PageStats ã®äžã« parallel 㪠HotBox ã眮ãïŒãèšèŒã
- Phase v7-0/1/2 ã®ãã§ãŒãºåå²ïŒå远å âC6-only stubâC6-only æ¬å®è£ ïŒããŸãšããã
ãããŸã§ã®åæã»ã«ãŒã«
- ULTRA äžä»£ïŒC4âC7 ULTRA / Tiny front v3ïŒã¯ FROZENïŒæ¬ç·ïŒãšããŠç¶æããã
- MID v3 㯠257â768B å°ä»»ã®æ¬ç·ç®±ãšããŠç¶æããã
- V6 C6-only headerless ã¯ç ç©¶ç®±ãšããŠåçµïŒv7 ã®ç©çå±€èšèšã®åèïŒã
- v7 㯠å¥ç« ïŒHAKMEM v3 äžä»£ïŒ ãšããŠèšèšããENV çµç±ã§ opt-in ãããŸã§ front/gate ããäžååŒã°ãªãã
次ãã§ãŒãºåè£ïŒå®è£ ã¯å¥ AI åãïŒ
- Phase v7-1: C6-only v7 stub
- route kind ã«
TINY_ROUTE_SMALL_HEAP_V7ã远å ããC6 ã¯ã©ã¹ã ã v7 route ãè¿ããããã¡ã€ã«ã远å ã small_heap_alloc_fast_v7_stub/small_heap_free_fast_v7_stubãå®è£ ããåœé¢ã¯ãã¹ãŠ MID v3 / V6 / pool v1 ã«å³ãã©ãŒã«ããã¯ã- RegionIdBox_v7 㯠OBSERVE ã¢ãŒãã§
region_id_lookup_v7(ptr)ãåŒã³ãREGION_SMALL_V7ã®çµ±èšã ãåãïŒæåäžå€ïŒã
- route kind ã«
- Phase v7-2: C6-only v7 æ¬å®è£
ïŒsmall垯ã ãïŒ
- SegmentBox_v7 / ColdIface_v7 ãå®è£ ããC6 pages ã® refill/retire ã Segment v7 çµç±ã«ããã
small_heap_alloc_fast_v7/small_heap_free_fast_v7ãå®è£ ããC6-only small åž¯ãæ¬åœã« v7 TLS + Segment ã§åãã- C6-heavy / Mixed ã§ v7 vs MID v3 vs V6 vs v2 æ¬ç·ã A/B ããSmallObjectHotBox_v7 ã®äŸ¡å€ãè©äŸ¡ã
Phase V6-HDR-1: RegionIdBox å®é ç·ã»OBSERVEïŒå®äºïŒ
ç®ç
V6-HDR-0 ã§äœæãã RegionIdBox ã C6 segment ã«å®é
ç·ããOBSERVE ã¢ãŒãã§
ptr â (kind, page_meta.class_idx) ã®æ£åœæ§ãæ€èšŒãããæå倿Žãªãã
å®è£ ã¿ã¹ã¯
| No | ã¿ã¹ã¯ | ç¶æ |
|---|---|---|
| 1 | RegionIdBox å®è£ ãåããïŒC6 segment lookupïŒ | â |
| 2 | v6 free ã® REGION_OBSERVE ããžãã¯å ·äœå | â |
| 3 | front åŽ class_idx ãã³ãé ç·ç¢ºèª | â |
| 4 | ããã¥ã¡ã³ãæŽæ°ïŒCURRENT_TASK.md, REGIONID_V6_DESIGN.mdïŒ | â |
| 5 | ãã¹ãïŒOBSERVE ON ã§ãã³ãå®è¡ïŒ | â |
å®è£ 詳现
-
RegionIdBox å®è£ (
core/region_id_v6.c)region_id_lookup_v6(ptr):small_page_meta_v6_of(ptr)ã䜿çšã㊠C6 segment å€å®- TLS cache ä»ãé«éç:
region_id_lookup_cached_v6(ptr) - OBSERVE ãã°:
HAKMEM_REGION_ID_V6_OBSERVE=1ã§æå¹
-
REGION_OBSERVE ããžã㯠(
core/smallobject_core_v6.c)- ENV:
HAKMEM_SMALL_V6_REGION_OBSERVE=1ïŒæ°èšãV6-HDR-1 å°çšïŒ - free å
¥å£ã§
region_id_lookup_v6(ptr)ãåŒã³ãclass_idx æ€èšŒ - äžäžèŽæ:
[V6_REGION_OBSERVE] MISMATCH ptr=... hint=X actual=Y
- ENV:
-
front é ç·ç¢ºèªçµæ
class_idxhint ã¯free_tiny_fast()å ã§ããããã€ã(header & 0x0F)ããååŸ- v6 route (
TINY_ROUTE_SMALL_HEAP_V6) ã¯çŸåšæªæ¥ç¶ïŒbreak ã§ã¹ãããïŒ - ããã¯ãOBSERVE only, æå倿Žãªããã®ä»æ§éã
ENV
HAKMEM_SMALL_V6_REGION_OBSERVE=1: free path ã§ class_idx æ€èšŒãã°ãåºå
1. ããŒã¹ã©ã€ã³ïŒ1 thread, ws=400, iters=1M, seed=1ïŒ
-
Mixed 16â1024BïŒæ¬ç·ïŒ
- ã³ãã³ã:
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 1000000 400 1 - 䞻㪠ENVïŒbench_profile çµç±ïŒ:
HAKMEM_TINY_HEAP_PROFILE=C7_SAFEHAKMEM_TINY_C7_HOT=1HAKMEM_SMALL_HEAP_V3_ENABLED=1/HAKMEM_SMALL_HEAP_V3_CLASSES=0x80ïŒC7-only v3ïŒHAKMEM_TINY_C7_ULTRA_ENABLED=1ïŒUF-3 ã»ã°ã¡ã³ãç, 2MiB/64KiBïŒHAKMEM_TINY_FRONT_V3_ENABLED=1/HAKMEM_TINY_FRONT_V3_LUT_ENABLED=1HAKMEM_POOL_V2_ENABLED=0
- ThroughputïŒçŸ HEAD, ReleaseïŒ: çŽ 44â45M ops/s
- ç«¶å:
- mimalloc: ~110â120M ops/s
- system: ~90M ops/s
- ã³ãã³ã:
-
C7-only (1024B åºå®, C7 v3 + ULTRA)
- C7 ULTRA OFF: ~38M ops/s
- C7 ULTRA ON: ~57M ops/sïŒçŽ +50%以äžïŒ
- C7 åãèšèšïŒULTRA ã»ã°ã¡ã³ã + TLS freelist + mask freeïŒã¯æåãã¿ãŒã³ãšã¿ãªããä»åŸã® small-object v4/mid ã«å±éäºå®ã
-
C6-heavy mid/smallmid (257â768B, C6 㯠mid/pool çµè·¯)
- ã³ãã³ã:
HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 ./bench_mid_large_mt_hakmem 1 1000000 400 1 - çŸç¶ Throughput: çŽ 10M ops/s
- éå» Phase82 ã§ã¯ LEGACY + flatten ã§ 23â27M ops/s ãèšé²ããŠãããçŸè¡ HEAD ã§ã¯ lookup å±€ïŒhak_super_lookup/mid_desc_lookup çïŒãããã«ããã¯åããŠããç¶æ ã
- ã³ãã³ã:
2. ããŸæ¬ç·ã§æå¹ãªç®±
-
C7 v3 + C7 ULTRA (UF-3 ã»ã°ã¡ã³ãç)
- Hot: TinyC7UltraBoxïŒTLS freelist + 2MiB Segment / 64KiB Page, mask å€å®ïŒã
- Cold: C7UltraSegmentBoxïŒpage_meta[] ã§ page/class/used/capacity ã管çïŒã
- ç¹åŸŽ:
- C7-only ã§ ~38Mâ~57M ops/sãMixed ã§ã 35Mâ44â45M ops/s ãŸã§åºäžãã
- C7 ULTRA 管çå€ã® ptr ã¯å¿ ã C7 v3 free ã«ãã©ãŒã«ããã¯ïŒãããä»ã Fail-Fast çµè·¯ãç¶æïŒã
- ENV:
HAKMEM_TINY_C7_ULTRA_ENABLED=1ïŒããã©ã«ã ONïŒHAKMEM_TINY_C7_ULTRA_HEADER_LIGHTã¯ç ç©¶ç®±ïŒããã©ã«ã 0ïŒã
-
SmallObject v3ïŒC7-only æ¬ç·ïŒ
- C7 ããŒãžåäœã® freelist + current/partial 管çãColdIface 㯠Tiny v1 çµç±ã§ Superslab/Warm/Stats ãè§Šãã
- C7 ULTRA ON æã¯ãã»ã°ã¡ã³ãå ptr ã ã ULTRA ãå ã«é£ããæ®ã㯠v3 freeããåºæ¬æ§é ã
-
mid/pool v1ïŒC6 ã¯äžæŠããã«åºå®, Phase C6-FREEZEïŒ
- C6 㯠Tiny/SmallObject/ULTRA ã§ç¹å¥æ±ãããªãã
- C6 å°çš smallheap v3/v4/ULTRAã»pool flatten ã¯ãã¹ãŠ ENV opt-in ã®ç ç©¶ç®±æ±ãã
- çŸç¶ C6-heavy 㯠~10M ops/sãåèšèšã¿ãŒã²ããã
3. small-object v4 / mid åãã®çŸç¶ãšæ¹é
-
SmallObjectHotBox_v4 ã®ç®±æ§é ïŒèšèšæžã¿, éšåå®è£ ïŒ
SmallPageMeta:free_list/used/capacity/class_idx/flags/page_idx/segmentãSmallClassHeap:current/partial_head/full_headãSmallHeapCtx: per-thread ã§SmallClassHeap cls[NUM_SMALL_CLASSES]ãæã€ãSmallSegment(v4): 2MiB Segment / 64KiB Page ãåæã«page_meta[]ãæã€ã- ColdIface_v4:
small_cold_v4_refill_page/small_cold_v4_retire_page/small_cold_v4_remote_push/drainã® 1 ç®±ã
-
C6-only v4 å®è£ ïŒPhase v4-mid-2, ç ç©¶ç®±ïŒ
- C6 ã® alloc/free ã SmallHeapCtx v4 çµç±ã§åŠçããSegment v4 ãã refill/retire ããçµè·¯ãå®è£ æžã¿ã
- C6-heavy A/BïŒC6 v1 vs v4ïŒ:
- v4 OFF: ~9.4M ops/s
- v4 ON : ~10.1M ops/sïŒçŽ +8ã9%ïŒ
- Mixed ã§ C6-only v4 ã ON ã«ãããš +1% çšåºŠïŒã»ãŒèª€å·®å ïŒã§ååž°ãªãã
- ããã©ã«ãã§ã¯
HAKMEM_SMALL_HEAP_V4_ENABLED=0/CLASSES=0x0ã®ããæšæºãããã¡ã€ã«ã«ã¯åœ±é¿ããªãã
-
mid/smallmid ã®ä»åŸã®çã
- çŸç¶ïŒC6-heavy ~10M ops/sãlookup ç³»ïŒhak_super_lookup / mid_desc_lookup / classify_ptr / ss_map_lookupïŒã ~40% ãå ããã
- æ¹åæ§ïŒ
- C7 ULTRA ã§æåãããã¿ãŒã³ïŒSegment + Page + TLS freelist + mask freeïŒã small-object v4 ã«åºããŠãptrâpageâclass ã O(1) ã«ããã
- mid_desc_lookup / hak_super_lookup ãªã©ã® lookup å±€ã small-object v4 route ããå€ãã
- C6/C5 ã¯ãhot mid ã¯ã©ã¹ããšããŠæ®µéçã« v4 ã«èŒãããã®ä»ã® mid/smallmid 㯠SmallHeap v4 or pool v1 ã§æ±ãã
4. ä»åŸã®ãã§ãŒãºïŒTODO æŠèŠïŒ
-
Phase v4-mid-3ïŒC5-only v4 ç ç©¶ç®±ïŒ â å®äº
- ENV:
HAKMEM_SMALL_HEAP_V4_ENABLED=1/HAKMEM_SMALL_HEAP_V4_CLASSES=0x20ã§ C5 ã SmallHeap v4 route ã«èŒããã - A/B çµæ:
- C5-heavy (129â256B): v4 OFF 54.4M â v4 ON 48.7M ops/s (â10ã11%ååž°)ãæ¢å Tiny/front v3 çµè·¯ãéãã
- Mixed 16â1024B (C6+C5 v4): C6-only 28.3M â C5+C6 28.9M ops/s (+2%, 誀差ã埮æ¹å)ãååž°ãªãã
- æ¹é: C5-heavy ã§ã¯ v4 ãå£åŸãããããC5 v4 ã¯ç ç©¶ç®±ã®ãŸãŸæšæºãããã¡ã€ã«ã«ã¯å ¥ããªããMixed ã§ã¯åœ±é¿å°ãããã C5+C6 v4 (0x60) ãç ç©¶ç®±ãšããŠå©çšå¯èœã
- ENV:
-
Phase v4-mid-4/5/6ïŒC6/C5 v4 ã®èšºæãšäžæåçµïŒ â å®äº
- C5 v4:
- C5-heavy (129â256B): v4 OFF 54.4M â v4 ON 48.7M ops/sïŒâ10ã11% ååž°ïŒãæ¢å Tiny/front v3 çµè·¯ãéãã
- Mixed 16â1024B ã§ã¯ C5+C6 v4 ON ã§ +2ã3% çšåºŠã®åŸ®æ¹åã ããæ¬ç·ãšããŠæ¡çšããã»ã©ã®ã¡ãªããã¯ç¡ãã
- C6 v4:
- æ£ãã C6-only ãã³ãïŒMIN=256 MAX=510ïŒã§ v4 OFF ~58â67M â v4 ON ~48â50M ops/sïŒâ15ã28% ååž°ïŒã
- stats ãã C6 alloc/free ã® 100% ã v4 çµè·¯ãéã£ãŠããããšã確èªã§ããroute/fallback ã§ã¯ãªã v4 å®è£ ãã®ãã®ãéãããšã倿ã
- ws/iters ãå¢ãããš TinyHeap ãšããŒãžå ±æããèšèšèµ·å ã®ã¯ã©ãã·ã¥ãæ®åããŠãããC6 v4 ãçŸè¡èšèšã®ãŸãŸæ¬ç·ã«èŒããã®ã¯é£ããã
- TLS fastlist:
- C6 çš TLS fastlist ã远å ããããv4 ON æã® C6-heavy throughput ã¯ã»ãŒå€ãããïŒ48ã49M ops/sïŒãæ ¹æ¬çãªååž°ïŒv4ã®ããŒãžç®¡ç/æ§é ïŒãæã¡æ¶ãã«ã¯è³ã£ãŠããªãã
- æ¹é:
- SmallObject v4ïŒC5/C6 åãïŒã¯åœé¢ ç ç©¶ç®±ã®ãŸãŸåçµããæ¬ç·ã® mid/smallmid æ¹åã¯å¥èšèšïŒsmall-object v5 / mid-ULTRA / pool åèšèšïŒãšããŠæ€èšããã
- Mixed/C7 åŽã¯åŒãç¶ããC7 v3 + C7 ULTRAããåºæºã« A/B ãè¡ããmid/pool åŽã¯çŸè¡ v1 ãåºæºã©ã€ã³ãšããŠæ®ã眮ãã
- C5 v4:
-
Phase v5-2/3ïŒC6-only v5 éé» & èååïŒ â å®äºïŒç ç©¶ç®±ïŒ
- Phase v5-2: C6-only small-object v5 ã Segment+Page ããŒã¹ã§æ¬å®è£ ãTiny/Pool ããå®å šã«åãé¢ãã2MiB Segment / 64KiB Page äžã§ C6 ããŒãžã管çãåå㯠~14â20M ops/s çšåºŠã§ v1 ããå€§å¹ ã«é ãã£ãã
- Phase v5-3: C6 v5 ã® HotPath ãèååïŒåäž TLS ã»ã°ã¡ã³ã + O(1)
page_meta_of+ ããããããã«ãã free page æ€çŽ¢ïŒãC6-heavy 1M/400 ã§ v5 OFF ~44.9M â v5 ON ~38.5M ops/sïŒ+162% vs v5-2, baseline æ¯çŽ -14%ïŒãMixed ã§ã 36â39M ops/s ã§ SEGV ç¡ãã - æ¹é: v5 㯠v4 ããæ§é çã«ã¯è¯ãããC6-only ã§ããŸã v1 ãäžåããããåœé¢ã¯ç ç©¶ç®±ã®ãŸãŸç¶æãæ¬ç· mid/smallmid ã¯åŒãç¶ã pool v1 åºæºã§èŠã€ã€ãv5 èšèšã C7 ULTRA ãã¿ãŒã³ã«è¿ã¥ããæ¹åã§æ€èšãç¶ç¶ããã
-
Phase v4-mid-SEGVïŒC6 v4 ã® SEGV ä¿®æ£ã»ç ç©¶ç®±å®å®åïŒ â å®äº
- åé¡: C6 v4 ã TinyHeap ã®ããŒãžãå ±æ â iters >= 800k ã§ freelist ç Žå£ â SEGV
- ä¿®æ£: C6 å°çš refill/retire ã SmallSegment v4 ã«åãæ¿ããTinyHeap äŸåãå®å šæé€
- çµæ:
- iters=1M, ws <= 390: SEGV æ¶å€± â
- C6-only (MIN=257 MAX=768): v4 OFF ~47M â v4 ON ~43M ops/sïŒâ8.5% ååž°ã®ã¿ãå®å®ïŒ
- Mixed 16â1024B: v4 ON ã§ SEGV ãªãïŒå°å¹ å垰蚱容ïŒ
- æ¹é: C6 v4 ã¯ç ç©¶ç®±ãšããŠå®å®åå®äºãæ¬ç·ã«ã¯èŒããªãïŒæ¢å mid/pool v1 ã䜿çšïŒã
-
Phase v5-0ïŒSmallObject v5 refactor: ENVçµ±äžã»ãã¯ãåã»æ§é äœæé©åïŒ â å®äº
- å 容: v5 åºç€ã®æ¹åã»æé©åïŒæåã¯å®å šäžå€ïŒ
- æ¹åé
ç®:
- ENV initialization ã sentinel ãã¿ãŒã³ã§çµ±äžïŒENV_UNINIT/ENABLED/DISABLED +
__builtin_expectïŒ - ãã€ã³ã¿ãã¯ãå:
BASE_FROM_PTR,PAGE_IDX,PAGE_META,VALIDATE_MAGIC,VALIDATE_PTR - SmallClassHeapV5 ã«
partial_count远å - SmallPageMetaV5 ã® field åé 眮ïŒhot fields å é éçŽ â L1 cache æé©å, 24BïŒ
- route priority ENV 远å :
HAKMEM_ROUTE_PRIORITY={v4|v5|auto} - segment_size override ENV 远å :
HAKMEM_SMALL_HEAP_V5_SEGMENT_SIZE
- ENV initialization ã sentinel ãã¿ãŒã³ã§çµ±äžïŒENV_UNINIT/ENABLED/DISABLED +
- æå: å®å šäžå€ïŒv5 route ã¯åŒã°ããªããENV ããã©ã«ã OFFïŒ
- ãã¹ã: Mixed 16â1024B ã§ 43.0â43.8M ops/sïŒå€åãªãïŒãSEGV/assert ãªã
- ç®æš: v5-1 ã§ C6-only stub â v5-2 ã§æ¬å®è£ â v5-3 ã§ Mixed ã«æ®µéææ Œ
-
Phase v5-1ïŒSmallObject v5 C6-only route stub æ¥ç¶ïŒ â å®äº
- å 容: C6 ã v5 route ã«æ¥ç¶ïŒäžèº«ã¯ v1/pool fallbackïŒ
- å®è£
:
tiny_route_env_box.h: C6 ã§HAKMEM_SMALL_HEAP_V5_ENABLED=1ãªãTINY_ROUTE_SMALL_HEAP_V5ã«åå²malloc_tiny_fast.h: alloc/free switch ã« v5 case 远å ïŒfallthrough ã§ v1/pool ã«èœã¡ãïŒsmallobject_hotbox_v5.c: stub å®è£ ïŒalloc 㯠NULL è¿åŽãfree 㯠no-opïŒ
- ENV:
HAKMEM_SMALL_HEAP_V5_ENABLED=1/HAKMEM_SMALL_HEAP_V5_CLASSES=0x40ã§ opt-in - ãã¹ãçµæ:
- C6-heavy: v5 OFF ~15.5M â v5 ON ~16.4M ops/sïŒå€åãªã, æ£åžžïŒ
- Mixed: 47.2M ops/sïŒå€åãªãïŒ
- SEGV/assert ãªã â
- æ¹é: v5-1 ã§ã¯æå㯠v1/pool fallback ãšåããç ç©¶ç®±ãšã㊠ENV ããªã»ããïŒ
C6_SMALL_HEAP_V5_STUBïŒãdocs/analysis/ENV_PROFILE_PRESETS.mdã«è¿œèšãv5-2 ã§æ¬å®è£ ã远å ã
-
Phase v5-2 / v5-3ïŒSmallObject v5 C6-only å®è£ ïŒèåå, ç ç©¶ç®±ïŒ â å®äº
- å
容: C6 åã SmallObjectHotBox v5 ã Segment + Page + TLS ããŒã¹ã§å®è£
ããv5-3 ã§åäž TLS ã»ã°ã¡ã³ãïŒO(1)
page_meta_ofïŒãããããã free-page æ€çŽ¢ãªã©ã§ HotPath ãèååã - C6-heavy 1M/400:
- v5 OFFïŒpool v1ïŒ: çŽ 44.9M ops/s
- v5-3 ON: çŽ 38.5M ops/sïŒv5-2 ã® ~14.7M ãã㯠+162% ã ããbaseline æ¯ã§ã¯çŽ -14%ïŒ
- Mixed 16â1024B:
- v5 ONïŒC6 ã®ã¿ v5 routeïŒã§ã 36â39M ops/s ã§ SEGV ãªãïŒæ¬ç· Mixed ãããã¡ã€ã«ã§ã¯ v5 ã¯ããã©ã«ã OFFïŒã
- æ¹é: C6 v5 ã¯æ§é çã«ã¯ v4 ããè¯ãå®å®ããããããŸã v1 ãäžåããã ç ç©¶ç®±ã®ãŸãŸç¶æãæ¬ç· mid/smallmid ã¯åŒãç¶ã pool v1 åºæºã§èŠãã
- å
容: C6 åã SmallObjectHotBox v5 ã Segment + Page + TLS ããŒã¹ã§å®è£
ããv5-3 ã§åäž TLS ã»ã°ã¡ã³ãïŒO(1)
-
Phase v5-4ïŒC6 v5 header light / freelist æé©åïŒ â å®äºïŒç ç©¶ç®±ïŒ
- ç®ç: C6-heavy ã§ v5 ON æã®ååž°ãè©°ããïŒtarget: baseline æ¯ -5ã7%ïŒã
- å®è£
:
HAKMEM_SMALL_HEAP_V5_HEADER_MODE=full|lightENV ã远å ïŒããã©ã«ã fullïŒ- light mode: page carve æã«å šãããã¯ã® header ãåæåãalloc æã® header write ãã¹ããã
- full mode: åŸæ¥ã©ãã alloc æ¯ã« header writeïŒæšæºåäœïŒ
- SmallHeapCtxV5 ã« header_mode ãã£ãŒã«ã远å ïŒTLS ã§ ENV ã 1 åã ãèªãã§ cacheïŒ
- 宿ž¬å€ïŒ1M iter, ws=400ïŒ:
- C6-heavy (257-768B): v5 OFF 47.95M / v5 full 38.97M (-18.7%) / v5 light 39.25M (+0.7% vs full, -18.1% vs baseline)
- Mixed 16-1024B: v5 OFF 43.59M / v5 full 36.53M (-16.2%) / v5 light 38.04M (+4.1% vs full, -12.7% vs OFF)
- çµè«: header light ã¯åŸ®æ¹åïŒ+0.7-4.1%ïŒã ããtarget ã® -5ã7% ã«ã¯å±ããïŒçŸç¶ -18.1%ïŒãheader write 以å€ã«ã HotPath ã³ã¹ãããïŒfreelist æäœãmetadata access çïŒãv5-5 以éã§ TLS cache / batching ã«ãã HotPath ãè©°ããäºå®ã
- éçš: æšæºãããã¡ã€ã«ã§ã¯åŒãç¶ã
HAKMEM_SMALL_HEAP_V5_ENABLED=0ïŒv5 OFFïŒãC6 v5 ã¯ç ç©¶å°çšã§ãA/B æã®ã¿æç€ºçã« ONã
-
Phase v5-5ïŒC6 v5 TLS cacheïŒ â å®äºïŒç ç©¶ç®±ïŒ
- ç®ç: C6 v5 ã® HotPath ãã page_meta access ãåæžã+1-2% æ¹åãç®æãã
- å®è£
:
HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED=0|1ENV ã远å ïŒããã©ã«ã 0ïŒ- SmallHeapCtxV5 ã«
c6_cached_blockãã£ãŒã«ã远å ïŒ1-slot TLS cacheïŒ - alloc: cache hit æã¯ page_meta åç §ããå³åº§ã«è¿ãïŒheader mode ã«å¿ããŠåŠçïŒ
- free: cache 空ãªã block ã cache ã«æ ŒçŽïŒfreelist push ãã¹ãããïŒãæºæ¯ãªã evict ããŠæ° block ã cache
- 宿ž¬å€ïŒ1M iter, ws=400, HEADER_MODE=fullïŒ:
- C6-heavy (257-768B): cache OFF 35.53M â cache ON 37.02M ops/s (+4.2%)
- Mixed 16-1024B: cache OFF 38.04M â cache ON 37.93M ops/s (-0.3%, 誀差ç¯å²)
- çµè«: TLS cache ã«ãã C6-heavy ã§ +4.2% ã®æ¹åãéæïŒç®æš +1-2% ãäžåãïŒãMixed ã§ã¯åœ±é¿ã»ãŒãŒããpage_meta access åæžãå¹ããŠããã
- æ¢ç¥ã®åé¡: header_mode=light æã« infinite loop çºçïŒfreelist pointer ã header ãšè¡çªãã edge caseïŒãçŸç¶ã¯ full mode ã®ã¿åäœç¢ºèªæžã¿ã
- éçš: æšæºãããã¡ã€ã«ã§ã¯
HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED=0ïŒOFFïŒãC6 ç ç©¶çšã§ cache ON ã«ãã v5 æ§èœãéšåæ¹åå¯èœã
-
Phase v5-6ïŒC6 v5 TLS batchingïŒ â å®äºïŒç ç©¶ç®±ïŒ
- ç®ç: refill é »åºŠãåæžããC6-heavy ã§ v5 full+cache æ¯ã®è¿œå æ¹åãçãã
- å®è£
:
HAKMEM_SMALL_HEAP_V5_BATCH_ENABLED/HAKMEM_SMALL_HEAP_V5_BATCH_SIZEã远å ããSmallHeapCtxV5 ã«SmallV5Batch c6_batchïŒslots[4] + countïŒãæãããŠãC6 v5 alloc/free ã§ TLS ããããåªå çã«äœ¿ãããã«ããã
- 宿ž¬ïŒ1M/400, HEADER_MODE=full, TLS cache=ON, v5 ONïŒ:
- C6-heavy: batch OFF 36.71M â batch ON 37.78M ops/sïŒ+2.9%ïŒ
- Mixed 16â1024B: batch OFF 38.25M â batch ON 37.09M ops/sïŒçŽ -3%, C6-heavy å°çšãªãã·ã§ã³ãšããŠèš±å®¹ïŒ
- æ¹é: C6-heavy ã§ã¯ cache ã«ç¶ã㊠batch ã§ã +æ°% æ¹åã確èªã§ããããv5 å šäœã¯äŸç¶ baseline(v1/pool) ããé ããC6 v5 ã¯åŒãç¶ãç ç©¶ç®±ãšããŠç¶æããæ¬ç· mid/smallmid 㯠pool v1 ãåºæºã«èŠãã
-
Phase v6-0ïŒSmallObject Core v6 èšèšã»åã¹ã±ã«ãã³ïŒ â å®äºïŒèšèšïŒ
- ç®ç: 16ã2KiB small-object/mid åãã«ãL0 ULTRA / L1 Core / L2 Segment+ColdIface / L3 Policy ã®4å±€æ§é ãšãããã¬ã¹åæã® HotBox ãå®çŸ©ããããã以äžåãããªãæ žãã®èšèšãåºããã
- å
容:
docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.mdã远å ããSmallHeapCtxV6 / SmallClassHeapV6 / SmallPageMetaV6 / SmallSegmentV6 ãš ptrâpageâclass O(1) ã«ãŒã«ãHotBox ã絶察ã«ãããªã責åïŒheader æžãã»lookupã»Stats ãªã©ïŒãææåã- v6 ã¯çŸæç¹ã§ã¯ã³ãŒãã¯äžåè§Šãããèšèšã¬ãã«ã®ä»æ§ãšåã€ã¡ãŒãžã ãããŸãšããæ®µéãv5 㯠C6 ç ç©¶ç®±ãšããŠæ®ãã€ã€ãå°æ¥ small-object ãäœãçŽãéã®ãè¯ããšã㊠v6 ã®å±€æ§é ãæ¡çšããã
-
Phase v6-1ãv6-4ïŒSmallObject Core v6 C6-only å®è£ ïŒèååïŒMixed å®å®åïŒ â å®äºïŒç ç©¶ç®±ïŒ
- v6-1: route stub æ¥ç¶ïŒæå㯠v1/pool fallback ã®ãŸãŸïŒã
- v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath ã®æäœéå®è£ ãC6-heavy ã§ v6 çµè·¯ã SEGV ãªãå®èµ°ãããšãããŸã§ç¢ºèªïŒåæã¯çŽ -44%ïŒã
- v6-3: èååïŒTLS ownership check + batch header write + TLS batch refillïŒã«ãããC6-heavy ã§ v6 OFF â27.1M / v6-3 ON â27.1MïŒÂ±0%, baseline åçïŒãŸã§æ¹åã
- v6-4: Mixed ã§ã® v6 å®å®åã
small_page_meta_v6_ofã TLS ã¡ã¿ã§ã¯ãªã mmap é åãèŠãŠãããã°ãä¿®æ£ããMixed v6 ON ã§ãå®èµ°ïŒC6-only v6 ã®ãã Mixed 㯠v6 ON â35.8M, v6 OFF â44MïŒã - çŸç¶:
- C6-heavy: v6 OFF â27.1M / v6 ON â27.4MïŒC6 Core v6 㯠baseline åçã»å®å®ïŒã
- Mixed: C6-only v6 ã®ããå šäœã§ã¯ãŸã çŽ -19% ååž°ãC6-heavy çšã®å®éšç®±ãšã㊠v6 ãç¶æãã€ã€ãæ¬ç· Mixed ã¯åŒãç¶ã v6 OFF ãåºæºã«èŠãã
-
Phase v6-5ïŒSmallObject Core v6 C5 æ¡åŒµ, ç ç©¶ç®±ïŒ â å®äº
- ç®ç: Core v6 ã C5 ãµã€ãºåž¯ïŒ129â256BïŒã«ãæ¡åŒµããfree hotpath ã§ v6 ãã«ããŒããã¯ã©ã¹ãå¢ããè¶³å Žãäœãã
- å®è£
:
SmallHeapCtxV6ã« C5 çš TLS freelistïŒtls_freelist_c5/tls_count_c5ïŒã远å ããC5 ã§ãsmall_alloc_fast_v6/small_free_fast_v6ã TLSârefill/slow ã®ãã¿ãŒã³ã§åãããã«ããã- ColdIface v6 ã® refill/retire ã class_idxïŒC5/C6ïŒã«å¿ã㊠block_size/容éãå€ããããããäžè¬åã
- 宿ž¬ïŒ1M/400, v6 ON C5-only, C6 v6 OFFïŒ:
- C5-heavy (129â256B): v6 OFF â53.6M â v6 ON(C5) â41.0MïŒçŽ -23%ïŒ
- Mixed 16â1024B: v6 OFF â44.0M â v6 ON(C5) â36.2MïŒçŽ -18%ïŒ
- æ¹é: C5 Core v6 ã¯å®å®ããŠåããã®ã®ãTiny front v3 + v1/pool ãã倧ããé ããããæ¬ç·ã«ã¯ä¹ãã C5 v6 ã¯ç ç©¶ç®±æ±ããšãããC5-heavy/Mixed ã® free hotpath ãããã«åããªããv6 åŽã®ãããªãèååããå¥ã®ç®±ïŒfront/gate ã poolïŒã®åèšèšãæ€èšããã
-
Phase v6-6ïŒSmallObject Core v6 C4 æ¡åŒµ, ç ç©¶ç®±ïŒ â å®äº
- ç®ç: Core v6 ã C4 ãµã€ãºåž¯ïŒ65â128BïŒã«æ¡åŒµããŠãfree hotpath ã«ããŒç¯å²ãåºãã
ss_fast_lookup/slab_index_foräŸåãåæžã - å®è£
å
容:
SmallHeapCtxV6ã« C4 çš TLS freelistïŒtls_freelist_c4/tls_count_c4ïŒã远å ãsmall_alloc_fast_v6ã« C4 fast/cold refill path ã远å ïŒsmall_alloc_c4_hot_v6/small_alloc_cold_v6ã§ C4 æ¯æŽïŒãsmall_free_fast_v6ã« C4 TLS push path ã远å ïŒsmall_free_c4_hot_v6ïŒãmalloc_tiny_fast.halloc/free dispatcher ã« C4 case ã远å ã- ColdIface v6 refill ã C4ïŒ128B blockïŒã«å¯Ÿå¿ã
- ãã°ä¿®æ£:
small_alloc_cold_v6ã« C4 refill logic ãæ¬ èœããŠããã®ãä¿®æ£ïŒcold path ã§ C4 refill ãå®è£ ãããŠããªãã£ããããå šãŠ pool fallback ã«ãªã£ãŠããïŒã
- 宿ž¬å€ïŒ100k iter, v6 ON, mixed size workloadïŒ:
- C4-only (80B, class 4): v6 OFF â47.4M â v6 ON â39.4MïŒâ17% ååž°ïŒ
- C5+C6 (mixed 200/400B): v6 OFF â43.5M â v6 ON â26.8MïŒâ38% ååž°ïŒ
- Mixed (500B): v6 OFF â40.8M â v6 ON â27.5MïŒâ33% ååž°ïŒ
- è©äŸ¡:
- ç®æš: v6-6 㯠±0âæ°% within acceptable rangeïŒuser æå®ïŒãçã£ãŠããããC4 å®è£ ã«ãã£ãŠã倧ããªååž°ãæ¶ããïŒC4-only: â17%ïŒã
- æ ¹æ¬åå : v6 å®è£ ãã®ãã®ïŒTLS ownership check + page refill + cold pathïŒã® overhead ã v5 以æ¥ç¶ããŠãããC4 æ¡åŒµã§ã¯æ§é çãªæ¹åã«è³ããã
- å®å šç¢ºèªã®éŸå€è¶ é: Mixed ã§ â33% 㯠user æå®ã®ãâ10% 以äžèœã¡ããç ç©¶ç®±ã«çãããåºæºã倧ããè¶ éã
- æ¹é: Phase v6-6 ã¯ç ç©¶ç®±ã«çããæ¬ç·ã«ä¹ããªããv6-6 ïŒC4 extend ïŒã¯ ENV opt-in ã®ã¿ãæ··åšãªã¹ã¯é²æ¢ã®ãããv6-5ïŒC5ïŒãš v6-6ïŒC4ïŒã¯åæ ON ã¯éæšå¥šïŒMixed ã§ â33%ïŒã
- ä»åŸã®æ¹åæ§:
- v6 ç³»ã¯ãC6 baseline åçãã§ã¯éæã§ãããïŒv6-3ïŒC6-only 㧠±0%ïŒïŒC5/C4 ãžã®æ¡åŒµã§ã¯ overhead ã倧ããã
- 次ã®ã¢ãããŒã㯠v6 architecture ã® root cause 調æ»ïŒTLS ownership check ã® cost / page refill overhead / cold path cost çïŒããå¥èšèšïŒpool v2 åèšèš, front gate èåå, ULTRA segment æ¡åŒµïŒãæ€èšãã¹ãã
- ç®ç: Core v6 ã C4 ãµã€ãºåž¯ïŒ65â128BïŒã«æ¡åŒµããŠãfree hotpath ã«ããŒç¯å²ãåºãã
6. free path æé©åã®æ¹éïŒPhase FREE-LEGACY-BREAKDOWN ç³»åïŒ
çŸç¶èªè:
- Mixed 16â1024B ã® perf å èš³: free â 24%, tiny_alloc_gate_fast â 22%
- v6ïŒC5/C4 æ¡åŒµïŒã§ â33% ååž°ãfree-front v3 ã§ â4% ååž°
- æ°äžä»£è¿œå ã§ã¯ãªããæ¢å free path ã®ãã©ã®ç®±ãäœïŒ é£ã£ãŠãããããå¯èŠåããŠãã³ãã€ã³ãåæžããæ¹éã«è»¢æ
æ¬ç·ã®åæïŒåºå®ïŒ:
- Mixed 16â1024B: Tiny front v3 + C7 ULTRA + pool v1ïŒçŽ 44â45M ops/sïŒ
- v4/v5/v6ïŒC5/C4ïŒ/ free-front v3 㯠ç ç©¶ç®±ã»ããã©ã«ã OFF
- v6 㯠C6-only ã® mid åãã³ã¢ïŒC6-heavy ãããã¡ã€ã«å°çšã§ ONã±0% éæïŒ
HAKMEM_SMALL_HEAP_V6_ENABLED=0/HAKMEM_TINY_FREE_FRONT_V3_ENABLED=0ãåºæº
Phase FREE-LEGACY-BREAKDOWN-1 â å®äº
- ç®ç: free ããããã¹ãç®±åäœã§ã«ãŠã³ãããå èš³ãå¯èŠå
- å®è£
:
- ENV:
HAKMEM_FREE_PATH_STATS=1ã§ free path ã®ç®±ããšã«ãŠã³ã¿ãæå¹åïŒdefault 0ïŒ - FreePathStats æ§é äœã§ c7_ultra / v3 / v6 / pool v1 / super_lookup / remote_free ãªã©ãèšæž¬
- ãã¹ãã©ã¯ã¿ã§
[FREE_PATH_STATS]åºå
- ENV:
- 枬å®çµæ:
docs/analysis/FREE_LEGACY_PATH_ANALYSIS.mdã«èšé² - 次ãã§ãŒãº: 枬å®çµæãèŠãŠ FREE-LEGACY-OPT-1/2/3 ã®ã©ããå®è£ ãããæ±ºå®
Phase FREE-LEGACY-BREAKDOWN-1 枬å®çµæ â å®äº
- Mixed 16â1024B ã® free çµè·¯å
èš³:
- C7 ULTRA fast: 50.7% (275,089 / 542,031 calls)
- Legacy fallback: 49.2% (266,942 / 542,031 calls)
- pool_v1_fast: 1.5% (8,081 / 542,031 calls)
- ãã®ä»ïŒv3/v6/tiny_v1/super_lookup/remoteïŒ: 0.0%
- C6-heavy ã® free çµè·¯å
èš³:
- pool_v1_fast: 100% (500,099 / 500,108 calls)
- ãã®ä»: 0.0%
- äž»èŠçºèŠ:
- Mixed 㯠å®å šãªäºæ¥µåæ§é ïŒC7 ULTRA 50.7% vs Legacy 49.2%ïŒ
- C6-heavy 㯠pool_v1 çµè·¯ã®ã¿ã䜿çšïŒæé©åã¿ãŒã²ããæç¢ºïŒ
- 詳现:
docs/analysis/FREE_LEGACY_PATH_ANALYSIS.mdåç §
Phase FREE-LEGACY-OPT-4 ã·ãªãŒãº: Legacy fallback åæžïŒèšç»äžïŒ
- ç®ç: Mixed ã® Legacy fallback 49.2% ãåæžããC7 ULTRA ãã¿ãŒã³ãä»ã¯ã©ã¹ã«å±é
- ã¢ãããŒã:
- 4-0: ããã¥ã¡ã³ãæŽç â
- 4-1: Legacy ã® per-class åæïŒã©ã®ã¯ã©ã¹ã Legacy ãæã䜿çšããŠãããç¹å®ïŒ
- 4-2: 1ã¯ã©ã¹éå® ULTRA-Free lane ã®èšèšã»å®è£
- 察象: 4-1 ã§ç¹å®ãããæå€§ã·ã§ã¢ã¯ã©ã¹ïŒä»®ã« C5ïŒ
- å®è£ : TLS free ãã£ãã·ã¥ã®ã¿è¿œå ïŒalloc åŽã¯æ¢åã®ãŸãŸïŒ
- ENV:
HAKMEM_TINY_C5_ULTRA_FREE_ENABLED=0(ç ç©¶ç®±)
- 4-3: A/B ãã¹ãïŒMixed ã§å¹ææž¬å®ãçµææ¬¡ç¬¬ã§æ¬ç·å or ç ç©¶ç®±ç¶æïŒ
- æåŸ 广: Legacy 49% â 35-40% ã«åæžãfree å šäœã§ 5-10% æ¹åãMixed ã§ +2-4M ops/s
5. å¥åº·èšºæã©ã³ïŒå¿ ãæåã«å©ã 2 æ¬ïŒ
-
Tiny/Mixed çš:
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \ ./bench_random_mixed_hakmem 1000000 400 1 # ç®å®: 44±1M ops/s / segv/assert ãªã -
mid/smallmid C6 çš:
HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 \ ./bench_mid_large_mt_hakmem 1 1000000 400 1 # çŸç¶: â10M ops/s / segv/assert ãªãïŒåèšèšã¿ãŒã²ããïŒ
ãŸãšããŠå©ããããšã㯠scripts/verify_health_profiles.shïŒååšããå ŽåïŒãå©çšãã
詳现㪠perf/ãã§ãŒãºãã°ã¯ CURRENT_TASK_ARCHIVE_20251210.md ãšå docs/analysis/* ãåç
§ããŠãã ããã
Phase FREE-LEGACY-OPT-4-4: C6 ULTRA free+alloc çµ±åïŒå¯çå TLS ãã£ãã·ã¥ïŒâ å®äº
ç®ç
Phase 4-3 ã§ free-only TLS ãã£ãã·ã¥ã effective ã§ãªãããšã倿ããããã alloc åŽã« TLS pop ã远å ããŠçµ±åããå®å šãª alloc/free ãµã€ã¯ã«ãå®çŸã
å®è£ å 容
- malloc_tiny_fast.h: C6 ULTRA alloc popïŒL191-202ïŒ
- FreePathStats: c6_ultra_alloc_hit ã«ãŠã³ã¿è¿œå
- ENV: HAKMEM_TINY_C6_ULTRA_FREE_ENABLED (default: OFF)
èšæž¬çµæ
Mixed 16â1024B (1M iter, ws=400):
- OFF (baseline): 40.2M ops/s
- ON (çµ±ååŸ): 42.2M ops/s
- æ¹å: +4.9% â æåŸ å€éæ
C6-heavy (257-768B, 1M iter, ws=400):
- OFF: 40.7M ops/s
- ON: 43.8M ops/s
- æ¹å: +7.6% â Mixed ãã广倧
广ã®åæ
Legacy ã®åçåæž:
- Legacy fallback: 266,942 â 129,623 (-51.4%)
- Legacy by class[6]: 137,319 â 0 (100% æé€)
TLS ãµã€ã¯ã«ã®æå:
- C6 allocs: 137,241 ã TLS pop ã§ direct serve
- C6 frees: 137,319 ã TLS push ã§ç»é²
- ãã£ãã·ã¥ã¯éå å¡«ããªãïŒalloc ã drainïŒ
èšèšãã¿ãŒã³
å¯çå TLS ãã£ãã·ã¥:
- Core v6 ã®ãããªå°çš segment 管çãªã
- æ¢å allocator ã«ãå¯çãïŒoverhead minimalïŒ
- free + alloc äž¡æ¹å¶åŸ¡ã§å®å šãªãµã€ã¯ã«å®çŸ
å€å®çµæ
â æåŸ å€éæ: +3-5% â +4.9% ãå®çŸ â C6 legacy 100% æé€: èšèšã®åŠ¥åœæ§ç¢ºèª â æ¬åœåè£ã«ææ Œ: ENV ããã©ã«ã OFF ã¯ç¶æ
Phase REFACTOR-1/2/3: Code Quality Improvement â å®äº
宿œå 容
-
REFACTOR-1: Magic Number â Named Constants
- æ°ãã¡ã€ã«: tiny_ultra_classes_box.h
- TINY_CLASS_C6/C7ãtiny_class_is_c6/c7() ãã¯ãå®çŸ©
- malloc_tiny_fast.h: == 6, == 7 â semantic macros
-
REFACTOR-2: Legacy Fallback Logic çµ±äžå
- æ°ãã¡ã€ã«: tiny_legacy_fallback_box.h
- tiny_legacy_fallback_free_base() çµ±äžé¢æ°
- éè€åé€: 60è¡ïŒmalloc_tiny_fast.h ãš tiny_c6_ultra_free_box.cïŒ
-
REFACTOR-3: Inline Pointer Macro äžå€®å
- æ°ãã¡ã€ã«: tiny_ptr_convert_box.h
- tiny_base_to_user_inline(), tiny_user_to_base_inline()
- offset 1 byte ã centralized ã«
广
- â DRY åå: Code duplication åæžïŒ60è¡ïŒ
- â å¯èªæ§: Magic number â semantic macro
- â ä¿å®æ§: offset, logic ã1ç®æã§å®çŸ©
- â Performance: Zero regressionïŒinline preservedïŒ
çŽ¯ç©æ¹åïŒPhase 4-0 â Refactor-3ïŒ
| Phase | æ¹å | çŽ¯ç© | ç¹åŸŽ |
|---|---|---|---|
| 4-1 | - | - | Legacy per-class åæ |
| 4-2 | +0% | 0% | Free-only TLSïŒå¹æãªãïŒ |
| 4-3 | +1-3% | 1-3% | Segment åŠç¿ïŒéå®çïŒ |
| 4-4 | +4.9% | +4.9% | Free+alloc çµ±åïŒæ¬åœïŒ |
| REFACTOR | +0% | +4.9% | Code qualityïŒoverhead ãªãïŒ |
Phase FREE-FRONT-V3-1 å®è£ å®äº (2025-12-11)
ç®ç: free åæ®µã«ãv3 snapshot ç®±ããå·®ã蟌ã¿ãroute å€å®ãš ENV å€å®ã 1 ç®æã«éçŽããè¶³å Žãäœããæåã¯å€ããªãã
å®è£ å 容:
-
æ°èŠãã¡ã€ã«äœæ:
core/box/free_front_v3_env_box.h- free_route_kind_t enum (FREE_ROUTE_LEGACY, FREE_ROUTE_TINY_V3, FREE_ROUTE_CORE_V6_C6, FREE_ROUTE_POOL_V1)
- FreeRouteSnapshotV3 struct (route_kind[NUM_SMALL_CLASSES])
- API 3å: free_front_v3_enabled(), free_front_v3_snapshot_get(), free_front_v3_snapshot_init()
- ENV: HAKMEM_TINY_FREE_FRONT_V3_ENABLED (default 0 = OFF)
-
å®è£ ãã¡ã€ã«:
core/box/free_front_v3_env_box.c- free_front_v3_enabled() - ENV lazy init (default OFF)
- free_front_v3_snapshot_get() - TLS snapshot ã¢ã¯ã»ã¹
- free_front_v3_snapshot_init() - route_kind ããŒãã«åæå
- çŸè¡ tiny_route_for_class() ã䜿ã£ãŠæ¢åæåãç¶æ
-
ãã¡ã€ã«ä¿®æ£:
core/box/hak_free_api.inc.h- FG_DOMAIN_TINY å ã« v3 snapshot routing logic ã远å
- v3 OFF (default) ã§ã¯åŸæ¥ãã¹ãç¶æïŒæå倿ŽãªãïŒ
- v3 ON ã§ã¯ snapshot çµç±ã§ route æ±ºå® (v6 c6, v3, pool v1)
-
Makefile æŽæ°
- OBJS_BASE, BENCH_HAKMEM_OBJS_BASE, SHARED_OBJS ã« free_front_v3_env_box.o 远å
ãã«ãçµæ:
- â ã³ã³ãã€ã«æå (free_front_v3_env_box.o çæ)
- â ãªã³ã¯æå (free_front_v3_enabled, free_front_v3_snapshot_get ã·ã³ãã«è§£æ±º)
- æ¢åã® v3/v4/v5/v6 é¢é£ã®ãªã³ã¯ãšã©ãŒã¯ pre-existing issue
次ãã§ãŒãº (FREE-FRONT-V3-2):
- route_for_class åŒã³åºãåæž
- ENV check åé€ïŒsnapshot å ã«çµ±åæžã¿ïŒ
- snapshot åæåã®æé©å
Phase FREE-FRONT-V3-2 å®è£ å®äº (2025-12-11)
ç®ç: free path ãã tiny_route_for_class() åŒã³åºããš redundant 㪠ENV check ãåæžããfree åŠçãæé©åããã
å®è£ å 容:
-
smallobject_hotbox_v3_env_box.h ã« small_heap_v3_class_mask() 远å
- v3 察象ã¯ã©ã¹ã®ããããã¹ã¯ãè¿ã颿°ã远å ïŒv6 ãšåæ§ã® APIïŒ
- small_heap_v3_class_enabled() ããã¹ã¯çµç±ã«æžãæã
-
free_front_v3_snapshot_init() ã®æé©å (core/box/free_front_v3_env_box.c)
- tiny_route_for_class() åŒã³åºããå®å šåé€
- ENV ãã¹ã¯ãçŽæ¥èªãã§å€å®ïŒv6_mask, v3_maskïŒ
- åªå 床é ã« route 決å®: v6 > v3 > pool/legacy
-
hak_free_at() v3 path ã®æé©å (core/box/hak_free_api.inc.h)
- v6 hot path ã inline ã§åŒã³åºãïŒsmall_free_c6_hot_v6, c5, c4ïŒ
- ENV check ãªããsnapshot ã ãã§å®çµ
- v3 path (C7) 㯠so_free() ã«å§è²ïŒss_fast_lookup 㯠v3 å éšã§åŠçïŒ
ãã³ãããŒã¯çµæ:
Mixed 16-1024B (bench_random_mixed_hakmem 100000 400 1):
- v3 OFF (baseline): 42.6M, 41.6M, 45.2M ops/s â å¹³å 43.1M ops/s
- v3 ON (optimized): 41.1M, 39.9M, 43.0M ops/s â å¹³å 41.3M ops/s
- çµæ: â4.2% (埮ååž°)
C6-heavy mid/smallmid (bench_mid_large_mt_hakmem 1 100000 400 1):
- v3 OFF (baseline): 13.8M, 15.2M, 14.5M ops/s â å¹³å 14.5M ops/s
- v3 ON (optimized): 15.5M, 15.2M, 14.0M ops/s â å¹³å 14.9M ops/s
- çµæ: +2.8% (誀差ã埮æ¹å)
å®å®æ§:
- â ã³ã³ãã€ã«æåããªã³ã¯æå
- â SEGV/assert ãªã
- â v3 OFF æã¯åŸæ¥ãã¹ãç¶æïŒå®å šã«å€æŽãªãïŒ
çµè«:
- Mixed ã§åŸ®ååž° (â4%) ãèŠããããããv3 ã¯åŒãç¶ãç ç©¶ç®±ïŒdefault OFFïŒãšããŠç¶æ
- C6-heavy ã§ã¯åŸ®æ¹å (+3%) ã確èªããããã誀差ç¯å²å
- snapshot infrastructure ã¯æ£åžžã«åäœããŠãããä»åŸã®æé©åã®è¶³å ŽãšããŠæçš
- Phase v3-3 ã§ã¯ãv6 hot path ã® inline åã route dispatch ã®æé©åãæ€èš
Phase PERF-ULTRA-REBASE-1 宿œå®äº (2025-12-11)
ç®ç: C4-C7 ULTRA ãå šãŠæå¹ã«ããç¶æ ã§ã® CPU ããããã¹èšæž¬
èšæž¬æ¡ä»¶:
- ENV: HAKMEM_TINY_C4/C5/C6/C7_ULTRA_FREE_ENABLED=1ïŒå šãŠ ONïŒ
- v6/v5/v4/free-front-v3 㯠OFFïŒç ç©¶ç®±ïŒ
- ã¯ãŒã¯ããŒã: Mixed 16-1024B, 10M cycles, ws=8192
- Throughput: 31.61M ops/s
ããããã¹åæçµæ (allocator å éš, self%):
| é äœ | 颿°/ãã¹ | self% | åé¡ |
|---|---|---|---|
| ðŽ #1 | C7 ULTRA alloc | 7.66% | â æ°ããæå€§ããã«ãã㯠|
| #2 | C4-C7 ULTRA free矀 | 5.41% | alloc-free cycle |
| #3 | so_allocç³» (v3 backend) | 3.60% | äžèŠæš¡alloc |
| #4 | page_of/segmentå€å® | 2.74% | ptr解決 |
| #5 | gate/frontåæ®µ | 2.51% | â æ¹åæžã¿ |
| #6 | so_freeç³» | 2.47% | - |
| #7 | ss_map_lookup | 0.79% | â å€§å¹ æ¹åæžã¿ |
éèŠãªçºèŠ:
- C7 ULTRA alloc ãæç¢ºãªæå€§ããã«ãã㯠- gate/front ã header ã¯ããååèã
- headeræžã蟌ã¿ãäžå¯èŠ (< 0.17%) - ULTRAçµè·¯ã§ã®åæžå¹æãåºãŠãã
- gate/front ã¯æ¢ã«èš±å®¹ç¯å² (2.51%) - 以åã®ãã§ãŒãºããæ¹åæžã¿
åæçµè«:
- v6/v5/v4 ã®ãããªæ°äžä»£è¿œå ã§ã¯ãªãããæ¢ã«åœãããåºãŠãã C7/C4/C5/C6 ULTRA å éšãèãããããã§ãŒãºãžè»¢æãã¹ã
- C7 ULTRA alloc ã® 7.66% ã 5-6% ã«åãã°ãå šäœã§ 2-3% ã®å¹æãæåŸ ã§ãã
詳现: docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md åç
§
Phase PERF-ULTRA-ALLOC-OPT-1 èšç»ïŒå®è£ äºå®ïŒ
ç®ç: C7 ULTRA allocïŒ7.66%ïŒã®å éšæé©åã«ãã alloc ãã¹é«éå
ã¿ãŒã²ãã: tiny_c7_ultra_alloc() ã® hot path ãçŽç·å
å®è£ æœç:
- TLS ããããã¹ã®çŽç·å
- env check / snapshot ååŸãæ®ã£ãŠããªãã確èª
- fast path ãå®å šã«çŽç·åïŒå岿å°åïŒ
- TLS freelist ã¬ã€ã¢ãŠãæé©å
- 1 cache line ã«åãŸãã確èª
- alloc ãããããŒã¿ïŒfreelist[], countïŒã®é 眮æé©å
- segment/page_meta ã¢ã¯ã»ã¹ã®ç¢ºèª
- segment learning / page_meta access ãæ¬åœã« slow path ã ãã確èª
- hot path ã«äœåãªã¡ã¢ãªã¢ã¯ã»ã¹ããªãã確èª
èšæž¬æŠç¥:
- C7-only ãš Mixed äž¡æ¹ã® A/B ãã¹ãïŒenabler: HAKMEM_TINY_C7_ULTRA_FREE_ENABLED=1ïŒ
- perf èšæž¬ã§ self% ã 7.66% â 5-6% ãŸã§èœã¡ãã確èª
- throughput æ¹åéãæž¬å®
æåŸ å€: alloc ãã¹ã§ 5-10% ã®åæž
次ã¹ããã: å®è£ å®äºåŸãperf åèšæž¬ã§å¹æãæ€èšŒ
次ãã§ãŒãºåè£ïŒæ±ºå®ä¿çäžïŒ
å®è£ äºå®ãã§ãŒãº
-
Phase PERF-ULTRA-ALLOC-OPT-1 (å³åº§å®è£ )
- C7 ULTRA alloc å éšæé©å
- ç®æš: 7.66% â 5-6%
- æåŸ : å šäœã§ 2-3% æ¹å
-
Phase PERF-ULTRA-ALLOC-OPT-2 (åŸç¶)
- C4-C7 ULTRA free矀ïŒ5.41%ïŒã®è»œéå
- page_of / segmentå€å®ãšã®é£æºæé©å
ç ç©¶ç®±ïŒåŸåããåœé¢ã¯ OFFïŒ
- C3/C2 ULTRA: legacy å°ããïŒ4% æªæºïŒã®ã« TLS å¢å ã§ L1 æ±æãªã¹ã¯
- v6/v5/v4 æ¡åŒµ: æ¢å v1/pool ããå€§å¹ ã«é ããæ°äžä»£è¿œå ã¯çŸæ®µéã§ã¯ååž°èªçº
- FREE-FRONT-V3-2: 以å -4% ååž°ããã£ããããULTRA æŽååŸã«åæ€èš
å®è£ ããªã·ãŒå€æïŒéèŠïŒ
ãããŸã§ïŒãã§ãŒãº 4-4 ãŸã§ïŒ
- æ°ããç®±ãäžä»£ïŒv4/v5/v6/free-front-v3 çïŒãå¢ãã
- åœãããåºããæ¬ç·åãã
ä»åŸïŒPERF-ULTRA-ALLOC-OPT 以éïŒ
- æ¢ã«åœãããåºãŠããç®±ïŒC4-C7 ULTRAïŒã®äžèº«ã现ããåã
- æ°äžä»£è¿œå ã¯é¿ããïŒL1 ãã£ãã·ã¥æ±æãè€é床å¢å ã®ãªã¹ã¯ïŒ
- hotpath åæ â ãã³ãã€ã³ãæé©åã®ãµã€ã¯ã«ãåã
Phase PERF-ULTRA-ALLOC-OPT-1 å®è£ è©Šè¡ (2025-12-11)
ç®ç: C7 ULTRA allocïŒçŸåš 7.66% self%ïŒã® hot path ãçŽç·åãã5-6% ãŸã§åæž
å®è£ å 容:
- æ°èŠãã¡ã€ã«äœæ:
core/box/tiny_c7_ultra_free_env_box.h: ENV gate (HAKMEM_TINY_C7_ULTRA_FREE_ENABLED, default ON)core/box/tiny_c7_ultra_free_box.h: TLS structure (TinyC7UltraFreeTLS) with optimized layout (count first)core/box/tiny_c7_ultra_free_box.c: tiny_c7_ultra_alloc_fast() / tiny_c7_ultra_free_fast() implementation
- 倿Žãã¡ã€ã«:
core/front/malloc_tiny_fast.h: æ°ãã C7 ULTRA alloc/free fast path ã®çµ±åcore/box/free_path_stats_box.h: c7_ultra_free_fast / c7_ultra_alloc_hit ã«ãŠã³ã¿è¿œåMakefile: tiny_c7_ultra_free_box.o ã®è¿œå
èšèšæå³:
- C4/C5/C6 ULTRA free ãšåæ§ã®ãå¯çå TLS ãã£ãã·ã¥ããã¿ãŒã³ã C7 ã«é©çš
- TLS freelist (128 slots) ã§ alloc/free ãé«éå
- hot field (count) ãæ§é äœå é ã«é 眮ã㊠L1 cache locality åäž
- æ¢å C7 ULTRA (UF-3) ãã³ãŒã«ããã¹ãšããŠæž©å
å®è£ äžã®èª²é¡:
-
Segment lookup åé¡:
- tiny_c7_ultra_segment_from_ptr() ãåžžã« NULL ãè¿ãçŸè±¡ã確èª
- BASE pointer / USER pointer äž¡æ¹è©Šããã解決ãã
- g_ultra_seg (TLS倿°) ã®åæåã¿ã€ãã³ã° or å¯èŠæ§ã®åé¡ã®å¯èœæ§
-
TLS cache æªåäœ:
- FREE_PATH_STAT ã® c7_ultra_free_fast ã«ãŠã³ã¿ãåžžã« 0
- c7_ultra_alloc_hit ã«ãŠã³ã¿ãåžžã« 0
- segment check ãå®å šã« bypass ããŠãæ¹åãã
- TLS cache ãžã® push/pop ãäžåºŠãæåããŠããªãç¶æ
-
çµ±åã®è€éæ§:
- æ¢å C7 ULTRA (UF-1/UF-2/UF-3) ãšæ°å®è£ ã® ENV 倿°ãç°ãªã
- HAKMEM_TINY_C7_ULTRA_ENABLED (æ¢å) vs HAKMEM_TINY_C7_ULTRA_FREE_ENABLED (æ°èŠ)
- æ¢åå®è£ ã tiny_c7_ultra.c ã§ç¬èªã® TLS freelist ãæã£ãŠãã
èšæž¬çµæ:
- Build: æå (warning ã®ã¿)
- Sanity test: æå (SEGV/assert ãªã)
- Throughput: ~44M ops/s (ããŒã¹ã©ã€ã³ãšåç, æ¹åãªã)
- perf self%: 7.66% (å€åãªã, æé©åæªé©çšç¶æ )
åæãšèå¯:
-
æ ¹æ¬åå ã®å¯èœæ§:
- C7 ULTRA ã®æ¢åå®è£ (tiny_c7_ultra.c) ãç¬èªã® TLS state ãš segment 管çãæã€
- æ°èŠã«äœæãã TLS cache ãæ¢åå®è£ ãšçµ±åãããŠããªã
- segment lookup ãæåŸ éãåäœããªã (g_ultra_seg ã®åæå/å¯èŠæ§åé¡)
-
ã¢ãããŒãã®èŠçŽãå¿ èŠæ§:
- çŸåš: æ¢å C7 ULTRA ãšã¯å¥ã®äžŠåã·ã¹ãã ãäœæ (C4/C5/C6 ãã¿ãŒã³)
- ææ¡: æ¢å tiny_c7_ultra.c ã® tiny_c7_ultra_alloc() ãçŽæ¥æé©åãã¹ã
- çç±: C7 ULTRA ã¯æ¢ã«å°çš segment ãš TLS ãæã¡ãç¬ç«ãããµãã·ã¹ãã
-
次ã¹ãããã®æšå¥š:
- Option A: tiny_c7_ultra.c ã® tiny_c7_ultra_alloc() å
éšãçŽæ¥æé©å
- ENV check ã®å€åºã
- TLS freelist access ã®çŽç·å
- äžèŠãªåå²ã®åé€
- Option B: çŸåšã®å®è£
ã® segment lookup åé¡ã解決
- g_ultra_seg ã®åæåã¿ã€ãã³ã°ã調æ»
- ãããã°ãã«ãã§ã®è©³çްãã¬ãŒã¹
- segment registry ãšã®çµ±å確èª
- Option A: tiny_c7_ultra.c ã® tiny_c7_ultra_alloc() å
éšãçŽæ¥æé©å
ã¹ããŒã¿ã¹: æªå®äº (èŠåèšèš)
æèš:
- C7 ULTRA 㯠C4/C5/C6 ãšç°ãªããæ¢ã«å°çšã® segment 管çãš TLS ãæã€ç¬ç«ã·ã¹ãã
- ãå¯çåããã¿ãŒã³ã¯æ¢å allocator ã«å¯çããåæã ããC7 ULTRA ã¯ç¬ç«ããŠããäžé©å
- çŽæ¥æé©å (ENV check å€åºããåå²åæž) ã®æ¹ãé©åãªã¢ãããŒãã®å¯èœæ§ãé«ã
次ãã§ãŒãºãžã®ç€ºå:
- Phase PERF-ULTRA-ALLOC-OPT-1 ã¯äžæŠä¿çããã¢ãããŒãã忀èš
- tiny_c7_ultra.c ã® tiny_c7_ultra_alloc() ãçŽæ¥ãããã¡ã€ãªã³ã°ããhot path ç¹å®
- ENV check / åå²åæž / TLS access æå°åãæ¢åã³ãŒãå ã§å®æœ
Phase PERF-ULTRA-ALLOC-OPT-1 å®è£ å®äº (2025-12-11)
C7 ULTRA alloc 㯠tiny_c7_ultra.c å æé©åã§ self%/throughput ãšãã»ãŒäžå€ã ãã以äžã¯ refill/path èšèšã絡ãããäžæŠæã¡æ¢ãã
Phase PERF-ULTRA-FREE-OPT-1 å®è£ å®äº (2025-12-11)
å®è£ å 容:
- C4âC7 ULTRA free ã pure TLS push + cold segment learning ã«çµ±äž
- C4/C5/C6 ULTRA ã¯æ¢ã«æé©åæžã¿ïŒçµ±äž legacy fallback çµç±ïŒ
- C7 ULTRA free ãåããã¿ãŒã³ã«æŽåïŒlikely/unlikely + FREE_PATH_STAT_INC 远å ïŒ
- base/user å€æã¯ tiny_ptr_convert_box.h ãã¯ãã§çµ±äžæžã¿
宿ž¬å€ (Mixed 16-1024B, 1M iter, ws=400):
- Baseline (C7 ULTRA ã®ã¿): 42.0-42.1M ops/s, legacy_fb=266,943 (49.2%)
- Optimized (C4-C7 ULTRA å šæå¹): 45.7-47.0M ops/s, legacy_fb=26,025 (4.8%)
- æ¹å: +8.8-11.7% (å¹³å +9.3%, çŽ +4M ops/s)
FREE_PATH_STATS åæ:
- C7 ULTRA: 275,057 (50.7%, äžå€)
- C6 ULTRA: 0 â 137,319 free + 137,241 alloc (100% ã«ããŒ, legacy C6 å®å šæé€)
- C5 ULTRA: 0 â 68,871 free + 68,827 alloc (100% ã«ããŒ, legacy C5 å®å šæé€)
- C4 ULTRA: 0 â 34,727 free + 34,696 alloc (100% ã«ããŒ, legacy C4 å®å šæé€)
- Legacy fallback: 266,943 â 26,025 (-90.2%, C2/C3 ã®ã¿æ®å)
C4/C5/C6-heavy å®å®æ§ç¢ºèª:
- C4-heavy (65-128B): 55.0M ops/s, SEGV/assert ãªã
- C5-heavy (129-256B): 56.5M ops/s, SEGV/assert ãªã
- C6-heavy (257-768B): 16.9M ops/s, SEGV/assert ãªã
è©äŸ¡: ç®æšéæ
- Legacy 49% â 5% ã«åæžïŒâ90%ïŒ
- C4/C5/C6 ULTRA ã«ãã Mixed throughput +9.3%
- å šã¯ã©ã¹ïŒC4-C7ïŒã§çµ±äžããã TLS push ãã¿ãŒã³ç¢ºç«
Phase PERF-ULTRA-REBASE-3: æ£ãããã©ã¡ãŒã¿ã§åèšæž¬ (2025-12-11)
åé¡: Phase REBASE-2 ã§ iters=1M, ws=400 ã¯è»œãã㊠ULTRA 颿°ã invisibleïŒ238 samples ã®ã¿ïŒã ã£ããããæ£ãããã©ã¡ãŒã¿ã§å宿œã
ä¿®æ£å 容: iters=10M, ws=8192ïŒPhase REBASE-1 ãšåããã©ã¡ãŒã¿ã§åèšæž¬ïŒ
Mixed 16-1024B ããããã¹ïŒself% äžäœ, 1890 samplesïŒ
| é äœ | 颿° | self% | åé¡ |
|---|---|---|---|
| #1 | free | 29.22% | free dispatcher |
| #2 | main | 19.27% | benchmark overhead |
| #3 | tiny_alloc_gate_fast | 18.17% | alloc gate |
| #4 | tiny_c7_ultra_refill | 6.92% | C7 ULTRA refill |
| #5 | malloc | 5.00% | malloc dispatcher |
| #6 | tiny_region_id_write_header (lto_priv) | 4.29% | header write |
| #7 | hak_super_lookup | 2.90% | segment lookup |
| #8 | hak_free_at | 2.36% | free routing |
| #9 | so_free | 2.60% | v3 free |
| #10 | so_alloc_fast | 2.46% | v3 alloc |
ã¹ã«ãŒããã:
- Mixed 16-1024B: 30.6M ops/s (10M iter, ws=8192)
- C6-heavy 257-768B: 17.0M ops/s (10M iter, ws=8192)
C6-heavy ããããã¹ïŒself% äžäœ, 3027 samplesïŒ
| é äœ | 颿° | self% | åé¡ |
|---|---|---|---|
| #1 | worker_run | 10.66% | benchmark loop |
| #2 | free | 25.13% | free dispatcher |
| #3 | hak_free_at | 19.89% | free routing |
| #4 | hak_pool_free_v1_impl | 10.16% | pool v1 free |
| #5 | hak_pool_try_alloc_v1_impl | 10.95% | pool v1 alloc |
| #6 | pthread_once | 5.94% | initialization |
| #7 | hak_pool_free_fast_v2_impl | 3.94% | pool v2 fallback |
| #8 | hak_super_lookup | 4.39% | segment lookup |
| #9 | malloc | 3.77% | malloc dispatcher |
| #10 | hak_pool_try_alloc (part) | 0.66% | pool alloc slow |
åæ
Mixed 16-1024B ã§ã®å€å:
- free: 29.22% (benchmark å€ã®ãã£ã¹ãããã£éšå)
- tiny_alloc_gate_fast: 18.17% (åå REBASE-1 ã®èšæž¬ãšäžèŽ)
- C7 ULTRA refill: 6.92% (åå REBASE-1 ã§ã¯ 7.66% ã ã£ãããã¯ãŒã¯ããŒãã«ããå€åç¯å²å )
- C4-C7 ULTRA free 矀: åå¥ã«ã¯ invisible (< 1% each) ã ããåèšã§æ°%çšåºŠ
- so_allocç³»: 2.46% (so_alloc_fast) + 1.16% (so_alloc) = çŽ 3.62%
- page_of/segment: hak_super_lookup 2.90%
C6-heavy ã§ã®ç¶æ³:
- pool v1 çµè·¯ã dominant: hak_pool_free_v1_impl (10.16%) + hak_pool_try_alloc_v1_impl (10.95%)
- hak_free_at: 19.89% (free routing overhead ã倧ãã)
- hak_super_lookup: 4.39% (segment lookup)
- C6-heavy ã¯å®å šã« pool v1 çµè·¯ã䜿çšïŒååã® FREE_PATH_STATS åæãšäžèŽïŒ
次ã®ããã«ããã¯ç¢ºå®
Mixed ã§ã¯:
- free dispatcher å šäœïŒ29.22%ïŒ ãæå€§
- tiny_alloc_gate_fastïŒ18.17%ïŒã第äº
- C7 ULTRA refillïŒ6.92%ïŒã¯æ¢ã«èãéšé¡
C6-heavy ã§ã¯:
- hak_free_atïŒ19.89%ïŒ ãæå€§ã® allocator å éšããã«ããã¯
- pool v1 alloc/freeïŒå 10%ïŒã¯æ§é çãªã³ã¹ã
- hak_super_lookupïŒ4.39%ïŒãåæžäœå°ãã
次ãã§ãŒãºåè£
-
Option A: free dispatcher æé©å (Mixed åã)
- free() å éšã® routing logic ãæé©å
- hak_free_at ã®åå²ãåæž
- æåŸ 广: Mixed ã§ free 29% â 25% çšåºŠã«åæžïŒ+1-2M ops/sïŒ
-
Option B: alloc gate æé©å (Mixed åã)
- tiny_alloc_gate_fastïŒ18.17%ïŒã®å éšæé©å
- class å€å®ã routing ã®çŽç·å
- æåŸ 广: Mixed ã§ alloc gate 18% â 15% çšåºŠã«åæžïŒ+1-2M ops/sïŒ
-
Option C: C6-heavy mid/pool åèšèš (C6 åã)
- hak_free_atïŒ19.89%ïŒã® C6 å°çš fast path 远å
- pool v1 ã® lookup overhead åæž
- æåŸ 广: C6-heavy ã§ 17M â 20-25M ops/s
æšå¥š: Option A ãŸã㯠BïŒMixed ãæ¬ç·ã®ããïŒãC6-heavy ã¯å¥é mid åèšèšãã§ãŒãºã§å¯Ÿå¿ã
次ãã§ãŒãºæ±ºå®:
- Mixed: free dispatcher â29%, alloc gate â18%, C7 ULTRA refill â6.9%
- 次㯠FREE-DISPATCHER-OPT-1 ã§ hak_free_at ç³»ã®ã«ãŒãã£ã³ã°å±€ãèããã
çæãã¡ã€ã«
/mnt/workdisk/public_share/hakmem/perf_ultra_mixed_v3.txt- Mixed 16-1024B ã® complete perf report (1890 samples)/mnt/workdisk/public_share/hakmem/perf_ultra_c6_v3.txt- C6-heavy ã® complete perf report (3027 samples)/mnt/workdisk/public_share/hakmem/CURRENT_TASK_PERF_REBASE3.md- 詳现ã¬ããŒã
Phase SO-BACKEND-OPT-1: v3 backend (so_alloc/so_free) åè§£ãã§ãŒãº â å®äº (2025-12-11)
ç®ç
PERF-ULTRA-REFILL-OPT-1a/1b ã§ C7 ULTRA refill ã +11% æé©åããåŸã次ã®ããã«ãã㯠v3 backend (so_alloc/so_free) ã ~5% ãå ãã ããšã倿ã
- Mixed 16-1024B ã§ã¯ so_alloc_fast (2.46%) + so_free (2.47%) + so_alloc (1.21%) = åèš ~5%
- å èš³ã现ååããæ¬¡ãã§ãŒãºã§æé©åãã¹ãç®æïŒã¯ã©ã¹å¥ hot pathãã¡ã¢ãªã¢ã¯ã»ã¹ãåå²ïŒãç¹å®ãã
å®è£ å 容ïŒå®äºïŒ
â Task 1: ããã¥ã¡ã³ãæŽæ°
- CURRENT_TASK.md ã« Phase SO-BACKEND-OPT-1 ã»ã¯ã·ã§ã³è¿œå
- docs/analysis/SMALLOBJECT_HOTBOX_V3_DESIGN.md ã«ãPhase SO-BACKEND-OPT-1: v3 Backend ããã«ããã¯åæãã»ã¯ã·ã§ã³è¿œå
- çŸç¶èªèïŒv3 backend ã® perf å èš³ïŒalloc 2.46%, free 2.47%, alloc_slow 1.21% = åèš 5.14%ïŒ
- å®è£ æ¹éïŒè©³çް stats æ§é äœã®å®çŸ©
â Task 2: v3 backend çš stats å®è£
- ENV:
HAKMEM_SO_V3_STATS(æ¢åãããã©ã«ã 0)ã§äœ¿çš - core/box/smallobject_hotbox_v3_box.h ã«æ°ãã£ãŒã«ã远å ïŒ
alloc_current_hit: current ããŒãžãã popalloc_partial_hit: partial ããŒãžãã popfree_current: current ã« pushfree_partial: partial ã« pushfree_retire: page retire
- core/smallobject_hotbox_v3.c ã« helper 颿°å®è£
(6å)ïŒ
so_v3_record_alloc_current_hit()so_v3_record_alloc_partial_hit()so_v3_record_free_current()so_v3_record_free_partial()so_v3_record_free_retire()- etc.
- so_alloc_fast / so_free_fast å ã«åã蟌ã¿
- ãã¹ãã©ã¯ã¿ã§
[ALLOC_DETAIL]/[FREE_DETAIL]ã»ã¯ã·ã§ã³è¿œå
â Task 3: Mixed / C7-only ã§èšæž¬
- C7-only (1024B, 1M iter, ws=400, ULTRA ç¡å¹å):
- alloc_current_hit=550095 (99.99%), alloc_partial_hit=5 (0.001%)
- alloc_refill=5045 (0.9%), fallback=0
- free_retire=349 (0.09%), fallback=0, page_of_fail=0 (perfect)
- Throughput: 42.4M ops/s (baseline 62.9M with ULTRA)
- Mixed 16â1024B (1M iter, ws=400, ULTRA ç¡å¹å):
- alloc_current_hit=275089 (100%), alloc_partial_hit=0
- alloc_refill=2340 (0.85%), fallback=0
- free_retire=142 (0.07%), fallback=0, page_of_fail=0 (perfect)
- Throughput: 35.9M ops/s (baseline 43.4M with ULTRA)
â Task 4: èšæž¬åæã𿬡ãã§ãŒãºåè£
- Alloc ãã¹è©äŸ¡ïŒalloc_current_hit â100% ã§æé©åæžã¿ â page locality å®ç§
- Free ãã¹è©äŸ¡ïŒfree_retire â0.1% ã§æé©åæžã¿ â page churn äœã
- Page lookupïŒpage_of_fail = 0 ã§ robust â corner case ãªã
- çµè«: v3 backend ã®ããžãã¯éšåïŒããŒãžéžæãretireïŒã¯æ¢ã«æé©åæžã¿
- ããã«ããã¯ç¹å®: so_alloc/so_free ã®ãå éšã³ã¹ããïŒheader write, memcpy, åå²ïŒã 5% overhead ã®äž»å
Phase SO-BACKEND-OPT-2 åè£ïŒæ¬¡ãã§ãŒãºïŒ
èšæž¬çµæã«åºã¥ãå®è£ æ¡ïŒåªå 床é ïŒïŒ
| åè£ | å 容 | æåŸ 广 | é£æåºŠ |
|---|---|---|---|
| Header write åæž | carve æäžæ¬åæåïŒlight modeïŒ | 1-2% | äœ |
| Freelist carve æé©å | pre-carved freelist ã Cold IF ããè¿åŽ | <1% | äž |
| åå²åæž | hot path çŽç·åãunlikely() äœ¿çš | 0.5-1% | äž |
| Memcpy åæž | inline asm ã atomic ã§ 8byte store æé©å | 0.5-1% | é« |
æšå¥š: Phase SO-BACKEND-OPT-2 宿œåã« perf profile (cycles:u) ã§ so_alloc_fast/so_free_fast ãè©³çŽ°èšæž¬ïŒæ¢å CPU ããããã¹åæã«å«ããã®ãæãŸããïŒ
ãã«ãã»ãã¹ãçµæ
- â
Release ãã«ãæå (warning: unused variable
front_snap㯠pre-existing) - â Mixed 16-1024B ãã¹ãæåïŒSEGV/assert ãªãïŒ
- â C7-only ãã¹ãæå
- â Stats åºååäœç¢ºèªæžã¿
Phase FREE-DISPATCHER-OPT-1: free dispatcher çµ±èšèšæž¬ (2025-12-11)
ç®ç: free dispatcherïŒ29%ïŒã®å èš³ã现åå
- domain å€å®ïŒtiny/mid/largeïŒã®æ¯ç
- route å€å®ïŒULTRA/legacy/pool/v6ïŒã®æ¯ç
- ENV check / route_for_class åŒã³åºãåæ°
æ¹é: çµ±èšã«ãŠã³ã¿ã远å ããæåã¯å€ããªããæ¬¡ãã§ãŒãºïŒOPT-2ïŒã§æé©åå®è£ ã倿ã
å®è£ å 容:
- FreeDispatchStats æ§é äœè¿œå ïŒENV gated, default OFFïŒ
- hak_free_at / fg_classify_domain / tiny_free_gate ã«ã«ãŠã³ã¿åã蟌ã¿
- æå倿ŽãªãïŒèšæž¬ã®ã¿ïŒ
ENV: HAKMEM_FREE_DISPATCH_STATS=1 ã§æå¹åïŒããã©ã«ã 0ïŒ
èšæž¬çµæ:
- Mixed: total=8,081, route_calls=267,967, env_checks=9
- BENCH_FAST_FRONT ã«ãã倧åã¯æ©æãªã¿ãŒã³
- route_for_class ã¯äž»ã« alloc åŽã§åŒã°ãã
- ENV check ã¯åæåæã® 9åã®ã¿
- C6-heavy: total=500,099, route_calls=1,034, env_checks=9
- fg_classify_domain ã«å°éãã free ãå€ã
- route_for_class åŒã³åºãã¯æ¥µå°ïŒsnapshot 广ïŒ
çµè«:
- ENV check ã¯æ¢ã«ååæé©åãããŠããïŒåæåæã®ã¿ïŒ
- route_for_class 㯠alloc åŽã§ã®åŒã³åºããäž»ã§ãfree åŽã¯ snapshot ã§ O(1)
- 次ãã§ãŒãºïŒOPT-2ïŒã§ã¯å¥ã®ã¢ãããŒããæ€èšïŒdomain å€å®ã®æ©æåãªã©ïŒ
çºèŠ: FREE_DISPATCH_STATS ãã ENV/route ã¯åæåæã«ããåŒã°ããŠããªããroute_calls=267,967 ã¯ã»ãŒ alloc åŽããã
Phase ALLOC-GATE-OPT-1: tiny_alloc_gate_fast çµ±èšèšæž¬ (2025-12-11)
ç®ç: alloc gateïŒ18%ïŒã®å èš³ã现åå
- sizeâclass 倿ã®åæ°
- route_for_class åŒã³åºãåæ°
- alloc-side ENV check åæ°
- ã¯ã©ã¹å¥ååžïŒC0ãC7ïŒ
æ¹é: çµ±èšã«ãŠã³ã¿ã远å ããæåã¯å€ããªããæ¬¡ãã§ãŒãºïŒOPT-1BïŒã§æé©åå®è£ ã倿ã
å®è£ å 容:
- AllocGateStats æ§é äœè¿œå ïŒsize2class/route/env/classååžïŒ
- malloc_tiny_fast å ã«ã«ãŠã³ã¿åã蟌ã¿
- ENV: HAKMEM_ALLOC_GATE_STATS (default 0)
- æå倿ŽãªãïŒèšæž¬ã®ã¿ïŒ
èšæž¬çµæ:
- Mixed: total=542,033, size2class=0, route_calls=0, env_checks=275,089, C4-C7=95.2%
- â size_to_class / route_for_class 㯠å®å šåæžæžã¿ïŒLUT 广ïŒ
- â C4-C7 ã 95% â ULTRA fast path ãæå¹
- env_checks â c7_calls â C7 ULTRA ã® ENV gate ãæ¯ååŒã°ããïŒæ§é çã³ã¹ãïŒ
- C6-heavy: total=11 â malloc_tiny_fast ã¯ã»ãŒéããªãïŒmid/pool äž»äœïŒ
çµè«:
- â alloc gate 㯠æ¢ã«ååæé©åæžã¿ïŒLUT + ULTRA ã§åæžæžã¿ïŒ
- â ãããªãæé©åäœå°ã¯å°ããïŒenv_checks ã¯è»œéåæžã¿ãæ°%以äžã®å¹æïŒ
- 次ãã§ãŒãºã§ã¯ free dispatcher (29%) ã C7 ULTRA refill (7%) ãªã©ãä»ã®ããã«ããã¯ãçã
詳现: docs/analysis/ALLOC_GATE_ANALYSIS.md åç
§
Phase PERF-ULTRA-REBASE-4: åèšæž¬ãšç¢ºèª (2025-12-11)
ç®ç: dispatcher ãš alloc gate ãæ¢ã«æé©åãããŠããããšã確èªããåŸãå®éã«æ°ãã perf profile ãååŸ
èšæž¬æ¡ä»¶:
- ENV: å šãŠ OFFïŒããã©ã«ããstats ç¡ãã§ baselineïŒ
- ã¯ãŒã¯ããŒã: Mixed 16-1024B, 10M iter, ws=8192
- perf record: cycles:u, F 5000, dwarf call-graph
ããããã¹åæ (self%, 1K samples)
| é äœ | 颿°/ãã¹ | self% | å€å |
|---|---|---|---|
| #1 | free | 25.48% | â0.74% vs REBASE-3 |
| #2 | malloc | 21.13% | â0% (åç) |
| #3 | tiny_c7_ultra_alloc | 7.66% | ±0% (åç) |
| #4 | tiny_c7_ultra_free | 3.50% | â0.6% (æé©å广) |
| #5 | so_free | 2.47% | (æ°èŠvisible) |
| #6 | so_alloc_fast | 2.39% | (æ°èŠvisible) |
| #7 | tiny_c7_ultra_page_of | 1.78% | NEW: refill path |
| #8 | so_alloc | 1.21% | (æ°èŠvisible) |
| #9 | classify_ptr | 1.15% | (æ°èŠvisible) |
çµ±èšæ å ±ïŒMixed 1M iter, ws=400ïŒ
Alloc Gate Stats:
total=542,019 calls
size2class=0 calls â
(å®å
šåæž)
route_calls=0 calls â
(å®å
šåæž)
env_checks=275,089 (æ§é çã³ã¹ã)
classååž: C7=50.8%, C6=25.3%, C5=12.7%, C4=6.4%, C2-C3=4.8%
Free Dispatcher Stats:
total=8,081 calls
tiny=0, mid=8,081, large=0 (å
šãŠ mid ãã¹)
ultra=0 (ULTRA ã fre dispatcher ã bypass ããŠãã)
tiny_legacy=7, pool=0, v6=0
route_calls=267,954 (倧éšå㯠alloc åŽããåŒã°ããŠãã)
env_checks=9 (åæåæã®ã¿)
åæ
確èªäºé :
-
Dispatcher (25.48%) ã¯æ¢ã«æé©åæžã¿
- route_for_class 㯠9 åã®ã¿ïŒåæåæïŒ
- 25% ã¯ãã¡ã³ã¯ã·ã§ã³åŒã³åºãã®ã³ã¹ãïŒarchitecture levelïŒ
-
Alloc Gate (21.13%) ã¯æ¢ã«æé©åæžã¿
- size_to_class = 0 calls (LUT)
- route_for_class = 0 calls (ULTRA enabled)
- env_checks = 275K ã¯C7 ULTRA ã® enable check ïŒunavoidableïŒ
-
æ°ããããã«ããã¯:
- C7 ULTRA refill (tiny_c7_ultra_page_of) ã 1.78% ã§æ°èŠã«visible
- so_alloc/so_free ãåèš ~5%
- classify_ptr ã 1.15%
ã¹ã«ãŒããã
- Mixed 16-1024B: 39.5M ops/s (iters=1M, ws=400)
- æ¯èŒ: REBASE-3 ã® 30.6M ops/sïŒiters=10M, ws=8192ïŒãšã¯å¥ã¯ãŒã¯ããŒã
次ãã§ãŒãºåè£
Option A: C7 ULTRA refill æé©å
- tiny_c7_ultra_page_of ã 1.78%
- Segment learning / page lookup ã® refill ãã¹ãæé©å
- æåŸ : refill ãã¹åæžã§å šäœ 1-2%
Option B: Architectural Level ã®æé©å
- free dispatcher (25%) + malloc dispatcher (21%) = 46%
- çŸç¶ã¯ C API (malloc/free) ã®åŒã³åºãã³ã¹ã
- äŸ: ããããã¹å šäœã inlined dispatcher ã§åèšèš
- ãªã¹ã¯: å€§èŠæš¡ãªèšèšå€æŽ
Option C: so_alloc/so_free ç³» (~5%) ã®åæž
- v3 backend ã®æé©å
- classify_ptr (1.15%) ã®åæž
- æåŸ : 1-2M ops/s
æšå¥š: Option AïŒC7 ULTRA refillïŒããçæãdispatcher/gate ã® 46% 㯠architecture çãªå¿ èŠã³ã¹ãã§ãé£æåºŠ vs 广ã®èгç¹ããçŸç¶ã¯åãå ¥ããã¹ãã
çµè«
- dispatcher + gate: èš 46% â æ¢ã«æé©åæžã¿ïŒENV/route snapshot åå®äºïŒ
- C7 ULTRA å éš: alloc 7.66% + free 3.50% + refill 1.78% = 12.94%
- 次ã®ã¿ãŒã²ãã: C7 ULTRA refill ãã¹ïŒ1.78%ïŒããã®åæžéå§
Phase PERF-ULTRA-REFILL-OPT-1a/1b å®è£ å®äº (2025-12-11)
ç®ç
C7 ULTRA refill ãã¹ïŒtiny_c7_ultra_page_of ã® 1.78%ïŒãæé©åããå šäœã®ã¹ã«ãŒãããåäžãå®çŸ
å®è£ å 容
Phase 1a: Page Size Macroå
// tiny_c7_ultra_segment.c ã«è¿œå
#define TINY_C7_ULTRA_PAGE_SHIFT 16 // 64KiB = 2^16
// ä¿®æ£: tiny_c7_ultra_page_of ã§ division ã shift ã«
uint32_t idx = (uint32_t)(offset >> TINY_C7_ULTRA_PAGE_SHIFT);
// ä¿®æ£: refill/free ã§ multiplication ã shift ã«
tls->seg_end = tls->seg_base + ((size_t)seg->num_pages << TINY_C7_ULTRA_PAGE_SHIFT);
uint8_t* base = (uint8_t*)seg->base + ((size_t)chosen << TINY_C7_ULTRA_PAGE_SHIFT);
Phase 1b: Segment Learning ç§»å
// åŸæ¥: freeååã§ segment_from_ptr() ãåŒã³åºããŠåŠç¿
if (unlikely(tls->seg_base == 0)) {
seg = tiny_c7_ultra_segment_from_ptr(ptr); // <- deleted
...
}
// æé©ååŸ: segment learning 㯠alloc refillæã«ç§»å
// free ã§ã¯ seg_base/seg_end ãæ¢ã«åãŸã£ãŠããåæ
// (normal pattern: alloc â free ãªã®ã§å®å
š)
ãã³ãããŒã¯çµæ
Mixed 16-1024B (1M iter, ws=400):
| ãã§ãŒãº | Throughput | æ¹å |
|---|---|---|
| Baseline | 39.5M ops/s | baseline |
| Phase 1a | 39.5M ops/s | ±0% (誀差) |
| Phase 1b | 42.3M ops/s | +7.1% |
| 3åå¹³å | 43.9M ops/s | +11.1% |
宿ž¬:
- Run 1: 42.9M ops/s
- Run 2: 45.0M ops/s
- Run 3: 43.7M ops/s
æé©åã®è©³çް
1. Division â Bit Shift ã®å¹æ
- tiny_c7_ultra_page_of ã§ã®
offset / seg->page_sizeãoffset >> 16ã«å€æŽ - refill/free ã§ã®
num_pages * page_sizeã bit shift ã«å€æŽ - å division ïœ2-3 cycles åæž Ã è€æ°åŒã³åºã = 环ç©å¹æ
2. Segment Learning åé€ã®å¹æ
- free ååã§ã® tiny_c7_ultra_segment_from_ptr() call ãåé€
- segment learning 㯠alloc refillæã«æ¢ã«å®æœæžã¿
- éåžžãã¿ãŒã³ïŒalloc â freeïŒã§ã¯å šã圱é¿ãªã
- per-thread 1 åã® segment_from_ptr() call + 1 åã® pointer comparison åæž
åç®å¹æ
- Phase 1a: æ°% åæžïŒèŠãã«ããã环ç©ïŒ
- Phase 1b: visible ãªåæžïŒunlikely cold path å®å šåé€ïŒ
- Total: +11.1% = dispatch/gate åªå (46%) ã®æ¬¡ã«å€§ããæ¹å
次ãã§ãŒãº
çŸåšã®æåïŒ
- C7 ULTRA å éšåªåã§ +11% éæ
- dispatcher/gate (46%) ã¯æ¢ã«æé©åæžã¿
- æ°èŠããã«ããã¯: so_alloc/so_free (åèš ~5%)
åè£:
- Option A: so_alloc/so_free æé©å â v3 backend
- Option B: classify_ptr (1.15%) åæž
- Option C: æ°èŠãµã€ãºã¯ã©ã¹ (C3/C2 ULTRA) â TLS L1 æ±æãªã¹ã¯
æšå¥š: Option AïŒv3 backend æé©åïŒãæ€èš
Phase v7-2: SmallObject v7 C6-only Implementation â
宿
- SmallSegment_v7: 2MiB segment with TLS slot, free page stack
- ColdIface_v7: Page refill/retire with stat publishing (future Learner)
- HotBox_v7: Full C6-only alloc/free with proper header format (HEADER_MAGIC | class_idx)
- RegionIdBox integration: v7 segment registration for ptr->region lookup
- Free path fix: Early-exit v7 check BEFORE ss_fast_lookup (separate mmap segment)
ãã³ãããŒã¯çµæ (C6, 400-510B, 500K iter)
| Mode | Throughput | Cost |
|---|---|---|
| v7 OFF (legacy) | 58.6M ops/s | baseline |
| v7 ON (C6-only) | 54.5M ops/s | -7% overhead |
v7 stats: alloc=275104 free=275104 refill=1360 retire=1360 (perfect balance)
åæ
-7% ã®ãªãŒããŒããã㯠RegionIdBox binary search + segment validation ãäž»å
- v7 ã¯ç ç©¶ç®±ãšã㊠OFF ã®ãŸãŸïŒãã³ãããŒã¯ãããã¡ã€ã«ã§ã¯äœ¿çšããªãïŒ
- Phase v7-3: TLS fast path cache ã§ RegionIdBox ãªãŒããŒãããåæžäºå®
次: Phase v7-3: C6 TLS Fast Path + Page Metadata Cache
ç®æš: RegionIdBox overhead ãåæžã㊠v7 ON ã§ã®æ§èœæ¹å
æ¹é:
- SmallHeapCtx_v7 ã« TLS segment base/end/ptr ã远å â "ã»ãšãã©ã®" free ã TLS ç¯å²å
- same-page page_meta TLS cache â 1-2% æ¹åæåŸ
- RegionIdBox 㯠TLS ç¯å²å€ã®ã¿ã«å¶é â POOL/LEGACY/ULTRA åé¡å°çš
- C6-only ç¶æ (C5/C4 ã¯åŸã®æ€èš)
Phase v7-3: SmallObject v7 TLS Fast Path Optimization â
宿
å®è£ ç®æ:
/mnt/workdisk/public_share/hakmem/core/box/smallobject_cold_iface_v7_box.h: SmallHeapCtx_v7 æ§é 倿Ž/mnt/workdisk/public_share/hakmem/core/smallobject_cold_iface_v7.c: TLS hint åæå/mnt/workdisk/public_share/hakmem/core/box/smallobject_hotbox_v7_box.h: free fast path TLS æé©å
倿Žç¹:
-
SmallHeapCtx_v7 æ¡åŒµ:
typedef struct SmallHeapCtx_v7 { SmallClassHeap_v7 cls[HAK_SMALL_NUM_CLASSES_V7]; SmallSegment_v7* segment; // Phase v7-3: TLS segment fast hint uintptr_t tls_seg_base; uintptr_t tls_seg_end; // Phase v7-3: same-page cache (removed - not effective) // uintptr_t last_page_base/end/meta; } SmallHeapCtx_v7; -
TLS segment hint èšå® (
cold_v7_ensure_segment()):ctx->segment = seg; ctx->tls_seg_base = seg->base; ctx->tls_seg_end = seg->base + SMALL_SEGMENT_V7_SIZE; -
free fast path æé©å (
small_heap_free_fast_v7()):// Path 1: TLS segment hit (most common) if (addr >= ctx->tls_seg_base && addr < ctx->tls_seg_end) { // Direct page_idx calculation (skip RegionIdBox) size_t page_idx = (addr - ctx->tls_seg_base) >> SMALL_PAGE_V7_SHIFT; // ... fast path ... } // Path 2: RegionIdBox fallback (only for non-TLS pointers) regionid_fallback: RegionLookupV6 lk = region_id_lookup_v6(ptr); // ... cold path ...
ãã³ãããŒã¯çµæ (C6, 400-510B, 500K iter)
v7 OFF baseline:
- Run 1: 58.5M ops/s
- Run 2: 58.0M ops/s
- Run 3: 60.0M ops/s
- Average: 58.8M ops/s
v7 ON (Phase v7-3 optimized):
- Run 1: 57.5M ops/s
- Run 2: 57.2M ops/s
- Run 3: 54.3M ops/s
- Average: 56.3M ops/s
çµæ: -4.3% overhead (vs -7% in Phase v7-2)
åæ
æ¹å广:
- Phase v7-2: -7.0% overhead (54.5M ops/s vs 58.6M baseline)
- Phase v7-3: -4.3% overhead (56.3M ops/s vs 58.8M baseline)
- Overhead åæžç: 38% (7.0% â 4.3%)
æè¡è©³çް:
-
TLS segment bounds check:
- Most allocations come from TLS segment â high hit rate
- Simple range check (2 comparisons) vs RegionIdBox binary search (O(log N))
- Page index calculation: bit shift (fast) vs segment traversal
-
Same-page cache åé€:
- Initial implementation included last_page_meta cache
- Profiling showed negligible benefit (< 1%)
- Removed to reduce branch complexity and TLS cache pressure
-
Remaining overhead:
- 4.3% overhead primarily from:
- Extra validation (capacity, class_idx checks)
- Page metadata access (vs direct SuperSlab metadata)
- RegionIdBox fallback on TLS miss (rare but exists)
- 4.3% overhead primarily from:
æ¬¡ã®æ¹é
v7 䜿çšå€æ:
- -4.3% overhead ã¯èš±å®¹ç¯å²å ïŒç ç©¶ç®±ãšããŠã¯æåïŒ
- Production profile ã§ã¯åŒãç¶ã OFF (legacy SuperSlab 䜿çš)
- Future: C5/C4 class 远å ã§ coverage æ¡å€§ â overhead èãŸã
Phase v7-4 åè£:
- C5 (256B) / C4 (128B) å¯Ÿå¿ â coverage æ¡å€§ã§çžå¯Ÿ overhead åæž
- Page metadata layout æé©å â cache line alignment
- Remote free å¯Ÿå¿ â multi-thread workload æºå
Phase v7-4: Policy Box å°å ¥ (ããã³ãè¯ã®äœãçŽã) â
宿
å®è£ ç®æ:
/mnt/workdisk/public_share/hakmem/core/box/smallobject_policy_v7_box.h: Policy Box ããã㌠(æ°èŠäœæ)/mnt/workdisk/public_share/hakmem/core/smallobject_policy_v7.c: Policy Box å®è£ (æ°èŠäœæ)/mnt/workdisk/public_share/hakmem/core/front/malloc_tiny_fast.h: v7 routing ã Policy Box çµç±ã«å€æŽ/mnt/workdisk/public_share/hakmem/Makefile:core/smallobject_policy_v7.oã OBJS ãªã¹ãã«è¿œå
èšèšæå³: ããã³ãããsizeâclassâroute_kindâswitchãã®1å±€ã ãã«ããŠãã«ãŒã決å®ã Policy Box ã«éçŽãBox Theory ã® L3 ã« SmallPolicyV7 ãé 眮ããULTRA/v7/MID_v3/LEGACY ã®éžæãäžå åã
Policy Box å®è£
1. Route Kind Enum (L0/L1/L1' layer selection):
typedef enum {
SMALL_ROUTE_ULTRA, // L0: C4-C7 ULTRA (FROZEN)
SMALL_ROUTE_V7, // L1: SmallObject v7 (research box)
SMALL_ROUTE_MID_V3, // L1': MID v3 (257-768B mid/small)
SMALL_ROUTE_LEGACY, // L1': TinyHeap v1 / Pool v1 (fallback)
} SmallRouteKind;
2. Policy Snapshot Structure:
typedef struct SmallPolicyV7 {
SmallRouteKind route_kind[8]; // C0-C7 routing decision
} SmallPolicyV7;
3. Policy API:
/// Get policy snapshot (read-only, TLS cached)
const SmallPolicyV7* small_policy_v7_snapshot(void);
/// Initialize policy from ENV variables (called once at startup)
/// Priority: ULTRA > v7 > MID_v3 > LEGACY
void small_policy_v7_init_from_env(SmallPolicyV7* policy);
/// Get route kind name for debugging
const char* small_route_kind_name(SmallRouteKind kind);
ENV åªå é äœ (åºå®)
Priority 1: ULTRA (highest)
HAKMEM_TINY_C7_ULTRA_ENABLED(default ON) â C7 ULTRAHAKMEM_TINY_C6_ULTRA_FREE_ENABLEDâ C6 ULTRA (free-only, å°æ¥æ¡åŒµçš)- Future:
HAKMEM_TINY_C4_ULTRA_ENABLED/C5_ULTRA_ENABLED
Priority 2: SmallObject v7 (research box)
HAKMEM_SMALL_HEAP_V7_ENABLEDâ v7 æå¹åHAKMEM_SMALL_HEAP_V7_CLASSES(default 0x40 = C6) â v7 察象ã¯ã©ã¹
Priority 3: MID_v3 (mid/small range)
HAKMEM_MID_V3_ENABLEDâ MID_v3 æå¹åHAKMEM_MID_V3_CLASSES(default 0x60 = C5-C6) â MID_v3 察象ã¯ã©ã¹
Priority 4: LEGACY (fallback)
- Default for all classes not covered by above
ããã³ã段éç§»è¡
alloc path (malloc_tiny_fast.h, line 227-235):
// Phase v7-4: Check Policy Box for v7 routing (before switch)
const SmallPolicyV7* policy = small_policy_v7_snapshot();
if (policy->route_kind[class_idx] == SMALL_ROUTE_V7) {
void* v7p = small_heap_alloc_fast_v7_stub(size, (uint8_t)class_idx);
if (TINY_HOT_LIKELY(v7p != NULL)) {
return v7p;
}
// v7 stub returned NULL -> fallback to legacy
}
free path (malloc_tiny_fast.h, line 408-416):
// Phase v7-4: Check Policy Box for v7 routing (before route lookup)
const SmallPolicyV7* policy = small_policy_v7_snapshot();
if (class_idx == 6 && policy->route_kind[class_idx] == SMALL_ROUTE_V7) {
if (small_heap_free_fast_v7_stub(ptr, (uint8_t)class_idx)) {
FREE_PATH_STAT_INC(smallheap_v7_fast);
return 1;
}
// v7 returned false (ptr not in v7 segment) -> fallback to legacy below
}
Box Theory å±€æ§é
L0: ULTRA (frozen, C4-C7)
- C7 ULTRA: Phase ULTRA-1~6 (production)
- C6/C5/C4 ULTRA: Phase ULTRA-7~9 (future)
L1: SmallObject v7 (research box)
- C6-only (Phase v7-1~4)
- Future: C5/C4 expansion
L1': MID_v3 / LEGACY (fallback)
- MID_v3: 257-768B (C5-C6 range)
- LEGACY: TinyHeap v1 / Pool v1
L2: Segment / RegionId
- SmallSegment_v7 (64MB mmap region)
- RegionIdBox v6 (ptr â segment lookup)
L3: Policy / Stats / Learner
- SmallPolicyV7 (this phase): Route decision
- Stats: FreePathStatsBox / AllocGateStatsBox
- Learner: (future) dynamic route selection
段éç§»è¡æŠç¥
Phase v7-4 çŸç¶:
- v7 é¢é£ã®ã¿ Policy box çµç±ã«å€æŽ
- ULTRA/MID_v3/LEGACY ã¯æ¢åã®
tiny_route_env_box.hã䜵çšïŒåŸã§çµ±åäºå®ïŒ
å°æ¥ã®çµ±äž:
tiny_route_env_box.hã® ULTRA/MID_v3/LEGACY ã«ãŒãå€å®ã Policy box ã«çµ±å- ã¯ã©ã¹ããšã®æè»ãªåªå é äœèšå®
- Learner 飿ºã«ããåçã«ãŒãéžæ (ENV override)
Debug Output
åå TLS åæåæã« stderr ã«åºå:
[POLICY_V7_INIT] Route assignments:
C0: LEGACY
C1: LEGACY
C2: LEGACY
C3: LEGACY
C4: LEGACY
C5: LEGACY
C6: V7 (if HAKMEM_SMALL_HEAP_V7_ENABLED=1)
C7: ULTRA (if HAKMEM_TINY_C7_ULTRA_ENABLED=1, default ON)
ãã«ã確èª
ã³ã³ãã€ã«: æå
gcc -O3 ... -c -o core/smallobject_policy_v7.o core/smallobject_policy_v7.c
gcc -o bench_tiny_hot_hakmem ... core/smallobject_policy_v7.o ... -lm -lpthread -flto
ãªã³ã¯: æå (all object lists updated)
OBJS_BASEBENCH_HAKMEM_OBJS_BASETINY_BENCH_OBJS_BASE
æ¬¡ã®æ¡åŒµ
Phase v7-5 åè£:
- ULTRA/MID_v3/LEGACY çµ±å:
tiny_route_env_box.hâ Policy box ã«ç§»è¡ - Learner 飿º: ENV defaults + runtime learning override
- ã¯ã©ã¹ããšã®æè»ãªåªå é äœ: ENV ã§ ULTRA vs v7 ã®é åºãé転å¯èœã«
- Multi-class v7: C5/C4 远å â coverage æ¡å€§
========================================================================
SECTION: HAKMEM v2 äžä»£ å®äºå®£èšïŒç¬¬1ç« ïŒ
========================================================================
Phase v7-4 å®äºæç¹ã§ã®ç·æ¬
Policy Box å°å ¥ã«ãããL0-L3 ã®å±€æ§é ã確ç«ã ãè¯ã®èšèšç·Žç¿ãã¯éæããv2 äžä»£ã¯äžæŠå®æã
ææãã€ã©ã€ã
| ãã§ãŒãº | ç®æš | éæåºŠ |
|---|---|---|
| ULTRA (C7) | +10% ç®æš | â +11% éæ |
| MID v3 | 257-768B æ¬ç·å | â å®äº |
| v7 (C6-only) | ç ç©¶ç®±æ§ç¯ | â å®äºã-4.3% overhead |
| Policy Box | route äžå å | â å®äºãENV éçŽ |
次äžä»£ãžã®èª²é¡äžèЧ
v7 第2ç« ïŒPhase v7-5 åè£ïŒ:
- Multi-class æ¡åŒµïŒC5/C4ïŒâ overhead åæ
- Learner 飿º â åç route éžæ
- HeaderLess çµ±äž â v6/v7 mode çµ±å
éçºå鿡件:
- HakORune / JoinIR ããŒãã©ã€ãºåªå
- v2 äžä»£ããã¥ã¡ã³ãïŒHAKMEM_V2_GENERATION_SUMMARY.mdïŒãåçµç¶æ ã§èªã¿è¿ããããš
åçµæ¹é
- ULTRA: æ¹é çŠæ¢ïŒFROZENïŒ
- MID v3: ãã°ä¿®æ£ã®ã¿
- v7: code freezeïŒresearch boxãšããŠä¿åïŒ
- HAKMEM: ããã§äžæŠå®æ
次ã®éçºã¯ HakORune / JoinIR åªå ã