Files

Moe Charm (CI) 1a8652a91a Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)

Implement C6 ULTRA intrusive LIFO freelist with ENV gating:
- Single-linked LIFO using next pointer at USER+1 offset
- tiny_next_store/tiny_next_load for pointer access (single source of truth)
- Segment learning via ss_fast_lookup (per-class seg_base/seg_end)
- ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF)
- Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS

Files:
- core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO
- core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6)
- core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new)
- core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new)
- core/tiny_debug_ring.h: C6_IFL events
- core/box/free_path_stats_box.h/c: c6_ifl_* counters

A/B Test Results (1M iterations, ws=200, 257-512B):
- ENV_OFF (array): 56.6 Mop/s avg
- ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise)
- Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2025-12-12 16:26:42 +09:00

1.5 KiB

Raw Blame History

mimalloc Gap Summary (Phase v11b-1)

Current Status: 2025-12-12

Throughput Comparison (Mixed 16-1024B, ws=400, 10M iter)

Allocator	Throughput	vs mimalloc
mimalloc	65.5M ops/s	1.00x
hakmem v11b-1	50.7M ops/s	0.77x

Progress Summary

Phase	Throughput	vs mimalloc	Key Change
v11a-4	38.6M	0.59x	baseline
v11a-5	45.4M	0.69x	alloc path: single switch + C7 early-exit
v11b-1	50.7M	0.77x	free path: single switch + C7 early-exit

perf stat Comparison (Mixed 16-1024B, v11a-5 data)

Metric	mimalloc	hakmem	Ratio
cycles	~500M	1.04B	2.1x
instructions	~920M	2.2B	2.4x
cache-misses	~90K	408K	4.5x
branch-misses	~6.3M	14.5M	2.3x

Next Target

フロント alloc/free 両方を最適化完了。次は backend core または cache locality 改善。

Candidates:

cache locality: cache-misses 4.5x が最大差 → TLS page prefetch, hot page reuse
instructions削減: 2.4x → inline 化, マクロ展開
small-object v7 の small帯 (C2-C3) 設計

Key Insight

alloc + free 両パスで switch (jump table) 化が有効
フロント層の最適化だけで v11a-4 → v11b-1 で +31% 改善 (38.6M → 50.7M)
mimalloc との差は主に cache-misses (4.5x) と instructions (2.4x)

Date: 2025-12-12 Phase: v11b-1 complete

1.5 KiB Raw Blame History

mimalloc Gap Summary (Phase v11b-1)

Current Status: 2025-12-12

Throughput Comparison (Mixed 16-1024B, ws=400, 10M iter)

Progress Summary

perf stat Comparison (Mixed 16-1024B, v11a-5 data)

Next Target

Key Insight

1.5 KiB

Raw Blame History