2025-11-16 02:37:24 +09:00
|
|
|
hakmem_smallmid.o: core/hakmem_smallmid.c core/hakmem_smallmid.h \
|
Phase 17-2: Small-Mid Dedicated SuperSlab Backend (実験結果: 70% page fault, 性能改善なし)
Summary:
========
Phase 17-2 implements dedicated SuperSlab backend for Small-Mid allocator (256B-1KB).
Result: No performance improvement (-0.9%), worse than Phase 17-1 (+0.3%).
Root cause: 70% page fault (ChatGPT + perf profiling).
Conclusion: Small-Mid専用層戦略は失敗。Tiny SuperSlab最適化が必要。
Implementation:
===============
1. Dedicated Small-Mid SuperSlab pool (1MB, 16 slabs/SS)
- Separate from Tiny SuperSlab (no competition)
- Batch refill (8-16 blocks per TLS refill)
- Direct 0xb0 header writes (no Tiny delegation)
2. Backend architecture
- SmallMidSuperSlab: 1MB aligned region, fast ptr→SS lookup
- SmallMidSlabMeta: per-slab metadata (capacity/used/carved/freelist)
- SmallMidSSHead: per-class pool with LRU tracking
3. Batch refill implementation
- smallmid_refill_batch(): 8-16 blocks/call (vs 1 in Phase 17-1)
- Freelist priority → bump allocation fallback
- Auto SuperSlab expansion when exhausted
Files Added:
============
- core/hakmem_smallmid_superslab.h: SuperSlab metadata structures
- core/hakmem_smallmid_superslab.c: Backend implementation (~450 lines)
Files Modified:
===============
- core/hakmem_smallmid.c: Removed Tiny delegation, added batch refill
- Makefile: Added hakmem_smallmid_superslab.o to build
- CURRENT_TASK.md: Phase 17 完了記録 + Phase 18 計画
A/B Benchmark Results:
======================
| Size | Phase 17-1 (ON) | Phase 17-2 (ON) | Delta | vs Baseline |
|--------|-----------------|-----------------|----------|-------------|
| 256B | 6.06M ops/s | 5.84M ops/s | -3.6% | -4.1% |
| 512B | 5.91M ops/s | 5.86M ops/s | -0.8% | +1.2% |
| 1024B | 5.54M ops/s | 5.44M ops/s | -1.8% | +0.4% |
| Avg | 5.84M ops/s | 5.71M ops/s | -2.2% | -0.9% |
Performance Analysis (ChatGPT + perf):
======================================
✅ Frontend (TLS/batch refill): OK
- Only 30% CPU time
- Batch refill logic is efficient
- Direct 0xb0 header writes work correctly
❌ Backend (SuperSlab allocation): BOTTLENECK
- 70% CPU time in asm_exc_page_fault
- mmap(1MB) → kernel page allocation → very slow
- New SuperSlab allocation per benchmark run
- No warm SuperSlab reuse (used counter never decrements)
Root Cause:
===========
Small-Mid allocates new SuperSlabs frequently:
alloc → TLS miss → refill → new SuperSlab → mmap(1MB) → page fault (70%)
Tiny reuses warm SuperSlabs:
alloc → TLS miss → refill → existing warm SuperSlab → no page fault
Key Finding: "70% page fault" reveals SuperSlab layer needs optimization,
NOT frontend layer (TLS/batch refill design is correct).
Lessons Learned:
================
1. ❌ Small-Mid専用層戦略は失敗 (Phase 17-1: +0.3%, Phase 17-2: -0.9%)
2. ✅ Frontend実装は成功 (30% CPU, batch refill works)
3. 🔥 70% page fault = SuperSlab allocation bottleneck
4. ✅ Tiny (6.08M ops/s) is already well-optimized, hard to beat
5. ✅ Layer separation doesn't improve performance - backend optimization needed
Next Steps (Phase 18):
======================
ChatGPT recommendation: Optimize Tiny SuperSlab (NOT Small-Mid specific layer)
Box SS-Reuse (Priority 1):
- Implement meta->freelist reuse (currently bump-only)
- Detect slab empty → return to shared_pool
- Reuse same SuperSlab for longer (reduce page faults)
- Target: 70% page fault → 5-10%, 2-4x improvement
Box SS-Prewarm (Priority 2):
- Pre-allocate SuperSlabs per class (Phase 11: +6.4%)
- Concentrate page faults at benchmark start
- Benchmark-only optimization
Small-Mid Implementation Status:
=================================
- ENV=0 by default (zero overhead, branch predictor learns)
- Complete separation from Tiny (no interference)
- Valuable as experimental record ("why dedicated layer failed")
- Can be removed later if needed (not blocking Tiny optimization)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 03:21:13 +09:00
|
|
|
core/hakmem_build_flags.h core/hakmem_smallmid_superslab.h \
|
|
|
|
|
core/tiny_region_id.h core/tiny_box_geometry.h \
|
|
|
|
|
core/hakmem_tiny_superslab_constants.h core/hakmem_tiny_config.h \
|
2025-11-21 23:00:24 +09:00
|
|
|
core/ptr_track.h core/hakmem_super_registry.h \
|
|
|
|
|
core/hakmem_tiny_superslab.h core/superslab/superslab_types.h \
|
|
|
|
|
core/hakmem_tiny_superslab_constants.h core/superslab/superslab_inline.h \
|
2025-11-26 12:33:49 +09:00
|
|
|
core/superslab/superslab_types.h core/superslab/../tiny_box_geometry.h \
|
2025-11-28 01:45:45 +09:00
|
|
|
core/tiny_debug_ring.h core/tiny_remote.h core/hakmem_tiny.h \
|
2025-11-29 06:47:13 +09:00
|
|
|
core/hakmem_trace.h core/hakmem_tiny_mini_mag.h core/tiny_debug_api.h
|
2025-11-16 02:37:24 +09:00
|
|
|
core/hakmem_smallmid.h:
|
|
|
|
|
core/hakmem_build_flags.h:
|
Phase 17-2: Small-Mid Dedicated SuperSlab Backend (実験結果: 70% page fault, 性能改善なし)
Summary:
========
Phase 17-2 implements dedicated SuperSlab backend for Small-Mid allocator (256B-1KB).
Result: No performance improvement (-0.9%), worse than Phase 17-1 (+0.3%).
Root cause: 70% page fault (ChatGPT + perf profiling).
Conclusion: Small-Mid専用層戦略は失敗。Tiny SuperSlab最適化が必要。
Implementation:
===============
1. Dedicated Small-Mid SuperSlab pool (1MB, 16 slabs/SS)
- Separate from Tiny SuperSlab (no competition)
- Batch refill (8-16 blocks per TLS refill)
- Direct 0xb0 header writes (no Tiny delegation)
2. Backend architecture
- SmallMidSuperSlab: 1MB aligned region, fast ptr→SS lookup
- SmallMidSlabMeta: per-slab metadata (capacity/used/carved/freelist)
- SmallMidSSHead: per-class pool with LRU tracking
3. Batch refill implementation
- smallmid_refill_batch(): 8-16 blocks/call (vs 1 in Phase 17-1)
- Freelist priority → bump allocation fallback
- Auto SuperSlab expansion when exhausted
Files Added:
============
- core/hakmem_smallmid_superslab.h: SuperSlab metadata structures
- core/hakmem_smallmid_superslab.c: Backend implementation (~450 lines)
Files Modified:
===============
- core/hakmem_smallmid.c: Removed Tiny delegation, added batch refill
- Makefile: Added hakmem_smallmid_superslab.o to build
- CURRENT_TASK.md: Phase 17 完了記録 + Phase 18 計画
A/B Benchmark Results:
======================
| Size | Phase 17-1 (ON) | Phase 17-2 (ON) | Delta | vs Baseline |
|--------|-----------------|-----------------|----------|-------------|
| 256B | 6.06M ops/s | 5.84M ops/s | -3.6% | -4.1% |
| 512B | 5.91M ops/s | 5.86M ops/s | -0.8% | +1.2% |
| 1024B | 5.54M ops/s | 5.44M ops/s | -1.8% | +0.4% |
| Avg | 5.84M ops/s | 5.71M ops/s | -2.2% | -0.9% |
Performance Analysis (ChatGPT + perf):
======================================
✅ Frontend (TLS/batch refill): OK
- Only 30% CPU time
- Batch refill logic is efficient
- Direct 0xb0 header writes work correctly
❌ Backend (SuperSlab allocation): BOTTLENECK
- 70% CPU time in asm_exc_page_fault
- mmap(1MB) → kernel page allocation → very slow
- New SuperSlab allocation per benchmark run
- No warm SuperSlab reuse (used counter never decrements)
Root Cause:
===========
Small-Mid allocates new SuperSlabs frequently:
alloc → TLS miss → refill → new SuperSlab → mmap(1MB) → page fault (70%)
Tiny reuses warm SuperSlabs:
alloc → TLS miss → refill → existing warm SuperSlab → no page fault
Key Finding: "70% page fault" reveals SuperSlab layer needs optimization,
NOT frontend layer (TLS/batch refill design is correct).
Lessons Learned:
================
1. ❌ Small-Mid専用層戦略は失敗 (Phase 17-1: +0.3%, Phase 17-2: -0.9%)
2. ✅ Frontend実装は成功 (30% CPU, batch refill works)
3. 🔥 70% page fault = SuperSlab allocation bottleneck
4. ✅ Tiny (6.08M ops/s) is already well-optimized, hard to beat
5. ✅ Layer separation doesn't improve performance - backend optimization needed
Next Steps (Phase 18):
======================
ChatGPT recommendation: Optimize Tiny SuperSlab (NOT Small-Mid specific layer)
Box SS-Reuse (Priority 1):
- Implement meta->freelist reuse (currently bump-only)
- Detect slab empty → return to shared_pool
- Reuse same SuperSlab for longer (reduce page faults)
- Target: 70% page fault → 5-10%, 2-4x improvement
Box SS-Prewarm (Priority 2):
- Pre-allocate SuperSlabs per class (Phase 11: +6.4%)
- Concentrate page faults at benchmark start
- Benchmark-only optimization
Small-Mid Implementation Status:
=================================
- ENV=0 by default (zero overhead, branch predictor learns)
- Complete separation from Tiny (no interference)
- Valuable as experimental record ("why dedicated layer failed")
- Can be removed later if needed (not blocking Tiny optimization)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 03:21:13 +09:00
|
|
|
core/hakmem_smallmid_superslab.h:
|
2025-11-16 02:37:24 +09:00
|
|
|
core/tiny_region_id.h:
|
|
|
|
|
core/tiny_box_geometry.h:
|
|
|
|
|
core/hakmem_tiny_superslab_constants.h:
|
|
|
|
|
core/hakmem_tiny_config.h:
|
|
|
|
|
core/ptr_track.h:
|
2025-11-21 23:00:24 +09:00
|
|
|
core/hakmem_super_registry.h:
|
|
|
|
|
core/hakmem_tiny_superslab.h:
|
|
|
|
|
core/superslab/superslab_types.h:
|
|
|
|
|
core/hakmem_tiny_superslab_constants.h:
|
|
|
|
|
core/superslab/superslab_inline.h:
|
|
|
|
|
core/superslab/superslab_types.h:
|
2025-11-26 12:33:49 +09:00
|
|
|
core/superslab/../tiny_box_geometry.h:
|
2025-11-21 23:00:24 +09:00
|
|
|
core/tiny_debug_ring.h:
|
|
|
|
|
core/tiny_remote.h:
|
2025-11-28 01:45:45 +09:00
|
|
|
core/hakmem_tiny.h:
|
|
|
|
|
core/hakmem_trace.h:
|
|
|
|
|
core/hakmem_tiny_mini_mag.h:
|
2025-11-29 06:47:13 +09:00
|
|
|
core/tiny_debug_api.h:
|