acc64f2438
Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement)
...
## Summary
- ChatGPT により bench_profile.h の setenv segfault を修正(RTLD_NEXT 経由に切り替え)
- core/box/pool_zero_mode_box.h 新設:ENV キャッシュ経由で ZERO_MODE を統一管理
- core/hakmem_pool.c で zero mode に応じた memset 制御(FULL/header/off)
- A/B テスト結果:ZERO_MODE=header で +15.34% improvement(1M iterations, C6-heavy)
## Files Modified
- core/box/pool_api.inc.h: pool_zero_mode_box.h include
- core/bench_profile.h: glibc setenv → malloc+putenv(segfault 回避)
- core/hakmem_pool.c: zero mode 参照・制御ロジック
- core/box/pool_zero_mode_box.h (新設): enum/getter
- CURRENT_TASK.md: Phase ML1 結果記載
## Test Results
| Iterations | ZERO_MODE=full | ZERO_MODE=header | Improvement |
|-----------|----------------|-----------------|------------|
| 10K | 3.06 M ops/s | 3.17 M ops/s | +3.65% |
| 1M | 23.71 M ops/s | 27.34 M ops/s | **+15.34%** |
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-10 09:08:18 +09:00
5685c2f4c9
Implement Warm Pool Secondary Prefill Optimization (Phase B-2c Complete)
...
Problem: Warm pool had 0% hit rate (only 1 hit per 3976 misses) despite being
implemented, causing all cache misses to go through expensive superslab_refill
registry scans.
Root Cause Analysis:
- Warm pool was initialized once and pushed a single slab after each refill
- When that slab was exhausted, it was discarded (not pushed back)
- Next refill would push another single slab, which was immediately exhausted
- Pool would oscillate between 0 and 1 items, yielding 0% hit rate
Solution: Secondary Prefill on Cache Miss
When warm pool becomes empty, we now do multiple superslab_refills and prefill
the pool with 3 additional HOT superlslabs before attempting to carve. This
builds a working set of slabs that can sustain allocation pressure.
Implementation Details:
- Modified unified_cache_refill() cold path to detect empty pool
- Added prefill loop: when pool count == 0, load 3 extra superlslabs
- Store extra slabs in warm pool, keep 1 in TLS for immediate carving
- Track prefill events in g_warm_pool_stats[].prefilled counter
Results (1M Random Mixed 256B allocations):
- Before: C7 hits=1, misses=3976, hit_rate=0.0%
- After: C7 hits=3929, misses=3143, hit_rate=55.6%
- Throughput: 4.055M ops/s (maintained vs 4.07M baseline)
- Stability: Consistent 55.6% hit rate at 5M allocations (4.102M ops/s)
Performance Impact:
- No regression: throughput remained stable at ~4.1M ops/s
- Registry scan avoided in 55.6% of cache misses (significant savings)
- Warm pool now functioning as intended with strong locality
Configuration:
- TINY_WARM_POOL_MAX_PER_CLASS increased from 4 to 16 to support prefill
- Prefill budget hardcoded to 3 (tunable via env var if needed later)
- All statistics always compiled, ENV-gated printing via HAKMEM_WARM_POOL_STATS=1
Next Steps:
- Monitor for further optimization opportunities (prefill budget tuning)
- Consider adaptive prefill budget based on class-specific hit rates
- Validate at larger allocation counts (10M+ pending registry size fix)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-12-04 23:31:54 +09:00
d5e6ed535c
P-Tier + Tiny Route Policy: Aggressive Superslab Management + Safe Routing
...
## Phase 1: Utilization-Aware Superslab Tiering (案B実装済)
- Add ss_tier_box.h: Classify SuperSlabs into HOT/DRAINING/FREE based on utilization
- HOT (>25%): Accept new allocations
- DRAINING (≤25%): Drain only, no new allocs
- FREE (0%): Ready for eager munmap
- Enhanced shared_pool_release_slab():
- Check tier transition after each slab release
- If tier→FREE: Force remaining slots to EMPTY and call superslab_free() immediately
- Bypasses LRU cache to prevent registry bloat from accumulating DRAINING SuperSlabs
- Test results (bench_random_mixed_hakmem):
- 1M iterations: ✅ ~1.03M ops/s (previously passed)
- 10M iterations: ✅ ~1.15M ops/s (previously: registry full error)
- 50M iterations: ✅ ~1.08M ops/s (stress test)
## Phase 2: Tiny Front Routing Policy (新規Box)
- Add tiny_route_box.h/c: Single 8-byte table for class→routing decisions
- ROUTE_TINY_ONLY: Tiny front exclusive (no fallback)
- ROUTE_TINY_FIRST: Try Tiny, fallback to Pool if fails
- ROUTE_POOL_ONLY: Skip Tiny entirely
- Profiles via HAKMEM_TINY_PROFILE ENV:
- "hot": C0-C3=TINY_ONLY, C4-C6=TINY_FIRST, C7=POOL_ONLY
- "conservative" (default): All TINY_FIRST
- "off": All POOL_ONLY (disable Tiny)
- "full": All TINY_ONLY (microbench mode)
- A/B test results (ws=256, 100k ops random_mixed):
- Default (conservative): ~2.90M ops/s
- hot: ~2.65M ops/s (more conservative)
- off: ~2.86M ops/s
- full: ~2.98M ops/s (slightly best)
## Design Rationale
### Registry Pressure Fix (案B)
- Problem: DRAINING tier SS occupied registry indefinitely
- Solution: When total_active_blocks→0, immediately free to clear registry slot
- Result: No more "registry full" errors under stress
### Routing Policy Box (新)
- Problem: Tiny front optimization scattered across ENV/branches
- Solution: Centralize routing in single table, select profiles via ENV
- Benefit: Safe A/B testing without touching hot path code
- Future: Integrate with RSS budget/learning layers for dynamic profile switching
## Next Steps (性能最適化)
- Profile Tiny front internals (TLS SLL, FastCache, Superslab backend latency)
- Identify bottleneck between current ~2.9M ops/s and mimalloc ~100M ops/s
- Consider:
- Reduce shared pool lock contention
- Optimize unified cache hit rate
- Streamline Superslab carving logic
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-12-04 18:01:25 +09:00
3a2e466af1
Add lightweight Fail-Fast layer to Gatekeeper Boxes
...
Core Changes:
- Modified: core/box/tiny_free_gate_box.h
* Added address range check in tiny_free_gate_try_fast() (line 142)
* Catches obviously invalid pointers (addr < 4096)
* Rejects fast path for garbage pointers, delegates to slow path
* Logs [TINY_FREE_GATE_RANGE_INVALID] (debug-only, max 8 messages)
* Cost: ~1 cycle (comparison + unlikely branch)
* Behavior: Fails safe by delegating to hak_tiny_free() slow path
- Modified: core/box/tiny_alloc_gate_box.h
* Added range check for malloc_tiny_fast() return value (line 143)
* Debug-only: Checks if returned user_ptr has addr < 4096
* On failure: Logs [TINY_ALLOC_GATE_RANGE_INVALID] and calls abort()
* Release build: Entire check compiled out (zero overhead)
* Rationale: Invalid allocator return is catastrophic - fail immediately
Design Rationale:
- Early detection of memory corruption/undefined behavior
- Conservative threshold (4096) captures NULL and kernel space
- Free path: Graceful degradation (delegate to slow path)
- Alloc path: Hard fail (allocator corruption is non-recoverable)
- Zero performance impact in production (Release) builds
- Debug-only diagnostic output prevents log spam
Fail-Fast Strategy:
- Layer 3a: Address range sanity check (always enabled)
* Rejects addr < 4096 (NULL, low memory garbage)
* Free: delegates to slow path (safe fallback)
* Alloc: aborts (corruption indicator)
- Layer 3b: Detailed Bridge/Header validation (ENV-controlled)
* Traditional HAKMEM_TINY_FREE_GATE_DIAG / HAKMEM_TINY_ALLOC_GATE_DIAG
* For advanced debugging and observability
Testing:
- Compilation: RELEASE=0 and RELEASE=1 both successful
- Smoke tests: 3/3 passed (simple_alloc, loop 10M, pool_tls)
- Performance: No regressions detected
- Address threshold (4096): Conservative, minimizes false positives
- Verified via Task agent (PASS verdict)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-12-04 12:36:32 +09:00
291c84a1a7
Add Tiny Alloc Gatekeeper Box for unified malloc entry point
...
Core Changes:
- New file: core/box/tiny_alloc_gate_box.h
* Thin wrapper around malloc_tiny_fast() with diagnostic hooks
* TinyAllocGateContext structure for size/class_idx/user/base/bridge information
* tiny_alloc_gate_diag_enabled() - ENV-controlled diagnostic mode
* tiny_alloc_gate_validate() - Validates class_idx/header/meta consistency
* tiny_alloc_gate_fast() - Main gatekeeper function
* Zero performance impact when diagnostics disabled
- Modified: core/box/hak_wrappers.inc.h
* Added #include "tiny_alloc_gate_box.h" (line 35)
* Integrated gatekeeper into malloc wrapper (lines 198-200)
* Diagnostic mode via HAKMEM_TINY_ALLOC_GATE_DIAG env var
Design Rationale:
- Complements Free Gatekeeper Box: Together they provide entry/exit hooks
- Validates allocation consistency at malloc time
- Enables Bridge + BASE/USER conversion validation in debug mode
- Maintains backward compatibility: existing behavior unchanged
Validation Features:
- tiny_ptr_bridge_classify_raw() - Verifies Superslab/Slab/meta lookup
- Header vs meta class consistency check (rate-limited, 8 msgs max)
- class_idx validation via hak_tiny_size_to_class()
- All validation logged but non-blocking (observation points for Guard)
Testing:
- All smoke tests pass (10M malloc/free cycles, pool TLS, real programs)
- Diagnostic mode validated with HAKMEM_TINY_ALLOC_GATE_DIAG=1
- No regressions in existing functionality
- Verified via Task agent (PASS verdict)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-12-04 12:06:14 +09:00