2025-11-09 18:55:50 +09:00
|
|
|
|
# Current Task: Phase 7 + Pool TLS — Step 4.x Integration & Validation
|
2025-11-05 16:47:04 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
**Date**: 2025-11-09
|
|
|
|
|
|
**Status**: 🚀 In Progress (Step 4.x)
|
|
|
|
|
|
**Priority**: HIGH
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
---
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
## 🎯 Goal
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
Box理論に沿って、Pool TLS を中心に「syscall 希薄化」と「境界一箇所化」を推し進め、Tiny/Mid/Larson の安定高速化を図る。
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
### **Why This Works**
|
|
|
|
|
|
Phase 7 Task 3 achieved **+180-280% improvement** by pre-warming:
|
|
|
|
|
|
- **Before**: First allocation → TLS miss → SuperSlab refill (100+ cycles)
|
|
|
|
|
|
- **After**: First allocation → TLS hit (15 cycles, pre-populated cache)
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
**Same bottleneck exists in Pool TLS**:
|
|
|
|
|
|
- First 8KB allocation → TLS miss → Arena carve → mmap (1000+ cycles)
|
|
|
|
|
|
- Pre-warm eliminates this cold-start penalty
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
---
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
## 📊 Current Status(Step 4までの主な進捗)
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
### 実装サマリ
|
|
|
|
|
|
- ✅ Tiny 1024B 特例(ヘッダ無し)+ class7 補給の軽量適応(mmap 多発の主因を遮断)
|
|
|
|
|
|
- ✅ OS 降下の境界化(`hak_os_map_boundary()`):mmap 呼び出しを一箇所に集約
|
|
|
|
|
|
- ✅ Pool TLS Arena(1→2→4→8MB指数成長, ENV で可変):mmap をアリーナへ集約
|
|
|
|
|
|
- ✅ Page Registry(チャンク登録/lookup で owner 解決)
|
|
|
|
|
|
- ✅ Remote Queue(Pool 用, mutex バケット版)+ alloc 前の軽量 drain を配線
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
---
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
## 🚀 次のステップ(アクション)
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
1) Remote Queue の drain を Pool TLS refill 境界とも統合(低水位時は drain→refill→bind)
|
|
|
|
|
|
- 現状: pool_alloc 入口で drain, pop 後 low-water で追加 drain を実装済み
|
|
|
|
|
|
- 追加: refill 経路(`pool_refill_and_alloc` 呼出し直前)でも drain を試行し、drain 成功時は refill を回避
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
2) strace による syscall 減少確認(指標化)
|
|
|
|
|
|
- RandomMixed: 256 / 1024B, それぞれ `mmap/madvise/munmap` 回数(-c合計)
|
|
|
|
|
|
- PoolTLS: 1T/4T の `mmap/madvise/munmap` 減少を比較(Arena導入前後)
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
3) 性能A/B(ENV: INIT/MAX/GROWTH)で最適化勘所を探索
|
|
|
|
|
|
- `HAKMEM_POOL_TLS_ARENA_MB_INIT`, `HAKMEM_POOL_TLS_ARENA_MB_MAX`, `HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS` の組合せを評価
|
|
|
|
|
|
- 目標: syscall を削減しつつメモリ使用量を許容範囲に維持
|
|
|
|
|
|
|
|
|
|
|
|
4) Remote Queue の高速化(次フェーズ)
|
|
|
|
|
|
- まずはmutex→lock分割/軽量スピン化、必要に応じてクラス別queue
|
|
|
|
|
|
- Page Registry の O(1) 化(ページ単位のテーブル), 将来はper-arena ID化
|
|
|
|
|
|
|
|
|
|
|
|
**Challenge**: Pool blocks are LARGE (8KB-52KB) vs Tiny (128B-1KB)
|
|
|
|
|
|
|
|
|
|
|
|
**Memory Budget Analysis**:
|
|
|
|
|
|
```
|
|
|
|
|
|
Phase 7 Tiny:
|
|
|
|
|
|
- 16 blocks × 1KB = 16KB per class
|
|
|
|
|
|
- 7 classes × 16KB = 112KB total ✅ Acceptable
|
|
|
|
|
|
|
|
|
|
|
|
Pool TLS (Naive):
|
|
|
|
|
|
- 16 blocks × 8KB = 128KB (class 0)
|
|
|
|
|
|
- 16 blocks × 52KB = 832KB (class 6)
|
|
|
|
|
|
- Total: ~4-5MB ❌ Too much!
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Smart Strategy**: Variable pre-warm counts based on expected usage
|
|
|
|
|
|
```c
|
|
|
|
|
|
// Hot classes (8-24KB) - common in real workloads
|
|
|
|
|
|
Class 0 (8KB): 16 blocks = 128KB
|
|
|
|
|
|
Class 1 (16KB): 16 blocks = 256KB
|
|
|
|
|
|
Class 2 (24KB): 12 blocks = 288KB
|
|
|
|
|
|
|
|
|
|
|
|
// Warm classes (32-40KB)
|
|
|
|
|
|
Class 3 (32KB): 8 blocks = 256KB
|
|
|
|
|
|
Class 4 (40KB): 8 blocks = 320KB
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
// Cold classes (48-52KB) - rare
|
|
|
|
|
|
Class 5 (48KB): 4 blocks = 192KB
|
|
|
|
|
|
Class 6 (52KB): 4 blocks = 208KB
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
Total: ~1.6MB ✅ Acceptable
|
|
|
|
|
|
```
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
**Rationale**:
|
|
|
|
|
|
1. Smaller classes are used more frequently (Pareto principle)
|
|
|
|
|
|
2. Total memory: 1.6MB (reasonable for 8-52KB allocations)
|
|
|
|
|
|
3. Covers most real-world workload patterns
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 23:53:25 +09:00
|
|
|
|
---
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
## ENV(Arena 関連)
|
|
|
|
|
|
```
|
|
|
|
|
|
# Initial chunk size in MB (default: 1)
|
|
|
|
|
|
export HAKMEM_POOL_TLS_ARENA_MB_INIT=2
|
|
|
|
|
|
|
|
|
|
|
|
# Maximum chunk size in MB (default: 8)
|
|
|
|
|
|
export HAKMEM_POOL_TLS_ARENA_MB_MAX=16
|
|
|
|
|
|
|
|
|
|
|
|
# Number of growth levels (default: 3 → 1→2→4→8MB)
|
|
|
|
|
|
export HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS=4
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Location**: `core/pool_tls.c`
|
|
|
|
|
|
|
|
|
|
|
|
**Code**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
// Pre-warm counts optimized for memory usage
|
|
|
|
|
|
static const int PREWARM_COUNTS[POOL_SIZE_CLASSES] = {
|
|
|
|
|
|
16, 16, 12, // Hot: 8KB, 16KB, 24KB
|
|
|
|
|
|
8, 8, // Warm: 32KB, 40KB
|
|
|
|
|
|
4, 4 // Cold: 48KB, 52KB
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
void pool_tls_prewarm(void) {
|
|
|
|
|
|
for (int class_idx = 0; class_idx < POOL_SIZE_CLASSES; class_idx++) {
|
|
|
|
|
|
int count = PREWARM_COUNTS[class_idx];
|
|
|
|
|
|
size_t size = POOL_CLASS_SIZES[class_idx];
|
|
|
|
|
|
|
|
|
|
|
|
// Allocate then immediately free to populate TLS cache
|
|
|
|
|
|
for (int i = 0; i < count; i++) {
|
|
|
|
|
|
void* ptr = pool_alloc(size);
|
|
|
|
|
|
if (ptr) {
|
|
|
|
|
|
pool_free(ptr); // Goes back to TLS freelist
|
|
|
|
|
|
} else {
|
|
|
|
|
|
// OOM during pre-warm (rare, but handle gracefully)
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Header Addition** (`core/pool_tls.h`):
|
|
|
|
|
|
```c
|
|
|
|
|
|
// Pre-warm TLS cache (call once at thread init)
|
|
|
|
|
|
void pool_tls_prewarm(void);
|
|
|
|
|
|
```
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 軽い確認(推奨)
|
|
|
|
|
|
```
|
|
|
|
|
|
# PoolTLS
|
|
|
|
|
|
./build.sh bench_pool_tls_hakmem
|
|
|
|
|
|
./bench_pool_tls_hakmem 1 100000 256 42
|
|
|
|
|
|
./bench_pool_tls_hakmem 4 50000 256 42
|
|
|
|
|
|
|
|
|
|
|
|
# syscall 計測(mmap/madvise/munmap 合計が減っているか確認)
|
|
|
|
|
|
strace -e trace=mmap,madvise,munmap -c ./bench_pool_tls_hakmem 1 100000 256 42
|
|
|
|
|
|
strace -e trace=mmap,madvise,munmap -c ./bench_random_mixed_hakmem 100000 256 42
|
|
|
|
|
|
strace -e trace=mmap,madvise,munmap -c ./bench_random_mixed_hakmem 100000 1024 42
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Location**: `core/hakmem.c` (or wherever Pool TLS init happens)
|
|
|
|
|
|
|
|
|
|
|
|
**Code**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
#ifdef HAKMEM_POOL_TLS_PHASE1
|
|
|
|
|
|
// Initialize Pool TLS
|
|
|
|
|
|
pool_thread_init();
|
|
|
|
|
|
|
|
|
|
|
|
// Pre-warm cache (Phase 1.5b optimization)
|
|
|
|
|
|
#ifdef HAKMEM_POOL_TLS_PREWARM
|
|
|
|
|
|
pool_tls_prewarm();
|
|
|
|
|
|
#endif
|
|
|
|
|
|
#endif
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Makefile Addition**:
|
|
|
|
|
|
```makefile
|
|
|
|
|
|
# Pool TLS Phase 1.5b - Pre-warm optimization
|
|
|
|
|
|
ifeq ($(POOL_TLS_PREWARM),1)
|
|
|
|
|
|
CFLAGS += -DHAKMEM_POOL_TLS_PREWARM=1
|
|
|
|
|
|
endif
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Update `build.sh`**:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
make \
|
|
|
|
|
|
POOL_TLS_PHASE1=1 \
|
|
|
|
|
|
POOL_TLS_PREWARM=1 \ # NEW!
|
|
|
|
|
|
HEADER_CLASSIDX=1 \
|
|
|
|
|
|
AGGRESSIVE_INLINE=1 \
|
|
|
|
|
|
PREWARM_TLS=1 \
|
|
|
|
|
|
"${TARGET}"
|
|
|
|
|
|
```
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
---
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
### **Step 4: Build & Smoke Test** ⏳ 10 min
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
```bash
|
|
|
|
|
|
# Build with pre-warm enabled
|
|
|
|
|
|
./build_pool_tls.sh bench_mid_large_mt_hakmem
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
# Quick smoke test
|
|
|
|
|
|
./dev_pool_tls.sh test
|
2025-11-08 23:53:25 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
# Expected: No crashes, similar or better performance
|
|
|
|
|
|
```
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
feat: Phase 7 + Phase 2 - Massive performance & stability improvements
Performance Achievements:
- Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed)
- Single-thread: +24% (2.71M → 3.36M ops/s Larson)
- 4T stability: 0% → 95% (19/20 success rate)
- Overall: 91.3% of System malloc average (target was 40-55%) ✓
Phase 7 (Tasks 1-3): Core Optimizations
- Task 1: Header validation removal (Region-ID direct lookup)
- Task 2: Aggressive inline (TLS cache access optimization)
- Task 3: Pre-warm TLS cache (eliminate cold-start penalty)
Result: +180-280% improvement, 85-146% of System malloc
Critical Bug Fixes:
- Fix 64B allocation crash (size-to-class +1 for header)
- Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11)
- Remove malloc fallback (30% → 50% stability)
Phase 2a: SuperSlab Dynamic Expansion (CRITICAL)
- Implement mimalloc-style chunk linking
- Unlimited slab expansion (no more OOM at 32 slabs)
- Fix chunk initialization bug (bitmap=0x00000001 after expansion)
Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h
Result: 50% → 95% stability (19/20 4T success)
Phase 2b: TLS Cache Adaptive Sizing
- Dynamic capacity: 16-2048 slots based on usage
- High-water mark tracking + exponential growth/shrink
- Expected: +3-10% performance, -30-50% memory
Files: core/tiny_adaptive_sizing.c/h (new)
Phase 2c: BigCache Dynamic Hash Table
- Migrate from fixed 256×8 array to dynamic hash table
- Auto-resize: 256 → 512 → 1024 → 65,536 buckets
- Improved hash function (FNV-1a) + collision chaining
Files: core/hakmem_bigcache.c/h
Expected: +10-20% cache hit rate
Design Flaws Analysis:
- Identified 6 components with fixed-capacity bottlenecks
- SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM)
- Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters)
Documentation:
- 13 comprehensive reports (PHASE*.md, DESIGN_FLAWS*.md)
- Implementation guides, test results, production readiness
- Bug fix reports, root cause analysis
Build System:
- Makefile: phase7 targets, PREWARM_TLS flag
- Auto dependency generation (-MMD -MP) for .inc files
Known Issues:
- 4T stability: 19/20 (95%) - investigating 1 failure for 100%
- L2.5 Pool dynamic sharding: design only (needs 2-3 days integration)
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 17:08:00 +09:00
|
|
|
|
---
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
### **Step 5: Benchmark** ⏳ 15 min
|
2025-11-08 23:53:25 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
```bash
|
|
|
|
|
|
# Full benchmark vs System malloc
|
|
|
|
|
|
./run_pool_bench.sh
|
feat: Phase 7 + Phase 2 - Massive performance & stability improvements
Performance Achievements:
- Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed)
- Single-thread: +24% (2.71M → 3.36M ops/s Larson)
- 4T stability: 0% → 95% (19/20 success rate)
- Overall: 91.3% of System malloc average (target was 40-55%) ✓
Phase 7 (Tasks 1-3): Core Optimizations
- Task 1: Header validation removal (Region-ID direct lookup)
- Task 2: Aggressive inline (TLS cache access optimization)
- Task 3: Pre-warm TLS cache (eliminate cold-start penalty)
Result: +180-280% improvement, 85-146% of System malloc
Critical Bug Fixes:
- Fix 64B allocation crash (size-to-class +1 for header)
- Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11)
- Remove malloc fallback (30% → 50% stability)
Phase 2a: SuperSlab Dynamic Expansion (CRITICAL)
- Implement mimalloc-style chunk linking
- Unlimited slab expansion (no more OOM at 32 slabs)
- Fix chunk initialization bug (bitmap=0x00000001 after expansion)
Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h
Result: 50% → 95% stability (19/20 4T success)
Phase 2b: TLS Cache Adaptive Sizing
- Dynamic capacity: 16-2048 slots based on usage
- High-water mark tracking + exponential growth/shrink
- Expected: +3-10% performance, -30-50% memory
Files: core/tiny_adaptive_sizing.c/h (new)
Phase 2c: BigCache Dynamic Hash Table
- Migrate from fixed 256×8 array to dynamic hash table
- Auto-resize: 256 → 512 → 1024 → 65,536 buckets
- Improved hash function (FNV-1a) + collision chaining
Files: core/hakmem_bigcache.c/h
Expected: +10-20% cache hit rate
Design Flaws Analysis:
- Identified 6 components with fixed-capacity bottlenecks
- SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM)
- Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters)
Documentation:
- 13 comprehensive reports (PHASE*.md, DESIGN_FLAWS*.md)
- Implementation guides, test results, production readiness
- Bug fix reports, root cause analysis
Build System:
- Makefile: phase7 targets, PREWARM_TLS flag
- Auto dependency generation (-MMD -MP) for .inc files
Known Issues:
- 4T stability: 19/20 (95%) - investigating 1 failure for 100%
- L2.5 Pool dynamic sharding: design only (needs 2-3 days integration)
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 17:08:00 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
# Expected results:
|
|
|
|
|
|
# Before (1.5a): 1.79M ops/s
|
|
|
|
|
|
# After (1.5b): 5-15M ops/s (+3-8x)
|
|
|
|
|
|
```
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
**Additional benchmarks**:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Different sizes
|
|
|
|
|
|
./bench_mid_large_mt_hakmem 1 100000 256 42 # 8-32KB mixed
|
|
|
|
|
|
./bench_mid_large_mt_hakmem 1 100000 1024 42 # Larger workset
|
2025-11-08 04:50:41 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
# Multi-threaded
|
|
|
|
|
|
./bench_mid_large_mt_hakmem 4 100000 256 42 # 4T
|
|
|
|
|
|
```
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 01:35:45 +09:00
|
|
|
|
---
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
### **Step 6: Measure & Analyze** ⏳ 10 min
|
|
|
|
|
|
|
|
|
|
|
|
**Metrics to collect**:
|
|
|
|
|
|
1. ops/s improvement (target: +3-8x)
|
|
|
|
|
|
2. Memory overhead (should be ~1.6MB per thread)
|
|
|
|
|
|
3. Cold-start penalty reduction (first allocation latency)
|
|
|
|
|
|
|
|
|
|
|
|
**Success Criteria**:
|
|
|
|
|
|
- ✅ No crashes or stability issues
|
|
|
|
|
|
- ✅ +200% or better improvement (5M ops/s minimum)
|
|
|
|
|
|
- ✅ Memory overhead < 2MB per thread
|
|
|
|
|
|
- ✅ No performance regression on small workloads
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
### **Step 7: Tune (if needed)** ⏳ 15 min (optional)
|
|
|
|
|
|
|
|
|
|
|
|
**If results are suboptimal**, adjust pre-warm counts:
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
**Too slow** (< 5M ops/s):
|
|
|
|
|
|
- Increase hot class pre-warm (16 → 24)
|
|
|
|
|
|
- More aggressive: Pre-warm all classes to 16
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
**Memory too high** (> 2MB):
|
|
|
|
|
|
- Reduce cold class pre-warm (4 → 2)
|
|
|
|
|
|
- Lazy pre-warm: Only hot classes initially
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
**Adaptive approach**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
// Pre-warm based on runtime heuristics
|
|
|
|
|
|
void pool_tls_prewarm_adaptive(void) {
|
|
|
|
|
|
// Start with minimal pre-warm
|
|
|
|
|
|
static const int MIN_PREWARM[7] = {8, 8, 4, 4, 2, 2, 2};
|
|
|
|
|
|
|
|
|
|
|
|
// TODO: Track usage patterns and adjust dynamically
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
## 📋 **Implementation Checklist**
|
|
|
|
|
|
|
|
|
|
|
|
### **Phase 1.5b: Pre-warm Optimization**
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Step 1**: Design pre-warm strategy (15 min)
|
|
|
|
|
|
- [ ] Analyze memory budget
|
|
|
|
|
|
- [ ] Decide pre-warm counts per class
|
|
|
|
|
|
- [ ] Document rationale
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Step 2**: Implement `pool_tls_prewarm()` (20 min)
|
|
|
|
|
|
- [ ] Add PREWARM_COUNTS array
|
|
|
|
|
|
- [ ] Write pre-warm function
|
|
|
|
|
|
- [ ] Add to pool_tls.h
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Step 3**: Integrate with init (10 min)
|
|
|
|
|
|
- [ ] Add call to hakmem.c init
|
|
|
|
|
|
- [ ] Add Makefile flag
|
|
|
|
|
|
- [ ] Update build.sh
|
2025-11-08 23:53:25 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
- [ ] **Step 4**: Build & smoke test (10 min)
|
|
|
|
|
|
- [ ] Build with pre-warm enabled
|
|
|
|
|
|
- [ ] Run dev_pool_tls.sh test
|
|
|
|
|
|
- [ ] Verify no crashes
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
- [ ] **Step 5**: Benchmark (15 min)
|
|
|
|
|
|
- [ ] Run run_pool_bench.sh
|
|
|
|
|
|
- [ ] Test different sizes
|
|
|
|
|
|
- [ ] Test multi-threaded
|
2025-11-08 23:53:25 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
- [ ] **Step 6**: Measure & analyze (10 min)
|
|
|
|
|
|
- [ ] Record performance improvement
|
|
|
|
|
|
- [ ] Measure memory overhead
|
|
|
|
|
|
- [ ] Validate success criteria
|
|
|
|
|
|
|
|
|
|
|
|
- [ ] **Step 7**: Tune (optional, 15 min)
|
|
|
|
|
|
- [ ] Adjust pre-warm counts if needed
|
|
|
|
|
|
- [ ] Re-benchmark
|
|
|
|
|
|
- [ ] Document final configuration
|
|
|
|
|
|
|
|
|
|
|
|
**Total Estimated Time**: 1.5 hours (90 minutes)
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
## 🎯 **Expected Outcomes**
|
|
|
|
|
|
|
|
|
|
|
|
### **Performance Targets**
|
|
|
|
|
|
```
|
|
|
|
|
|
Phase 1.5a (current): 1.79M ops/s
|
|
|
|
|
|
Phase 1.5b (target): 5-15M ops/s (+3-8x)
|
|
|
|
|
|
|
|
|
|
|
|
Conservative: 5M ops/s (+180%)
|
|
|
|
|
|
Expected: 8M ops/s (+350%)
|
|
|
|
|
|
Optimistic: 15M ops/s (+740%)
|
|
|
|
|
|
```
|
2025-11-08 23:53:25 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
### **Comparison to Phase 7**
|
|
|
|
|
|
```
|
|
|
|
|
|
Phase 7 Task 3 (Tiny):
|
|
|
|
|
|
Before: 21M → After: 59M ops/s (+181%)
|
2025-11-08 23:53:25 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
Phase 1.5b (Pool):
|
|
|
|
|
|
Before: 1.79M → After: 5-15M ops/s (+180-740%)
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
Similar or better improvement expected!
|
|
|
|
|
|
```
|
2025-11-08 23:53:25 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
### **Risk Assessment**
|
|
|
|
|
|
- **Technical Risk**: LOW (proven pattern from Phase 7)
|
|
|
|
|
|
- **Stability Risk**: LOW (simple, non-invasive change)
|
|
|
|
|
|
- **Memory Risk**: LOW (1.6MB is negligible for Pool workloads)
|
|
|
|
|
|
- **Complexity Risk**: LOW (< 50 LOC change)
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
## 📁 **Related Documents**
|
|
|
|
|
|
|
|
|
|
|
|
- `CLAUDE.md` - Development history (Phase 1.5a documented)
|
|
|
|
|
|
- `POOL_TLS_QUICKSTART.md` - Quick start guide
|
|
|
|
|
|
- `POOL_TLS_INVESTIGATION_FINAL.md` - Phase 1.5a debugging journey
|
|
|
|
|
|
- `PHASE7_TASK3_RESULTS.md` - Pre-warm success pattern (Tiny)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 🚀 **Next Actions**
|
|
|
|
|
|
|
|
|
|
|
|
**NOW**: Start Step 1 - Design pre-warm strategy
|
|
|
|
|
|
**NEXT**: Implement pool_tls_prewarm() function
|
|
|
|
|
|
**THEN**: Build, test, benchmark
|
|
|
|
|
|
|
|
|
|
|
|
**Estimated Completion**: 1.5 hours from start
|
|
|
|
|
|
**Success Probability**: 90% (proven technique)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
2025-11-08 23:53:25 +09:00
|
|
|
|
|
2025-11-09 18:55:50 +09:00
|
|
|
|
**Status**: Ready to implement - awaiting user confirmation to proceed! 🚀
|