ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード ENV変数削除(BG/HotMag系): - core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines) - core/hakmem_tiny_bg_spill.c: BG spill ENV 削除 - core/tiny_refill.h: BG remote 固定値化 - core/hakmem_tiny_slow.inc: BG refs 削除 fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE): - core/hakmem_shared_pool.c: Lock stats (~18 fprintf) - core/page_arena.c: Init/Shutdown/Stats (~27 fprintf) - core/hakmem.c: SIGSEGV init message ドキュメント整理: - 328 markdown files 削除(旧レポート・重複docs) 性能確認: - Larson: 52.35M ops/s (前回52.8M、安定動作✅) - ENV整理による機能影響なし - Debug出力は一部残存(次phase で対応) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
516
docs/specs/ATOMIC_FREELIST_INDEX.md
Normal file
516
docs/specs/ATOMIC_FREELIST_INDEX.md
Normal file
@ -0,0 +1,516 @@
|
||||
# Atomic Freelist Implementation - Documentation Index
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains comprehensive documentation and tooling for implementing atomic `TinySlabMeta.freelist` operations to enable multi-threaded safety in the HAKMEM memory allocator.
|
||||
|
||||
**Status**: Ready for implementation
|
||||
**Estimated Effort**: 5-8 hours (3 phases)
|
||||
**Expected Impact**: -2-3% single-threaded, +MT stability and scalability
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
**New to this task?** Start here:
|
||||
|
||||
1. **Read**: `ATOMIC_FREELIST_QUICK_START.md` (15 min)
|
||||
2. **Run**: `./scripts/analyze_freelist_sites.sh` (5 min)
|
||||
3. **Create**: Accessor header from template (30 min)
|
||||
4. **Begin**: Phase 1 conversion (2-3 hours)
|
||||
|
||||
---
|
||||
|
||||
## Documentation Files
|
||||
|
||||
### 1. Executive Summary
|
||||
**File**: `ATOMIC_FREELIST_SUMMARY.md`
|
||||
**Purpose**: High-level overview of the entire implementation
|
||||
**Contents**:
|
||||
- Investigation results (90 sites, not 589)
|
||||
- Implementation strategy (hybrid approach)
|
||||
- Performance analysis (2-3% regression expected)
|
||||
- Risk assessment (low risk, high benefit)
|
||||
- Timeline and success metrics
|
||||
|
||||
**Read this first** for a complete picture.
|
||||
|
||||
---
|
||||
|
||||
### 2. Implementation Strategy
|
||||
**File**: `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md`
|
||||
**Purpose**: Detailed technical strategy and design decisions
|
||||
**Contents**:
|
||||
- Accessor function API design (lock-free CAS + relaxed atomics)
|
||||
- Critical site list (top 20 sites to convert)
|
||||
- Non-critical site strategy (skip or use relaxed)
|
||||
- Phased implementation plan (3 phases)
|
||||
- Performance projections (single/multi-threaded)
|
||||
- Memory ordering rationale (acquire/release/relaxed)
|
||||
- Alternative approaches (mutex, global lock, etc.)
|
||||
|
||||
**Use this** when designing the accessor API and planning conversion phases.
|
||||
|
||||
---
|
||||
|
||||
### 3. Site-by-Site Conversion Guide
|
||||
**File**: `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md`
|
||||
**Purpose**: Line-by-line conversion instructions for all 90 sites
|
||||
**Contents**:
|
||||
- Phase 1: 5 files, 25 sites (hot paths)
|
||||
- File 1: `core/box/slab_freelist_atomic.h` (CREATE)
|
||||
- File 2: `core/tiny_superslab_alloc.inc.h` (8 sites)
|
||||
- File 3: `core/hakmem_tiny_refill_p0.inc.h` (3 sites)
|
||||
- File 4: `core/box/carve_push_box.c` (10 sites)
|
||||
- File 5: `core/hakmem_tiny_tls_ops.h` (4 sites)
|
||||
- Phase 2: 10 files, 40 sites (warm paths)
|
||||
- Phase 3: 5 files, 25 sites (cold paths)
|
||||
- Common pitfalls (double-POP, missing NULL check, etc.)
|
||||
- Testing checklist per file
|
||||
- Quick reference card (conversion patterns)
|
||||
|
||||
**Use this** during actual code conversion (your primary reference).
|
||||
|
||||
---
|
||||
|
||||
### 4. Quick Start Guide
|
||||
**File**: `ATOMIC_FREELIST_QUICK_START.md`
|
||||
**Purpose**: Step-by-step implementation instructions
|
||||
**Contents**:
|
||||
- Step 1: Read documentation (15 min)
|
||||
- Step 2: Create accessor header (30 min)
|
||||
- Step 3: Phase 1 conversion (2-3 hours)
|
||||
- Step 4: Phase 2 conversion (2-3 hours)
|
||||
- Step 5: Phase 3 cleanup (1-2 hours)
|
||||
- Common pitfalls and solutions
|
||||
- Performance expectations
|
||||
- Rollback plan
|
||||
- Success criteria
|
||||
|
||||
**Use this** as your daily task list during implementation.
|
||||
|
||||
---
|
||||
|
||||
### 5. Accessor Header Template
|
||||
**File**: `core/box/slab_freelist_atomic.h.TEMPLATE`
|
||||
**Purpose**: Complete implementation of atomic accessor API
|
||||
**Contents**:
|
||||
- Lock-free CAS operations (`slab_freelist_pop_lockfree`, `slab_freelist_push_lockfree`)
|
||||
- Relaxed load/store operations (`slab_freelist_load_relaxed`, `slab_freelist_store_relaxed`)
|
||||
- NULL check helpers (`slab_freelist_is_empty`, `slab_freelist_is_nonempty`)
|
||||
- Debug macro (`SLAB_FREELIST_DEBUG_PTR`)
|
||||
- Extensive comments (80+ lines of documentation)
|
||||
- Conversion examples
|
||||
- Performance notes
|
||||
- Testing strategy
|
||||
|
||||
**Copy this** to `core/box/slab_freelist_atomic.h` to get started.
|
||||
|
||||
---
|
||||
|
||||
## Tool Scripts
|
||||
|
||||
### 1. Site Analysis Script
|
||||
**File**: `scripts/analyze_freelist_sites.sh`
|
||||
**Purpose**: Analyze freelist access patterns in codebase
|
||||
**Output**:
|
||||
- Total site count (90 sites)
|
||||
- Operation breakdown (POP, PUSH, NULL checks, etc.)
|
||||
- Files with freelist usage (21 files)
|
||||
- Phase 1/2/3 file lists
|
||||
- Lock-protected sites check
|
||||
- Conversion effort estimates
|
||||
|
||||
**Run this** before starting conversion to validate site counts.
|
||||
|
||||
```bash
|
||||
./scripts/analyze_freelist_sites.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Conversion Verification Script
|
||||
**File**: `scripts/verify_atomic_freelist_conversion.sh`
|
||||
**Purpose**: Track conversion progress and detect potential bugs
|
||||
**Output**:
|
||||
- Accessor header check (exists, functions defined)
|
||||
- Direct access count (remaining unconverted sites)
|
||||
- Converted operations count (by type)
|
||||
- Conversion progress (0-100%)
|
||||
- Phase 1/2/3 file check (which files converted)
|
||||
- Potential bug detection (double-POP, double-PUSH, missing NULL check)
|
||||
- Compile status
|
||||
- Recommendations for next steps
|
||||
|
||||
**Run this** frequently during conversion to track progress and catch bugs early.
|
||||
|
||||
```bash
|
||||
./scripts/verify_atomic_freelist_conversion.sh
|
||||
```
|
||||
|
||||
**Example output**:
|
||||
```
|
||||
Progress: 30% (27/90 sites)
|
||||
[============----------------------------]
|
||||
Currently working on: Phase 1 (Critical Hot Paths)
|
||||
|
||||
✅ No double-POP bugs detected
|
||||
✅ No double-PUSH bugs detected
|
||||
✅ Compilation succeeded
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Critical Hot Paths (2-3 hours)
|
||||
**Goal**: Fix Larson 8T crash with minimal changes
|
||||
**Scope**: 5 files, 25 sites
|
||||
**Files**:
|
||||
- `core/box/slab_freelist_atomic.h` (CREATE)
|
||||
- `core/tiny_superslab_alloc.inc.h`
|
||||
- `core/hakmem_tiny_refill_p0.inc.h`
|
||||
- `core/box/carve_push_box.c`
|
||||
- `core/hakmem_tiny_tls_ops.h`
|
||||
|
||||
**Success Criteria**:
|
||||
- ✅ Larson 8T stable (no crashes)
|
||||
- ✅ Regression <5% (>24.0M ops/s)
|
||||
- ✅ No TSan warnings
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Important Paths (2-3 hours)
|
||||
**Goal**: Full MT safety for all allocation paths
|
||||
**Scope**: 10 files, 40 sites
|
||||
**Files**:
|
||||
- `core/tiny_refill_opt.h`
|
||||
- `core/tiny_free_magazine.inc.h`
|
||||
- `core/refill/ss_refill_fc.h`
|
||||
- `core/slab_handle.h`
|
||||
- 6 additional files
|
||||
|
||||
**Success Criteria**:
|
||||
- ✅ All MT tests pass (1T-16T)
|
||||
- ✅ Regression <3% (>24.4M ops/s)
|
||||
- ✅ MT scaling 70%+
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Cleanup (1-2 hours)
|
||||
**Goal**: Convert/document remaining sites
|
||||
**Scope**: 5 files, 25 sites
|
||||
**Files**:
|
||||
- Debug/stats files
|
||||
- Init/cleanup files
|
||||
- Verification files
|
||||
|
||||
**Success Criteria**:
|
||||
- ✅ All 90 sites converted or documented
|
||||
- ✅ Zero direct accesses (except atomic.h)
|
||||
- ✅ Full test suite passes
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Per-File Testing
|
||||
After converting each file:
|
||||
```bash
|
||||
make bench_random_mixed_hakmem
|
||||
./out/release/bench_random_mixed_hakmem 10000 256 42
|
||||
```
|
||||
|
||||
### Phase 1 Testing
|
||||
```bash
|
||||
# Single-threaded baseline
|
||||
./out/release/bench_random_mixed_hakmem 10000000 256 42
|
||||
|
||||
# Multi-threaded stability (PRIMARY TEST)
|
||||
./out/release/larson_hakmem 8 100000 256
|
||||
|
||||
# Race detection
|
||||
./build.sh tsan larson_hakmem
|
||||
./out/tsan/larson_hakmem 4 10000 256
|
||||
```
|
||||
|
||||
### Phase 2 Testing
|
||||
```bash
|
||||
# All sizes
|
||||
for size in 128 256 512 1024; do
|
||||
./out/release/bench_random_mixed_hakmem 1000000 $size 42
|
||||
done
|
||||
|
||||
# MT scaling
|
||||
for threads in 1 2 4 8 16; do
|
||||
./out/release/larson_hakmem $threads 100000 256
|
||||
done
|
||||
```
|
||||
|
||||
### Phase 3 Testing
|
||||
```bash
|
||||
# Full test suite
|
||||
make clean && make all
|
||||
./run_all_tests.sh
|
||||
|
||||
# ASan check
|
||||
./build.sh asan bench_random_mixed_hakmem
|
||||
./out/asan/bench_random_mixed_hakmem 100000 256 42
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Expectations
|
||||
|
||||
### Single-Threaded
|
||||
|
||||
| Metric | Before | After | Change |
|
||||
|--------|--------|-------|--------|
|
||||
| Random Mixed 256B | 25.1M ops/s | 24.4-24.8M ops/s | -1.2-2.8% ✅ |
|
||||
| Larson 1T | 2.76M ops/s | 2.68-2.73M ops/s | -1.1-2.9% ✅ |
|
||||
|
||||
**Acceptable**: <5% regression
|
||||
|
||||
### Multi-Threaded
|
||||
|
||||
| Metric | Before | After | Change |
|
||||
|--------|--------|-------|--------|
|
||||
| Larson 8T | **CRASH** | ~18-20M ops/s | **FIXED** ✅ |
|
||||
| MT Scaling (8T) | 0% (crashes) | 70-80% | **NEW** ✅ |
|
||||
|
||||
**Benefit**: Stability + MT scalability >> 2-3% single-threaded cost
|
||||
|
||||
---
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### NULL Check Conversion
|
||||
```c
|
||||
// BEFORE:
|
||||
if (meta->freelist) { ... }
|
||||
|
||||
// AFTER:
|
||||
if (slab_freelist_is_nonempty(meta)) { ... }
|
||||
```
|
||||
|
||||
### POP Operation Conversion
|
||||
```c
|
||||
// BEFORE:
|
||||
void* block = meta->freelist;
|
||||
meta->freelist = tiny_next_read(class_idx, block);
|
||||
|
||||
// AFTER:
|
||||
void* block = slab_freelist_pop_lockfree(meta, class_idx);
|
||||
if (!block) goto fallback; // Handle race
|
||||
```
|
||||
|
||||
### PUSH Operation Conversion
|
||||
```c
|
||||
// BEFORE:
|
||||
tiny_next_write(class_idx, node, meta->freelist);
|
||||
meta->freelist = node;
|
||||
|
||||
// AFTER:
|
||||
slab_freelist_push_lockfree(meta, class_idx, node);
|
||||
```
|
||||
|
||||
### Initialization Conversion
|
||||
```c
|
||||
// BEFORE:
|
||||
meta->freelist = NULL;
|
||||
|
||||
// AFTER:
|
||||
slab_freelist_store_relaxed(meta, NULL);
|
||||
```
|
||||
|
||||
### Debug Print Conversion
|
||||
```c
|
||||
// BEFORE:
|
||||
fprintf(stderr, "freelist=%p", meta->freelist);
|
||||
|
||||
// AFTER:
|
||||
fprintf(stderr, "freelist=%p", SLAB_FREELIST_DEBUG_PTR(meta));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Compilation Fails
|
||||
```bash
|
||||
# Check if accessor header exists
|
||||
ls -la core/box/slab_freelist_atomic.h
|
||||
|
||||
# Check for missing includes
|
||||
grep -n "#include.*slab_freelist_atomic.h" core/tiny_superslab_alloc.inc.h
|
||||
|
||||
# Rebuild from clean state
|
||||
make clean && make bench_random_mixed_hakmem
|
||||
```
|
||||
|
||||
### Issue: Larson 8T Still Crashes
|
||||
```bash
|
||||
# Check conversion progress
|
||||
./scripts/verify_atomic_freelist_conversion.sh
|
||||
|
||||
# Run with TSan to detect data races
|
||||
./build.sh tsan larson_hakmem
|
||||
./out/tsan/larson_hakmem 4 10000 256 2>&1 | grep -A5 "WARNING"
|
||||
|
||||
# Check for double-POP/PUSH bugs
|
||||
grep -A1 "slab_freelist_pop_lockfree" core/ -r | grep "tiny_next_read"
|
||||
grep -B1 "slab_freelist_push_lockfree" core/ -r | grep "tiny_next_write"
|
||||
```
|
||||
|
||||
### Issue: Performance Regression >5%
|
||||
```bash
|
||||
# Verify baseline (before conversion)
|
||||
git stash
|
||||
git checkout master
|
||||
./out/release/bench_random_mixed_hakmem 10000000 256 42
|
||||
# Record: 25.1M ops/s
|
||||
|
||||
# Check converted version
|
||||
git checkout atomic-freelist-phase1
|
||||
./out/release/bench_random_mixed_hakmem 10000000 256 42
|
||||
# Should be: >24.0M ops/s
|
||||
|
||||
# If regression >5%, profile hot paths
|
||||
perf record ./out/release/bench_random_mixed_hakmem 1000000 256 42
|
||||
perf report
|
||||
# Look for CAS retry loops or excessive memory ordering
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Quick Rollback (if Phase 1 fails)
|
||||
```bash
|
||||
git stash
|
||||
git checkout master
|
||||
git branch -D atomic-freelist-phase1
|
||||
# Review issues and retry
|
||||
```
|
||||
|
||||
### Alternative Approach (Spinlock)
|
||||
If lock-free proves too complex:
|
||||
```c
|
||||
// Option: Use 1-byte spinlock instead
|
||||
// Add to TinySlabMeta: uint8_t freelist_lock;
|
||||
// Use __sync_lock_test_and_set() for lock/unlock
|
||||
// Expected overhead: 5-10% (vs 2-3% for lock-free)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
Use the verification script to track progress:
|
||||
|
||||
```bash
|
||||
./scripts/verify_atomic_freelist_conversion.sh
|
||||
```
|
||||
|
||||
**Output example**:
|
||||
```
|
||||
Progress: 30% (27/90 sites)
|
||||
[============----------------------------]
|
||||
|
||||
Phase 1 files converted: 2/4
|
||||
Remaining sites: 63
|
||||
|
||||
Currently working on: Phase 1 (Critical Hot Paths)
|
||||
Next step: Convert core/box/carve_push_box.c
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Phase 1 Complete
|
||||
- [ ] 5 files converted (25 sites)
|
||||
- [ ] Larson 8T runs 100K iterations without crash
|
||||
- [ ] Single-threaded regression <5%
|
||||
- [ ] No TSan warnings
|
||||
- [ ] Verification script shows 30% progress
|
||||
|
||||
### Phase 2 Complete
|
||||
- [ ] 15 files converted (65 sites)
|
||||
- [ ] All MT tests pass (1T-16T)
|
||||
- [ ] Single-threaded regression <3%
|
||||
- [ ] MT scaling 70%+
|
||||
- [ ] Verification script shows 72% progress
|
||||
|
||||
### Phase 3 Complete
|
||||
- [ ] 21 files converted (90 sites)
|
||||
- [ ] Zero direct `meta->freelist` accesses
|
||||
- [ ] Full test suite passes
|
||||
- [ ] Documentation updated (CLAUDE.md)
|
||||
- [ ] Verification script shows 100% progress
|
||||
|
||||
---
|
||||
|
||||
## File Checklist
|
||||
|
||||
### Documentation
|
||||
- [x] `ATOMIC_FREELIST_SUMMARY.md` - Executive summary
|
||||
- [x] `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md` - Technical strategy
|
||||
- [x] `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` - Conversion guide
|
||||
- [x] `ATOMIC_FREELIST_QUICK_START.md` - Quick start instructions
|
||||
- [x] `ATOMIC_FREELIST_INDEX.md` - This file
|
||||
|
||||
### Templates
|
||||
- [x] `core/box/slab_freelist_atomic.h.TEMPLATE` - Accessor API
|
||||
|
||||
### Tools
|
||||
- [x] `scripts/analyze_freelist_sites.sh` - Site analysis
|
||||
- [x] `scripts/verify_atomic_freelist_conversion.sh` - Progress tracker
|
||||
|
||||
### Implementation (to be created)
|
||||
- [ ] `core/box/slab_freelist_atomic.h` - Working accessor API
|
||||
|
||||
---
|
||||
|
||||
## Contact and Support
|
||||
|
||||
If you encounter issues during implementation:
|
||||
|
||||
1. **Check documentation**: Review relevant guide for your current phase
|
||||
2. **Run verification**: `./scripts/verify_atomic_freelist_conversion.sh`
|
||||
3. **Review common pitfalls**: See `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` section
|
||||
4. **Rollback if needed**: `git checkout master`
|
||||
|
||||
---
|
||||
|
||||
## Estimated Timeline
|
||||
|
||||
| Milestone | Duration | Cumulative |
|
||||
|-----------|----------|------------|
|
||||
| **Preparation** | 15 min | 0.25h |
|
||||
| **Create accessor header** | 30 min | 0.75h |
|
||||
| **Phase 1 conversion** | 2-3h | 3-4h |
|
||||
| **Phase 1 testing** | 30 min | 3.5-4.5h |
|
||||
| **Phase 2 conversion** | 2-3h | 5.5-7.5h |
|
||||
| **Phase 2 testing** | 1h | 6.5-8.5h |
|
||||
| **Phase 3 conversion** | 1-2h | 7.5-10.5h |
|
||||
| **Phase 3 testing** | 1h | 8.5-11.5h |
|
||||
| **Total** | | **8.5-11.5h** |
|
||||
|
||||
**Minimal viable**: 3.5-4.5 hours (Phase 1 only, fixes Larson crash)
|
||||
**Full implementation**: 8.5-11.5 hours (all 3 phases, complete MT safety)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
**Ready to start?**
|
||||
|
||||
1. Read `ATOMIC_FREELIST_QUICK_START.md` (15 min)
|
||||
2. Run `./scripts/analyze_freelist_sites.sh` (5 min)
|
||||
3. Copy template: `cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h` (5 min)
|
||||
4. Edit template to add includes (20 min)
|
||||
5. Test compile: `make bench_random_mixed_hakmem` (5 min)
|
||||
6. Begin Phase 1 conversion using `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` (2-3 hours)
|
||||
|
||||
**Good luck!** 🚀
|
||||
392
docs/specs/CONFIGURATION.md
Normal file
392
docs/specs/CONFIGURATION.md
Normal file
@ -0,0 +1,392 @@
|
||||
# HAKMEM Configuration Guide
|
||||
|
||||
**Last Updated**: 2025-11-26 (After Phase 2.2 - Learning Systems Consolidation)
|
||||
|
||||
This guide documents all canonical HAKMEM environment variables after Phase 0-2 cleanup.
|
||||
|
||||
---
|
||||
|
||||
## 📋 Quick Reference
|
||||
|
||||
Use the validation tool to check your configuration:
|
||||
|
||||
```bash
|
||||
# Validate current environment
|
||||
./scripts/validate_config.sh
|
||||
|
||||
# Strict mode (treat warnings as errors)
|
||||
./scripts/validate_config.sh --strict
|
||||
|
||||
# Quiet mode (errors only)
|
||||
./scripts/validate_config.sh --quiet
|
||||
```
|
||||
|
||||
**Deprecated variables?** See [DEPRECATED.md](DEPRECATED.md) for migration guide.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Core Configuration
|
||||
|
||||
### Allocator Path Selection
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_WRAP_TINY` | 0, 1 | 1 | Enable TINY allocator (1-2048B) |
|
||||
| `HAKMEM_WRAP_POOL` | 0, 1 | 1 | Enable POOL allocator (2-8KB) |
|
||||
| `HAKMEM_WRAP_MID` | 0, 1 | 1 | Enable MID allocator (8-32KB) |
|
||||
| `HAKMEM_WRAP_LARGE` | 0, 1 | 1 | Enable LARGE allocator (>32KB) |
|
||||
|
||||
**Example**:
|
||||
```bash
|
||||
# Disable all HAKMEM allocators (use system malloc)
|
||||
export HAKMEM_WRAP_TINY=0 HAKMEM_WRAP_POOL=0 HAKMEM_WRAP_MID=0 HAKMEM_WRAP_LARGE=0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🐛 Debug & Diagnostics
|
||||
|
||||
**Canonical Variables** (After P0.4 - Debug Consolidation):
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_DEBUG_LEVEL` | 0-3 | 0 | Verbosity (0=none, 1=errors, 2=info, 3=verbose) |
|
||||
| `HAKMEM_DEBUG_TINY` | 0, 1 | 0 | Enable TINY allocator debug output |
|
||||
| `HAKMEM_TRACE_ALLOCATIONS` | 0, 1 | 0 | Trace every alloc/free (expensive!) |
|
||||
| `HAKMEM_INTEGRITY_CHECKS` | 0, 1 | 1 | Enable integrity validation (canary checks) |
|
||||
|
||||
**Examples**:
|
||||
```bash
|
||||
# Production (quiet, integrity only)
|
||||
export HAKMEM_DEBUG_LEVEL=0
|
||||
export HAKMEM_INTEGRITY_CHECKS=1
|
||||
|
||||
# Debug session (verbose + TINY debug + tracing)
|
||||
export HAKMEM_DEBUG_LEVEL=3
|
||||
export HAKMEM_DEBUG_TINY=1
|
||||
export HAKMEM_TRACE_ALLOCATIONS=1
|
||||
export HAKMEM_INTEGRITY_CHECKS=1
|
||||
|
||||
# Performance testing (all checks OFF)
|
||||
export HAKMEM_DEBUG_LEVEL=0
|
||||
export HAKMEM_INTEGRITY_CHECKS=0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ SuperSlab Management
|
||||
|
||||
**Canonical Variables** (After P0.1 - SuperSlab Unification):
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_SUPERSLAB_REUSE` | 0, 1 | 0 | Reuse empty slabs (reduces mmap/munmap syscalls) |
|
||||
| `HAKMEM_SUPERSLAB_LAZY` | 0, 1 | 1 | Lazy deallocation (Phase 9, keep slabs cached) |
|
||||
| `HAKMEM_SUPERSLAB_PREWARM` | 0-128 | 0 | Preallocate N SuperSlabs at startup |
|
||||
| `HAKMEM_SUPERSLAB_LRU_CAP` | 0-1024 | 256 | Max cached SuperSlabs (LRU eviction) |
|
||||
| `HAKMEM_SUPERSLAB_SOFT_CAP` | 0-1024 | 128 | Soft cap for SuperSlab pool (before eviction) |
|
||||
|
||||
**Examples**:
|
||||
```bash
|
||||
# High performance (aggressive reuse + large cache)
|
||||
export HAKMEM_SUPERSLAB_REUSE=1
|
||||
export HAKMEM_SUPERSLAB_LAZY=1
|
||||
export HAKMEM_SUPERSLAB_PREWARM=16
|
||||
export HAKMEM_SUPERSLAB_LRU_CAP=512
|
||||
|
||||
# Low memory footprint (minimal caching)
|
||||
export HAKMEM_SUPERSLAB_REUSE=0
|
||||
export HAKMEM_SUPERSLAB_LAZY=0
|
||||
export HAKMEM_SUPERSLAB_LRU_CAP=32
|
||||
export HAKMEM_SUPERSLAB_SOFT_CAP=16
|
||||
```
|
||||
|
||||
**Note**: Phase 12 (Shared SuperSlab Pool) removed per-class registry population, making `SUPERSLAB_REUSE` less effective. Default is OFF.
|
||||
|
||||
---
|
||||
|
||||
## 🧠 Learning Systems
|
||||
|
||||
**Canonical Variables** (After P2.2 - Learning Consolidation, 18→6 variables):
|
||||
|
||||
### Allocation Learning
|
||||
Controls adaptive sizing for allocator caches (TLS, SFC, capacity tuning).
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_ALLOC_LEARN` | 0, 1 | 0 | Enable allocation pattern learning |
|
||||
| `HAKMEM_ALLOC_LEARN_WINDOW` | 1-1000000 | 10000 | Learning window size (operations) |
|
||||
| `HAKMEM_ALLOC_LEARN_RATE` | 0.0-1.0 | 0.1 | Learning rate (lower = slower adaptation) |
|
||||
|
||||
### Memory Learning
|
||||
Controls THP (Transparent Huge Pages), RSS optimization, and max-size learning.
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_MEM_LEARN` | 0, 1 | 0 | Enable memory pattern learning (THP/RSS/WMAX) |
|
||||
| `HAKMEM_MEM_LEARN_WINDOW` | 1-1000000 | 5000 | Learning window size (operations) |
|
||||
| `HAKMEM_MEM_LEARN_THRESHOLD` | 0.0-1.0 | 0.8 | Activation threshold (80% confidence) |
|
||||
|
||||
### Advanced Overrides
|
||||
**For troubleshooting only** - enables legacy advanced knobs that are auto-tuned by default.
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_LEARN_ADVANCED` | 0, 1 | 0 | Enable advanced override knobs (see DEPRECATED.md) |
|
||||
|
||||
**Examples**:
|
||||
```bash
|
||||
# Production (learning disabled, use static tuning)
|
||||
export HAKMEM_ALLOC_LEARN=0
|
||||
export HAKMEM_MEM_LEARN=0
|
||||
|
||||
# Adaptive workload (enable both learners)
|
||||
export HAKMEM_ALLOC_LEARN=1
|
||||
export HAKMEM_ALLOC_LEARN_WINDOW=20000
|
||||
export HAKMEM_ALLOC_LEARN_RATE=0.05
|
||||
export HAKMEM_MEM_LEARN=1
|
||||
export HAKMEM_MEM_LEARN_WINDOW=10000
|
||||
export HAKMEM_MEM_LEARN_THRESHOLD=0.75
|
||||
|
||||
# Migration troubleshooting (enable advanced overrides)
|
||||
export HAKMEM_LEARN_ADVANCED=1
|
||||
export HAKMEM_LEARN_DECAY=0.95 # Override auto-tuned decay
|
||||
```
|
||||
|
||||
**Migration Note**: See [DEPRECATED.md](DEPRECATED.md) for mapping of 18 legacy variables → 6 canonical variables.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 TINY Allocator (1-2048B)
|
||||
|
||||
### TLS Cache Configuration
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_TINY_TLS_CAP` | 16-1024 | 64 | Per-class TLS cache capacity |
|
||||
| `HAKMEM_TINY_TLS_REFILL` | 4-256 | 16 | Batch refill size |
|
||||
| `HAKMEM_TINY_DRAIN_THRESH` | 0-1024 | 128 | Remote free drain threshold |
|
||||
|
||||
### Super Front Cache (SFC)
|
||||
**Note**: SFC is **ACTIVE** and provides 95%+ hit rate for hot allocations.
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_TINY_SFC_ENABLE` | 0, 1 | 1 | Enable Super Front Cache (ultra-fast TLS cache) |
|
||||
| `HAKMEM_TINY_SFC_CAPACITY` | 32-512 | 128 | SFC slot count |
|
||||
| `HAKMEM_TINY_SFC_HOT_CLASSES` | 1-16 | 8 | Number of hot classes to cache |
|
||||
|
||||
### P0 Batch Optimization
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_TINY_P0_ENABLE` | 0, 1 | 1 | Enable P0 batch refill (O(1) freelist pop) |
|
||||
| `HAKMEM_TINY_P0_BATCH` | 4-128 | 16 | P0 batch size |
|
||||
| `HAKMEM_TINY_P0_NO_DRAIN` | 0, 1 | 0 | Disable remote drain (debug only) |
|
||||
| `HAKMEM_TINY_P0_LOG` | 0, 1 | 0 | Enable P0 counter validation logging |
|
||||
|
||||
### Header Configuration
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_TINY_HEADER_CLASSIDX` | 0, 1 | 1 | Store class_idx in header (Phase 7, enables fast free) |
|
||||
|
||||
**Examples**:
|
||||
```bash
|
||||
# High-throughput (large caches, aggressive batching)
|
||||
export HAKMEM_TINY_TLS_CAP=256
|
||||
export HAKMEM_TINY_TLS_REFILL=32
|
||||
export HAKMEM_TINY_SFC_CAPACITY=256
|
||||
export HAKMEM_TINY_P0_ENABLE=1
|
||||
export HAKMEM_TINY_P0_BATCH=32
|
||||
|
||||
# Low-latency (small caches, fine-grained refill)
|
||||
export HAKMEM_TINY_TLS_CAP=32
|
||||
export HAKMEM_TINY_TLS_REFILL=4
|
||||
export HAKMEM_TINY_SFC_CAPACITY=64
|
||||
export HAKMEM_TINY_P0_BATCH=8
|
||||
|
||||
# Debug P0 issues
|
||||
export HAKMEM_TINY_P0_LOG=1
|
||||
export HAKMEM_TINY_P0_NO_DRAIN=1 # Isolate batch refill from remote free
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏊 Pool TLS Allocator (2-8KB)
|
||||
|
||||
### Arena Management
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_POOL_TLS_ARENA_MB_INIT` | 1-64 | 1 | Initial arena size (MB) |
|
||||
| `HAKMEM_POOL_TLS_ARENA_MB_MAX` | 1-64 | 8 | Maximum arena size (MB) |
|
||||
| `HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS` | 1-8 | 3 | Growth levels (1MB→2MB→4MB→8MB) |
|
||||
|
||||
**Example**:
|
||||
```bash
|
||||
# Large arena for high-throughput 8KB allocations
|
||||
export HAKMEM_POOL_TLS_ARENA_MB_INIT=4
|
||||
export HAKMEM_POOL_TLS_ARENA_MB_MAX=32
|
||||
export HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS=5 # 4MB→8MB→16MB→32MB
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Statistics & Profiling
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_STATS_ENABLE` | 0, 1 | 0 | Enable statistics collection |
|
||||
| `HAKMEM_STATS_VERBOSE` | 0, 1 | 0 | Verbose stats output |
|
||||
| `HAKMEM_STATS_INTERVAL_SEC` | 1-3600 | 10 | Stats reporting interval (seconds) |
|
||||
| `HAKMEM_PROFILE_SYSCALLS` | 0, 1 | 0 | Profile syscall counts (mmap/munmap/madvise) |
|
||||
|
||||
**Example**:
|
||||
```bash
|
||||
# Enable stats for performance analysis
|
||||
export HAKMEM_STATS_ENABLE=1
|
||||
export HAKMEM_STATS_VERBOSE=1
|
||||
export HAKMEM_STATS_INTERVAL_SEC=5
|
||||
export HAKMEM_PROFILE_SYSCALLS=1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Experimental Features
|
||||
|
||||
**Warning**: These features are experimental and may change or be removed.
|
||||
|
||||
| Variable | Values | Default | Description |
|
||||
|----------|--------|---------|-------------|
|
||||
| `HAKMEM_EXPERIMENTAL_ADAPTIVE_DRAIN` | 0, 1 | 0 | Adaptive remote free drain threshold |
|
||||
| `HAKMEM_EXPERIMENTAL_CACHE_TUNING` | 0, 1 | 0 | Runtime cache capacity tuning |
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start Examples
|
||||
|
||||
### 1. Production (Default Recommended)
|
||||
```bash
|
||||
# High performance, stable, integrity checks enabled
|
||||
export HAKMEM_SUPERSLAB_LAZY=1
|
||||
export HAKMEM_SUPERSLAB_LRU_CAP=256
|
||||
export HAKMEM_TINY_P0_ENABLE=1
|
||||
export HAKMEM_INTEGRITY_CHECKS=1
|
||||
```
|
||||
|
||||
### 2. Debug Session
|
||||
```bash
|
||||
# Verbose logging, tracing, integrity checks
|
||||
export HAKMEM_DEBUG_LEVEL=3
|
||||
export HAKMEM_DEBUG_TINY=1
|
||||
export HAKMEM_TRACE_ALLOCATIONS=1
|
||||
export HAKMEM_INTEGRITY_CHECKS=1
|
||||
export HAKMEM_TINY_P0_LOG=1
|
||||
```
|
||||
|
||||
### 3. Low-Latency Workload
|
||||
```bash
|
||||
# Small caches, fine-grained batching, minimal syscalls
|
||||
export HAKMEM_TINY_TLS_CAP=32
|
||||
export HAKMEM_TINY_TLS_REFILL=4
|
||||
export HAKMEM_TINY_SFC_CAPACITY=64
|
||||
export HAKMEM_SUPERSLAB_LAZY=1
|
||||
export HAKMEM_SUPERSLAB_LRU_CAP=128
|
||||
```
|
||||
|
||||
### 4. High-Throughput Workload
|
||||
```bash
|
||||
# Large caches, aggressive batching, prewarm
|
||||
export HAKMEM_TINY_TLS_CAP=256
|
||||
export HAKMEM_TINY_TLS_REFILL=32
|
||||
export HAKMEM_TINY_SFC_CAPACITY=256
|
||||
export HAKMEM_TINY_P0_BATCH=32
|
||||
export HAKMEM_SUPERSLAB_PREWARM=16
|
||||
export HAKMEM_SUPERSLAB_LRU_CAP=512
|
||||
```
|
||||
|
||||
### 5. Memory-Efficient (Low RSS)
|
||||
```bash
|
||||
# Minimal caching, eager deallocation
|
||||
export HAKMEM_SUPERSLAB_LAZY=0
|
||||
export HAKMEM_SUPERSLAB_LRU_CAP=32
|
||||
export HAKMEM_SUPERSLAB_SOFT_CAP=16
|
||||
export HAKMEM_TINY_TLS_CAP=32
|
||||
export HAKMEM_TINY_SFC_CAPACITY=64
|
||||
export HAKMEM_POOL_TLS_ARENA_MB_MAX=2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Validation & Testing
|
||||
|
||||
### Validate Configuration
|
||||
```bash
|
||||
# Check for deprecated/invalid variables
|
||||
./scripts/validate_config.sh
|
||||
|
||||
# Example output:
|
||||
# [DEPRECATED] HAKMEM_LEARN is deprecated, use HAKMEM_ALLOC_LEARN instead
|
||||
# Sunset date: 2026-05-26 (6 months from 2025-11-26)
|
||||
# See DEPRECATED.md for migration guide
|
||||
#
|
||||
# [WARN] HAKMEM_TINY_TLS_CAP=2048 is outside typical range (16-1024)
|
||||
#
|
||||
# [OK] HAKMEM_DEBUG_LEVEL=2
|
||||
# [OK] HAKMEM_SUPERSLAB_LAZY=1
|
||||
```
|
||||
|
||||
### Test Performance
|
||||
```bash
|
||||
# Baseline (10M iterations, 10 runs recommended)
|
||||
./out/release/bench_random_mixed_hakmem
|
||||
|
||||
# Custom workload
|
||||
./out/release/bench_random_mixed_hakmem 10000000 256 42
|
||||
|
||||
# Multi-threaded (Larson benchmark)
|
||||
./out/release/larson_hakmem 8 # 8 threads
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ❓ FAQ
|
||||
|
||||
### Q: What's the difference between ALLOC_LEARN and MEM_LEARN?
|
||||
**A**:
|
||||
- `HAKMEM_ALLOC_LEARN`: Tunes **allocator behavior** (cache sizes, refill batches) based on allocation patterns
|
||||
- `HAKMEM_MEM_LEARN`: Tunes **memory management** (THP usage, RSS optimization, max-size detection)
|
||||
|
||||
### Q: Should I enable learning in production?
|
||||
**A**: **Generally NO**. Learning adds overhead (~5-10%) and is best for:
|
||||
- Adaptive workloads with unpredictable patterns
|
||||
- Benchmarking different configurations
|
||||
- Initial tuning phase (then bake learned values into static config)
|
||||
|
||||
For production, use static tuning based on profiling.
|
||||
|
||||
### Q: Why is SUPERSLAB_REUSE default OFF?
|
||||
**A**: Phase 12 (Shared SuperSlab Pool) removed per-class registry population. Reuse is now less effective and can cause fragmentation. Use `SUPERSLAB_LAZY=1` (default) instead for syscall reduction.
|
||||
|
||||
### Q: What's the performance impact of INTEGRITY_CHECKS?
|
||||
**A**: ~2-5% overhead. Recommended for production (default ON) to catch memory corruption early. Disable only for performance testing.
|
||||
|
||||
### Q: How do I migrate from deprecated learning variables?
|
||||
**A**: See [DEPRECATED.md](DEPRECATED.md) Section "Learning Systems (P2.2 Consolidation)" for complete mapping of 18→6 variables. The 6-month deprecation period provides backward compatibility.
|
||||
|
||||
### Q: What's SFC and why is it still active?
|
||||
**A**: SFC (Super Front Cache) is an ultra-fast TLS cache (95%+ hit rate, 3-4 instructions). Unified Cache was tested in Phase 3d-B but found slower than SFC, so SFC remained as the active implementation.
|
||||
|
||||
---
|
||||
|
||||
## 📚 See Also
|
||||
|
||||
- [DEPRECATED.md](DEPRECATED.md) - Deprecated variables and migration guide
|
||||
- [BUILDING_QUICKSTART.md](BUILDING_QUICKSTART.md) - Build instructions
|
||||
- [CLAUDE.md](CLAUDE.md) - Development history and performance benchmarks
|
||||
- [hakmem_cleanup_proposal.txt](hakmem_cleanup_proposal.txt) - Cleanup roadmap
|
||||
|
||||
---
|
||||
|
||||
**Generated**: 2025-11-26 (Phase 2.2 - Learning Systems Consolidation)
|
||||
150
docs/specs/DOCS_INDEX.md
Normal file
150
docs/specs/DOCS_INDEX.md
Normal file
@ -0,0 +1,150 @@
|
||||
HAKMEM Docs Index (2025-10-29)
|
||||
|
||||
Purpose
|
||||
- One‑page map for current work: how to build, run, compare, and tune.
|
||||
- Focus on Tiny fast‑path tuning vs system/mimalloc, with safe LD guidance.
|
||||
|
||||
Quick Build
|
||||
- Direct link (recommended for perf tuning)
|
||||
- `make bench_fast`
|
||||
- Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
|
||||
- PGO (direct link)
|
||||
- `./build_pgo.sh` (profile+build)
|
||||
- Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
|
||||
- Shared (LD_PRELOAD) PGO
|
||||
- `make pgo-profile-shared && make pgo-build-shared`
|
||||
- Run: `HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system`
|
||||
|
||||
Direct‑Link Comparisons (CSV)
|
||||
- Pair (HAKMEM vs mimalloc): `bash scripts/run_comprehensive_pair.sh`
|
||||
- CSV: `bench_results/comp_pair_YYYYMMDD_HHMMSS/summary.csv`
|
||||
- Tiny hot triad (HAKMEM/System/mimalloc): `bash scripts/run_tiny_hot_triad.sh 80000`
|
||||
- CSV: `bench_results/tiny_hot_triad_YYYYMMDD_HHMMSS/results.csv`
|
||||
- Random mixed triad: `bash scripts/run_random_mixed_matrix.sh 120000`
|
||||
- CSV: `bench_results/random_mixed_YYYYMMDD_HHMMSS/results.csv`
|
||||
|
||||
Perf‑Main preset (safe, mainline‑oriented)
|
||||
- Build + run triad: `bash scripts/run_perf_main_triad.sh 60000`
|
||||
- Applies recommended tiny env (TLS_SLL=1, REFILL_MAX=96, HOT=192, HYST=16) without bench‑only macros.
|
||||
|
||||
Tiny param sweeps
|
||||
- Basic: `bash scripts/sweep_tiny_params.sh 100000`
|
||||
- Advanced(SLL倍率/リフィル/クラス別MAGなど): `bash scripts/sweep_tiny_advanced.sh 80000 --mag64-512`
|
||||
|
||||
LD_PRELOAD Apps (opt‑in)
|
||||
- Script: `bash scripts/run_apps_with_hakmem.sh`
|
||||
- Default safety: `HAKMEM_LD_SAFE=2` (pass‑through) set in script, then per‑case `LD_PRELOAD` on.
|
||||
- Recommendation: use direct‑link for perf; LD runs are for stability sampling only.
|
||||
|
||||
Tiny Modes and Knobs
|
||||
- Normal (default): TLS magazine + TLS SLL (≤256B)
|
||||
- `HAKMEM_TINY_TLS_SLL=1` (default)
|
||||
- `HAKMEM_TINY_MAG_CAP=128` (good tiny bench preset; 64B may prefer 512)
|
||||
- TinyQuickSlot(最小フロント; 実験)
|
||||
- `HAKMEM_TINY_QUICK=1`
|
||||
- items[6] を1ラインに保持。miss時は SLL/Mag から少量補充して即返却。
|
||||
- Ultra (SLL‑only, experimental):
|
||||
- `HAKMEM_TINY_ULTRA=1` (opt‑in)
|
||||
- `HAKMEM_TINY_ULTRA_VALIDATE=0/1` (perf vs safety)
|
||||
- Per‑class overrides: `HAKMEM_TINY_ULTRA_BATCH_C{0..7}`, `HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}`
|
||||
- FLINT (Fast Lightweight INTelligence): Frontend + deferred Intelligence(実験)
|
||||
- `HAKMEM_TINY_FRONTEND=1` (enable array FastCache; miss falls back)
|
||||
- `HAKMEM_TINY_FASTCACHE=1` (low‑level switch; keep OFF unless A/B)
|
||||
- `HAKMEM_INT_ENGINE=1` (event ring + BG thread adjusts fill targets)
|
||||
- イベント拡張(内部): timestamp/tier/flags/site_id/thread をリングに蓄積(ホットパス外)。今後の適応に活用
|
||||
|
||||
Best‑Known Presets (direct link)
|
||||
- Tiny hot focus
|
||||
- `export HAKMEM_WRAP_TINY=1`
|
||||
- `export HAKMEM_TINY_TLS_SLL=1`
|
||||
- `export HAKMEM_TINY_MAG_CAP=128` (64B: try 512)
|
||||
- `export HAKMEM_TINY_REMOTE_DRAIN_TRYRATE=0`
|
||||
- `export HAKMEM_TINY_REMOTE_DRAIN_THRESHOLD=1000000`
|
||||
- Memory efficiency A/B
|
||||
- `export HAKMEM_TINY_FLUSH_ON_EXIT=1`
|
||||
- Run bench/app; compare steady‑state RSS with/without.
|
||||
|
||||
Refill Batch (A/B)
|
||||
- `HAKMEM_TINY_REFILL_MAX_HOT`(既定192)/ `HAKMEM_TINY_REFILL_MAX`(既定64)
|
||||
- 小サイズ帯(8/16/32B)でピーク探索。現環境は既定付近が最良帯
|
||||
|
||||
Current Results (high level)
|
||||
- Tiny hot triad (Perf‑Main, 60–80k cycles, safe):
|
||||
- 16–64B: System ≈ 300–335 M; HAKMEM ≈ 250–300 M; mimalloc 535–620 M.
|
||||
- 128B: HAKMEM ≈ 250–270 M; System 170–176 M; mimalloc 575–586 M.
|
||||
- Comprehensive (direct link): mimalloc ≈ 0.9–1.0B; HAKMEM ≈ 0.25–0.27B.
|
||||
- Random mixed: three close; mimalloc slightly ahead; HAKMEM ≈ System ± a few %.
|
||||
|
||||
Bench‑only highlight(参考値, 専用ビルド)
|
||||
- SLL‑only + warmup + PGO(≤64B)で 8–24B が 400M超、32B/b100 最大 429.18M(System 312.55M)。
|
||||
- 実行: `bash scripts/run_tiny_sllonly_triad.sh 30000`(安全な通常ビルドには含めません)
|
||||
|
||||
Open Focus
|
||||
- Close the 16–64B gap (cap/batch tuning; SLL/mini‑mag overhead shave).
|
||||
- Ultra (opt‑in) stabilization; A/B vs normal.
|
||||
- Frontend refill heuristics; BG engine stop/join wiring (added).
|
||||
|
||||
Mid Range MT (8-32KB, mimalloc-style)
|
||||
- **Status**: COMPLETE (2025-11-01) - 110M ops/sec achieved ✅
|
||||
- Quick benchmark: `bash benchmarks/scripts/mid/run_mid_mt_bench.sh`
|
||||
- Comparison: `bash benchmarks/scripts/mid/compare_mid_mt_allocators.sh`
|
||||
- Full report: `MID_MT_COMPLETION_REPORT.md`
|
||||
- Implementation: `core/hakmem_mid_mt.{c,h}`
|
||||
- Results: 110M ops/sec (100-101% of mimalloc, 2.12x faster than glibc)
|
||||
|
||||
ACE Learning Layer (Adaptive Control Engine)
|
||||
- **Status**: Phase 1 COMPLETE ✅ (2025-11-01) - Infrastructure ready 🚀
|
||||
- **Goal**: Fix weaknesses with adaptive learning (mimalloc超えを目指す!)
|
||||
- Fragmentation stress: 3.87 → 10-20 M ops/s (2.6-5.2x target)
|
||||
- Large WS: 22.15 → 30-45 M ops/s (1.4-2.0x target)
|
||||
- realloc: 277ns → 140-210ns (1.3-2.0x target)
|
||||
- **Documentation**:
|
||||
- User guide: `docs/ACE_LEARNING_LAYER.md` ✅
|
||||
- Technical plan: `docs/ACE_LEARNING_LAYER_PLAN.md` ✅
|
||||
- Progress report: `ACE_PHASE1_PROGRESS.md` ✅
|
||||
- **Phase 1 Deliverables** (COMPLETE ✅):
|
||||
- ✅ Metrics collection (`hakmem_ace_metrics.{c,h}`)
|
||||
- ✅ UCB1 learning algorithm (`hakmem_ace_ucb1.{c,h}`)
|
||||
- ✅ Dual-loop controller (`hakmem_ace_controller.{c,h}`)
|
||||
- ✅ Dynamic TLS capacity adjustment
|
||||
- ✅ Hot-path metrics integration (alloc/free tracking)
|
||||
- ✅ A/B benchmark script (`scripts/bench_ace_ab.sh`)
|
||||
- **Usage**:
|
||||
- Enable: `HAKMEM_ACE_ENABLED=1 ./your_benchmark`
|
||||
- Debug: `HAKMEM_ACE_ENABLED=1 HAKMEM_ACE_LOG_LEVEL=2 ./your_benchmark`
|
||||
- A/B test: `./scripts/bench_ace_ab.sh`
|
||||
- **Next**: Phase 2 - Extended benchmarking + learning convergence validation
|
||||
|
||||
Directory Structure (2025-11-01 Reorganization)
|
||||
- **benchmarks/** - All benchmark-related files
|
||||
- `src/` - Benchmark source code (tiny/mid/comprehensive/stress)
|
||||
- `scripts/` - Benchmark scripts organized by category
|
||||
- `results/` - Benchmark results (formerly bench_results/)
|
||||
- `perf/` - Performance profiling data (formerly perf_data/)
|
||||
- **tests/** - Test files (unit/integration/stress)
|
||||
- **core/** - Core allocator implementation
|
||||
- **docs/** - Documentation (benchmarks/, api/, guides/)
|
||||
- **scripts/** - Development scripts (build/, apps/, maintenance/)
|
||||
- **archive/** - Historical documents and analysis
|
||||
|
||||
Where to Read More
|
||||
- **SlabHandle Box**: `docs/SLAB_HANDLE.md`(ownership + remote drain + metadata のカプセル化)
|
||||
- **Free Safety**: `docs/FREE_SAFETY.md`(二重free/クラス不一致のFail‑Fastとリング運用)
|
||||
- **Cleanup/Organization**: `CLEANUP_SUMMARY_2025_11_01.md` (latest)
|
||||
- **Archive**: `archive/README.md` - Historical docs and analysis
|
||||
- Bench mode: `BENCH_MODE.md`
|
||||
- Env knobs: `ENV_VARS.md`
|
||||
- Tiny hot microbench: `TINY_HOT_BENCH.md`
|
||||
- Frontend/Backend split: `FRONTEND_BACKEND_PLAN.md`
|
||||
- LD status/safety: `LD_PRELOAD_STATUS.md`
|
||||
- Goals/Targets: `GOALS_2025_10_29.md`
|
||||
- Latest results: `BENCH_RESULTS_2025_10_29.md` (today), `BENCH_RESULTS_2025_10_28.md` (yesterday)
|
||||
- Mainline integration plan: `MAINLINE_INTEGRATION.md`
|
||||
- FLINT Intelligence (events/adaptation): `FLINT_INTELLIGENCE.md`
|
||||
|
||||
Hako / MIR / FFI
|
||||
- `HAKO_MIR_FFI_SPEC.md` — フロント型検証完結+MIRは運ぶだけ+FFI機械的ローワリングの仕様
|
||||
|
||||
Notes
|
||||
- LD mode: keep `HAKMEM_LD_SAFE=2` default for apps; prefer direct‑link for tuning.
|
||||
- Ultra/Frontend are experimental; keep OFF by default and use scripts for A/B.
|
||||
@ -1,106 +1,327 @@
|
||||
# ENV Vars (Runtime Controls)
|
||||
HAKMEM Environment Variables (Tiny focus)
|
||||
|
||||
学習・キャッシュ・ラッパー挙動などのランタイム制御一覧です。
|
||||
Core toggles
|
||||
- HAKMEM_WRAP_TINY=1
|
||||
- Tiny allocatorを有効化(直リンク)
|
||||
- HAKMEM_TINY_USE_SUPERSLAB=0/1
|
||||
- SuperSlab経路のON/OFF(既定ON)
|
||||
|
||||
## 学習(CAP / 窓 / 予算)
|
||||
- `HAKMEM_LEARN=1` — CAP学習ON(別スレッド)
|
||||
- `HAKMEM_LEARN_WINDOW_MS` — 学習窓(既定 1000ms)
|
||||
- `HAKMEM_TARGET_HIT_MID` / `HAKMEM_TARGET_HIT_LARGE` — 目標ヒット率(既定 0.65 / 0.55)
|
||||
- `HAKMEM_CAP_STEP_MID` / `HAKMEM_CAP_STEP_LARGE` — CAPの更新ステップ(既定 4 / 1)
|
||||
- `HAKMEM_BUDGET_MID` / `HAKMEM_BUDGET_LARGE` — 合計CAPの上限(0=無効)
|
||||
SFC (Super Front Cache) stats / A/B
|
||||
- HAKMEM_SFC_ENABLE=0/1
|
||||
- Box 5‑NEW: Super Front Cache を有効化(既定OFF; A/B用)。
|
||||
- HAKMEM_SFC_CAPACITY=16..256 / HAKMEM_SFC_REFILL_COUNT=8..256
|
||||
- SFCの容量とリフィル個数(例: 256/128)。
|
||||
- HAKMEM_SFC_STATS_DUMP=1
|
||||
- プロセス終了時に SFC 統計をstderrへダンプ(alloc_hits/misses, refill_calls など)。
|
||||
- 使い方: make CFLAGS+=" -DHAKMEM_DEBUG_COUNTERS=1" larson_hakmem; HAKMEM_SFC_ENABLE=1 HAKMEM_SFC_STATS_DUMP=1 ./larson_hakmem …
|
||||
|
||||
## Mid/Large CAP手動上書き
|
||||
- `HAKMEM_CAP_MID=a,b,c,d,e` — 2/4/8/16/32KiB のCAP(ページ)
|
||||
- `HAKMEM_CAP_LARGE=a,b,c,d,e` — 64/128/256/512KiB/1MiB のCAP(バンドル)
|
||||
Larson defaults (publish→mail→adopt)
|
||||
- 忘れがちな必須変数をスクリプトで一括設定するため、`scripts/run_larson_defaults.sh` を用意しています。
|
||||
- 既定で以下を export します(A/B は環境変数で上書き可能):
|
||||
- `HAKMEM_TINY_USE_SUPERSLAB=1` / `HAKMEM_TINY_MUST_ADOPT=1` / `HAKMEM_TINY_SS_ADOPT=1`
|
||||
- `HAKMEM_TINY_FAST_CAP=64`
|
||||
- `HAKMEM_TINY_FAST_SPARE_PERIOD=8` ← fast-tier から Superslab へ戻して publish 起点を作る
|
||||
- `HAKMEM_TINY_TLS_LIST=1`
|
||||
- `HAKMEM_TINY_MAILBOX_SLOWDISC=1`
|
||||
- `HAKMEM_TINY_MAILBOX_SLOWDISC_PERIOD=256`
|
||||
|
||||
## 可変Midクラス(DYN1)
|
||||
- `HAKMEM_MID_DYN1=<bytes>` — 可変クラス1枠を有効化(例: 14336)
|
||||
- `HAKMEM_CAP_MID_DYN1=<pages>` — DYN1専用CAP
|
||||
- `HAKMEM_DYN1_AUTO=1` — サイズ分布ピークから自動割り当て(固定クラスと衝突しない場合のみ)
|
||||
- `HAKMEM_HIST_SAMPLE=N` — サイズ分布のサンプリング(2^N に1回)
|
||||
Front Gate (A/B for boxified fast path)
|
||||
- `HAKMEM_TINY_FRONT_GATE_BOX=1` — Use Front Gate Box implementation (SFC→SLL) for fast-path pop/push/cascade. Default 0. Safe to toggle during builds via `make EXTRA_CFLAGS+=" -DHAKMEM_TINY_FRONT_GATE_BOX=1"`.
|
||||
- Debug visibility(任意): `HAKMEM_TINY_RF_TRACE=1`
|
||||
- Force-notify(任意, デバッグ補助): `HAKMEM_TINY_RF_FORCE_NOTIFY=1`
|
||||
- モード別(tput/pf)で Superslab サイズと cache/precharge も設定:
|
||||
- tput: `HAKMEM_TINY_SS_FORCE_LG=21`, `HAKMEM_TINY_SS_CACHE=0`, `HAKMEM_TINY_SS_PRECHARGE=0`
|
||||
- pf: `HAKMEM_TINY_SS_FORCE_LG=20`, `HAKMEM_TINY_SS_CACHE=4`, `HAKMEM_TINY_SS_PRECHARGE=1`
|
||||
|
||||
## ラッパー挙動(LD_PRELOAD)
|
||||
- `HAKMEM_WRAP_L2=1` / `HAKMEM_WRAP_L25=1` — ラッパー内でもMid/L2.5使用を許可(安全に留意)
|
||||
- `HAKMEM_POOL_TLS_FREE=0/1` — Mid free をTLS返却(1=既定)
|
||||
- `HAKMEM_POOL_MIN_BUNDLE=<n>` — Mid補充の最小バンドル(既定2)
|
||||
- `HAKMEM_POOL_REFILL_BATCH=1-4` — Phase 6.25: Mid Pool refill 時のページ batch 数(既定2、1=batch無効)
|
||||
- `HAKMEM_WRAP_TINY=1` — ラッパー内でもTinyを許可(magazineのみ/ロック回避)
|
||||
- `HAKMEM_WRAP_TINY_REFILL=1` — ラッパー内で小規模trylockリフィル許可(安全性優先で既定OFF)
|
||||
Ultra Tiny (SLL-only, experimental)
|
||||
- HAKMEM_TINY_ULTRA=0/1
|
||||
- Ultra TinyモードのON/OFF(SLL中心の最小ホットパス)
|
||||
- HAKMEM_TINY_ULTRA_VALIDATE=0/1
|
||||
- UltraのSLLヘッド検証(安全性重視時に1、性能計測は0推奨)
|
||||
- HAKMEM_TINY_ULTRA_BATCH_C{0..7}=N
|
||||
- クラス別リフィル・バッチ上書き(例: class=3(64B) → C3)
|
||||
- HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}=N
|
||||
- クラス別SLL上限上書き
|
||||
|
||||
## 丸め許容(W_MAX)
|
||||
- `HAKMEM_WMAX_MID` / `HAKMEM_WMAX_LARGE` — 丸め許容(例: 1.4)
|
||||
- `HAKMEM_WMAX_LEARN=1` — W_MAX学習ON(簡易: ラウンドロビン)
|
||||
- `HAKMEM_WMAX_CANDIDATES_MID` / `HAKMEM_WMAX_CANDIDATES_LARGE` — 候補(例: "1.4,1.6,1.7")
|
||||
- `HAKMEM_WMAX_DWELL_SEC` — 候補切替の最小保持秒数(既定10)
|
||||
SuperSlab adopt/publish(実験)
|
||||
- HAKMEM_TINY_SS_ADOPT=0/1
|
||||
- SuperSlab の publish/adopt + remote drain + owner移譲を有効化(既定OFF)。
|
||||
- 4T Larson など cross-thread free が多いワークロードで再利用密度を高めるための実験用スイッチ。
|
||||
- ON 時は一部の単体性能(1T)が低下する可能性があるため A/B 前提で使用してください。
|
||||
- 備考: 環境変数を未設定の場合でも、実行中に cross-thread free が検出されると自動で ON になる(auto-on)。
|
||||
- HAKMEM_TINY_SS_ADOPT_COOLDOWN=4
|
||||
- adopt 再試行までのクールダウン(スレッド毎)。0=無効。
|
||||
- HAKMEM_TINY_SS_ADOPT_BUDGET=8
|
||||
- superslab_refill() 内で adopt を試行する最大回数(0-32)。
|
||||
- HAKMEM_TINY_SS_ADOPT_BUDGET_C{0..7}
|
||||
- クラス別の adopt 予算個別上書き(0-32)。指定時は `HAKMEM_TINY_SS_ADOPT_BUDGET` より優先。
|
||||
- HAKMEM_TINY_SS_REQTRACE=1
|
||||
- 収穫ゲート(guard)や ENOMEM フォールバック、slab/SS 採用のリクエストトレースを標準エラーに出力(軽量)。
|
||||
- HAKMEM_TINY_RF_FORCE_NOTIFY=0/1(デバッグ補助)
|
||||
- remote queue がすでに非空(old!=0)でも、`slab_listed==0` の場合に publish を強制通知。
|
||||
- 初回の空→非空通知を見逃した可能性をあぶり出す用途に有効(A/B 推奨)。
|
||||
|
||||
## プロファイル
|
||||
- `HAKMEM_PROF=1` / `HAKMEM_PROF_SAMPLE=N` — 軽量サンプリング・プロファイラ
|
||||
- `HAKMEM_ACE_SAMPLE=N` — L1ヒット/ミス/L1フォールバックのサンプル率
|
||||
Ready List(Refill最適化の箱)
|
||||
- 2025-12 cleanup: Ready系ENVは廃止。Ready ringは常時有効、幅/予算は固定(width=TINY_READY_RING, budget=1)。
|
||||
|
||||
## カウンタのサンプリング(ホットパス書込みの削減)
|
||||
- `HAKMEM_POOL_COUNT_SAMPLE=N` — Midの`hits/misses/frees`を2^Nに1回だけ更新(既定10=1/1024)
|
||||
- `HAKMEM_TINY_COUNT_SAMPLE=N` — Tinyの`alloc/free`カウントを2^Nに1回だけ更新(既定8=1/256)
|
||||
Background Remote Drain(束ね箱・軽量ステップ)
|
||||
- 2025-12 cleanup: BG Remote系ENV(HAKMEM_TINY_BG_REMOTE*)は廃止。BGリモート/aggregatorは固定OFF。
|
||||
|
||||
## セーフティ
|
||||
- `HAKMEM_SAFE_FREE=1` — free時 mincore ガード(オーバーヘッド注意)
|
||||
Ready Aggregator(BG, 非破壊peek)
|
||||
- 2025-12 cleanup: Ready Aggregator系ENVも廃止(固定OFF)。
|
||||
|
||||
## Mid TLS 二段(リング+ローカルLIFO)
|
||||
- `HAKMEM_POOL_TLS_RING=0/1` — TLSリング有効化(既定1)
|
||||
- `HAKMEM_TRYLOCK_PROBES=K` — 非空シャードへのtrylock試行回数(既定3)
|
||||
- `HAKMEM_RING_RETURN_DIV=2|3|4` — リング満杯時の吐き戻し率(2=1/2, 3=1/3)
|
||||
- `HAKMEM_TLS_LO_MAX=<n>` — TLSローカルLIFOの上限(既定256)
|
||||
- `HAKMEM_SHARD_MIX=1` — site→shardの分散ハッシュを強化(splitmix64)
|
||||
Registry 窓(探索コストのA/B)
|
||||
- HAKMEM_TINY_REG_SCAN_MAX=N
|
||||
- Registry の“小窓”で走査する最大エントリ数(既定256)。
|
||||
- 値を小さくすると superslab_refill() と mmap直前ゲートでの探索コストが減る一方、adopt 命中率が低下し OOM/新規mmap が増える可能性あり。
|
||||
- Tiny‑Hotなど命中率が高い場合は 64/128 などをA/B推奨。
|
||||
|
||||
## L2.5(LargePool)専用
|
||||
- `HAKMEM_L25_RUN_BLOCKS=<n>` — bump-runのブロック数を上書き(クラス共通)。既定はクラス別に約2MiB/ラン(64KB:32, 128KB:16, 256KB:8, 512KB:4, 1MB:2)
|
||||
- `HAKMEM_L25_RUN_FACTOR=<n>` — ラン長の倍率(1..8)。`RUN_BLOCKS` 指定時は無効
|
||||
- `HAKMEM_L25_PREF=remote|run` — TLSミス時の順序。`remote`=リモートドレイン優先、`run`=bump-run優先(既定: remote)
|
||||
- `HAKMEM_WRAP_L25=0/1` — ラッパー内でもL2.5使用を許可(既定0)
|
||||
- `HAKMEM_L25_TC_SPILL=<n>` — free時のTransfer Cacheスピル閾値(既定32、0=無効)
|
||||
- `HAKMEM_L25_BG_DRAIN=0/1` — BGスレッドで remote→freelist を定期ドレイン(既定0)
|
||||
- `HAKMEM_L25_BG_MS=<n>` — BGドレイン間隔(ミリ秒, 既定5)
|
||||
- `HAKMEM_L25_TC_CAP=<n>` — TCリング容量(既定64, 8..64)
|
||||
- `HAKMEM_L25_RING_TRIGGER=<n>` — remote-firstの起動トリガ(リング残がn以下の時だけ、既定2)
|
||||
- `HAKMEM_L25_OWNER_INBOUND=0/1` — owner直帰モード(cross‑thread freeはページownerのinboundへ積む)。allocは自分のinboundから少量drainしてTLSへ
|
||||
- `HAKMEM_L25_INBOUND_SLOTS=<n>` — inboundスロット数(既定512, 128..2048 目安)。ビルド既定より大きい値は切り捨て
|
||||
Mid 向け簡素化リフィル(128–1024B向けの分岐削減)
|
||||
- HAKMEM_TINY_MID_REFILL_SIMPLE=0/1
|
||||
- クラス>=4(128B以上)で、sticky/hot/mailbox/registry/adopt の多段探索をスキップし、
|
||||
1) 既存TLSのSuperSlabに未使用Slabがあれば直接初期化→bind、
|
||||
2) なければ新規SuperSlabを確保して先頭Slabをbind、の順に簡素化します。
|
||||
- 目的: superslab_refill() 内の分岐と走査を削減(tput重視A/B用)。
|
||||
- 注意: adopt機会が減るため、PFやメモリ効率は変動します。常用前にA/B必須。
|
||||
|
||||
## ログ抑制
|
||||
- `HAKMEM_INVALID_FREE_LOG=0/1` — 無効freeログ出力のON/OFF(既定0=抑制)
|
||||
Mid 向けリフィル・バッチ(SLL補強)
|
||||
- HAKMEM_TINY_REFILL_COUNT_MID=N
|
||||
- クラス>=4(128B以上)の SLL リフィル時に carve する個数の上書き(既定: max_take または余力)。
|
||||
- 例: 32/64/96 でA/B。SLLが枯渇しにくくなり、refill頻度が下がる可能性あり。
|
||||
|
||||
注: 上記の TLS/RING/PROBES/LO_MAX は L2.5(LargePool)にも適用されます(同名ENVで連動)。
|
||||
Alloc側 remote ヘッド読みの緩和(A/B)
|
||||
- HAKMEM_TINY_ALLOC_REMOTE_RELAX=0/1
|
||||
- hak_tiny_alloc_superslab() で `remote_heads[slab_idx]` 非ゼロチェックを relaxed 読みで実施(既定は acquire)。
|
||||
- 所有権獲得→drain の順序は保持されるため安全。分岐率の低下・ロード圧の軽減を狙うA/B用。
|
||||
|
||||
## バッチ系(madvise/munmap のバックグラウンド化)
|
||||
- `HAKMEM_BATCH_BG=0/1` — バックグラウンドスレッドでバッチをフラッシュ(既定1=ON)
|
||||
- 大きな解放(>=64KiB)は `hak_batch_add()` に蓄積→しきい値到達/定期でBGが flush
|
||||
- ホットパスから madvise/munmap を外し、TLBフラッシュ/システムコールをBGへ移譲
|
||||
Front命中率の底上げ(採用境界でのスプライス)
|
||||
- HAKMEM_TINY_DRAIN_TO_SLL=N(0=無効)
|
||||
- 採用境界(drain→owner→bind)直後に、freelist から最大 N 個を TLS の SLL へ移す(class 全般)。
|
||||
- 目的: 次回 tiny_alloc_fast_pop のミス率を低下させる(cross‑thread供給をFrontへ寄せる)。
|
||||
- 境界厳守: 本スプライスは採用境界の中だけで実施。publish 側で drain/owner を触らない。
|
||||
|
||||
## タイミング計測(Debug Timing)
|
||||
- `HAKMEM_TIMING=1` — カテゴリ別の集計をstderrにダンプ(終了時)
|
||||
- 主要カテゴリ(抜粋):
|
||||
- Mid(L2): `pool_lock`, `pool_refill`, `pool_tc_drain`, `pool_tls_ring_pop`, `pool_tls_lifo_pop`, `pool_remote_push`, `pool_alloc_tls_page`
|
||||
- L2.5: `l25_lock`, `l25_refill`, `l25_tls_ring_pop`, `l25_tls_lifo_pop`, `l25_remote_push`, `l25_alloc_tls_page`, `l25_shard_steal`
|
||||
Front リフィル量(A/B)
|
||||
- HAKMEM_TINY_REFILL_COUNT=N(全クラス共通)
|
||||
- HAKMEM_TINY_REFILL_COUNT_HOT=N(class<=3)
|
||||
- HAKMEM_TINY_REFILL_COUNT_MID=N(class>=4)
|
||||
- HAKMEM_TINY_REFILL_COUNT_C{0..7}=N(クラス個別)
|
||||
- tiny_alloc_fast のリフィル数を制御(既定16)。大きくするとミス頻度が下がる一方、1回のリフィルコストは増える。
|
||||
|
||||
重要: publish/adopt の前提(SuperSlab ON)
|
||||
- HAKMEM_TINY_USE_SUPERSLAB=1
|
||||
- publish→mailbox→adopt のパイプラインは SuperSlab 経路が ON のときのみ動作します。
|
||||
- ベンチでは既定ONを推奨(A/BでOFFにしてメモリ効率重視の比較も可能)。
|
||||
- OFF の場合、[Publish Pipeline]/[Publish Hits] は 0 のままとなります。
|
||||
|
||||
SuperSlab cache / precharge(Phase 6.24+)
|
||||
- HAKMEM_TINY_SS_CACHE=N
|
||||
- クラス共通の SuperSlab キャッシュ上限(per-class の保持枚数)。0=無制限、未指定=無効。
|
||||
- キャッシュ有効時は `superslab_free()` が空の SuperSlab を即 munmap せず、キャッシュに積んで再利用する。
|
||||
- HAKMEM_TINY_SS_CACHE_C{0..7}=N
|
||||
- クラス別のキャッシュ上限(個別指定)。指定があるクラスは `HAKMEM_TINY_SS_CACHE` より優先。
|
||||
- HAKMEM_TINY_SS_PRECHARGE=N
|
||||
- Tiny クラスごとに N 枚の SuperSlab を事前確保し、キャッシュにプールする。0=無効。
|
||||
- 事前確保した SuperSlab は `MAP_POPULATE` 相当で先読みされ、初回アクセス時の PF を抑制。
|
||||
- 指定すると自動的にキャッシュも有効化される(precharge 分を保持するため)。
|
||||
- HAKMEM_TINY_SS_PRECHARGE_C{0..7}=N
|
||||
- クラス別の precharge 枚数(個別上書き)。例: 8B クラスのみ 4 枚プリチャージ → `HAKMEM_TINY_SS_PRECHARGE_C0=4`
|
||||
- HAKMEM_TINY_SS_POPULATE_ONCE=1
|
||||
- 次回 `mmap` で取得する SuperSlab を 1 回だけ `MAP_POPULATE` で fault-in(A/B 用のワンショットプリタッチ)。
|
||||
|
||||
Harvest / Guard(mmap前の収穫ゲート)
|
||||
- HAKMEM_TINY_GUARD=0/1
|
||||
- 新規 mmap 直前に trim/adopt を優先して実施するゲートを有効化(既定ON)。
|
||||
- HAKMEM_TINY_SS_CAP=N
|
||||
- Tiny 各クラスにおける SuperSlab 上限(0=無制限)。
|
||||
- HAKMEM_TINY_SS_CAP_C{0..7}=N
|
||||
- クラス別上限の個別指定(0=無制限)。
|
||||
- HAKMEM_TINY_GLOBAL_WATERMARK_MB=MB
|
||||
- 総確保バイト数がしきい値(MB)を超えた場合にハーベストを強制(0=無効)。
|
||||
|
||||
Counters(ダンプ)
|
||||
- HAKMEM_TINY_COUNTERS_DUMP=1
|
||||
- 拡張カウンタを標準エラーにダンプ(クラス別)。
|
||||
- SS adopt/publish に加えて、Slab adopt/publish/requeue/miss を出力。
|
||||
- [Publish Pipeline]: notify_calls / same_empty_pubs / remote_transitions / mailbox_reg_calls / mailbox_slow_disc
|
||||
- [Free Pipeline]: ss_local / ss_remote / tls_sll / magazine
|
||||
|
||||
Safety (free の検証)
|
||||
- HAKMEM_SAFE_FREE=1
|
||||
- free 境界で追加の検証を有効化(SuperSlab 範囲・クラス不一致・危険な二重 free の検出)。
|
||||
- デバッグ時の既定推奨。perf 計測時は 0 を推奨。
|
||||
- HAKMEM_SAFE_FREE_STRICT=1
|
||||
- 無効 free(クラス不一致/未割当/二重free)が検出されたら Fail‑Fast(リング出力→SIGUSR2)。
|
||||
- 既定は 0(ログのみ)。
|
||||
|
||||
Frontend (mimalloc-inspired, experimental)
|
||||
- HAKMEM_TINY_FRONTEND=0/1
|
||||
- フロントエンドFastCacheを有効化(ホットパス最小化、miss時のみバックエンド)
|
||||
- HAKMEM_INT_ENGINE=0/1
|
||||
- 遅延インテリジェンス(イベント収集+BG適応)を有効化
|
||||
- HAKMEM_INT_ADAPT_REFILL=0/1
|
||||
- INTで refill 上限(`HAKMEM_TINY_REFILL_MAX(_HOT)`)をウィンドウ毎に±16で調整(既定ON)
|
||||
- HAKMEM_INT_ADAPT_CAPS=0/1
|
||||
- INTでクラス別 MAG/SLL 上限を軽く調整(±16/±32)。熱いクラスは上限を少し広げ、低頻度なら縮小(既定ON)
|
||||
- HAKMEM_INT_EVENT_TS=0/1
|
||||
- イベントにtimestamp(ns)を含める(既定OFF)。OFFならclock_gettimeコールを避ける(ホットパス軽量化)
|
||||
- HAKMEM_INT_SAMPLE=N
|
||||
- イベントを 1/2^N の確率でサンプリング(既定: N未設定=全記録)。例: N=5 → 1/32。INTが有効なときのホットパス負荷を制御
|
||||
- HAKMEM_TINY_FASTCACHE=0/1
|
||||
- 低レベルFastCacheスイッチ(通常は不要。A/B実験用)
|
||||
- HAKMEM_TINY_QUICK=0/1
|
||||
- TinyQuickSlot(64B/クラスの超小スタック)を最前段に有効化。
|
||||
- 仕様: items[6] + top を1ラインに集約。ヒット時は1ラインアクセスのみで返却。
|
||||
- miss時: SLL→Quick or Magazine→Quick の順に少量補充してから返却(既存構造を保持)。
|
||||
- 推奨: 小サイズ(≤256B)A/B用。安定後に既定ONを検討。
|
||||
|
||||
FLINT naming(別名・概念用)
|
||||
- FLINT = FRONT(HAKMEM_TINY_FRONTEND) + INT(HAKMEM_INT_ENGINE)
|
||||
- 一括ONの別名環境変数(実装は今後の予定):
|
||||
- HAKMEM_FLINT=1 → FRONT+INTを有効化(予定)
|
||||
- HAKMEM_FLINT_FRONT=1 → FRONTのみ(= HAKMEM_TINY_FRONTEND)
|
||||
- HAKMEM_FLINT_BG=1 → INTのみ(= HAKMEM_INT_ENGINE)
|
||||
|
||||
Other useful
|
||||
|
||||
New (debug isolation)
|
||||
- HAKMEM_TINY_DISABLE_READY=0/1
|
||||
- Ready/Mailboxのコンシューマ経路を完全停止(既定0=ON)。TSan/ASanの隔離実験でSS+freelistのみを通す用途。
|
||||
- HAKMEM_DEBUG_SEGV=0/1
|
||||
- 早期SIGSEGVハンドラを登録し、stderrへバックトレースを1回だけ出力(環境により未出力のことあり)。
|
||||
- HAKMEM_FORCE_LIBC_ALLOC_INIT=0/1
|
||||
- プロセス起動~hak_init()完了までの期間だけ、malloc/free を libc へ強制ルーティング(初期化中の dlsym→malloc 再帰や
|
||||
TLS 未初期化アクセスを回避)。init 完了後は自動で通常経路に戻る(env が設定されていても、init 後は無効化される動作)。
|
||||
- HAKMEM_TINY_MAG_CAP=N
|
||||
- TLSマガジンの上限(通常パスのチューニングに使用)
|
||||
- HAKMEM_TINY_MAG_CAP_C{0..7}=N
|
||||
- クラス別のTLSマガジン上限(通常パス)。指定時はクラスごとの既定値を上書き(例: 64B=class3 に 512 を指定)
|
||||
- HAKMEM_TINY_TLS_SLL=0/1
|
||||
- 通常パスのSLLをON/OFF
|
||||
- HAKMEM_SLL_MULTIPLIER=N
|
||||
- 小サイズクラス(0..3, 8/16/32/64B)のSLL上限を MAG_CAP×N まで拡張(上限TINY_TLS_MAG_CAP)。既定2。1..16の間で調整
|
||||
- HAKMEM_TINY_SLL_CAP_C{0..7}=N
|
||||
- 通常パスのクラス別SLL上限(絶対値)。指定時は倍率計算をバイパス
|
||||
- HAKMEM_TINY_REFILL_MAX=N
|
||||
- マガジン低水位時の一括補充上限(既定64)。大きくすると補充回数が減るが瞬間メモリ圧は増える
|
||||
- HAKMEM_TINY_REFILL_MAX_HOT=N
|
||||
- 8/16/32/64Bクラス(class<=3)向けの上位上限(既定192)。小サイズ帯のピーク探索用
|
||||
- HAKMEM_TINY_REFILL_MAX_C{0..7}=N(新)
|
||||
- クラス別の補充上限(個別上書き)。設定があるクラスのみ有効(0=未設定)
|
||||
- HAKMEM_TINY_REFILL_MAX_HOT_C{0..7}=N(新)
|
||||
- ホットクラス(0..3)用の個別上書き。設定がある場合は `REFILL_MAX_HOT` より優先
|
||||
- (削除済み) HAKMEM_TINY_BG_REMOTE*
|
||||
- 2025-12 cleanup: BG Remote系ENVは廃止(BGリモートは固定OFF)。
|
||||
- HAKMEM_TINY_PREFETCH=0/1
|
||||
- SLLポップ時にhead/nextの軽量プリフェッチを有効化(微調整用、既定OFF)
|
||||
- HAKMEM_TINY_REFILL_COUNT=N(ULTRA_SIMPLE用)
|
||||
- ULTRA_SIMPLE の SLL リフィル個数(既定 32、8–256)。
|
||||
- HAKMEM_TINY_FLUSH_ON_EXIT=0/1
|
||||
- 退出時にTinyマガジンをフラッシュ+トリム(RSS計測用)
|
||||
- HAKMEM_TINY_RSS_BUDGET_KB=N(新)
|
||||
- INTエンジン起動時にTinyのRSS予算(kB)を設定。超過時にクラス別のMAG/SLL上限を段階的に縮小(メモリ優先)。
|
||||
- HAKMEM_TINY_INT_TIGHT=0/1(新)
|
||||
- INTの調整を縮小側にバイアス(閾値を上げ、MAG/SLLの最小値を床に近づける)。
|
||||
- HAKMEM_TINY_DIET_STEP=N(新, 既定16)
|
||||
- 予算超過時の一回あたり縮小量(MAG: step, SLL: step×2)。
|
||||
- HAKMEM_TINY_CAP_FLOOR_C{0..7}=N(新)
|
||||
- クラス別MAGの下限(例: C0=64, C3=128)。INTの縮小時にこれ未満まで下げない。
|
||||
- HAKMEM_DEBUG_COUNTERS=0/1
|
||||
- パス/Ultraのデバッグカウンタをビルドに含める(既定0=除去)。ONで `HAKMEM_TINY_PATH_DEBUG=1` 時に atexit ダンプ。
|
||||
- HAKMEM_ENABLE_STATS
|
||||
- 定義時のみホットパスで `stats_record_alloc/free` を実行。未定義時は完全に呼ばれない(ベンチ最小化)。
|
||||
- HAKMEM_TINY_TRACE_RING=1
|
||||
- Tiny Debug Ring を有効化。`SIGUSR2` またはクラッシュ時に直近4096件の alloc/free/publish/remote イベントを stderr ダンプ。
|
||||
- HAKMEM_TINY_DEBUG_FAST0=1
|
||||
- fast-tier/hot/TLS リストを強制バイパスし Slow/SS 経路のみで動作させるデバッグモード(FrontGate の境界切り分け用)。
|
||||
- HAKMEM_TINY_DEBUG_REMOTE_GUARD=1
|
||||
- SuperSlab remote queue への push 前後でポインタ境界を検証。異常時は Debug Ring に `remote_invalid` を記録して Fail-Fast。
|
||||
- HAKMEM_TINY_STAT_SAMPLING(ビルド定義, 任意)/ HAKMEM_TINY_STAT_RATE_LG(環境, 任意)
|
||||
- 統計が有効な場合でも、alloc側の統計更新を低頻度化(例: RATE_LG=14 → 16384回に1回)。
|
||||
- 既定はOFF(サンプリング無し=毎回更新)。ベンチ用にONで命令数を削減可能。
|
||||
- HAKMEM_TINY_HOTMAG=0/1
|
||||
- 小クラス用の小型TLSマガジン(128要素, classes 0..3)を有効化。既定0(A/B用)。
|
||||
- alloc: HotMag→SLL→Magazine の順でヒットを狙う。free: SLL優先、溢れ時にHotMag→Magazine。
|
||||
|
||||
USDT/tracepoints(perfのユーザ空間静的トレース)
|
||||
- ビルド時に `CFLAGS+=-DHAKMEM_USDT=1` を付与すると、主要分岐にUSDT(DTrace互換)プローブが埋め込まれます。
|
||||
- 依存: `<sys/sdt.h>`(Debian/Ubuntu: `sudo apt-get install systemtap-sdt-dev`)。
|
||||
- プローブ名(provider=hakmem)例:
|
||||
- `sll_pop`, `mag_pop`, `front_pop`(allocホットパス)
|
||||
- `bump_hit`(TLSバンプシャドウ命中)
|
||||
- `slow_alloc`(スローパス突入)
|
||||
- 使い方(例):
|
||||
- `HAKMEM_TIMING=1 LD_PRELOAD=./libhakmem.so mimalloc-bench/bench/larson/larson 10 65536 1048576 10000 1 12345 4`
|
||||
- 一覧: `perf list 'sdt:hakmem:*'`
|
||||
- 集計: `perf stat -e sdt:hakmem:front_pop,cycles ./bench_tiny_hot_hakmem 32 100 40000`
|
||||
- 記録: `perf record -e sdt:hakmem:sll_pop -e sdt:hakmem:mag_pop ./bench_tiny_hot_hakmem 32 100 50000`
|
||||
- 権限/環境の注意:
|
||||
- `unknown tracepoint` → perfがUSDT(sdt:)非対応、または古いツール。`sudo apt-get install linux-tools-$(uname -r)` を推奨。
|
||||
- `can't access trace events` → tracefs権限不足。
|
||||
- `sudo mount -t tracefs -o mode=755 nodev /sys/kernel/tracing`
|
||||
- `sudo sysctl kernel.perf_event_paranoid=1`
|
||||
- WSLなど一部カーネルでは UPROBE/USDT が無効な場合があります(PMUのみにフォールバック)。
|
||||
|
||||
## Mid Transfer Cache(TC)
|
||||
- `HAKMEM_TC_ENABLE=0/1` — TCを有効化(既定1)
|
||||
- `HAKMEM_TC_UNBOUNDED=0/1` — ドレイン個数の上限を無効化(既定1)
|
||||
- `HAKMEM_TC_DRAIN_MAX=<n>` — 1回のallocでドレインする最大個数(既定64程度、0で無制限)
|
||||
- `HAKMEM_TC_DRAIN_TRIGGER=<n>` — リング残量がn未満のときのみドレイン(既定2)
|
||||
ビルドプリセット(Tiny‑Hot最短フロント)
|
||||
- コンパイル時フラグ: `-DHAKMEM_TINY_MINIMAL_FRONT=1`
|
||||
- 入口から UltraFront/Quick/Frontend/HotMag/SuperSlab try/BumpShadow を物理的に除去
|
||||
- 残る経路: `SLL → TLS Magazine → SuperSlab →(以降のスローパス)`
|
||||
- Makefileターゲット: `make bench_tiny_front`
|
||||
- ベンチと相性の悪い分岐を取り除き、命令列を短縮(PGOと併用推奨)
|
||||
- 付与フラグ: `-DHAKMEM_TINY_MAG_OWNER=0`(マガジン項目のowner書き込みを省略し、alloc/freeの書込み負荷を削減)
|
||||
- 実行時スイッチ(軽量A/B): `HAKMEM_TINY_MINIMAL_HOT=1`
|
||||
- 入口で SuperSlab TLSバンプ→SuperSlab直経路を優先(ビルド除去ではなく分岐)
|
||||
- Tiny‑Hotでは概ね不利(命令・分岐増)なため、既定OFF。ベンチA/B用途のみ。
|
||||
|
||||
## MF2: Per-Page Sharding(Phase 7.2)
|
||||
- `HAKMEM_MF2_ENABLE=0/1` — MF2 Per-Page Sharding有効化(既定0=無効)
|
||||
- mimalloc方式: 各64KBページが独立したfreelistを保持、O(1)ページ検索
|
||||
- 期待性能: Mid 4T +50% (13.78 → 20.7 M/s)
|
||||
Scripts
|
||||
- scripts/run_tiny_hot_triad.sh <cycles>
|
||||
- scripts/run_tiny_benchfast_triad.sh <cycles> — bench-only fast path triad
|
||||
- scripts/run_tiny_sllonly_triad.sh <cycles> — SLL-only + warmup + PGO triad
|
||||
- scripts/run_tiny_sllonly_r12w192_triad.sh <cycles> — SLL-only tuned(32B: REFILL=12, WARMUP32=192)
|
||||
- scripts/run_ultra_debug_sweep.sh <cycles> <batch>
|
||||
- scripts/sweep_ultra_params.sh <cycles> <bench_batch>
|
||||
- scripts/run_comprehensive_pair.sh
|
||||
- scripts/run_random_mixed_matrix.sh <cycles>
|
||||
|
||||
## ビルド時(Makefile)
|
||||
- `RING_CAP=<8|16|32>` — TLSリング容量(Mid)。`make shared RING_CAP=16` など
|
||||
Bench-only build flags (compile-time)
|
||||
- HAKMEM_TINY_BENCH_FASTPATH=1 — 入口を SLL→Mag→tiny refill に固定(最短パス)
|
||||
- HAKMEM_TINY_BENCH_SLL_ONLY=1 — Mag を物理的に除去(SLL-only)、freeもSLLに直push
|
||||
- HAKMEM_TINY_BENCH_TINY_CLASSES=3 — 対象クラス(0..N, 3→≤64B)
|
||||
- HAKMEM_TINY_BENCH_WARMUP8/16/32/64 — 初回ウォームアップ個数(例: 32=160〜192)
|
||||
- HAKMEM_TINY_BENCH_REFILL/REFILL8/16/32/64 — リフィル個数(例: REFILL32=12)
|
||||
|
||||
## しきい値(mmap)
|
||||
- `HAKMEM_THP_LEARN=1`(将来)/ `thp_threshold` は FrozenPolicy 側に保持(既定 2MiB)
|
||||
Makefile helpers
|
||||
- bench_fastpath / pgo-benchfast-* — bench_fastpathのPGO
|
||||
- bench_sll_only / pgo-benchsll-* — SLL-onlyのPGO
|
||||
- pgo-benchsll-r12w192-* — 32Bに合わせたREFILL/WARMUPのPGO
|
||||
|
||||
## ヘッダ書込み(Mid, 実験的)
|
||||
- `HAKMEM_HDR_LIGHT=0|1|2`
|
||||
- 0: フルヘッダ(magic/method/size/alloc_site/class_bytes/owner_tid)
|
||||
- 1: 最小ヘッダ(magic/method/size のみ。owner未設定)
|
||||
- 2: ヘッダ書込み/検証スキップ(危険。ページ記述子の所有者判定と併用前提)
|
||||
Perf‑Main preset(メインライン向け、安全寄り, opt‑in)
|
||||
- 推奨環境変数(例):
|
||||
- `HAKMEM_TINY_TLS_SLL=1`
|
||||
- `HAKMEM_TINY_REFILL_MAX=96`
|
||||
- `HAKMEM_TINY_REFILL_MAX_HOT=192`
|
||||
- `HAKMEM_TINY_SPILL_HYST=16`
|
||||
- 実行例:
|
||||
- Tiny‑Hot triad: `HAKMEM_TINY_TLS_SLL=1 HAKMEM_TINY_REFILL_MAX=96 HAKMEM_TINY_REFILL_MAX_HOT=192 HAKMEM_TINY_SPILL_HYST=16 bash scripts/run_tiny_hot_triad.sh 60000`
|
||||
- Random‑Mixed: `HAKMEM_TINY_TLS_SLL=1 HAKMEM_TINY_REFILL_MAX=96 HAKMEM_TINY_REFILL_MAX_HOT=192 HAKMEM_TINY_SPILL_HYST=16 bash scripts/run_random_mixed_matrix.sh 100000`
|
||||
|
||||
LD safety (for apps/LD_PRELOAD runs)
|
||||
- HAKMEM_LD_SAFE=0/1/2
|
||||
- 0: full (開発用のみ推奨)
|
||||
- 1: Tinyのみ(非Tinyはlibcへ委譲)
|
||||
- 2: パススルー(推奨デフォルト)
|
||||
- HAKMEM_TINY_SPECIALIZE_8_16=0/1(新)
|
||||
- 8/16B向けに“mag-popのみ”の特化経路を有効化(既定OFF)。A/B用。
|
||||
- HAKMEM_TINY_SPECIALIZE_32_64=0/1
|
||||
- 32/64B向けに“mag-popのみ”の特化経路を有効化(既定OFF)。A/B用。
|
||||
- HAKMEM_TINY_SPECIALIZE_MASK=<int>(新)
|
||||
- クラス別に特化を有効化するビットマスク(bit0=8B, bit1=16B, …, bit7=64B)。
|
||||
- 例: 0x02 → 16Bのみ特化、0x0C → 32/64B特化。
|
||||
- HAKMEM_TINY_BENCH_MODE=1
|
||||
- ベンチ専用の簡素化採用パスを有効化。per-class 単一点の公開スロットを使用し、superslab_refill のスキャンと多段リング走査を回避。
|
||||
- OOMガード(harvest/trim)は保持。A/B用途に限定してください。
|
||||
|
||||
Runner build knobs(scripts/run_larson_claude.sh)
|
||||
- HAKMEM_BUILD_3LAYER=1
|
||||
- `make larson_hakmem_3layer` を用いて 3-layer Tiny をビルドして実行(LTO=OFF/O1)。
|
||||
- HAKMEM_BUILD_ROUTE=1
|
||||
- `make larson_hakmem_route` を用いて 3-layer + Route 指紋(ビルド時ON)でビルドして実行。
|
||||
- 実行時は `HAKMEM_TINY_TRACE_RING=1 HAKMEM_ROUTE=1` を併用してリングにルートを出力。
|
||||
|
||||
@ -166,31 +166,17 @@ From `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_stats.h`:
|
||||
- **Purpose**: Probability (1/N) of attempting trylock drain
|
||||
- **Impact**: Lower = more aggressive draining
|
||||
|
||||
#### HAKMEM_TINY_BG_REMOTE
|
||||
- **Default**: 0
|
||||
- **Purpose**: Enable background thread for remote free draining
|
||||
- **Impact**: Offloads drain work from allocation path
|
||||
- **Warning**: Requires background thread
|
||||
#### HAKMEM_TINY_BG_REMOTE (削除済み)
|
||||
- 2025-12 cleanup: BG Remote系ENVは廃止。BGリモートドレインは固定OFF。
|
||||
|
||||
#### HAKMEM_TINY_BG_REMOTE_BATCH
|
||||
- **Default**: 32
|
||||
- **Purpose**: Number of target slabs processed per BG loop
|
||||
- **Impact**: Larger = more work per iteration
|
||||
#### HAKMEM_TINY_BG_REMOTE_BATCH (削除済み)
|
||||
- 2025-12 cleanup: BG Remote batch ENVは廃止(固定値32未使用)。
|
||||
|
||||
#### HAKMEM_TINY_BG_SPILL
|
||||
- **Default**: 0
|
||||
- **Purpose**: Enable background magazine spill queue
|
||||
- **Impact**: Deferred magazine overflow handling
|
||||
#### HAKMEM_TINY_BG_SPILL (削除済み)
|
||||
- 2025-12 cleanup: BG Spill系ENVは廃止。BG spillは固定OFF。
|
||||
|
||||
#### HAKMEM_TINY_BG_BIN
|
||||
- **Default**: 0
|
||||
- **Purpose**: Background bin index for spill target
|
||||
- **Impact**: Controls which magazine bin gets background processing
|
||||
|
||||
#### HAKMEM_TINY_BG_TARGET
|
||||
- **Default**: 512
|
||||
- **Purpose**: Target magazine size for background trimming
|
||||
- **Impact**: Trim magazines above this size
|
||||
#### HAKMEM_TINY_BG_BIN / HAKMEM_TINY_BG_TARGET (削除済み)
|
||||
- 2025-12 cleanup: BG Bin/Target ENVは廃止(BG bin処理は固定OFF)。
|
||||
|
||||
---
|
||||
|
||||
@ -311,26 +297,17 @@ From `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_stats.h`:
|
||||
- **Impact**: Ultra-fast path for ≤64B
|
||||
- **Experimental**: Bench-only optimization
|
||||
|
||||
#### HAKMEM_TINY_HOTMAG
|
||||
- **Default**: 0
|
||||
- **Purpose**: Enable small TLS hot magazine (128 items, classes 0-3)
|
||||
- **Impact**: Extra fast layer for 8-64B
|
||||
- **Experimental**: A/B testing
|
||||
#### HAKMEM_TINY_HOTMAG (削除済み)
|
||||
- 2025-12 cleanup: HotMag runtime ENVトグルは削除。HotMagはデフォルトOFF固定、ENVでの調整不可。
|
||||
|
||||
#### HAKMEM_TINY_HOTMAG_CAP
|
||||
- **Default**: 128
|
||||
- **Purpose**: HotMag capacity override
|
||||
- **Impact**: Larger = more TLS memory
|
||||
#### HAKMEM_TINY_HOTMAG_CAP (削除済み)
|
||||
- 2025-12 cleanup: HotMag容量ENVを削除(固定値128)。
|
||||
|
||||
#### HAKMEM_TINY_HOTMAG_REFILL
|
||||
- **Default**: 64
|
||||
- **Purpose**: HotMag refill batch size
|
||||
- **Impact**: Batch size when refilling from backend
|
||||
#### HAKMEM_TINY_HOTMAG_REFILL (削除済み)
|
||||
- 2025-12 cleanup: HotMag refillバッチENVを削除(固定値32)。
|
||||
|
||||
#### HAKMEM_TINY_HOTMAG_C{0..7}
|
||||
- **Default**: None
|
||||
- **Purpose**: Per-class HotMag enable/disable
|
||||
- **Example**: `HAKMEM_TINY_HOTMAG_C2=1` (enable for 32B)
|
||||
#### HAKMEM_TINY_HOTMAG_C{0..7} (削除済み)
|
||||
- 2025-12 cleanup: クラス別HotMag有効/無効ENVを削除(全クラス固定OFF)。
|
||||
|
||||
---
|
||||
|
||||
|
||||
141
docs/specs/POOL_TLS_QUICKSTART.md
Normal file
141
docs/specs/POOL_TLS_QUICKSTART.md
Normal file
@ -0,0 +1,141 @@
|
||||
# Pool TLS Phase 1.5a - Quick Start Guide
|
||||
|
||||
Pool TLS Phase 1.5a は 8KB-52KB のメモリ割り当てを高速化する TLS Arena 実装です。
|
||||
|
||||
## 🚀 クイックスタート
|
||||
|
||||
### 1. 開発サイクル(最も簡単!)
|
||||
|
||||
```bash
|
||||
# Build + Verify + Smoke Test を一発で実行
|
||||
./dev_pool_tls.sh test
|
||||
|
||||
# 結果:
|
||||
# ✅ All checks passed!
|
||||
```
|
||||
|
||||
### 2. ベンチマーク実行
|
||||
|
||||
```bash
|
||||
# Pool TLS vs System malloc の性能比較
|
||||
./run_pool_bench.sh
|
||||
|
||||
# 結果例:
|
||||
# HAKMEM (Pool TLS): 1790000 ops/s
|
||||
# System malloc: 189000 ops/s
|
||||
# Performance ratio: 947% (9.47x)
|
||||
# 🏆 HAKMEM WINS!
|
||||
```
|
||||
|
||||
### 3. 個別ビルド
|
||||
|
||||
```bash
|
||||
# Pool TLS Phase 1.5a を有効にしてビルド
|
||||
./build_pool_tls.sh bench_mid_large_mt_hakmem
|
||||
./build_pool_tls.sh larson_hakmem
|
||||
./build_pool_tls.sh bench_random_mixed_hakmem
|
||||
```
|
||||
|
||||
## 📋 スクリプト一覧
|
||||
|
||||
| スクリプト | 用途 | 使い方 |
|
||||
|-----------|------|--------|
|
||||
| `dev_pool_tls.sh` | 開発サイクル統合 | `./dev_pool_tls.sh test` |
|
||||
| `build_pool_tls.sh` | Pool TLS ビルド | `./build_pool_tls.sh <target>` |
|
||||
| `run_pool_bench.sh` | 性能ベンチマーク | `./run_pool_bench.sh` |
|
||||
| `build.sh` | 汎用ビルド(ChatGPT製) | `./build.sh <target>` |
|
||||
| `verify_build.sh` | ビルド検証(ChatGPT製) | `./verify_build.sh <binary>` |
|
||||
|
||||
## 🎯 推奨ワークフロー
|
||||
|
||||
### コード変更時
|
||||
```bash
|
||||
# 1. コード編集
|
||||
vim core/pool_tls_arena.c
|
||||
|
||||
# 2. クイックテスト(5-10秒)
|
||||
./dev_pool_tls.sh test
|
||||
|
||||
# 3. OK なら詳細ベンチマーク
|
||||
./run_pool_bench.sh
|
||||
```
|
||||
|
||||
### デバッグ時
|
||||
```bash
|
||||
# 1. デバッグビルド
|
||||
./build_debug.sh bench_mid_large_mt_hakmem gdb
|
||||
|
||||
# 2. GDB で実行
|
||||
gdb ./bench_mid_large_mt_hakmem
|
||||
(gdb) run 1 100 256 42
|
||||
```
|
||||
|
||||
### クリーンビルド
|
||||
```bash
|
||||
# 全削除してリビルド
|
||||
./dev_pool_tls.sh clean
|
||||
./dev_pool_tls.sh build
|
||||
```
|
||||
|
||||
## 🔧 有効化されている機能
|
||||
|
||||
Pool TLS ビルドでは以下が自動的に有効化されます:
|
||||
|
||||
- ✅ `POOL_TLS_PHASE1=1` - Pool TLS Phase 1.5a(8-52KB)
|
||||
- ✅ `HEADER_CLASSIDX=1` - Phase 7 header-based free
|
||||
- ✅ `AGGRESSIVE_INLINE=1` - Phase 7 aggressive inlining
|
||||
- ✅ `PREWARM_TLS=1` - Phase 7 TLS cache pre-warming
|
||||
|
||||
**フラグを忘れる心配なし!** スクリプトが全て設定します。
|
||||
|
||||
## 📊 性能目標
|
||||
|
||||
| Phase | 目標性能 | 現状 |
|
||||
|-------|----------|------|
|
||||
| Phase 1.5a (baseline) | 1-2M ops/s | ✅ 1.79M ops/s |
|
||||
| Phase 1.5b (optimized) | 5-15M ops/s | 🚧 開発中 |
|
||||
| Phase 2 (learning) | 15-30M ops/s | 📅 予定 |
|
||||
|
||||
## ❓ トラブルシューティング
|
||||
|
||||
### ビルドエラー
|
||||
```bash
|
||||
# フラグ確認
|
||||
make print-flags
|
||||
|
||||
# クリーンビルド
|
||||
./dev_pool_tls.sh clean
|
||||
./dev_pool_tls.sh build
|
||||
```
|
||||
|
||||
### 性能が出ない
|
||||
```bash
|
||||
# ビルド検証(古いバイナリでないか確認)
|
||||
./verify_build.sh bench_mid_large_mt_hakmem
|
||||
|
||||
# リビルド
|
||||
./build_pool_tls.sh bench_mid_large_mt_hakmem
|
||||
```
|
||||
|
||||
### SEGV クラッシュ
|
||||
```bash
|
||||
# デバッグビルド
|
||||
./build_debug.sh bench_mid_large_mt_hakmem gdb
|
||||
|
||||
# gdb で実行
|
||||
gdb ./bench_mid_large_mt_hakmem
|
||||
(gdb) run 1 100 256 42
|
||||
(gdb) bt
|
||||
```
|
||||
|
||||
## 📝 開発メモ
|
||||
|
||||
- **依存関係追跡**: `-MMD -MP` で自動検出(ChatGPT 実装)
|
||||
- **フラグ不整合チェック**: Makefile が自動検証(ChatGPT 実装)
|
||||
- **ビルド検証**: `verify_build.sh` でタイムスタンプ確認(ChatGPT 実装)
|
||||
|
||||
## 🎓 詳細ドキュメント
|
||||
|
||||
- `CLAUDE.md` - 開発履歴
|
||||
- `POOL_TLS_INVESTIGATION_FINAL.md` - Phase 1.5a 調査報告
|
||||
- `Makefile` - ビルドシステム詳細
|
||||
Reference in New Issue
Block a user