Phase 1 完了:環境変数整理 + fprintf デバッグガード ENV変数削除(BG/HotMag系): - core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines) - core/hakmem_tiny_bg_spill.c: BG spill ENV 削除 - core/tiny_refill.h: BG remote 固定値化 - core/hakmem_tiny_slow.inc: BG refs 削除 fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE): - core/hakmem_shared_pool.c: Lock stats (~18 fprintf) - core/page_arena.c: Init/Shutdown/Stats (~27 fprintf) - core/hakmem.c: SIGSEGV init message ドキュメント整理: - 328 markdown files 削除(旧レポート・重複docs) 性能確認: - Larson: 52.35M ops/s (前回52.8M、安定動作✅) - ENV整理による機能影響なし - Debug出力は一部残存(次phase で対応) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
15 KiB
HAKMEM Configuration Crisis - Executive Summary
Date: 2025-11-26 Status: 🔴 CRITICAL - Configuration complexity is hindering development Reading Time: 10 minutes
🚨 The Crisis in Numbers
| Metric | Current | Target | Reduction |
|---|---|---|---|
| Runtime ENV variables | 236 | 80 | -66% |
| Build-time flags | 59+ | ~40 | -32% |
| Shell scripts | 30 files (3000 LOC) | 8 entry points | -73% |
| JSON presets | 1 file, 3 presets | 4+ files, organized | Better structure |
| Configuration guides | 0 | 3+ comprehensive | ∞% improvement |
| Deprecation tracking | None | Automated timeline | Needed |
Bottom Line: HAKMEM has grown from a research allocator to a production system, but configuration management hasn't scaled. We're at the point where even the original developers are losing track of features.
📊 Quick Facts
Environment Variables (236 total)
By Category:
TINY Allocator: 113 vars (48%) 🔴 BLOATED
Debug/Profiling: 31 vars (13%)
Learning Systems: 18 vars (8%) 🟡 6 independent systems
SuperSlab: 15 vars (6%)
Shared Pool: 12 vars (5%)
Mid-Large: 11 vars (5%)
Benchmarking: 10 vars (4%)
Others: 26 vars (11%)
By Status:
Active & Used: ~120 vars (51%)
Deprecated/Dead: ~60 vars (25%) 🔴 REMOVE
Research/Experimental: ~40 vars (17%)
Undocumented: ~16 vars (7%) 🔴 UNCLEAR
Build Flags (59+ total)
By Category:
Feature Toggles: 23 flags (39%)
Optimization: 15 flags (25%)
Debug/Instrumentation: 12 flags (20%)
Build Modes: 9 flags (15%)
Shell Scripts (30 files)
By Type:
Benchmarking: 14 scripts (47%) 🟡 Overlapping
ENV Setup: 6 scripts (20%) 🔴 Duplicated
Build Helpers: 5 scripts (17%)
Utilities: 5 scripts (17%)
Problem: No clear entry points, duplicated logic across 30 files, zero coordination.
🔥 Top 5 Critical Issues
1. TINY Allocator Configuration Explosion (113 vars)
The Problem: TINY allocator has evolved through multiple phases (v1 → v2 → ULTRA → SLIM → Unified), but old configuration layers were never removed. Result: 113 overlapping environment variables.
Examples of Chaos:
# Refill configuration (7 overlapping strategies!)
HAKMEM_TINY_REFILL_BATCH_SIZE=64
HAKMEM_TINY_P0_BATCH=32 # Same as above?
HAKMEM_TINY_SFC_REFILL=16 # SFC is deprecated!
HAKMEM_UNIFIED_REFILL_SIZE=64 # Unified path
HAKMEM_TINY_FAST_REFILL_COUNT=32 # Fast path
HAKMEM_TINY_ULTRA_REFILL=8 # Ultra path
HAKMEM_TINY_SLIM_REFILL_BATCH=16 # SLIM path
# Debug toggles (11 variants with overlapping names!)
HAKMEM_TINY_DEBUG=1
HAKMEM_DEBUG_TINY=1 # Same thing?
HAKMEM_TINY_VERBOSE=1
HAKMEM_TINY_DEBUG_VERBOSE=1 # Combined?
HAKMEM_TINY_LOG=1
... (6 more variants)
Impact:
- Developers don't know which variables to use
- Testing matrix is impossibly large (2^113 combinations)
- Configuration bugs are common
- Onboarding new developers takes weeks
Recommendation: Consolidate to ~40 variables organized by architectural layer:
- Core allocation: 15 vars
- TLS caching: 8 vars
- Refill/drain: 6 vars
- Debug: 5 vars
- Learning: 6 vars
2. Dead Code Still Has Active Config (60+ vars)
The Problem: Features have been replaced or deprecated, but their configuration variables are still active, causing confusion.
Examples:
SFC (Single-Free-Cache) - REPLACED by Unified Cache:
HAKMEM_TINY_SFC_ENABLE=1 # 🔴 Dead (replaced Nov 2024)
HAKMEM_TINY_SFC_CAP=128 # 🔴 Dead
HAKMEM_TINY_SFC_REFILL=16 # 🔴 Dead
HAKMEM_TINY_SFC_SPILL_THRESH=96 # 🔴 Dead
HAKMEM_TINY_SFC_BATCH_POP=8 # 🔴 Dead
HAKMEM_TINY_SFC_STATS=1 # 🔴 Dead
Status: Unified Cache replaced SFC in Phase 3d-B (2025-11-20), but SFC vars still parsed.
PAGE_ARENA - Research artifact, never integrated:
HAKMEM_PAGE_ARENA_ENABLE=1 # 🔴 Research-only
HAKMEM_PAGE_ARENA_SIZE_MB=16 # 🔴 Research-only
HAKMEM_PAGE_ARENA_GROWTH=2 # 🔴 Research-only
HAKMEM_PAGE_ARENA_MAX_MB=128 # 🔴 Research-only
HAKMEM_PAGE_ARENA_THP=1 # 🔴 Research-only
Status: Experimental code from 2024-09, never productionized, still has active config.
Other Dead Features:
- EXTERNAL_GUARD (3 vars) - Purpose unclear, no documentation
- MF2 (3 vars) - Undocumented, possibly abandoned
- OLD_REFILL (5 vars) - Replaced by P0 batch refill
Impact:
- Users waste time trying dead features
- CI tests dead code paths
- Codebase appears larger than it is
Recommendation: Remove dead code and deprecate variables with 6-month timeline.
3. Learning System Chaos (6 independent systems)
The Problem: HAKMEM has 6 separate learning/adaptive systems with unclear interaction semantics.
The 6 Systems:
1. HAKMEM_LEARN=1 # Global meta-learner?
2. HAKMEM_TINY_LEARN=1 # TINY-specific learning
3. HAKMEM_TINY_CAP_LEARN=1 # TLS capacity learning
4. HAKMEM_ADAPTIVE_SIZING=1 # Size class tuning
5. HAKMEM_THP_LEARN=1 # Transparent Huge Pages
6. HAKMEM_WMAX_LEARN=1 # Workload max size learning
Questions with No Answers:
- Can these be enabled together? Do they conflict?
- Which learning system owns TLS cache sizing?
- What happens if TINY_LEARN=1 but LEARN=0?
- Is there a master learning coordinator?
Additional Learning Vars (12 more):
HAKMEM_LEARN_RATE=0.1
HAKMEM_LEARN_DECAY=0.95
HAKMEM_LEARN_MIN_SAMPLES=1000
HAKMEM_TINY_LEARN_WINDOW=10000
HAKMEM_ADAPTIVE_SIZING_INTERVAL_MS=5000
... (7 more tuning parameters)
Impact:
- Unpredictable behavior when multiple systems enabled
- No documented interaction model
- Difficult to debug performance issues
- Unclear which system to tune
Recommendation: Consolidate to 2 learning systems:
- Allocation Learning: Size classes, TLS capacity, refill tuning
- Memory Learning: THP, RSS optimization, SuperSlab lifecycle
With clear boundaries and documented interaction semantics.
4. Scripts Anarchy (30 files, 3000 LOC, zero hierarchy)
The Problem: Scripts have accumulated organically with no organization. Multiple scripts do the same thing with subtle differences.
Examples:
Running Larson - 6 different ways:
scripts/run_larson.sh # Which one to use?
scripts/run_larson_1t.sh # 1 thread variant
scripts/run_larson_8t.sh # 8 thread variant
scripts/larson_benchmark.sh # Different from run_larson.sh?
scripts/bench_larson_preset.sh # Uses JSON presets
scripts/quick_larson.sh # Quick test variant
Which should I use? → Unclear.
Running Random Mixed - 3 different ways:
scripts/run_random_mixed.sh # Hardcoded params
scripts/bench_random_mixed_json.sh # Uses JSON preset
scripts/quick_random_mixed.sh # Different defaults
ENV Setup Duplication (copy-pasted across 30 files):
# This block appears in 12+ scripts:
export HAKMEM_TINY_HEADER_CLASSIDX=1
export HAKMEM_TINY_AGGRESSIVE_INLINE=1
export HAKMEM_TINY_PREWARM_TLS=1
export HAKMEM_SS_EMPTY_REUSE=1
export HAKMEM_TINY_UNIFIED_CACHE=1
# ... (20 more vars duplicated everywhere)
Impact:
- New developers don't know where to start
- Bug fixes need to be applied to 6+ scripts
- Inconsistent behavior across scripts
- No single source of truth
Recommendation: Reorganize to 8 entry points:
scripts/
├── bench/ # Benchmarking entry points
│ ├── larson.sh # Single Larson entry (flags for 1T/8T)
│ ├── random_mixed.sh # Single Random Mixed entry
│ └── suite.sh # Full benchmark suite
├── config/ # Configuration presets
│ ├── production.env # Production defaults
│ ├── debug.env # Debug configuration
│ └── research.env # Research/experimental
├── lib/ # Shared utilities
│ ├── env_setup.sh # Single source of ENV setup
│ └── validation.sh # Config validation
└── README.md # Scripts guide
5. Zero Configuration Documentation
The Problem: 236 environment variables, 59 build flags, 30 scripts → ZERO master documentation.
What's Missing:
- ❌ Master list of all ENV variables
- ❌ Categorization of variables by purpose
- ❌ Default values documentation
- ❌ Interaction semantics (which vars conflict?)
- ❌ Preset selection guide
- ❌ Deprecation timeline
- ❌ Scripts coordination guide
- ❌ Configuration examples for common use cases
Current State: Configuration knowledge exists only in:
- Source code (scattered across 100+ files)
- Git commit messages (hard to search)
- Claude's memory (not accessible to others)
- Tribal knowledge (not written down)
Impact:
- 2+ weeks onboarding time for new developers
- Configuration bugs in production
- Wasted time experimenting with dead features
- Duplicate questions ("Which Larson script should I use?")
Recommendation: Create 3 comprehensive guides:
- CONFIGURATION.md - Master reference (all vars categorized)
- PRESET_GUIDE.md - How to choose presets
- SCRIPTS_GUIDE.md - Scripts hierarchy and usage
🎯 Proposed Cleanup Strategy
Phase 0: Immediate Wins (P0, 2 days effort, LOW risk)
Goal: Quick improvements that establish cleanup patterns.
P0.1: Unify SuperSlab Variables (5 vars → 3 vars)
- Remove:
HAKMEM_SS_EMPTY_REUSE,HAKMEM_SUPERSLAB_REUSE(duplicates) - Keep:
HAKMEM_SUPERSLAB_REUSE,HAKMEM_SUPERSLAB_LAZY,HAKMEM_SUPERSLAB_PREWARM - Effort: 1 hour (grep + replace + deprecation notice)
P0.2: Create Master Preset Registry (1 file → 4 files)
presets/production.json- Recommended production configpresets/debug.json- Full debugging enabledpresets/research.json- Experimental featurespresets/minimal.json- Minimal feature set- Effort: 2 hours (extract from current presets)
P0.3: Clean Up build.sh Pinned Flags
- Document all pinned flags in
BUILD_FLAGS.md - Remove obsolete flags (POOL_TLS_PHASE1=0, etc.)
- Effort: 2 hours
P0.4: Consolidate Debug Variables (11 vars → 4 vars)
HAKMEM_DEBUG_LEVEL(0-3): 0=none, 1=errors, 2=info, 3=verboseHAKMEM_DEBUG_TINY(0/1): TINY allocator specificHAKMEM_DEBUG_POOL(0/1): Pool allocator specificHAKMEM_DEBUG_MID(0/1): Mid-Large allocator specific- Effort: 3 hours (consolidate scattered debug toggles)
P0.5: Create DEPRECATED.md
- List all deprecated variables with sunset dates
- Add deprecation warnings to code (TLS-cached, lightweight)
- Effort: 1 hour
Total Phase 0 Effort: 2 days Risk: LOW (backward compatible with deprecation warnings)
Phase 1: Structural Improvements (P1, 3 days effort, MEDIUM risk)
Goal: Reorganize and document configuration system.
P1.1: Reorganize Scripts Hierarchy
- Move to
scripts/{bench,config,lib}/structure - Consolidate 6 Larson scripts → 1 with flags
- Create shared
lib/env_setup.sh - Effort: 1 day
P1.2: Create CONFIGURATION.md
- Master reference for all 236 variables
- Categorize by allocator/feature
- Document defaults and interactions
- Effort: 1 day
P1.3: Create PRESET_GUIDE.md
- When to use each preset
- How to customize presets
- Common configuration patterns
- Effort: 4 hours
P1.4: Add Preset Versioning
presets/v1/production.json(semantic versioning)- Migration guide for preset changes
- Effort: 2 hours
P1.5: Add Configuration Validation
- Runtime check for conflicting vars
- Warning for deprecated vars (console + log)
- Effort: 4 hours
Total Phase 1 Effort: 3 days Risk: MEDIUM (scripts reorganization may break workflows)
Phase 2: Deep Cleanup (P2, 4 days effort, MEDIUM risk)
Goal: Remove dead code and consolidate overlapping features.
P2.1: Remove Dead Code
- SFC (6 vars) → Remove
- PAGE_ARENA (5 vars) → Remove or document as research
- EXTERNAL_GUARD (3 vars) → Remove
- MF2 (3 vars) → Remove
- OLD_REFILL (5 vars) → Remove
- Effort: 1 day (with 6-month deprecation period)
P2.2: Consolidate Learning Systems (6 systems → 2 systems)
- Allocation Learning: size classes, TLS, refill
- Memory Learning: THP, RSS, SuperSlab lifecycle
- Document interaction semantics
- Effort: 2 days (complex refactoring)
P2.3: Reorganize TINY Allocator Config (113 vars → ~40 vars)
- Core allocation: 15 vars
- TLS caching: 8 vars
- Refill/drain: 6 vars
- Debug: 5 vars
- Learning: 6 vars
- Effort: 2 days (with 6-month migration)
P2.4: Unify Profiling/Stats (15 vars → 4 vars)
HAKMEM_PROFILE_LEVEL(0-3)HAKMEM_STATS_INTERVAL_MSHAKMEM_STATS_OUTPUT_FILEHAKMEM_TRACE_ALLOCATIONS(0/1)- Effort: 4 hours
P2.5: Remove Benchmark-Specific Hacks
HAKMEM_BENCH_FAST_MODE- should be a preset, not ENV varHAKMEM_TINY_ULTRA_SIMPLE- merge into debug level- Effort: 2 hours
Total Phase 2 Effort: 4 days Risk: MEDIUM (requires careful migration planning)
📈 Success Metrics
Quantitative
ENV Variables: 236 → 80 (-66%)
Build Flags: 59 → 40 (-32%)
Shell Scripts: 30 → 8 (-73%)
Undocumented Vars: 16 → 0 (-100%)
Qualitative
- ✅ New developer onboarding: 2 weeks → 2 days
- ✅ Configuration bugs: Common → Rare
- ✅ Testing matrix: Intractable → Manageable
- ✅ Feature discovery: Trial-and-error → Documented
📅 Timeline
| Phase | Duration | Risk | Dependencies |
|---|---|---|---|
| Phase 0 | 2 days | LOW | None |
| Phase 1 | 3 days | MEDIUM | Phase 0 complete |
| Phase 2 | 4 days | MEDIUM | Phase 1 complete |
| Total | 9 days | Manageable | Incremental rollout |
Deprecation Period: 6 months (2025-11-26 → 2026-05-26)
🚀 Getting Started
Immediate Next Steps:
- ✅ Read this summary (you're done!)
- 📖 Review detailed analysis:
hakmem_config_analysis.txt - 🛠️ Review concrete proposal:
hakmem_cleanup_proposal.txt - 🎯 Start with P0.1 (SuperSlab unification) - lowest risk, sets pattern
- 📝 Track progress in
CONFIG_CLEANUP_PROGRESS.md
Questions?
- Technical details →
hakmem_config_analysis.txt - Implementation plan →
hakmem_cleanup_proposal.txt - Quick reference → This document
📚 Related Documents
-
hakmem_config_analysis.txt (30-min read)
- Complete inventory of 236 ENV variables
- Detailed categorization and pain points
- Scripts analysis and configuration drift examples
-
hakmem_cleanup_proposal.txt (30-min read)
- Concrete implementation roadmap
- Step-by-step instructions for each phase
- Risk mitigation strategies
-
CONFIGURATION.md (to be created in P1.2)
- Master reference for all configuration
- Will become single source of truth
Last Updated: 2025-11-26 Next Review: After Phase 0 completion (est. 2025-11-28)