Files

Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)

Phase 1 完了：環境変数整理 + fprintf デバッグガード

ENV変数削除（BG/HotMag系）:
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除（旧レポート・重複docs）

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作✅)
- ENV整理による機能影響なし
- Debug出力は一部残存（次phase で対応）

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-26 14:45:26 +09:00

15 KiB

Raw Blame History

HAKMEM Configuration Crisis - Executive Summary

Date: 2025-11-26 Status: 🔴 CRITICAL - Configuration complexity is hindering development Reading Time: 10 minutes

🚨 The Crisis in Numbers

Metric	Current	Target	Reduction
Runtime ENV variables	236	80	-66%
Build-time flags	59+	~40	-32%
Shell scripts	30 files (3000 LOC)	8 entry points	-73%
JSON presets	1 file, 3 presets	4+ files, organized	Better structure
Configuration guides	0	3+ comprehensive	∞% improvement
Deprecation tracking	None	Automated timeline	Needed

Bottom Line: HAKMEM has grown from a research allocator to a production system, but configuration management hasn't scaled. We're at the point where even the original developers are losing track of features.

📊 Quick Facts

Environment Variables (236 total)

By Category:

TINY Allocator:        113 vars (48%) 🔴 BLOATED
Debug/Profiling:        31 vars (13%)
Learning Systems:       18 vars (8%)  🟡 6 independent systems
SuperSlab:              15 vars (6%)
Shared Pool:            12 vars (5%)
Mid-Large:              11 vars (5%)
Benchmarking:           10 vars (4%)
Others:                 26 vars (11%)

By Status:

Active & Used:         ~120 vars (51%)
Deprecated/Dead:        ~60 vars (25%) 🔴 REMOVE
Research/Experimental:  ~40 vars (17%)
Undocumented:          ~16 vars (7%)  🔴 UNCLEAR

Build Flags (59+ total)

By Category:

Feature Toggles:        23 flags (39%)
Optimization:           15 flags (25%)
Debug/Instrumentation:  12 flags (20%)
Build Modes:             9 flags (15%)

Shell Scripts (30 files)

By Type:

Benchmarking:          14 scripts (47%) 🟡 Overlapping
ENV Setup:              6 scripts (20%) 🔴 Duplicated
Build Helpers:          5 scripts (17%)
Utilities:              5 scripts (17%)

Problem: No clear entry points, duplicated logic across 30 files, zero coordination.

🔥 Top 5 Critical Issues

1. TINY Allocator Configuration Explosion (113 vars)

The Problem: TINY allocator has evolved through multiple phases (v1 → v2 → ULTRA → SLIM → Unified), but old configuration layers were never removed. Result: 113 overlapping environment variables.

Examples of Chaos:

# Refill configuration (7 overlapping strategies!)
HAKMEM_TINY_REFILL_BATCH_SIZE=64
HAKMEM_TINY_P0_BATCH=32              # Same as above?
HAKMEM_TINY_SFC_REFILL=16            # SFC is deprecated!
HAKMEM_UNIFIED_REFILL_SIZE=64        # Unified path
HAKMEM_TINY_FAST_REFILL_COUNT=32     # Fast path
HAKMEM_TINY_ULTRA_REFILL=8           # Ultra path
HAKMEM_TINY_SLIM_REFILL_BATCH=16     # SLIM path

# Debug toggles (11 variants with overlapping names!)
HAKMEM_TINY_DEBUG=1
HAKMEM_DEBUG_TINY=1                  # Same thing?
HAKMEM_TINY_VERBOSE=1
HAKMEM_TINY_DEBUG_VERBOSE=1          # Combined?
HAKMEM_TINY_LOG=1
... (6 more variants)

Impact:

Developers don't know which variables to use
Testing matrix is impossibly large (2^113 combinations)
Configuration bugs are common
Onboarding new developers takes weeks

Recommendation: Consolidate to ~40 variables organized by architectural layer:

Core allocation: 15 vars
TLS caching: 8 vars
Refill/drain: 6 vars
Debug: 5 vars
Learning: 6 vars

2. Dead Code Still Has Active Config (60+ vars)

The Problem: Features have been replaced or deprecated, but their configuration variables are still active, causing confusion.

Examples:

SFC (Single-Free-Cache) - REPLACED by Unified Cache:

HAKMEM_TINY_SFC_ENABLE=1         # 🔴 Dead (replaced Nov 2024)
HAKMEM_TINY_SFC_CAP=128          # 🔴 Dead
HAKMEM_TINY_SFC_REFILL=16        # 🔴 Dead
HAKMEM_TINY_SFC_SPILL_THRESH=96  # 🔴 Dead
HAKMEM_TINY_SFC_BATCH_POP=8      # 🔴 Dead
HAKMEM_TINY_SFC_STATS=1          # 🔴 Dead

Status: Unified Cache replaced SFC in Phase 3d-B (2025-11-20), but SFC vars still parsed.

PAGE_ARENA - Research artifact, never integrated:

HAKMEM_PAGE_ARENA_ENABLE=1       # 🔴 Research-only
HAKMEM_PAGE_ARENA_SIZE_MB=16     # 🔴 Research-only
HAKMEM_PAGE_ARENA_GROWTH=2       # 🔴 Research-only
HAKMEM_PAGE_ARENA_MAX_MB=128     # 🔴 Research-only
HAKMEM_PAGE_ARENA_THP=1          # 🔴 Research-only

Status: Experimental code from 2024-09, never productionized, still has active config.

Other Dead Features:

EXTERNAL_GUARD (3 vars) - Purpose unclear, no documentation
MF2 (3 vars) - Undocumented, possibly abandoned
OLD_REFILL (5 vars) - Replaced by P0 batch refill

Impact:

Users waste time trying dead features
CI tests dead code paths
Codebase appears larger than it is

Recommendation: Remove dead code and deprecate variables with 6-month timeline.

3. Learning System Chaos (6 independent systems)

The Problem: HAKMEM has 6 separate learning/adaptive systems with unclear interaction semantics.

The 6 Systems:

1. HAKMEM_LEARN=1                    # Global meta-learner?
2. HAKMEM_TINY_LEARN=1               # TINY-specific learning
3. HAKMEM_TINY_CAP_LEARN=1           # TLS capacity learning
4. HAKMEM_ADAPTIVE_SIZING=1          # Size class tuning
5. HAKMEM_THP_LEARN=1                # Transparent Huge Pages
6. HAKMEM_WMAX_LEARN=1               # Workload max size learning

Questions with No Answers:

Can these be enabled together? Do they conflict?
Which learning system owns TLS cache sizing?
What happens if TINY_LEARN=1 but LEARN=0?
Is there a master learning coordinator?

Additional Learning Vars (12 more):

HAKMEM_LEARN_RATE=0.1
HAKMEM_LEARN_DECAY=0.95
HAKMEM_LEARN_MIN_SAMPLES=1000
HAKMEM_TINY_LEARN_WINDOW=10000
HAKMEM_ADAPTIVE_SIZING_INTERVAL_MS=5000
... (7 more tuning parameters)

Impact:

Unpredictable behavior when multiple systems enabled
No documented interaction model
Difficult to debug performance issues
Unclear which system to tune

Recommendation: Consolidate to 2 learning systems:

Allocation Learning: Size classes, TLS capacity, refill tuning
Memory Learning: THP, RSS optimization, SuperSlab lifecycle

With clear boundaries and documented interaction semantics.

4. Scripts Anarchy (30 files, 3000 LOC, zero hierarchy)

The Problem: Scripts have accumulated organically with no organization. Multiple scripts do the same thing with subtle differences.

Examples:

Running Larson - 6 different ways:

scripts/run_larson.sh                # Which one to use?
scripts/run_larson_1t.sh             # 1 thread variant
scripts/run_larson_8t.sh             # 8 thread variant
scripts/larson_benchmark.sh          # Different from run_larson.sh?
scripts/bench_larson_preset.sh       # Uses JSON presets
scripts/quick_larson.sh              # Quick test variant

Which should I use? → Unclear.

Running Random Mixed - 3 different ways:

scripts/run_random_mixed.sh          # Hardcoded params
scripts/bench_random_mixed_json.sh   # Uses JSON preset
scripts/quick_random_mixed.sh        # Different defaults

ENV Setup Duplication (copy-pasted across 30 files):

# This block appears in 12+ scripts:
export HAKMEM_TINY_HEADER_CLASSIDX=1
export HAKMEM_TINY_AGGRESSIVE_INLINE=1
export HAKMEM_TINY_PREWARM_TLS=1
export HAKMEM_SS_EMPTY_REUSE=1
export HAKMEM_TINY_UNIFIED_CACHE=1
# ... (20 more vars duplicated everywhere)

Impact:

New developers don't know where to start
Bug fixes need to be applied to 6+ scripts
Inconsistent behavior across scripts
No single source of truth

Recommendation: Reorganize to 8 entry points:

scripts/
├── bench/                     # Benchmarking entry points
│   ├── larson.sh             # Single Larson entry (flags for 1T/8T)
│   ├── random_mixed.sh       # Single Random Mixed entry
│   └── suite.sh              # Full benchmark suite
├── config/                    # Configuration presets
│   ├── production.env        # Production defaults
│   ├── debug.env             # Debug configuration
│   └── research.env          # Research/experimental
├── lib/                       # Shared utilities
│   ├── env_setup.sh          # Single source of ENV setup
│   └── validation.sh         # Config validation
└── README.md                  # Scripts guide

5. Zero Configuration Documentation

The Problem: 236 environment variables, 59 build flags, 30 scripts → ZERO master documentation.

What's Missing:

❌ Master list of all ENV variables
❌ Categorization of variables by purpose
❌ Default values documentation
❌ Interaction semantics (which vars conflict?)
❌ Preset selection guide
❌ Deprecation timeline
❌ Scripts coordination guide
❌ Configuration examples for common use cases

Current State: Configuration knowledge exists only in:

Source code (scattered across 100+ files)
Git commit messages (hard to search)
Claude's memory (not accessible to others)
Tribal knowledge (not written down)

Impact:

2+ weeks onboarding time for new developers
Configuration bugs in production
Wasted time experimenting with dead features
Duplicate questions ("Which Larson script should I use?")

Recommendation: Create 3 comprehensive guides:

CONFIGURATION.md - Master reference (all vars categorized)
PRESET_GUIDE.md - How to choose presets
SCRIPTS_GUIDE.md - Scripts hierarchy and usage

🎯 Proposed Cleanup Strategy

Phase 0: Immediate Wins (P0, 2 days effort, LOW risk)

Goal: Quick improvements that establish cleanup patterns.

P0.1: Unify SuperSlab Variables (5 vars → 3 vars)

Remove: HAKMEM_SS_EMPTY_REUSE, HAKMEM_SUPERSLAB_REUSE (duplicates)
Keep: HAKMEM_SUPERSLAB_REUSE, HAKMEM_SUPERSLAB_LAZY, HAKMEM_SUPERSLAB_PREWARM
Effort: 1 hour (grep + replace + deprecation notice)

P0.2: Create Master Preset Registry (1 file → 4 files)

presets/production.json - Recommended production config
presets/debug.json - Full debugging enabled
presets/research.json - Experimental features
presets/minimal.json - Minimal feature set
Effort: 2 hours (extract from current presets)

P0.3: Clean Up build.sh Pinned Flags

Document all pinned flags in BUILD_FLAGS.md
Remove obsolete flags (POOL_TLS_PHASE1=0, etc.)
Effort: 2 hours

P0.4: Consolidate Debug Variables (11 vars → 4 vars)

HAKMEM_DEBUG_LEVEL (0-3): 0=none, 1=errors, 2=info, 3=verbose
HAKMEM_DEBUG_TINY (0/1): TINY allocator specific
HAKMEM_DEBUG_POOL (0/1): Pool allocator specific
HAKMEM_DEBUG_MID (0/1): Mid-Large allocator specific
Effort: 3 hours (consolidate scattered debug toggles)

P0.5: Create DEPRECATED.md

List all deprecated variables with sunset dates
Add deprecation warnings to code (TLS-cached, lightweight)
Effort: 1 hour

Total Phase 0 Effort: 2 days Risk: LOW (backward compatible with deprecation warnings)

Phase 1: Structural Improvements (P1, 3 days effort, MEDIUM risk)

Goal: Reorganize and document configuration system.

P1.1: Reorganize Scripts Hierarchy

Move to scripts/{bench,config,lib}/ structure
Consolidate 6 Larson scripts → 1 with flags
Create shared lib/env_setup.sh
Effort: 1 day

P1.2: Create CONFIGURATION.md

Master reference for all 236 variables
Categorize by allocator/feature
Document defaults and interactions
Effort: 1 day

P1.3: Create PRESET_GUIDE.md

When to use each preset
How to customize presets
Common configuration patterns
Effort: 4 hours

P1.4: Add Preset Versioning

presets/v1/production.json (semantic versioning)
Migration guide for preset changes
Effort: 2 hours

P1.5: Add Configuration Validation

Runtime check for conflicting vars
Warning for deprecated vars (console + log)
Effort: 4 hours

Total Phase 1 Effort: 3 days Risk: MEDIUM (scripts reorganization may break workflows)

Phase 2: Deep Cleanup (P2, 4 days effort, MEDIUM risk)

Goal: Remove dead code and consolidate overlapping features.

P2.1: Remove Dead Code

SFC (6 vars) → Remove
PAGE_ARENA (5 vars) → Remove or document as research
EXTERNAL_GUARD (3 vars) → Remove
MF2 (3 vars) → Remove
OLD_REFILL (5 vars) → Remove
Effort: 1 day (with 6-month deprecation period)

P2.2: Consolidate Learning Systems (6 systems → 2 systems)

Allocation Learning: size classes, TLS, refill
Memory Learning: THP, RSS, SuperSlab lifecycle
Document interaction semantics
Effort: 2 days (complex refactoring)

P2.3: Reorganize TINY Allocator Config (113 vars → ~40 vars)

Core allocation: 15 vars
TLS caching: 8 vars
Refill/drain: 6 vars
Debug: 5 vars
Learning: 6 vars
Effort: 2 days (with 6-month migration)

P2.4: Unify Profiling/Stats (15 vars → 4 vars)

HAKMEM_PROFILE_LEVEL (0-3)
HAKMEM_STATS_INTERVAL_MS
HAKMEM_STATS_OUTPUT_FILE
HAKMEM_TRACE_ALLOCATIONS (0/1)
Effort: 4 hours

P2.5: Remove Benchmark-Specific Hacks

HAKMEM_BENCH_FAST_MODE - should be a preset, not ENV var
HAKMEM_TINY_ULTRA_SIMPLE - merge into debug level
Effort: 2 hours

Total Phase 2 Effort: 4 days Risk: MEDIUM (requires careful migration planning)

📈 Success Metrics

Quantitative

ENV Variables:     236 → 80  (-66%)
Build Flags:        59 → 40  (-32%)
Shell Scripts:      30 → 8   (-73%)
Undocumented Vars:  16 → 0   (-100%)

Qualitative

✅ New developer onboarding: 2 weeks → 2 days
✅ Configuration bugs: Common → Rare
✅ Testing matrix: Intractable → Manageable
✅ Feature discovery: Trial-and-error → Documented

📅 Timeline

Phase	Duration	Risk	Dependencies
Phase 0	2 days	LOW	None
Phase 1	3 days	MEDIUM	Phase 0 complete
Phase 2	4 days	MEDIUM	Phase 1 complete
Total	9 days	Manageable	Incremental rollout

Deprecation Period: 6 months (2025-11-26 → 2026-05-26)

🚀 Getting Started

Immediate Next Steps:

✅ Read this summary (you're done!)
📖 Review detailed analysis: hakmem_config_analysis.txt
🛠️ Review concrete proposal: hakmem_cleanup_proposal.txt
🎯 Start with P0.1 (SuperSlab unification) - lowest risk, sets pattern
📝 Track progress in CONFIG_CLEANUP_PROGRESS.md

Questions?

Technical details → hakmem_config_analysis.txt
Implementation plan → hakmem_cleanup_proposal.txt
Quick reference → This document

hakmem_config_analysis.txt (30-min read)
- Complete inventory of 236 ENV variables
- Detailed categorization and pain points
- Scripts analysis and configuration drift examples
hakmem_cleanup_proposal.txt (30-min read)
- Concrete implementation roadmap
- Step-by-step instructions for each phase
- Risk mitigation strategies
CONFIGURATION.md (to be created in P1.2)
- Master reference for all configuration
- Will become single source of truth

Last Updated: 2025-11-26 Next Review: After Phase 0 completion (est. 2025-11-28)

15 KiB Raw Blame History

HAKMEM Configuration Crisis - Executive Summary

🚨 The Crisis in Numbers

📊 Quick Facts

Environment Variables (236 total)

Build Flags (59+ total)

Shell Scripts (30 files)

🔥 Top 5 Critical Issues

1. TINY Allocator Configuration Explosion (113 vars)

2. Dead Code Still Has Active Config (60+ vars)

3. Learning System Chaos (6 independent systems)

4. Scripts Anarchy (30 files, 3000 LOC, zero hierarchy)

5. Zero Configuration Documentation

🎯 Proposed Cleanup Strategy

Phase 0: Immediate Wins (P0, 2 days effort, LOW risk)

Phase 1: Structural Improvements (P1, 3 days effort, MEDIUM risk)

Phase 2: Deep Cleanup (P2, 4 days effort, MEDIUM risk)

📈 Success Metrics

Quantitative

Qualitative

📅 Timeline

🚀 Getting Started

📚 Related Documents

15 KiB

Raw Blame History