## Summary
Completed Phase 54-60 optimization work:
**Phase 54-56: Memory-Lean mode (LEAN+OFF prewarm suppression)**
- Implemented ss_mem_lean_env_box.h with ENV gates
- Balanced mode (LEAN+OFF) promoted as production default
- Result: +1.2% throughput, better stability, zero syscall overhead
- Added to bench_profile.h: MIXED_TINYV3_C7_BALANCED preset
**Phase 57: 60-min soak finalization**
- Balanced mode: 60-min soak, RSS drift 0%, CV 5.38%
- Speed-first mode: 60-min soak, RSS drift 0%, CV 1.58%
- Syscall budget: 1.25e-7/op (800× under target)
- Status: PRODUCTION-READY
**Phase 59: 50% recovery baseline rebase**
- hakmem FAST (Balanced): 59.184M ops/s, CV 1.31%
- mimalloc: 120.466M ops/s, CV 3.50%
- Ratio: 49.13% (M1 ACHIEVED within statistical noise)
- Superior stability: 2.68× better CV than mimalloc
**Phase 60: Alloc pass-down SSOT (NO-GO)**
- Implemented alloc_passdown_ssot_env_box.h
- Modified malloc_tiny_fast.h for SSOT pattern
- Result: -0.46% (NO-GO)
- Key lesson: SSOT not applicable where early-exit already optimized
## Key Metrics
- Performance: 49.13% of mimalloc (M1 effectively achieved)
- Stability: CV 1.31% (superior to mimalloc 3.50%)
- Syscall budget: 1.25e-7/op (excellent)
- RSS: 33MB stable, 0% drift over 60 minutes
## Files Added/Modified
New boxes:
- core/box/ss_mem_lean_env_box.h
- core/box/ss_release_policy_box.{h,c}
- core/box/alloc_passdown_ssot_env_box.h
Scripts:
- scripts/soak_mixed_single_process.sh
- scripts/analyze_epoch_tail_csv.py
- scripts/soak_mixed_rss.sh
- scripts/calculate_percentiles.py
- scripts/analyze_soak.py
Documentation: Phase 40-60 analysis documents
## Design Decisions
1. Profile separation (core/bench_profile.h):
- MIXED_TINYV3_C7_SAFE: Speed-first (no LEAN)
- MIXED_TINYV3_C7_BALANCED: Balanced mode (LEAN+OFF)
2. Box Theory compliance:
- All ENV gates reversible (HAKMEM_SS_MEM_LEAN, HAKMEM_ALLOC_PASSDOWN_SSOT)
- Single conversion points maintained
- No physical deletions (compile-out only)
3. Lessons learned:
- SSOT effective only where redundancy exists (Phase 60 showed limits)
- Branch prediction extremely effective (~0 cycles for well-predicted branches)
- Early-exit pattern valuable even when seemingly redundant
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
7.3 KiB
Phase 54: Memory-Lean Mode Implementation
Overview
Phase 54 implements an opt-in Memory-Lean mode to reduce peak RSS from ~33MB (FAST baseline) to <10MB while accepting -5% to -10% throughput degradation. This mode is separate from the speed-first FAST profile and does not affect Standard/OBSERVE/FAST baselines.
Design Philosophy
- Opt-in (default OFF): Memory-Lean mode is disabled by default to preserve speed-first FAST profile
- ENV-gated A/B testing: Same binary can toggle between FAST and LEAN modes via environment variables
- Box Theory compliance: Single conversion point, clear boundaries, reversible changes
- Safety-first: Respects DSO guard and fail-fast rules (Phase 17 lessons)
Implementation
Box 1: ss_mem_lean_env_box.h (ENV Configuration)
Location: /mnt/workdisk/public_share/hakmem/core/box/ss_mem_lean_env_box.h
Purpose: Parse and provide ENV configuration for Memory-Lean mode
ENV Variables:
HAKMEM_SS_MEM_LEAN=0/1- Enable Memory-Lean mode [DEFAULT: 0]HAKMEM_SS_MEM_LEAN_TARGET_MB=N- Target peak RSS in MB [DEFAULT: 10]HAKMEM_SS_MEM_LEAN_DECOMMIT=FREE|DONTNEED|OFF- Decommit strategy [DEFAULT: FREE]FREE: UseMADV_FREE(lazy kernel reclaim, fast)DONTNEED: UseMADV_DONTNEED(eager kernel reclaim, slower)OFF: No decommit (only suppress prewarm)
API:
int ss_mem_lean_enabled(void); // Check if lean mode enabled
int ss_mem_lean_target_mb(void); // Get target RSS in MB
ss_mem_lean_decommit_mode_t ss_mem_lean_decommit_mode(void); // Get decommit strategy
Design:
- Header-only with inline functions for zero overhead when disabled
- Lazy initialization with double-check pattern
- No dependencies (pure ENV parsing)
Box 2: ss_release_policy_box.h/c (Release Policy)
Location: /mnt/workdisk/public_share/hakmem/core/box/ss_release_policy_box.{h,c}
Purpose: Single conversion point for superslab lifecycle decisions
API:
bool ss_should_keep_superslab(SuperSlab* ss, int class_idx); // Keep or release decision
int ss_maybe_decommit_superslab(void* ptr, size_t size); // Decommit memory (reduce RSS)
Design:
- In FAST mode (default): Returns
true(keep all superslabs, persistent backend) - In LEAN mode (opt-in): Returns
false(allow release of empty superslabs) - Decommit logic:
- Uses DSO-guarded
madvise()(respects Phase 17 safety rules) - Selects
MADV_FREEorMADV_DONTNEEDbased on ENV - Updates
lean_decommitcounter on success - Falls back to
munmapon failure
- Uses DSO-guarded
Boundary: All decommit operations flow through ss_os_madvise_guarded() (Superslab OS Box)
Patch 1: Prewarm Suppression
File: /mnt/workdisk/public_share/hakmem/core/box/ss_hot_prewarm_box.c
Change: Added lean mode check in box_ss_hot_prewarm_all()
int box_ss_hot_prewarm_all(void) {
// Phase 54: Memory-Lean mode suppresses prewarm (reduce RSS)
if (ss_mem_lean_enabled()) {
return 0; // No prewarm in lean mode
}
// ... existing prewarm logic ...
}
Impact: Prevents initial allocation of persistent superslabs (C0-C7 prewarm targets)
Patch 2: Decommit Logic
File: /mnt/workdisk/public_share/hakmem/core/box/ss_allocation_box.c
Change: Added decommit path in superslab_free() before munmap
void superslab_free(SuperSlab* ss) {
// ... existing cache logic ...
// Both caches full - try decommit before munmap
if (ss_mem_lean_enabled()) {
int decommit_ret = ss_maybe_decommit_superslab((void*)ss, ss_size);
if (decommit_ret == 0) {
// Decommit succeeded - record lean_retire and skip munmap
// SuperSlab VMA is kept but pages are released to kernel
ss_os_stats_record_lean_retire();
ss->magic = 0; // Clear magic to prevent use-after-free
// Update statistics...
return; // Skip munmap, pages are decommitted
}
// Decommit failed (DSO overlap, madvise error) - fall through to munmap
}
// ... existing munmap logic ...
}
Impact: Empty superslabs are decommitted (RSS reduced) instead of munmap'd (VMA kept)
Patch 3: Stats Counters
Files:
/mnt/workdisk/public_share/hakmem/core/box/ss_os_acquire_box.h/mnt/workdisk/public_share/hakmem/core/superslab_stats.c
Change: Added lean_decommit and lean_retire counters
extern _Atomic uint64_t g_ss_lean_decommit_calls; // Decommit operations
extern _Atomic uint64_t g_ss_lean_retire_calls; // Superslabs retired (decommit instead of munmap)
Reporting: Counters reported in SS_OS_STATS destructor output
Makefile Integration
File: /mnt/workdisk/public_share/hakmem/Makefile
Change: Added core/box/ss_release_policy_box.o to all build targets
Usage
Enable Memory-Lean Mode
# Default decommit strategy (MADV_FREE, fast)
export HAKMEM_SS_MEM_LEAN=1
./bench_random_mixed_hakmem
# Eager decommit (MADV_DONTNEED, slower but universal)
export HAKMEM_SS_MEM_LEAN=1
export HAKMEM_SS_MEM_LEAN_DECOMMIT=DONTNEED
./bench_random_mixed_hakmem
# Suppress prewarm only (no decommit)
export HAKMEM_SS_MEM_LEAN=1
export HAKMEM_SS_MEM_LEAN_DECOMMIT=OFF
./bench_random_mixed_hakmem
# Monitor stats
export HAKMEM_SS_OS_STATS=1
export HAKMEM_SS_MEM_LEAN=1
./bench_random_mixed_hakmem
Disable Memory-Lean Mode (FAST baseline)
# Explicit disable
export HAKMEM_SS_MEM_LEAN=0
./bench_random_mixed_hakmem
# Or unset (default is OFF)
unset HAKMEM_SS_MEM_LEAN
./bench_random_mixed_hakmem
Safety Guarantees
DSO Guard (Phase 17 Lesson)
- All
madvise()calls flow throughss_os_madvise_guarded() - DSO addresses are skipped (prevents .fini_array corruption)
- Fail-fast on
ENOMEM(disables future madvise calls)
Fail-Fast Rules
- Decommit failure → fall back to
munmap(no silent errors) - DSO overlap → skip decommit, use
munmap ENOMEM→ disable madvise globally, usemunmap
Magic Number Protection
- SuperSlab magic is cleared after decommit/munmap
- Prevents use-after-free (same as FAST mode)
Trade-offs
| Metric | FAST (baseline) | LEAN (target) | Change |
|---|---|---|---|
| Peak RSS | ~33 MB | <10 MB | -70% |
| Throughput | 60M ops/s | 54-57M ops/s | -5% to -10% |
| Syscalls | 9e-8/op | Higher (acceptable) | +X% |
| Drift | 0% | 0% (required) | No change |
Dependencies
ss_mem_lean_env_box.h(ENV configuration)ss_release_policy_box.h/c(release policy logic)madvise_guard_box.h(DSO-safe madvise wrapper)ss_os_acquire_box.h(stats counters)
Testing
- A/B test: Same binary, ENV toggle (
HAKMEM_SS_MEM_LEAN=0vsHAKMEM_SS_MEM_LEAN=1) - Baseline: Phase 48 rebase (FAST mode, lean disabled)
- Treatment: Memory-Lean mode (lean enabled)
- Metrics: RSS/throughput/syscalls/drift (5-30 min soak tests)
Box Theory Compliance
- ✅ Single conversion point: All decommit operations flow through
ss_maybe_decommit_superslab() - ✅ Clear boundaries: ENV gate, release policy box, OS box (3 layers)
- ✅ Reversible: ENV toggle (A/B testing)
- ✅ Minimal visualization: Stats counters only (no new debug logs)
- ✅ Safety-first: DSO guard, fail-fast rules, magic number protection
License
MIT
Date
2025-12-17