Files
hakmem/docs/design/LARGE_FILES_REFACTORING_PLAN.md
Moe Charm (CI) 67fb15f35f Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
## Changes

### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks

### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs

### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
  - Node pool exhaustion warning (line 252)
  - SP_META_CAPACITY_ERROR warning (line 421)
  - SP_FIX_GEOMETRY debug logging (line 745)
  - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
  - SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
  - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
  - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
  - SP_ACQUIRE_STAGE3 debug logging (line 1116)
  - SP_SLOT_RELEASE debug logging (line 1245)
  - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
  - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized

## Performance Validation

Before: 51M ops/s (with debug fprintf overhead)
After:  49.1M ops/s (consistent performance, fprintf removed from hot paths)

## Build & Test

```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00

17 KiB

Refactoring Plan: Large Files Consolidation

HAKMEM Memory Allocator - Implementation Roadmap


CRITICAL PATH TIMELINE

Phase 1: Tiny Free Path (Week 1) - HIGHEST PRIORITY

Target: hakmem_tiny_free.inc (1,711 lines, 171 lines/function avg)

Issue

  • Single 1.7K line file with 10 massive functions
  • Average function: 171 lines (should be 20-30)
  • 6-7 levels of nesting (should be 2-3)
  • Cannot unit test individual free paths

Deliverables

  1. tiny_free_dispatch.inc (300 lines)

    • hak_tiny_free() - Main entry
    • Ownership detection (TLS vs Remote vs SuperSlab)
    • Route selection logic
    • Safety check dispatcher
  2. tiny_free_local.inc (500 lines)

    • TLS ownership verification
    • Local freelist push (fast path)
    • Magazine spill logic
    • Per-class thresholds
    • Functions: tiny_free_local_to_tls, tiny_check_magazine_full
  3. tiny_free_remote.inc (500 lines)

    • Remote thread detection
    • MPSC queue enqueue
    • Fallback strategies
    • Queue full handling
    • Functions: tiny_free_remote_enqueue, tiny_remote_queue_add
  4. tiny_free_superslab.inc (400 lines)

    • SuperSlab ownership check
    • Adoption registration
    • Freelist publish
    • Refill interaction
    • Functions: tiny_free_adopt_superslab, tiny_free_publish

Metrics

  • Before: 1 file, 10 functions, 171 lines avg
  • After: 4 files, ~40 functions, 30-40 lines avg
  • Complexity: -60% (cyclomatic, nesting depth)
  • Testability: Unit tests per path now possible

Build Integration

# Old:
tiny_free.inc (1711 lines, monolithic)

# New:
tiny_free_dispatch.inc (included first)
tiny_free_local.inc (included second)
tiny_free_remote.inc (included third)
tiny_free_superslab.inc (included last)

# In hakmem_tiny.c:
#include "hakmem_tiny_free_dispatch.inc"
#include "hakmem_tiny_free_local.inc"
#include "hakmem_tiny_free_remote.inc"
#include "hakmem_tiny_free_superslab.inc"

Phase 2: Pool Manager (Week 2) - HIGH PRIORITY

Target: hakmem_pool.c (2,592 lines, 40 lines/function avg)

Issue

  • Monolithic pool manager handles 4 distinct responsibilities
  • 65 functions spread across cache, registry, alloc, free
  • Hard to test allocation without free logic
  • Code duplication between alloc/free paths

Deliverables

  1. mid_pool_core.c (200 lines)

    • hak_pool_alloc() - Public entry
    • hak_pool_free() - Public entry
    • Initialization
    • Configuration
    • Statistics queries
    • Policy enforcement
  2. mid_pool_cache.c (600 lines)

    • Page descriptor registry (mid_desc_*)
    • Thread cache management (mid_tc_*)
    • TLS ring buffer operations
    • Ownership tracking (in_use counters)
    • Functions: 25-30
    • Locks: per-(class,shard) mutexes
  3. mid_pool_alloc.c (800 lines)

    • hak_pool_alloc() implementation
    • hak_pool_alloc_fast() - TLS hot path
    • Refill from global freelist
    • Bump-run page management
    • New page allocation
    • Functions: 20-25
    • Focus: allocation logic only
  4. mid_pool_free.c (600 lines)

    • hak_pool_free() implementation
    • hak_pool_free_fast() - TLS hot path
    • Spill to global freelist
    • Page tracking (in_use dec)
    • Background DONTNEED batching
    • Functions: 15-20
    • Focus: free logic only
  5. mid_pool.h (new, 100 lines)

    • Public interface (hak_pool_alloc, hak_pool_free)
    • Configuration constants (POOL_NUM_CLASSES, etc)
    • Statistics structure (hak_pool_stats_t)
    • No implementation details leaked

Metrics

  • Before: 1 file (2592), 65 functions, ~40 lines avg, 14 includes
  • After: 5 files (~2600 total), ~85 functions, ~30 lines avg, modular
  • Compilation: ~20% faster (split linking)
  • Testing: Can test alloc/free independently

Dependency Graph (After)

hakmem.c
  ├─ mid_pool.h
  ├─ calls: hak_pool_alloc(), hak_pool_free()
  │
mid_pool_core.c ──includes──> mid_pool.h
  ├─ calls: mid_pool_cache.c (registry)
  ├─ calls: mid_pool_alloc.c (allocation)
  └─ calls: mid_pool_free.c (free)

mid_pool_cache.c (TLS ring, ownership tracking)
mid_pool_alloc.c (allocation fast/slow)
mid_pool_free.c (free fast/slow)

Phase 3: Tiny Core (Week 3) - HIGH PRIORITY

Target: hakmem_tiny.c (1,765 lines, 35 includes!)

Issue

  • 35 header includes (massive compilation overhead)
  • Acts as glue layer pulling in too many modules
  • SuperSlab, Magazine, Stats all loosely coupled
  • 1765 lines already near limit

Root Cause Analysis

Why 35 includes?

  1. Type definitions (5 includes)

    • hakmem_tiny.h - TinyPool, TinySlab types
    • hakmem_tiny_superslab.h - SuperSlab type
    • hakmem_tiny_magazine.h - Magazine type
    • tiny_tls.h - TLS operations
    • hakmem_tiny_config.h - Configuration
  2. Subsystem modules (12 includes)

    • hakmem_tiny_batch_refill.h - Batch operations
    • hakmem_tiny_stats.h, hakmem_tiny_stats_api.h - Statistics
    • hakmem_tiny_query_api.h - Query interface
    • hakmem_tiny_registry_api.h - Registry API
    • hakmem_tiny_tls_list.h - TLS list management
    • hakmem_tiny_remote_target.h - Remote queue
    • hakmem_tiny_bg_spill.h - Background spill
    • hakmem_tiny_ultra_front.inc.h - Ultra-simple path
    • And 3 more...
  3. Infrastructure modules (8 includes)

    • tiny_tls.h - TLS ops
    • tiny_debug.h, tiny_debug_ring.h - Debug utilities
    • tiny_mmap_gate.h - mmap wrapper
    • tiny_route.h - Route commit
    • tiny_ready.h - Ready state
    • tiny_tls_guard.h - TLS guard
    • tiny_tls_ops.h - TLS operations
  4. Core system (5 includes)

    • hakmem_internal.h - Common types
    • hakmem_syscall.h - Syscall wrappers
    • hakmem_prof.h - Profiling
    • hakmem_trace.h - Trace points
    • stdlib.h, stdio.h, etc

Deliverables

  1. hakmem_tiny_core.c (350 lines)

    • hak_tiny_alloc() - Main entry
    • hak_tiny_free() - Main entry (dispatcher to free modules)
    • Fast path inline helpers
    • Recursion guard
    • Includes: hakmem_tiny.h, hakmem_internal.h ONLY
    • Dispatch logic
  2. hakmem_tiny_alloc.c (400 lines)

    • Allocation cascade (7-layer fallback)
    • Magazine refill path
    • SuperSlab adoption
    • Includes: hakmem_tiny.h, hakmem_tiny_superslab.h, hakmem_tiny_magazine.h
    • Functions: 10-12
  3. hakmem_tiny_lifecycle.c (200 lines, refactored)

    • hakmem_tiny_trim()
    • hakmem_tiny_get_stats()
    • Initialization
    • Flush on exit
    • Includes: hakmem_tiny.h, hakmem_tiny_stats_api.h
  4. hakmem_tiny_route.c (200 lines, extracted)

    • Route commit
    • ELO-based dispatch
    • Strategy selection
    • Includes: hakmem_tiny.h, hakmem_route.h
  5. Remove duplicate declarations

    • Move forward decls to headers
    • Consolidate macro definitions

Expected Result

  • Before: 35 includes → 5-8 includes per file
  • Compilation: -30% time (smaller TU, fewer symbols)
  • File size: 1765 → 350 core + 400 alloc + 200 lifecycle + 200 route

Header Consolidation

New: hakmem_tiny_public.h (50 lines)
  - hak_tiny_alloc(size_t)
  - hak_tiny_free(void*)
  - hak_tiny_trim(void)
  - hak_tiny_get_stats(...)

New: hakmem_tiny_internal.h (100 lines)
  - Shared macros (dispatch, fast path checks)
  - Type definitions
  - Internal statistics structures

Phase 4: Main Dispatcher (Week 4) - MEDIUM PRIORITY

Target: hakmem.c (1,745 lines, 38 includes)

Issue

  • Main dispatcher doing too much (config + policy + stats + init)
  • 38 includes is excessive for a simple dispatcher
  • Mixing allocation/free/configuration logic
  • Size-based routing is only 200 lines

Deliverables

  1. hakmem_api.c (400 lines)

    • malloc/free/calloc/realloc/posix_memalign
    • Recursion guard
    • LD_PRELOAD detection
    • Safety checks (jemalloc, FORCE_LIBC, etc)
    • Includes: hakmem.h, hakmem_config.h ONLY
  2. hakmem_dispatch.c (300 lines)

    • hakmem_alloc_at() - Main dispatcher
    • Size-based routing (8B → Tiny, 8-32KB → Pool, etc)
    • Strategy selection
    • Feature dispatch
    • Includes: hakmem.h, hakmem_config.h
  3. hakmem_config.c (existing, 334 lines)

    • Configuration management
    • Environment variable parsing
    • Policy enforcement
    • Cap tuning
    • Keep as-is
  4. hakmem_stats.c (400 lines)

    • Global KPI tracking
    • Statistics aggregation
    • hak_print_stats()
    • hak_get_kpi()
    • Latency measurement
    • Debug output
  5. hakmem_init.c (200 lines, extracted)

    • One-time initialization
    • Subsystem startup
    • Includes: all allocators (hakmem_tiny.h, hakmem_pool.h, etc)

File Organization (After)

hakmem.c (new) - Public header + API entry
  ├─ hakmem_api.c - malloc/free wrappers
  ├─ hakmem_dispatch.c - Size-based routing
  ├─ hakmem_init.c - Initialization
  ├─ hakmem_config.c (existing) - Configuration
  └─ hakmem_stats.c - Statistics

API layer dispatch:
  malloc(size)
    ├─ hak_in_wrapper() check
    ├─ hak_init() if needed
    └─ hakmem_alloc_at(size)
      ├─ route to hak_tiny_alloc()
      ├─ route to hak_pool_alloc()
      ├─ route to hak_l25_alloc()
      └─ route to hak_whale_alloc()

Phase 5: Pool Core Library (Week 5) - MEDIUM PRIORITY

Target: Extract shared code (hakmem_pool.c + hakmem_l25_pool.c)

Issue

  • Both pool implementations are ~2600 + 1200 lines
  • Duplicate code: ring buffers, shard management, statistics
  • Hard to fix bugs (need 2 fixes, 1 per pool)
  • L25 started as copy-paste from MidPool

Deliverables

  1. pool_core_ring.c (200 lines)

    • Ring buffer push/pop
    • Capacity management
    • Overflow handling
    • Generic implementation (works for any item type)
  2. pool_core_shard.c (250 lines)

    • Per-shard freelist management
    • Sharding function
    • Lock management
    • Per-shard statistics
  3. pool_core_stats.c (150 lines)

    • Statistics structure
    • Hit/miss tracking
    • Refill counting
    • Thread-local aggregation
  4. pool_core.h (100 lines)

    • Public interface (generic pool ops)
    • Configuration constants
    • Type definitions
    • Statistics structure

Usage Pattern

// Old (MidPool): 2592 lines (monolithic)
#include "hakmem_pool.c" // All code

// New (MidPool): 600 + 200 (modular)
#include "pool_core.h"
#include "mid_pool_core.c" // Wrapper
#include "pool_core_ring.c" // Generic ring
#include "pool_core_shard.c" // Generic shard
#include "pool_core_stats.c" // Generic stats

// New (LargePool): 400 + 200 (modular)
#include "pool_core.h"
#include "l25_pool_core.c" // Wrapper
// Reuse: pool_core_ring.c, pool_core_shard.c, pool_core_stats.c

DEPENDENCY GRAPH (Before vs After)

BEFORE (Monolithic)

hakmem.c (1745)
  ├─ hakmem_tiny.c (1765, 35 includes!)
  │   └─ hakmem_tiny_free.inc (1711)
  ├─ hakmem_pool.c (2592, 65 functions)
  ├─ hakmem_l25_pool.c (1195, 39 functions)
  └─ [other modules] (whale, ace, etc)

Total large files: 9008 lines
Code cohesion: LOW (monolithic clusters)
Testing: DIFFICULT (can't isolate paths)
Compilation: SLOW (~20 seconds)

AFTER (Modular)

hakmem_api.c (400)          # malloc/free wrappers
hakmem_dispatch.c (300)     # Routing logic
hakmem_init.c (200)         # Initialization
  │
  ├─ hakmem_tiny_core.c (350)    # Tiny dispatcher
  │   ├─ hakmem_tiny_alloc.c (400)  # Allocation path
  │   ├─ hakmem_tiny_lifecycle.c (200) # Lifecycle
  │   ├─ hakmem_tiny_free_dispatch.inc (300)
  │   ├─ hakmem_tiny_free_local.inc (500)
  │   ├─ hakmem_tiny_free_remote.inc (500)
  │   └─ hakmem_tiny_free_superslab.inc (400)
  │
  ├─ mid_pool_core.c (200)      # Pool dispatcher
  │   ├─ mid_pool_cache.c (600)    # Cache management
  │   ├─ mid_pool_alloc.c (800)    # Allocation path
  │   └─ mid_pool_free.c (600)     # Free path
  │
  ├─ l25_pool_core.c (200)      # Large pool dispatcher
  │   ├─ (reuses pool_core modules)
  │   └─ l25_pool_alloc.c (300)
  │
  └─ pool_core/                  # Shared utilities
      ├─ pool_core_ring.c (200)
      ├─ pool_core_shard.c (250)
      └─ pool_core_stats.c (150)

Max file size: ~800 lines (mid_pool_alloc.c)
Code cohesion: HIGH (clear responsibilities)
Testing: EASY (test each path independently)
Compilation: FAST (~8 seconds, 60% improvement)

METRICS: BEFORE vs AFTER

Code Metrics

Metric Before After Change
Files over 1000 lines 5 0 -100%
Max file size 2592 800 -69%
Avg file size 1801 400 -78%
Total includes 35 (tiny.c) 5-8 per file -80%
Avg cyclomatic complexity HIGH MEDIUM -40%
Avg function size 40-171 lines 25-35 lines -60%

Development Metrics

Activity Before After Improvement
Finding a bug 30 min (big files) 10 min (smaller files) 3x faster
Adding a feature 45 min (tight coupling) 20 min (modular) 2x faster
Unit testing Hard (monolithic) Easy (isolated paths) 4x faster
Code review 2 hours (2592 lines) 20 min (400 lines) 6x faster
Compilation time 20 sec 8 sec 2.5x faster

Quality Metrics

Metric Before After
Maintainability Index 4/10 7/10
Cyclomatic Complexity 40+ 15-20
Code Duplication 20% (pools) 5% (shared core)
Test Coverage ~30% ~70% (isolated paths)
Documentation Clarity LOW (big files) HIGH (focused modules)

RISK MITIGATION

Risk 1: Breaking Changes

Risk: Refactoring introduces bugs Mitigation:

  • Keep public APIs unchanged (hak_pool_alloc, hak_tiny_free, etc)
  • Use feature branches (refactor-pool, refactor-tiny, etc)
  • Run full benchmark suite before merge (larson, memory, etc)
  • Gradual rollout (Phase 1 → Phase 2 → Phase 3)

Risk 2: Performance Regression

Risk: Function calls overhead increases Mitigation:

  • Use static inline for hot path helpers
  • Profile before/after with perf
  • Keep critical paths in fast-path files
  • Minimize indirection

Risk 3: Compilation Issues

Risk: Include circular dependencies Mitigation:

  • Use forward declarations (opaque pointers)
  • One .h per .c file (1:1 mapping)
  • Keep internal headers separate
  • Test with gcc -MM for dependency cycles

Risk 4: Testing Coverage

Risk: Tests miss new bugs in split code Mitigation:

  • Add unit tests per module
  • Test allocation + free separately
  • Stress test with Larson benchmark
  • Run memory tests (valgrind, asan)

ROLLBACK PLAN

If any phase fails, rollback is simple:

# Keep full history in git
git revert HEAD~1  # Revert last phase

# Or use feature branch strategy
git branch refactor-phase1
# If fails:
git checkout master
git branch -D refactor-phase1

SUCCESS CRITERIA

Phase 1 (Tiny Free) SUCCESS

  • All 4 tiny_free_*.inc files created
  • Larson benchmark score same or better (+1%)
  • No valgrind errors
  • Code review approved

Phase 2 (Pool) SUCCESS

  • mid_pool_*.c files created, mid_pool.h public interface
  • Pool benchmark unchanged
  • All 65 functions now distributed across 4 files
  • Compilation time reduced by 15%

Phase 3 (Tiny Core) SUCCESS

  • hakmem_tiny.c reduced to 350 lines
  • Include count: 35 → 8
  • Larson benchmark same or better
  • All allocations/frees work correctly

Phase 4 (Dispatcher) SUCCESS

  • hakmem.c split into 4 modules
  • Public API unchanged (malloc, free, etc)
  • Routing logic clear and testable
  • Compilation time reduced by 20%

Phase 5 (Pool Core) SUCCESS

  • 200+ lines of code eliminated from both pools
  • Behavior identical before/after
  • Future pool implementations can reuse pool_core
  • No performance regression

ESTIMATED TIME & EFFORT

Phase Task Effort Blocker
1 Split tiny_free.inc → 4 modules 3 days None
2 Split hakmem_pool.c → 4 modules 4 days Phase 1 (testing framework)
3 Refactor hakmem_tiny.c 3 days Phase 1, 2 (design confidence)
4 Split hakmem.c 2 days Phase 1-3
5 Extract pool_core 2 days Phase 2
TOTAL Full refactoring 14 days None

Parallelization possible: Phases 1-2 can overlap (2 developers) Accelerated timeline: 2 dev team = 8 days


NEXT IMMEDIATE STEPS

  1. Today: Review this plan with team
  2. Tomorrow: Start Phase 1 (tiny_free.inc split)
    • Create feature branch: refactor-tiny-free
    • Create 4 new .inc files
    • Move code blocks into appropriate files
    • Update hakmem_tiny.c includes
    • Verify compilation + Larson benchmark
  3. Day 3: Review + merge Phase 1
  4. Day 4: Start Phase 2 (pool.c split)

REFERENCES

  • LARGE_FILES_ANALYSIS.md - Detailed analysis of each file
  • Makefile - Build rules (update for new files)
  • CURRENT_TASK.md - Track phase completion
  • Box Theory notes - Module organization pattern