# Refactoring Plan: Large Files Consolidation ## HAKMEM Memory Allocator - Implementation Roadmap --- ## CRITICAL PATH TIMELINE ### Phase 1: Tiny Free Path (Week 1) - HIGHEST PRIORITY **Target**: hakmem_tiny_free.inc (1,711 lines, 171 lines/function avg) #### Issue - Single 1.7K line file with 10 massive functions - Average function: 171 lines (should be 20-30) - 6-7 levels of nesting (should be 2-3) - Cannot unit test individual free paths #### Deliverables 1. **tiny_free_dispatch.inc** (300 lines) - `hak_tiny_free()` - Main entry - Ownership detection (TLS vs Remote vs SuperSlab) - Route selection logic - Safety check dispatcher 2. **tiny_free_local.inc** (500 lines) - TLS ownership verification - Local freelist push (fast path) - Magazine spill logic - Per-class thresholds - Functions: tiny_free_local_to_tls, tiny_check_magazine_full 3. **tiny_free_remote.inc** (500 lines) - Remote thread detection - MPSC queue enqueue - Fallback strategies - Queue full handling - Functions: tiny_free_remote_enqueue, tiny_remote_queue_add 4. **tiny_free_superslab.inc** (400 lines) - SuperSlab ownership check - Adoption registration - Freelist publish - Refill interaction - Functions: tiny_free_adopt_superslab, tiny_free_publish #### Metrics - **Before**: 1 file, 10 functions, 171 lines avg - **After**: 4 files, ~40 functions, 30-40 lines avg - **Complexity**: -60% (cyclomatic, nesting depth) - **Testability**: Unit tests per path now possible #### Build Integration ```makefile # Old: tiny_free.inc (1711 lines, monolithic) # New: tiny_free_dispatch.inc (included first) tiny_free_local.inc (included second) tiny_free_remote.inc (included third) tiny_free_superslab.inc (included last) # In hakmem_tiny.c: #include "hakmem_tiny_free_dispatch.inc" #include "hakmem_tiny_free_local.inc" #include "hakmem_tiny_free_remote.inc" #include "hakmem_tiny_free_superslab.inc" ``` --- ### Phase 2: Pool Manager (Week 2) - HIGH PRIORITY **Target**: hakmem_pool.c (2,592 lines, 40 lines/function avg) #### Issue - Monolithic pool manager handles 4 distinct responsibilities - 65 functions spread across cache, registry, alloc, free - Hard to test allocation without free logic - Code duplication between alloc/free paths #### Deliverables 1. **mid_pool_core.c** (200 lines) - `hak_pool_alloc()` - Public entry - `hak_pool_free()` - Public entry - Initialization - Configuration - Statistics queries - Policy enforcement 2. **mid_pool_cache.c** (600 lines) - Page descriptor registry (mid_desc_*) - Thread cache management (mid_tc_*) - TLS ring buffer operations - Ownership tracking (in_use counters) - Functions: 25-30 - Locks: per-(class,shard) mutexes 3. **mid_pool_alloc.c** (800 lines) - `hak_pool_alloc()` implementation - `hak_pool_alloc_fast()` - TLS hot path - Refill from global freelist - Bump-run page management - New page allocation - Functions: 20-25 - Focus: allocation logic only 4. **mid_pool_free.c** (600 lines) - `hak_pool_free()` implementation - `hak_pool_free_fast()` - TLS hot path - Spill to global freelist - Page tracking (in_use dec) - Background DONTNEED batching - Functions: 15-20 - Focus: free logic only 5. **mid_pool.h** (new, 100 lines) - Public interface (hak_pool_alloc, hak_pool_free) - Configuration constants (POOL_NUM_CLASSES, etc) - Statistics structure (hak_pool_stats_t) - No implementation details leaked #### Metrics - **Before**: 1 file (2592), 65 functions, ~40 lines avg, 14 includes - **After**: 5 files (~2600 total), ~85 functions, ~30 lines avg, modular - **Compilation**: ~20% faster (split linking) - **Testing**: Can test alloc/free independently #### Dependency Graph (After) ``` hakmem.c ├─ mid_pool.h ├─ calls: hak_pool_alloc(), hak_pool_free() │ mid_pool_core.c ──includes──> mid_pool.h ├─ calls: mid_pool_cache.c (registry) ├─ calls: mid_pool_alloc.c (allocation) └─ calls: mid_pool_free.c (free) mid_pool_cache.c (TLS ring, ownership tracking) mid_pool_alloc.c (allocation fast/slow) mid_pool_free.c (free fast/slow) ``` --- ### Phase 3: Tiny Core (Week 3) - HIGH PRIORITY **Target**: hakmem_tiny.c (1,765 lines, 35 includes!) #### Issue - 35 header includes (massive compilation overhead) - Acts as glue layer pulling in too many modules - SuperSlab, Magazine, Stats all loosely coupled - 1765 lines already near limit #### Root Cause Analysis **Why 35 includes?** 1. **Type definitions** (5 includes) - hakmem_tiny.h - TinyPool, TinySlab types - hakmem_tiny_superslab.h - SuperSlab type - hakmem_tiny_magazine.h - Magazine type - tiny_tls.h - TLS operations - hakmem_tiny_config.h - Configuration 2. **Subsystem modules** (12 includes) - hakmem_tiny_batch_refill.h - Batch operations - hakmem_tiny_stats.h, hakmem_tiny_stats_api.h - Statistics - hakmem_tiny_query_api.h - Query interface - hakmem_tiny_registry_api.h - Registry API - hakmem_tiny_tls_list.h - TLS list management - hakmem_tiny_remote_target.h - Remote queue - hakmem_tiny_bg_spill.h - Background spill - hakmem_tiny_ultra_front.inc.h - Ultra-simple path - And 3 more... 3. **Infrastructure modules** (8 includes) - tiny_tls.h - TLS ops - tiny_debug.h, tiny_debug_ring.h - Debug utilities - tiny_mmap_gate.h - mmap wrapper - tiny_route.h - Route commit - tiny_ready.h - Ready state - tiny_tls_guard.h - TLS guard - tiny_tls_ops.h - TLS operations 4. **Core system** (5 includes) - hakmem_internal.h - Common types - hakmem_syscall.h - Syscall wrappers - hakmem_prof.h - Profiling - hakmem_trace.h - Trace points - stdlib.h, stdio.h, etc #### Deliverables 1. **hakmem_tiny_core.c** (350 lines) - `hak_tiny_alloc()` - Main entry - `hak_tiny_free()` - Main entry (dispatcher to free modules) - Fast path inline helpers - Recursion guard - Includes: hakmem_tiny.h, hakmem_internal.h ONLY - Dispatch logic 2. **hakmem_tiny_alloc.c** (400 lines) - Allocation cascade (7-layer fallback) - Magazine refill path - SuperSlab adoption - Includes: hakmem_tiny.h, hakmem_tiny_superslab.h, hakmem_tiny_magazine.h - Functions: 10-12 3. **hakmem_tiny_lifecycle.c** (200 lines, refactored) - hakmem_tiny_trim() - hakmem_tiny_get_stats() - Initialization - Flush on exit - Includes: hakmem_tiny.h, hakmem_tiny_stats_api.h 4. **hakmem_tiny_route.c** (200 lines, extracted) - Route commit - ELO-based dispatch - Strategy selection - Includes: hakmem_tiny.h, hakmem_route.h 5. **Remove duplicate declarations** - Move forward decls to headers - Consolidate macro definitions #### Expected Result - **Before**: 35 includes → 5-8 includes per file - **Compilation**: -30% time (smaller TU, fewer symbols) - **File size**: 1765 → 350 core + 400 alloc + 200 lifecycle + 200 route #### Header Consolidation ``` New: hakmem_tiny_public.h (50 lines) - hak_tiny_alloc(size_t) - hak_tiny_free(void*) - hak_tiny_trim(void) - hak_tiny_get_stats(...) New: hakmem_tiny_internal.h (100 lines) - Shared macros (dispatch, fast path checks) - Type definitions - Internal statistics structures ``` --- ### Phase 4: Main Dispatcher (Week 4) - MEDIUM PRIORITY **Target**: hakmem.c (1,745 lines, 38 includes) #### Issue - Main dispatcher doing too much (config + policy + stats + init) - 38 includes is excessive for a simple dispatcher - Mixing allocation/free/configuration logic - Size-based routing is only 200 lines #### Deliverables 1. **hakmem_api.c** (400 lines) - malloc/free/calloc/realloc/posix_memalign - Recursion guard - LD_PRELOAD detection - Safety checks (jemalloc, FORCE_LIBC, etc) - Includes: hakmem.h, hakmem_config.h ONLY 2. **hakmem_dispatch.c** (300 lines) - hakmem_alloc_at() - Main dispatcher - Size-based routing (8B → Tiny, 8-32KB → Pool, etc) - Strategy selection - Feature dispatch - Includes: hakmem.h, hakmem_config.h 3. **hakmem_config.c** (existing, 334 lines) - Configuration management - Environment variable parsing - Policy enforcement - Cap tuning - Keep as-is 4. **hakmem_stats.c** (400 lines) - Global KPI tracking - Statistics aggregation - hak_print_stats() - hak_get_kpi() - Latency measurement - Debug output 5. **hakmem_init.c** (200 lines, extracted) - One-time initialization - Subsystem startup - Includes: all allocators (hakmem_tiny.h, hakmem_pool.h, etc) #### File Organization (After) ``` hakmem.c (new) - Public header + API entry ├─ hakmem_api.c - malloc/free wrappers ├─ hakmem_dispatch.c - Size-based routing ├─ hakmem_init.c - Initialization ├─ hakmem_config.c (existing) - Configuration └─ hakmem_stats.c - Statistics API layer dispatch: malloc(size) ├─ hak_in_wrapper() check ├─ hak_init() if needed └─ hakmem_alloc_at(size) ├─ route to hak_tiny_alloc() ├─ route to hak_pool_alloc() ├─ route to hak_l25_alloc() └─ route to hak_whale_alloc() ``` --- ### Phase 5: Pool Core Library (Week 5) - MEDIUM PRIORITY **Target**: Extract shared code (hakmem_pool.c + hakmem_l25_pool.c) #### Issue - Both pool implementations are ~2600 + 1200 lines - Duplicate code: ring buffers, shard management, statistics - Hard to fix bugs (need 2 fixes, 1 per pool) - L25 started as copy-paste from MidPool #### Deliverables 1. **pool_core_ring.c** (200 lines) - Ring buffer push/pop - Capacity management - Overflow handling - Generic implementation (works for any item type) 2. **pool_core_shard.c** (250 lines) - Per-shard freelist management - Sharding function - Lock management - Per-shard statistics 3. **pool_core_stats.c** (150 lines) - Statistics structure - Hit/miss tracking - Refill counting - Thread-local aggregation 4. **pool_core.h** (100 lines) - Public interface (generic pool ops) - Configuration constants - Type definitions - Statistics structure #### Usage Pattern ``` // Old (MidPool): 2592 lines (monolithic) #include "hakmem_pool.c" // All code // New (MidPool): 600 + 200 (modular) #include "pool_core.h" #include "mid_pool_core.c" // Wrapper #include "pool_core_ring.c" // Generic ring #include "pool_core_shard.c" // Generic shard #include "pool_core_stats.c" // Generic stats // New (LargePool): 400 + 200 (modular) #include "pool_core.h" #include "l25_pool_core.c" // Wrapper // Reuse: pool_core_ring.c, pool_core_shard.c, pool_core_stats.c ``` --- ## DEPENDENCY GRAPH (Before vs After) ### BEFORE (Monolithic) ``` hakmem.c (1745) ├─ hakmem_tiny.c (1765, 35 includes!) │ └─ hakmem_tiny_free.inc (1711) ├─ hakmem_pool.c (2592, 65 functions) ├─ hakmem_l25_pool.c (1195, 39 functions) └─ [other modules] (whale, ace, etc) Total large files: 9008 lines Code cohesion: LOW (monolithic clusters) Testing: DIFFICULT (can't isolate paths) Compilation: SLOW (~20 seconds) ``` ### AFTER (Modular) ``` hakmem_api.c (400) # malloc/free wrappers hakmem_dispatch.c (300) # Routing logic hakmem_init.c (200) # Initialization │ ├─ hakmem_tiny_core.c (350) # Tiny dispatcher │ ├─ hakmem_tiny_alloc.c (400) # Allocation path │ ├─ hakmem_tiny_lifecycle.c (200) # Lifecycle │ ├─ hakmem_tiny_free_dispatch.inc (300) │ ├─ hakmem_tiny_free_local.inc (500) │ ├─ hakmem_tiny_free_remote.inc (500) │ └─ hakmem_tiny_free_superslab.inc (400) │ ├─ mid_pool_core.c (200) # Pool dispatcher │ ├─ mid_pool_cache.c (600) # Cache management │ ├─ mid_pool_alloc.c (800) # Allocation path │ └─ mid_pool_free.c (600) # Free path │ ├─ l25_pool_core.c (200) # Large pool dispatcher │ ├─ (reuses pool_core modules) │ └─ l25_pool_alloc.c (300) │ └─ pool_core/ # Shared utilities ├─ pool_core_ring.c (200) ├─ pool_core_shard.c (250) └─ pool_core_stats.c (150) Max file size: ~800 lines (mid_pool_alloc.c) Code cohesion: HIGH (clear responsibilities) Testing: EASY (test each path independently) Compilation: FAST (~8 seconds, 60% improvement) ``` --- ## METRICS: BEFORE vs AFTER ### Code Metrics | Metric | Before | After | Change | |--------|--------|-------|--------| | Files over 1000 lines | 5 | 0 | -100% | | Max file size | 2592 | 800 | -69% | | Avg file size | 1801 | 400 | -78% | | Total includes | 35 (tiny.c) | 5-8 per file | -80% | | Avg cyclomatic complexity | HIGH | MEDIUM | -40% | | Avg function size | 40-171 lines | 25-35 lines | -60% | ### Development Metrics | Activity | Before | After | Improvement | |----------|--------|-------|-------------| | Finding a bug | 30 min (big files) | 10 min (smaller files) | 3x faster | | Adding a feature | 45 min (tight coupling) | 20 min (modular) | 2x faster | | Unit testing | Hard (monolithic) | Easy (isolated paths) | 4x faster | | Code review | 2 hours (2592 lines) | 20 min (400 lines) | 6x faster | | Compilation time | 20 sec | 8 sec | 2.5x faster | ### Quality Metrics | Metric | Before | After | |--------|--------|-------| | Maintainability Index | 4/10 | 7/10 | | Cyclomatic Complexity | 40+ | 15-20 | | Code Duplication | 20% (pools) | 5% (shared core) | | Test Coverage | ~30% | ~70% (isolated paths) | | Documentation Clarity | LOW (big files) | HIGH (focused modules) | --- ## RISK MITIGATION ### Risk 1: Breaking Changes **Risk**: Refactoring introduces bugs **Mitigation**: - Keep public APIs unchanged (hak_pool_alloc, hak_tiny_free, etc) - Use feature branches (refactor-pool, refactor-tiny, etc) - Run full benchmark suite before merge (larson, memory, etc) - Gradual rollout (Phase 1 → Phase 2 → Phase 3) ### Risk 2: Performance Regression **Risk**: Function calls overhead increases **Mitigation**: - Use `static inline` for hot path helpers - Profile before/after with perf - Keep critical paths in fast-path files - Minimize indirection ### Risk 3: Compilation Issues **Risk**: Include circular dependencies **Mitigation**: - Use forward declarations (opaque pointers) - One .h per .c file (1:1 mapping) - Keep internal headers separate - Test with `gcc -MM` for dependency cycles ### Risk 4: Testing Coverage **Risk**: Tests miss new bugs in split code **Mitigation**: - Add unit tests per module - Test allocation + free separately - Stress test with Larson benchmark - Run memory tests (valgrind, asan) --- ## ROLLBACK PLAN If any phase fails, rollback is simple: ```bash # Keep full history in git git revert HEAD~1 # Revert last phase # Or use feature branch strategy git branch refactor-phase1 # If fails: git checkout master git branch -D refactor-phase1 ``` --- ## SUCCESS CRITERIA ### Phase 1 (Tiny Free) SUCCESS - [ ] All 4 tiny_free_*.inc files created - [ ] Larson benchmark score same or better (+1%) - [ ] No valgrind errors - [ ] Code review approved ### Phase 2 (Pool) SUCCESS - [ ] mid_pool_*.c files created, mid_pool.h public interface - [ ] Pool benchmark unchanged - [ ] All 65 functions now distributed across 4 files - [ ] Compilation time reduced by 15% ### Phase 3 (Tiny Core) SUCCESS - [ ] hakmem_tiny.c reduced to 350 lines - [ ] Include count: 35 → 8 - [ ] Larson benchmark same or better - [ ] All allocations/frees work correctly ### Phase 4 (Dispatcher) SUCCESS - [ ] hakmem.c split into 4 modules - [ ] Public API unchanged (malloc, free, etc) - [ ] Routing logic clear and testable - [ ] Compilation time reduced by 20% ### Phase 5 (Pool Core) SUCCESS - [ ] 200+ lines of code eliminated from both pools - [ ] Behavior identical before/after - [ ] Future pool implementations can reuse pool_core - [ ] No performance regression --- ## ESTIMATED TIME & EFFORT | Phase | Task | Effort | Blocker | |-------|------|--------|---------| | 1 | Split tiny_free.inc → 4 modules | 3 days | None | | 2 | Split hakmem_pool.c → 4 modules | 4 days | Phase 1 (testing framework) | | 3 | Refactor hakmem_tiny.c | 3 days | Phase 1, 2 (design confidence) | | 4 | Split hakmem.c | 2 days | Phase 1-3 | | 5 | Extract pool_core | 2 days | Phase 2 | | **TOTAL** | Full refactoring | **14 days** | None | **Parallelization possible**: Phases 1-2 can overlap (2 developers) **Accelerated timeline**: 2 dev team = 8 days --- ## NEXT IMMEDIATE STEPS 1. **Today**: Review this plan with team 2. **Tomorrow**: Start Phase 1 (tiny_free.inc split) - Create feature branch: `refactor-tiny-free` - Create 4 new .inc files - Move code blocks into appropriate files - Update hakmem_tiny.c includes - Verify compilation + Larson benchmark 3. **Day 3**: Review + merge Phase 1 4. **Day 4**: Start Phase 2 (pool.c split) --- ## REFERENCES - LARGE_FILES_ANALYSIS.md - Detailed analysis of each file - Makefile - Build rules (update for new files) - CURRENT_TASK.md - Track phase completion - Box Theory notes - Module organization pattern