Files
hakmem/docs/analysis/LARGE_FILES_ANALYSIS.md
Moe Charm (CI) 67fb15f35f Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
## Changes

### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks

### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs

### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
  - Node pool exhaustion warning (line 252)
  - SP_META_CAPACITY_ERROR warning (line 421)
  - SP_FIX_GEOMETRY debug logging (line 745)
  - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
  - SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
  - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
  - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
  - SP_ACQUIRE_STAGE3 debug logging (line 1116)
  - SP_SLOT_RELEASE debug logging (line 1245)
  - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
  - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized

## Performance Validation

Before: 51M ops/s (with debug fprintf overhead)
After:  49.1M ops/s (consistent performance, fprintf removed from hot paths)

## Build & Test

```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00

20 KiB

Large Files Analysis Report (1000+ Lines)

HAKMEM Memory Allocator Codebase

Date: 2025-11-06


EXECUTIVE SUMMARY

Large Files Identified (1000+ lines)

Rank File Lines Functions Avg Lines/Func Priority
1 hakmem_pool.c 2,592 65 40 CRITICAL
2 hakmem_tiny.c 1,765 57 31 CRITICAL
3 hakmem.c 1,745 29 60 HIGH
4 hakmem_tiny_free.inc 1,711 10 171 CRITICAL
5 hakmem_l25_pool.c 1,195 39 31 HIGH

Total Lines in Large Files: 9,008 / 32,175 (28% of codebase)


DETAILED ANALYSIS

1. hakmem_pool.c (2,592 lines) - L2 Hybrid Pool Implementation

Classification: Core Pool Manager | Refactoring Priority: CRITICAL

Primary Responsibilities

  • Size Classes: 2-32KB allocation (5 fixed classes + 2 dynamic)
  • TLS Caching: Ring buffer + bump-run pages (3 active pages per class)
  • Page Registry: MidPageDesc hash table (2048 buckets) for ownership tracking
  • Thread Cache: MidTC ring buffers per thread
  • Freelist Management: Per-class, per-shard global freelists
  • Background Tasks: DONTNEED batching, policy enforcement

Code Structure

Lines 1-45:      Header comments + config documentation (44 lines)
Lines 46-66:     Includes (14 headers)
Lines 67-200:    Internal data structures (TLS ring, page descriptors)
Lines 201-1100:  Page descriptor registry (hash, lookup, adopt)
Lines 1101-1800: Thread cache management (TLS operations)
Lines 1801-2500: Freelist operations (alloc, free, refill)
Lines 2501-2592: Public API + sizing functions (hak_pool_alloc, hak_pool_free)

Key Functions (65 total)

High-level (10):

  • hak_pool_alloc() - Main allocation entry point
  • hak_pool_free() - Main free entry point
  • hak_pool_alloc_fast() - TLS fast path
  • hak_pool_free_fast() - TLS fast path
  • hak_pool_set_cap() - Capacity tuning
  • hak_pool_get_stats() - Statistics
  • hak_pool_trim() - Memory reclamation
  • mid_desc_lookup() - Page ownership lookup
  • mid_tc_alloc_slow() - Refill from global
  • mid_tc_free_slow() - Spill to global

Hot path helpers (15):

  • mid_tc_alloc_fast() - Ring pop
  • mid_tc_free_slow() - Ring push
  • mid_desc_register() - Page ownership
  • mid_page_inuse_inc/dec() - Tracking
  • mid_batch_drain() - Background processing

Internal utilities (40):

  • Hash functions, initialization, thread local ops

Includes (14)

hakmem_pool.h, hakmem_config.h, hakmem_internal.h,
hakmem_syscall.h, hakmem_prof.h, hakmem_policy.h,
hakmem_debug.h + 7 system headers

Cross-File Dependencies

Calls from (3 files):

  • hakmem.c - Main entry point, dispatches to pool
  • hakmem_ace.c - Metrics collection
  • hakmem_learner.c - Auto-tuning feedback

Called by hakmem.c to allocate:

  • 8-32KB size range
  • Mid-range allocation tier

Complexity Metrics

  • Cyclomatic Complexity: 40+ branches/loops (high)
  • Mutable State: 12+ global/thread-local variables
  • Lock Contention: per-(class,shard) mutexes (fine-grained, good)
  • Code Duplication: TLS ring buffer pattern repeated (alloc/free paths)

Refactoring Recommendations

HIGH PRIORITY - Split into 3 modules:

  1. mid_pool_cache.c (600 lines)

    • TLS ring buffer management
    • Page descriptor registry
    • Thread local state management
    • Functions: mid_tc_, mid_desc_
  2. mid_pool_alloc.c (800 lines)

    • Allocation fast/slow paths
    • Refill from global freelist
    • Bump-run page management
    • Functions: hak_pool_alloc*, mid_tc_alloc_slow, refill_*
  3. mid_pool_free.c (600 lines)

    • Free paths (fast/slow)
    • Spill to global freelist
    • Page tracking (in_use counters)
    • Functions: hak_pool_free*, mid_tc_free_slow, drain_*
  4. Keep in mid_pool_core.c (200 lines)

    • Public API (hak_pool_alloc/free)
    • Initialization
    • Statistics
    • Policy enforcement

Expected Benefits:

  • Per-module responsibility clarity
  • Easier testing of alloc vs. free paths
  • Reduced compilation time (modular linking)
  • Better code reuse with L25 pool (currently 1195 lines, similar structure)

2. hakmem_tiny.c (1,765 lines) - Tiny Pool Orchestrator

Classification: Core Allocator | Refactoring Priority: CRITICAL

Primary Responsibilities

  • Size Classes: 8-128B allocation (4 classes + overflow)
  • SuperSlab Management: Multi-slab owner tracking
  • Refill Orchestration: TLS → Magazine → SuperSlab cascading
  • Statistics: Per-class allocation/free tracking
  • Lifecycle: Initialization, trimming, flushing
  • Compatibility: Ultra-Simple, Metadata, Box-Refactor fast paths

Code Structure

Lines 1-50:      Includes (35 headers - HUGE dependency list)
Lines 51-200:    Configuration macros + debug counters
Lines 201-400:   Function declarations (forward refs)
Lines 401-1000:  Main allocation path (7 layers of fallback)
Lines 1001-1300: Free path implementations (SuperSlab + Magazine)
Lines 1301-1500: Helper functions (stats, lifecycle)
Lines 1501-1765: Include guards + module wrappers

High Dependencies

35 #include statements (unusual for a .c file):

  • hakmem_tiny.h, hakmem_tiny_config.h
  • hakmem_tiny_superslab.h, hakmem_super_registry.h
  • hakmem_tiny_magazine.h, hakmem_tiny_batch_refill.h
  • hakmem_tiny_stats.h, hakmem_tiny_stats_api.h
  • hakmem_tiny_query_api.h, hakmem_tiny_registry_api.h
  • tiny_tls.h, tiny_debug.h, tiny_mmap_gate.h
  • tiny_debug_ring.h, tiny_route.h, tiny_ready.h
  • hakmem_tiny_tls_list.h, hakmem_tiny_remote_target.h
  • hakmem_tiny_bg_spill.h + more

Problem: Acts as a "glue layer" pulling in 35 modules - indicates poor separation of concerns

Key Functions (57 total)

Top-level entry (4):

  • hak_tiny_alloc() - Main allocation
  • hak_tiny_free() - Main free
  • hak_tiny_trim() - Memory reclamation
  • hak_tiny_get_stats() - Statistics

Fast paths (8):

  • tiny_alloc_fast() - TLS pop (3-4 instructions)
  • tiny_free_fast() - TLS push (3-4 instructions)
  • superslab_tls_bump_fast() - Bump-run fast path
  • hak_tiny_alloc_ultra_simple() - Alignment-based fast path
  • hak_tiny_free_ultra_simple() - Alignment-based free

Slow paths (15):

  • tiny_slow_alloc_fast() - Magazine refill
  • tiny_alloc_superslab() - SuperSlab adoption
  • superslab_refill() - SuperSlab replenishment
  • hak_tiny_free_superslab() - SuperSlab free
  • Batch refill helpers

Helpers (30):

  • Magazine management
  • Registry lookups
  • Remote queue handling
  • Debug helpers

Includes Analysis

Problem Modules (should be in separate files):

  1. hakmem_tiny.h - Type definitions
  2. hakmem_tiny_config.h - Configuration macros
  3. hakmem_tiny_superslab.h - SuperSlab struct
  4. hakmem_tiny_magazine.h - Magazine type
  5. tiny_tls.h - TLS operations

Indicator: If hakmem_tiny.c needs 35 headers, it's coordinating too many subsystems.

Refactoring Recommendations

HIGH PRIORITY - Extract coordination layer:

The 1765 lines are organized as:

  1. Alloc path (400 lines) - 7-layer cascade
  2. Free path (400 lines) - Local/Remote/SuperSlab branches
  3. Magazine logic (300 lines) - Batch refill/spill
  4. SuperSlab glue (300 lines) - Adoption/lookup
  5. Misc helpers (365 lines) - Stats, lifecycle, debug

Recommended split:

hakmem_tiny_core.c (300 lines)
  - hak_tiny_alloc() dispatcher
  - hak_tiny_free() dispatcher
  - Fast path shortcuts (inlined)
  - Recursion guard

hakmem_tiny_alloc.c (350 lines)
  - Allocation cascade logic
  - Magazine refill path
  - SuperSlab adoption

hakmem_tiny_free.inc (already 1711 lines!)
  - Should be split into:
    * tiny_free_local.inc (500 lines)
    * tiny_free_remote.inc (500 lines)
    * tiny_free_superslab.inc (400 lines)

hakmem_tiny_stats.c (already 818 lines)
  - Keep separate (good design)

hakmem_tiny_superslab.c (already 821 lines)
  - Keep separate (good design)

Key Issue: The file at 1765 lines is already at the limit. The #include count (35!) suggests it should already be split.


3. hakmem.c (1,745 lines) - Main Allocator Dispatcher

Classification: API Layer | Refactoring Priority: HIGH

Primary Responsibilities

  • malloc/free interposition: Standard C malloc hooks
  • Dispatcher: Routes to Pool/Tiny/Whale/L25 based on size
  • Initialization: One-time setup, environment parsing
  • Configuration: Policy enforcement, cap tuning
  • Statistics: Global KPI tracking, debugging output

Code Structure

Lines 1-60:      Includes (38 headers)
Lines 61-200:    Configuration constants + globals
Lines 201-400:   Helper macros + initialization guards
Lines 401-600:   Feature detection (jemalloc, LD_PRELOAD)
Lines 601-1000:  Allocation dispatcher (hakmem_alloc_at)
Lines 1001-1300: malloc/calloc/realloc/posix_memalign wrappers
Lines 1301-1500: free wrapper
Lines 1501-1745: Shutdown + statistics + debugging

Routing Logic

malloc(size)
  ├─ size <= 128B → hak_tiny_alloc()
  ├─ size 128-32KB → hak_pool_alloc()
  ├─ size 32-1MB → hak_l25_alloc()
  └─ size > 1MB → hak_whale_alloc() or libc_malloc

Key Functions (29 total)

Public API (10):

  • malloc() - Standard hook
  • free() - Standard hook
  • calloc() - Zeroed allocation
  • realloc() - Size change
  • posix_memalign() - Aligned allocation
  • hak_alloc_at() - Internal dispatcher
  • hak_free_at() - Internal free dispatcher
  • hak_init() - Initialization
  • hak_shutdown() - Cleanup
  • hak_get_kpi() - Metrics

Initialization (5):

  • Environment variable parsing
  • Feature detection (jemalloc, LD_PRELOAD)
  • One-time setup
  • Recursion guard initialization
  • Statistics initialization

Configuration (8):

  • Policy enforcement
  • Cap tuning
  • Strategy selection
  • Debug mode control

Statistics (6):

  • hak_print_stats() - Output summary
  • hak_get_kpi() - Query metrics
  • Latency measurement
  • Page fault tracking

Includes (38)

Problem areas:

  • Too many subsystem includes for a dispatcher
  • Should import via public headers only, not internals

Suggests: Dispatcher trying to manage too much state

Refactoring Recommendations

MEDIUM-HIGH PRIORITY - Extract dispatcher + config:

Split into:

  1. hakmem_api.c (400 lines)

    • malloc/free/calloc/realloc/memalign
    • Recursion guard
    • Initialization
    • LD_PRELOAD safety checks
  2. hakmem_dispatch.c (300 lines)

    • hakmem_alloc_at()
    • Size-based routing
    • Feature dispatch (strategy selection)
  3. hakmem_config.c (350 lines, already partially exists)

    • Configuration management
    • Environment parsing
    • Policy enforcement
  4. hakmem_stats.c (300 lines)

    • Statistics collection
    • KPI tracking
    • Debug output

Better organization:

  • hakmem.c should focus on being the dispatch frontend
  • Config management should be separate
  • Stats collection should be a module
  • Each allocator (pool, tiny, l25, whale) is responsible for its own stats

4. hakmem_tiny_free.inc (1,711 lines) - Free Path Orchestration

Classification: Core Free Path | Refactoring Priority: CRITICAL

Primary Responsibilities

  • Ownership Detection: Determine if pointer is TLS-owned
  • Local Free: Return to TLS freelist (TLS match)
  • Remote Free: Queue for owner thread (cross-thread)
  • SuperSlab Free: Adopt SuperSlab-owned blocks
  • Magazine Integration: Spill to magazine when TLS full
  • Safety Checks: Validation (debug mode only)

Code Structure

Lines 1-10:      Includes (7 headers)
Lines 11-100:    Helper functions (queue checks, validates)
Lines 101-400:   Local free path (TLS-owned)
Lines 401-700:   Remote free path (cross-thread)
Lines 701-1000:  SuperSlab free path (adoption)
Lines 1001-1400: Magazine integration (spill logic)
Lines 1401-1711: Utilities + validation helpers

Unique Feature: Included File (.inc)

  • NOT a standalone .c file
  • Included into hakmem_tiny.c
  • Suggests tight coupling with tiny allocator

Problem: .inc files at 1700+ lines should be split into multiple .inc files or converted to modular .c files with headers

Key Functions (10 total)

Main entry (3):

  • hak_tiny_free() - Dispatcher
  • hak_tiny_free_with_slab() - Pre-calculated slab
  • hak_tiny_free_ultra_simple() - Alignment-based

Fast paths (4):

  • Local free to TLS (most common)
  • Magazine spill (when TLS full)
  • Quick validation checks
  • Ownership detection

Slow paths (3):

  • Remote free (cross-thread queue)
  • SuperSlab adoption (TLS migrated)
  • Safety checks (debug mode)

Average Function Size: 171 lines

Problem indicators:

  • Functions way too large (should average 20-30 lines)
  • Deepest nesting level: ~6-7 levels
  • Mixing of high-level control flow with low-level details

Complexity

Free path decision tree (simplified):
  if (local thread owner)
    → Free to TLS
      if (TLS full)
        → Spill to magazine
          if (magazine full)
            → Drain to SuperSlab
  else if (remote thread owner)
    → Queue for remote thread
      if (queue full)
        → Fallback strategy
  else if (SuperSlab-owned)
    → Adopt SuperSlab
      if (already adopted)
        → Free to SuperSlab freelist
      else
        → Register ownership
  else
    → Error/unknown pointer

Refactoring Recommendations

CRITICAL PRIORITY - Split into 4 modules:

  1. tiny_free_local.inc (500 lines)

    • TLS ownership detection
    • Local freelist push
    • Quick validation
    • Magazine spill threshold
  2. tiny_free_remote.inc (500 lines)

    • Remote thread detection
    • Queue management
    • Fallback strategies
    • Cross-thread communication
  3. tiny_free_superslab.inc (400 lines)

    • SuperSlab ownership detection
    • Adoption logic
    • Freelist publishing
    • Superslab refill interaction
  4. tiny_free_dispatch.inc (300 lines, new)

    • Dispatcher logic
    • Ownership classification
    • Route selection
    • Safety checks

Expected benefits:

  • Each module ~300-500 lines (manageable)
  • Clear separation of concerns
  • Easier debugging (narrow down which path failed)
  • Better testability (unit test each path)
  • Reduced cyclomatic complexity per function

5. hakmem_l25_pool.c (1,195 lines) - Large Pool (64KB-1MB)

Classification: Core Pool Manager | Refactoring Priority: HIGH

Primary Responsibilities

  • Size Classes: 64KB-1MB allocation (5 classes)
  • Bundle Management: Multi-page bundles
  • TLS Caching: Ring buffer + active run (bump-run)
  • Freelist Sharding: Per-class, per-shard (64 shards/class)
  • MPSC Queues: Cross-thread free handling
  • Background Processing: Soft CAP guidance

Code Structure

Lines 1-48:      Header comments (docs)
Lines 49-80:     Includes (13 headers)
Lines 81-170:    Internal structures + TLS state
Lines 171-500:   Freelist management (per-shard)
Lines 501-900:   Allocation paths (fast/slow/refill)
Lines 901-1100:  Free paths (local/remote)
Lines 1101-1195: Public API + statistics

Key Functions (39 total)

High-level (8):

  • hak_l25_alloc() - Main allocation
  • hak_l25_free() - Main free
  • hak_l25_alloc_fast() - TLS fast path
  • hak_l25_free_fast() - TLS fast path
  • hak_l25_set_cap() - Capacity tuning
  • hak_l25_get_stats() - Statistics
  • hak_l25_trim() - Memory reclamation

Alloc paths (8):

  • Ring pop (fast)
  • Active run bump (fast)
  • Freelist refill (slow)
  • Bundle allocation (slowest)

Free paths (8):

  • Ring push (fast)
  • LIFO overflow (when ring full)
  • MPSC queue (remote)
  • Bundle return (slowest)

Internal utilities (15):

  • Ring management
  • Shard selection
  • Statistics
  • Initialization

Includes (13)

  • hakmem_l25_pool.h - Type definitions
  • hakmem_config.h - Configuration
  • hakmem_internal.h - Common types
  • hakmem_syscall.h - Syscall wrappers
  • hakmem_prof.h - Profiling
  • hakmem_policy.h - Policy enforcement
  • hakmem_debug.h - Debug utilities

Pattern: Similar to hakmem_pool.c (MidPool)

Comparison:

Aspect MidPool (2592) LargePool (1195)
Size Classes 5 fixed + 2 dynamic 5 fixed
TLS Structure Ring + 3 active pages Ring + active run
Sharding Per-(class,shard) Per-(class,shard)
Code Duplication High (from L25) Base for duplication
Functions 65 39

Observation: L25 Pool is 46% smaller, suggesting good recent refactoring OR incomplete implementation

Refactoring Recommendations

MEDIUM PRIORITY - Extract shared patterns:

  1. Extract pool_core library (300 lines)

    • Ring buffer management
    • Sharded freelist operations
    • Statistics tracking
    • MPSC queue utilities
  2. Use for both MidPool and LargePool:

    • Reduces duplication (saves ~200 lines in each)
    • Standardizes behavior
    • Easier to fix bugs once, deploy everywhere
  3. Per-pool customization (600 lines per pool)

    • Size-specific logic
    • Bump-run vs. active pages
    • Class-specific policies

SUMMARY TABLE: Refactoring Priority Matrix

File Lines Functions Avg/Func Incohesion Priority Est. Effort Benefit
hakmem_tiny_free.inc 1,711 10 171 EXTREME CRITICAL HIGH High (171→30 avg)
hakmem_pool.c 2,592 65 40 HIGH CRITICAL MEDIUM Med (extract 3 modules)
hakmem_tiny.c 1,765 57 31 HIGH CRITICAL HIGH High (35 includes→5)
hakmem.c 1,745 29 60 HIGH HIGH MEDIUM High (dispatcher clarity)
hakmem_l25_pool.c 1,195 39 31 MEDIUM HIGH LOW Med (extract pool_core)

RECOMMENDATIONS BY PRIORITY

Tier 1: CRITICAL (do first)

  1. hakmem_tiny_free.inc - Split into 4 modules

    • Reduces average function from 171→~80 lines
    • Enables unit testing per path
    • Reduces cyclomatic complexity
  2. hakmem_pool.c - Extract 3 modules

    • Reduces responsibility from "all pool ops" to "cache management" + "alloc" + "free"
    • Easier to reason about
    • Enables parallel development
  3. hakmem_tiny.c - Reduce to 2-3 core modules

    • Cut 35 includes down to 5-8
    • Reduces from 1765→400-500 core file
    • Leaves helpers in dedicated modules

Tier 2: HIGH (after Tier 1)

  1. hakmem.c - Extract dispatcher + config

    • Split into 4 modules (api, dispatch, config, stats)
    • Reduces from 1745→400-500 each
    • Better testability
  2. hakmem_l25_pool.c - Extract pool_core library

    • Shared code with MidPool
    • Reduces code duplication

Tier 3: MEDIUM (future)

  1. Extract pool_core library from MidPool/LargePool
  2. Create hakmem_tiny_alloc.c (currently split across files)
  3. Consolidate statistics collection into unified framework

ESTIMATED IMPACT

Code Metrics Improvement

Before:

  • 5 files over 1000 lines
  • 35 includes in hakmem_tiny.c
  • Average function in tiny_free.inc: 171 lines

After Tier 1:

  • 0 files over 1500 lines
  • Max function: ~80 lines
  • Cyclomatic complexity: -40%

Maintainability Score

  • Before: 4/10 (large monolithic files)
  • After Tier 1: 6.5/10 (clear module boundaries)
  • After Tier 2: 8/10 (modular, testable design)

Development Speed

  • Finding bugs: -50% time (smaller files to search)
  • Adding features: -30% time (clear extension points)
  • Testing: -40% time (unit tests per module)

BOX THEORY INTEGRATION

Current Box Modules (in core/box/):

  • free_local_box.c - Local thread free
  • free_publish_box.c - Publishing freelist
  • free_remote_box.c - Remote queue
  • front_gate_box.c - Fast path entry
  • mailbox_box.c - MPSC queue management

Recommended Box Alignment:

  1. Rename tiny_free_*.inc → Box 6A, 6B, 6C, 6D
  2. Create pool_core_box.c for shared functionality
  3. Add pool_cache_box.c for TLS management

NEXT STEPS

  1. Week 1: Extract tiny_free paths (4 modules)
  2. Week 2: Refactor pool.c (3 modules)
  3. Week 3: Consolidate tiny.c (reduce includes)
  4. Week 4: Split hakmem.c (dispatcher pattern)
  5. Week 5: Extract pool_core library

Estimated total effort: 5 weeks of focused refactoring Expected outcome: 50% improvement in code maintainability