Files
hakmem/docs/analysis/BRANCH_PREDICTION_OPTIMIZATION_REPORT.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

21 KiB
Raw Blame History

Branch Prediction Optimization Investigation Report

Date: 2025-11-09 Author: Claude Code Analysis Context: HAKMEM Phase 7 + Pool TLS Performance Investigation


Executive Summary

Problem: HAKMEM has 10.89% branch-miss rate vs System malloc's 3.5-3.9% (3x worse)

Root Cause Discovery: The problem is NOT just misprediction rate, but TOTAL BRANCH COUNT:

  • HAKMEM: 17,098,340 branches (10.84% miss)
  • System malloc: 2,006,962 branches (4.56% miss)
  • HAKMEM executes 8.5x MORE branches than System malloc!

Impact:

  • Branch misprediction overhead: ~1.8M misses × 15-20 cycles = 27-36M cycles wasted
  • Total execution: 17M branches vs System's 2M → 8x more branch overhead
  • Potential gain: 40-60% performance improvement with recommended optimizations

Critical Finding: HAKMEM_BUILD_RELEASE is NOT defined → All debug code is running in production builds!


1. Performance Hotspot Analysis

1.1 Perf Statistics (256B allocations, 100K iterations)

Metric HAKMEM System malloc Ratio
Branches 17,098,340 2,006,962 8.5x
Branch-misses 1,854,018 91,497 20.3x
Branch-miss rate 10.84% 4.56% 2.4x
L1-dcache loads 31,307,492 4,610,155 6.8x
L1-dcache misses 1,063,117 44,773 23.7x
L1 miss rate 3.40% 0.97% 3.5x
Cycles ~83M ~10M 8.3x
Time 0.103s 0.003s 34x slower

Key insight: HAKMEM is not just suffering from poor branch prediction, but is executing 8.5x more branches than System malloc!

1.2 Branch Count by Component

Source file analysis:

File Branch Statements Critical Issues
tiny_alloc_fast.inc.h 79 8 debug guards, 3 getenv() calls, SFC/SLL dual-layer
hak_free_api.inc.h 38 Pool TLS + Phase 7 dual dispatch, multiple lookups
hakmem_tiny_refill_p0.inc.h ~40 Complex precedence logic, 2 getenv() calls, validation
tiny_refill_opt.h ~20 Corruption checks, guard functions

Total: ~177 branch statements in hot path vs System malloc's ~5 branches


2. Branch Count Analysis: Allocation Path

2.1 Fast Path: tiny_alloc_fast() (lines 454-497)

Layer 0: SFC (Super Front Cache) - Lines 177-200

// Branch 1-2: Check if SFC enabled (TLS cache check)
if (!sfc_check_done) { /* getenv() + init */ }  // COLD
if (sfc_is_enabled) {                            // HOT
    // Branch 3: Try SFC
    void* ptr = sfc_alloc(class_idx);            // → 2 branches inside
    if (ptr != NULL) { /* hit */ }               // HOT
}

Branches: 5-6 (3 external + 2-3 in sfc_alloc)

Layer 1: SLL (TLS Freelist) - Lines 204-259

// Branch 4: Check if SLL enabled
if (g_tls_sll_enable) {                          // HOT
    // Branch 5: Try SLL pop
    void* head = g_tls_sll_head[class_idx];
    if (head != NULL) {                          // HOT
        // Branch 6-7: Corruption debug (ONLY if failfast ≥ 2)
        if (tiny_refill_failfast_level() >= 2) { // DEBUG
            /* alignment validation (2 branches) */
        }

        // Branch 8-9: Validate next pointer
        void* next = *(void**)head;
        if (tiny_refill_failfast_level() >= 2) { // DEBUG
            /* next pointer validation (2 branches) */
        }

        // Branch 10: Count update
        if (g_tls_sll_count[class_idx] > 0) {   // HOT
            g_tls_sll_count[class_idx]--;
        }

        // Branch 11: Profiling (DEBUG)
        #if !HAKMEM_BUILD_RELEASE
        if (start) { /* rdtsc tracking */ }      // DEBUG
        #endif

        return head;  // SUCCESS
    }
}

Branches: 11-15 (2 unconditional + 5-9 conditional debug)

Total allocation fast path: 16-21 branches vs System tcache's 1-2 branches

2.2 Refill Path: tiny_alloc_fast_refill() (lines 321-436)

Phase 2b capacity check:

// Branch 1: Check available capacity
int available_capacity = get_available_capacity(class_idx);
if (available_capacity <= 0) { return 0; }

Refill count precedence logic (lines 338-363):

// Branch 2: First-time init check
if (cnt == 0) {  // COLD (once per class per thread)
    // Branch 3-6: Complex precedence logic
    if (g_refill_count_class[class_idx] > 0) { /* ... */ }
    else if (class_idx <= 3 && g_refill_count_hot > 0) { /* ... */ }
    else if (class_idx >= 4 && g_refill_count_mid > 0) { /* ... */ }
    else if (g_refill_count_global > 0) { /* ... */ }

    // Branch 7-8: Clamping
    if (v < 8) v = 8;
    if (v > 256) v = 256;
}

Total refill path: 10-15 branches (one-time init + runtime checks)


3. Branch Count Analysis: Free Path

3.1 Free Path: hak_free_at() (hak_free_api.inc.h)

Pool TLS dispatch (lines 81-110):

#ifdef HAKMEM_POOL_TLS_PHASE1
    // Branch 1: Page boundary check
    #if !HAKMEM_TINY_SAFE_FREE
    if (((uintptr_t)header_addr & 0xFFF) == 0) {  // 0.1% frequency
        // Branch 2: Memory readable check (mincore syscall)
        if (!hak_is_memory_readable(header_addr)) { goto skip_pool_tls; }
    }
    #endif

    // Branch 3: Magic check
    if ((header & 0xF0) == POOL_MAGIC) {
        pool_free(ptr);
        goto done;
    }
#endif

Branches: 3 (optimized with hybrid mincore)

Phase 7 dual-header dispatch (lines 112-167):

// Branch 4: Try 1-byte Tiny header
if (hak_tiny_free_fast_v2(ptr)) {  // → 3-5 branches inside
    goto done;
}

// Branch 5: Page boundary check for 16-byte header
if (offset_in_page < HEADER_SIZE) {
    // Branch 6: Memory readable check
    if (!hak_is_memory_readable(raw)) { goto slow_path; }
}

// Branch 7: 16-byte header magic check
if (hdr->magic == HAKMEM_MAGIC) {
    // Branch 8: Method dispatch
    if (hdr->method == ALLOC_METHOD_MALLOC) { /* ... */ }
}

Branches: 8-10 (including 3-5 inside hak_tiny_free_fast_v2)

Mid/L25 lookup (lines 196-206):

// Branch 9-10: Mid/L25 registry lookups
if (hak_pool_mid_lookup(ptr, &mid_sz)) { /* ... */ }
if (hak_l25_lookup(ptr, &l25_sz)) { /* ... */ }

Branches: 2

Total free path: 13-15 branches vs System tcache's 2-3 branches


4. Root Cause Analysis

4.1 CRITICAL: Debug Code in Production Builds

Finding: HAKMEM_BUILD_RELEASE is NOT defined anywhere in Makefile

Impact: All debug code runs in production:

Debug Guard Location Frequency Overhead
!HAKMEM_BUILD_RELEASE tiny_alloc_fast.inc.h:171 Every allocation 2-3 branches
!HAKMEM_BUILD_RELEASE tiny_alloc_fast.inc.h:191-196 Every allocation 1 branch + rdtsc
!HAKMEM_BUILD_RELEASE tiny_alloc_fast.inc.h:250-256 Every allocation 1 branch + rdtsc
!HAKMEM_BUILD_RELEASE tiny_alloc_fast.inc.h:324-326 Every refill 1 branch + rdtsc
!HAKMEM_BUILD_RELEASE tiny_alloc_fast.inc.h:427-433 Every refill 1 branch + rdtsc
!HAKMEM_BUILD_RELEASE tiny_free_fast_v2.inc.h:99-104 Every free 1 branch + capacity check
!HAKMEM_BUILD_RELEASE hak_free_api.inc.h:118-120 Every free 1 function call
trc_refill_guard_enabled() tiny_refill_opt.h:61-75 Every splice 1 branch + getenv

Total overhead: 8-12 branches + 6 rdtsc calls + 2 getenv calls per allocation/free cycle

Expected impact of fixing: -40-50% total branches

4.2 HIGH: getenv() Calls in Hot Path

Finding: 3 lazy-initialized getenv() calls in hot path

Location Variable Call Frequency Fix
tiny_alloc_fast.inc.h:104 HAKMEM_TINY_PROFILE Every allocation (if -1) Cache in global var at init
hakmem_tiny_refill_p0.inc.h:68 HAKMEM_TINY_REFILL_COUNT_HOT Every refill (class ≤ 3) Pre-compute at init
hakmem_tiny_refill_p0.inc.h:78 HAKMEM_TINY_REFILL_COUNT_MID Every refill (class ≥ 4) Pre-compute at init

Impact:

  • getenv() is ~50-100 cycles (string lookup + syscall if not cached)
  • Adds 2-3 branches per call (null check, lazy init, result check)
  • Total: 6-9 branches + 150-300 cycles on first access per thread

Expected impact of fixing: -10-15% branches, -5-10% cycles

4.3 MEDIUM: Complex Multi-Layer Cache

Current architecture:

Allocation: Size check → SFC (Layer 0) → SLL (Layer 1) → SuperSlab → Refill
            1 branch     5-6 branches     11-15 branches   20-30 branches

System malloc tcache:

Allocation: Size check → TLS cache → ptmalloc2
            1 branch     1-2 branches

Problem: HAKMEM has 3 layers (SFC → SLL → SuperSlab) vs System's 1 layer (tcache)

Why SFC is redundant:

  • SLL already provides TLS freelist (same design as tcache)
  • SFC adds 5-6 branches with minimal benefit
  • Pre-warming (Phase 7 Task 3) already boosted SLL hit rate to 95%+

Expected impact of removing SFC: -5-10% branches, simpler code

4.4 MEDIUM: Excessive Validation in Hot Path

Corruption checks (lines 208-235 in tiny_alloc_fast.inc.h):

if (tiny_refill_failfast_level() >= 2) {  // getenv() call!
    // Alignment validation
    if (((uintptr_t)head % blk) != 0) {
        fprintf(stderr, "[TLS_SLL_CORRUPT] ...");
        abort();
    }

    // Next pointer validation
    if (next != NULL && ((uintptr_t)next % blk) != 0) {
        fprintf(stderr, "[ALLOC_POP_CORRUPT] ...");
        abort();
    }
}

Impact:

  • 1 getenv() call per thread (lazy init) = ~100 cycles
  • 5-7 branches per allocation when enabled
  • fprintf/abort paths confuse branch predictor

Solution: Move to compile-time flag (e.g., HAKMEM_DEBUG_VALIDATION) instead of runtime check

Expected impact: -5-10% branches when disabled


5. Optimization Recommendations (Ranked by Impact/Risk)

5.1 CRITICAL FIX: Enable Release Mode (0 risk, 40-50% impact)

Action: Add -DHAKMEM_BUILD_RELEASE=1 to production build flags

Implementation:

# Makefile
HAKMEM_RELEASE_FLAGS = -DHAKMEM_BUILD_RELEASE=1 -DNDEBUG -O3 -flto

release: CFLAGS += $(HAKMEM_RELEASE_FLAGS)
release: all

Changes enabled:

  • Removes 8 !HAKMEM_BUILD_RELEASE guards → -8-12 branches
  • Disables rdtsc profiling → -6 rdtsc calls
  • Disables corruption validation → -5-10 branches
  • Enables LTO and aggressive optimization

Expected result:

  • -40-50% total branches (17M → 8.5-10M)
  • -20-30% cycles (better inlining, constant folding)
  • +30-50% performance (overall)

A/B test command:

# Before
make bench_random_mixed_hakmem
./bench_random_mixed_hakmem 100000 256 42

# After
make HAKMEM_BUILD_RELEASE=1 bench_random_mixed_hakmem
./bench_random_mixed_hakmem 100000 256 42

5.2 HIGH PRIORITY: Pre-compute Env Vars at Init (Low risk, 10-15% impact)

Action: Move getenv() calls from hot path to global init

Current (lazy init in hot path):

// SLOW: Called on every allocation/refill
if (g_tiny_profile_enabled == -1) {
    const char* env = getenv("HAKMEM_TINY_PROFILE");  // 50-100 cycles!
    g_tiny_profile_enabled = (env && *env && *env != '0') ? 1 : 0;
}

Fixed (pre-compute at init):

// hakmem_init.c (runs once at startup)
void hakmem_tiny_init_config(void) {
    // Profile mode
    const char* env = getenv("HAKMEM_TINY_PROFILE");
    g_tiny_profile_enabled = (env && *env && *env != '0') ? 1 : 0;

    // Refill counts
    const char* hot_env = getenv("HAKMEM_TINY_REFILL_COUNT_HOT");
    g_refill_count_hot = hot_env ? atoi(hot_env) : HAKMEM_TINY_REFILL_DEFAULT;

    const char* mid_env = getenv("HAKMEM_TINY_REFILL_COUNT_MID");
    g_refill_count_mid = mid_env ? atoi(mid_env) : HAKMEM_TINY_REFILL_DEFAULT;
}

Expected result:

  • -6-9 branches (3 getenv lazy-init patterns)
  • -150-300 cycles on first access per thread
  • +5-10% performance (cleaner hot path)

Files to modify:

  • core/tiny_alloc_fast.inc.h:104 - Remove lazy init
  • core/hakmem_tiny_refill_p0.inc.h:66-84 - Remove lazy init
  • core/hakmem_init.c - Add global init function

5.3 MEDIUM PRIORITY: Simplify Cache Layers (Medium risk, 5-10% impact)

Option A: Remove SFC Layer (Recommended)

Rationale:

  • SFC adds 5-6 branches with minimal benefit
  • SLL already provides TLS freelist (same as System tcache)
  • Phase 7 Task 3 pre-warming gives SLL 95%+ hit rate
  • Three cache layers = unnecessary complexity

Implementation:

// Remove SFC entirely, use only SLL
static inline void* tiny_alloc_fast(size_t size) {
    int class_idx = hak_tiny_size_to_class(size);

    // Layer 1: TLS freelist (SLL) - DIRECT ACCESS
    void* head = g_tls_sll_head[class_idx];
    if (head != NULL) {
        g_tls_sll_head[class_idx] = *(void**)head;
        g_tls_sll_count[class_idx]--;
        return head;  // 3 instructions, 1-2 branches!
    }

    // Refill from SuperSlab
    if (tiny_alloc_fast_refill(class_idx) > 0) {
        head = g_tls_sll_head[class_idx];
        // ... retry pop
    }

    return hak_tiny_alloc_slow(size, class_idx);
}

Expected result:

  • -5-10% branches (remove SFC layer)
  • Simpler code (easier to debug/maintain)
  • Same or better performance (fewer layers = less overhead)

Option B: Unified TLS Cache (Higher risk, 10-20% impact)

Design: Single TLS cache with adaptive sizing (like mimalloc)

// Per-class TLS cache with adaptive capacity
struct TinyTLSCache {
    void* head;
    uint32_t count;
    uint32_t capacity;  // Adaptive: 16-256
};

static __thread TinyTLSCache g_tls_cache[TINY_NUM_CLASSES];

Expected result:

  • -10-20% branches (unified design)
  • Better cache utilization (adaptive sizing)
  • Matches System malloc architecture

5.4 LOW PRIORITY: Branch Hint Tuning (Low risk, 2-5% impact)

Action: Optimize __builtin_expect hints based on profiling

Current issues:

  • Some hints are incorrect (e.g., SFC disabled in production)
  • Missing hints on hot branches

Recommended changes:

// Line 184: SFC is DISABLED in most production builds
if (__builtin_expect(sfc_is_enabled, 1)) {  // WRONG!
// Fix:
if (__builtin_expect(sfc_is_enabled, 0)) {  // Expect disabled

// Line 208: Corruption checks are rare in production
if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {  // CORRECT

// Line 457: Size > 1KB is common in mixed workloads
if (__builtin_expect(class_idx < 0, 0)) {  // May be wrong for some workloads

Expected result:

  • -2-5% branch-misses (better prediction)
  • +2-5% performance (reduced pipeline stalls)

6. Expected Results Summary

6.1 Cumulative Impact (All Optimizations)

Optimization Branch Reduction Cycle Reduction Risk Effort
Enable Release Mode -40-50% -20-30% None 1 line
Pre-compute Env Vars -10-15% -5-10% Low 1 day
Remove SFC Layer -5-10% -5-10% Medium 2 days
Branch Hint Tuning -2-5% -2-5% Low 1 day
TOTAL -50-65% -30-45% Low 4-5 days

Projected final results:

  • Branches: 17M → 6-8.5M (vs System's 2M)
  • Branch-miss rate: 10.84% → 6-8% (vs System's 4.56%)
  • Throughput: Current → +40-80% improvement

Target: 70-90% of System malloc performance (currently ~3% of System)


6.2 Quick Win: Release Mode Only

Minimal change, maximum impact:

# Add one line to Makefile
CFLAGS += -DHAKMEM_BUILD_RELEASE=1

# Rebuild
make clean && make bench_random_mixed_hakmem

# Test
./bench_random_mixed_hakmem 100000 256 42

Expected:

  • -40-50% branches (17M → 8.5-10M)
  • +30-50% performance (immediate)
  • 0 code changes (just a flag)

7. A/B Test Plan

7.1 Baseline Measurement

# Measure current performance
perf stat -e branch-misses,branches,cycles,instructions \
  ./bench_random_mixed_hakmem 100000 256 42

# Output:
# branches:       17,098,340
# branch-misses:   1,854,018 (10.84%)
# cycles:         ~83M

7.2 Test 1: Release Mode

# Build with release flag
make clean
make CFLAGS="-DHAKMEM_BUILD_RELEASE=1 -O3" bench_random_mixed_hakmem

# Measure
perf stat -e branch-misses,branches,cycles,instructions \
  ./bench_random_mixed_hakmem 100000 256 42

# Expected:
# branches:       ~9M (-47%)
# branch-misses:  ~700K (7.8%)
# cycles:         ~60M (-27%)

7.3 Test 2: Release + Pre-compute Env

# Implement env var pre-computation (see 5.2)
make clean
make CFLAGS="-DHAKMEM_BUILD_RELEASE=1 -O3" bench_random_mixed_hakmem

# Expected:
# branches:       ~8M (-53%)
# branch-misses:  ~600K (7.5%)
# cycles:         ~55M (-33%)

7.4 Test 3: Release + Pre-compute + Remove SFC

# Remove SFC layer (see 5.3)
make clean
make CFLAGS="-DHAKMEM_BUILD_RELEASE=1 -O3" bench_random_mixed_hakmem

# Expected:
# branches:       ~7M (-59%)
# branch-misses:  ~500K (7.1%)
# cycles:         ~50M (-40%)

7.5 Success Criteria

Metric Current Target Stretch Goal
Branches 17M <10M <8M
Branch-miss rate 10.84% <8% <7%
vs System malloc 8.5x slower <5x slower <3x slower
Throughput 1.07M ops/s >2M ops/s >3M ops/s

8. Comparison with System Malloc Strategy

8.1 System malloc tcache (glibc 2.27+)

Design:

// Allocation (2-3 instructions, 1-2 branches)
void* tcache_get(size_t size) {
    int tc_idx = csize2tidx(size);  // Size to index (no branch)
    tcache_entry* e = tcache->entries[tc_idx];
    if (e != NULL) {  // BRANCH 1
        tcache->entries[tc_idx] = e->next;
        return (void*)e;
    }
    return _int_malloc(av, bytes);  // Slow path
}

// Free (2 instructions, 1 branch)
void tcache_put(void* ptr, size_t size) {
    int tc_idx = csize2tidx(size);  // Size to index (no branch)
    if (tcache->counts[tc_idx] < TCACHE_MAX_BINS) {  // BRANCH 1
        tcache_entry* e = (tcache_entry*)ptr;
        e->next = tcache->entries[tc_idx];
        tcache->entries[tc_idx] = e;
        tcache->counts[tc_idx]++;
    }
    // Else: fall back to _int_free
}

Key insights:

  • 1-2 branches total (vs HAKMEM's 16-21)
  • No validation in fast path
  • No debug guards in production
  • Single TLS cache layer (vs HAKMEM's 3 layers)
  • No getenv() calls (all config at compile-time)

8.2 mimalloc

Design:

// Allocation (3-4 instructions, 1-2 branches)
void* mi_malloc(size_t size) {
    mi_page_t* page = _mi_page_fast();  // TLS page cache
    if (mi_likely(page != NULL)) {  // BRANCH 1
        void* p = page->free;
        if (mi_likely(p != NULL)) {  // BRANCH 2
            page->free = mi_ptr_decode(p);
            return p;
        }
    }
    return mi_malloc_generic(NULL, size);  // Slow path
}

Key insights:

  • 2 branches total (vs HAKMEM's 16-21)
  • Inline header metadata (similar to HAKMEM Phase 7)
  • No debug overhead in release builds
  • Simple TLS structure (page + free pointer)

9. Conclusion

Root Cause: HAKMEM executes 8.5x more branches than System malloc due to:

  1. Debug code running in production (HAKMEM_BUILD_RELEASE not defined)
  2. Complex multi-layer cache (SFC → SLL → SuperSlab)
  3. Runtime env var checks in hot path
  4. Excessive validation and profiling

Immediate Action (1 line change):

CFLAGS += -DHAKMEM_BUILD_RELEASE=1  # Expected: +30-50% performance

Full Fix (4-5 days work):

  • Enable release mode
  • Pre-compute env vars at init
  • Remove redundant SFC layer
  • Optimize branch hints

Expected Result:

  • -50-65% branches (17M → 6-8.5M)
  • -30-45% cycles
  • +40-80% throughput
  • 70-90% of System malloc performance (vs current 3%)

Next Steps:

  1. Enable HAKMEM_BUILD_RELEASE=1 (immediate)
  2. Run A/B tests (measure impact)
  3. Implement env var pre-computation (1 day)
  4. Evaluate SFC removal (2 days)
  5. Re-measure and iterate

Appendix A: Detailed Branch Inventory

Allocation Path (tiny_alloc_fast.inc.h)

Line Branch Frequency Type Fix
177-182 SFC check done Cold (once/thread) Init Pre-compute
184 SFC enabled Hot Runtime Remove SFC
186 SFC ptr != NULL Hot Fast path Keep (necessary)
204 SLL enabled Hot Runtime Make compile-time
206 SLL head != NULL Hot Fast path Keep (necessary)
208 Failfast ≥ 2 Hot Debug Remove in release
211-216 Alignment check Hot Debug Remove in release
225 Failfast ≥ 2 Hot Debug Remove in release
227-234 Next validation Hot Debug Remove in release
241 Count > 0 Hot Unnecessary Remove
171-173 Profile enabled Hot Debug Remove in release
250-256 Profile rdtsc Hot Debug Remove in release

Total: 16-21 branchesTarget: 2-3 branches (95% reduction)

Refill Path (hakmem_tiny_refill_p0.inc.h)

Line Branch Frequency Type Fix
33 !g_use_superslab Cold Config Remove check
41 !tls->ss Hot Refill Keep (necessary)
46 !meta Hot Refill Keep (necessary)
56 room <= 0 Hot Capacity Keep (necessary)
66-73 Hot override Cold Env var Pre-compute
76-83 Mid override Cold Env var Pre-compute
116-119 Remote drain Hot Optimization Keep
138 Capacity check Hot Refill Keep (necessary)

Total: 10-15 branchesTarget: 5-8 branches (40-50% reduction)


End of Report