Phase 18 v2: BENCH_MINIMAL — NEUTRAL (+2.32% throughput, -5.06% instructions)

## Summary

Phase 18 v2 attempted instruction count reduction via conditional compilation:
- Stats collection → no-op
- ENV checks → constant propagation
- Binary size: 653K → 649K (-4K, -0.6%)

Result: NEUTRAL (below GO threshold)
- Throughput: +2.32% (target: +5% minimum) 
- Instructions: -5.06% (target: -15% minimum) 
- Cycles: -3.26% (positive signal)
- Branches: -8.67% (positive signal)
- Cache-misses: +30% (unexpected, likely layout)

## Analysis

Positive signals:
- Implementation correct (Branch -8.67%, Instruction -5.06%)
- Binary size reduced (-4K)
- Modest throughput gain (+2.32%)
- Cycles and branch overhead reduced

Negative signals:
- Instruction reduction insufficient (-5.06% << -15% smoking gun)
- Throughput gain below +5% threshold
- Cache-misses increased (+30%, layout noise?)

## Verdict

Freeze Phase 18 v2 (weak positive, insufficient for production).

Per user guidance: "If instructions don't drop clearly, continuation value is thin."
-5.06% instruction reduction is marginal. Allocator micro-optimization plateau confirmed.

## Key Insight

Phase 17 showed:
- IPC = 2.30 (consistent, memory-bound)
- I-cache gap: 55% (Phase 17: 153K → 68K)
- Instruction gap: 48% (Phase 17: 41.3B → 21.5B)

Phase 18 v1/v2 results confirm:
- Layout tweaks are fragile (v1: I-cache +91%)
- Instruction removal is modest benefit (v2: -5.06%)
- Allocator is NOT the bottleneck (IPC constant, memory-limited)

## Recommendation

Do NOT continue Phase 18 micro-optimizations.

Next frontier requires different approach:
1. Architectural redesign (SIMD, lock-free, batching)
2. Memory layout optimization (cache-friendly structures)
3. Broader profiling (not allocator-focused)

Or: Accept that 48M → 85M (75% gap) is achievable with current architecture.

Files:
- docs/analysis/PHASE18_HOT_TEXT_ISOLATION_2_AB_TEST_RESULTS.md (results)
- CURRENT_TASK.md (Phase 18 complete status)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-15 06:02:28 +09:00
parent ad346f7885
commit bc2c5ded76
4 changed files with 30 additions and 1 deletions

View File

@ -140,6 +140,15 @@ ifeq ($(HOT_TEXT_GC_SECTIONS),1)
LDFLAGS += -Wl,--gc-sections
endif
# Phase 18 v2: BENCH_MINIMAL (remove instrumentation for benchmark builds)
BENCH_MINIMAL ?= 0
ifeq ($(BENCH_MINIMAL),1)
CFLAGS += -DHAKMEM_BENCH_MINIMAL=1
CFLAGS_SHARED += -DHAKMEM_BENCH_MINIMAL=1
# Note: Both bench and shared lib will disable instrumentation
# Mainly impacts bench_* binaries (where BENCH_MINIMAL is intentionally enabled)
endif
# Default: enable Box Theory refactor for Tiny (Phase 6-1.7)
# This is the best performing option currently (4.19M ops/s)
# NOTE: Disabled while testing ULTRA_SIMPLE with SFC integration

View File

@ -60,8 +60,13 @@ typedef struct {
static FrontFastLaneStats g_front_fastlane_stats = {0};
// Increment macros (relaxed ordering - stats only)
// Phase 18 v2: BENCH_MINIMAL conditional (no-op when HAKMEM_BENCH_MINIMAL=1)
#if HAKMEM_BENCH_MINIMAL
#define FRONT_FASTLANE_STAT_INC(field) do { (void)0; } while(0)
#else
#define FRONT_FASTLANE_STAT_INC(field) \
atomic_fetch_add_explicit(&g_front_fastlane_stats.field, 1, memory_order_relaxed)
#endif
// Dump stats on exit (call from wrapper destructor or main)
static void front_fastlane_stats_dump(void) {

View File

@ -59,6 +59,14 @@ extern int g_hakmem_env_snapshot_ctor_mode;
// ENV gate: default OFF (research box, set =1 to enable)
// E3-4: Dual-mode - constructor init (fast) or legacy lazy init (fallback)
// Phase 18 v2: BENCH_MINIMAL conditional (constant return when HAKMEM_BENCH_MINIMAL=1)
#if HAKMEM_BENCH_MINIMAL
// In bench mode, snapshot is always enabled (one-time cost, compile-away benefit)
static inline bool hakmem_env_snapshot_enabled(void) {
return 1;
}
#else
// Normal mode: runtime check
static inline bool hakmem_env_snapshot_enabled(void) {
// E3-4 Fast path: constructor mode (no lazy check, just global read).
// Important: do not put a static LIKELY/UNLIKELY hint here.
@ -81,5 +89,6 @@ static inline bool hakmem_env_snapshot_enabled(void) {
}
return g_hakmem_env_snapshot_gate != 0;
}
#endif
#endif // HAK_ENV_SNAPSHOT_BOX_H

View File

@ -176,7 +176,9 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
core/box/malloc_tiny_direct_env_box.h \
core/box/malloc_tiny_direct_stats_box.h core/box/front_fastlane_box.h \
core/box/front_fastlane_env_box.h core/box/front_fastlane_stats_box.h \
core/box/../hakmem_internal.h
core/box/front_fastlane_alloc_legacy_direct_env_box.h \
core/box/tiny_front_hot_box.h core/box/tiny_front_cold_box.h \
core/box/smallobject_policy_v7_box.h core/box/../hakmem_internal.h
core/hakmem.h:
core/hakmem_build_flags.h:
core/hakmem_config.h:
@ -435,4 +437,8 @@ core/box/malloc_tiny_direct_stats_box.h:
core/box/front_fastlane_box.h:
core/box/front_fastlane_env_box.h:
core/box/front_fastlane_stats_box.h:
core/box/front_fastlane_alloc_legacy_direct_env_box.h:
core/box/tiny_front_hot_box.h:
core/box/tiny_front_cold_box.h:
core/box/smallobject_policy_v7_box.h:
core/box/../hakmem_internal.h: