Phase 19-3b: pass down env snapshot in hot paths

This commit is contained in:
Moe Charm (CI)
2025-12-15 12:50:16 +09:00
parent 8f4ada5bbd
commit e1a4561992
7 changed files with 207 additions and 126 deletions

View File

@ -0,0 +1,76 @@
# Phase 19-3b: ENV Snapshot Pass-Down — A/B Test Results
## Summary
Verdict: ✅ **GO**
- Baseline mean: **55.56M ops/s**
- Optimized mean: **57.10M ops/s**
- Delta (mean): **+2.76%**
- Baseline median: **55.65M ops/s**
- Optimized median: **57.09M ops/s**
- Delta (median): **+2.57%**
## What Changed
- `core/front/malloc_tiny_fast.h`
- Capture `const HakmemEnvSnapshot* env` once per hot call and pass it down:
- `free_tiny_fast()` / `free_tiny_fast_hot()` capture once
- `free_tiny_fast_cold(..., env)` consumes it
- `tiny_legacy_fallback_free_base_with_env(..., env)` consumes it
- Reuse the same snapshot in the alloc route selection:
- `tiny_policy_hot_get_route_with_env(class_idx, env)`
- Remove dead `front_snap` computations (were set-but-unused).
- `core/box/tiny_legacy_fallback_box.h`
- Add `tiny_legacy_fallback_free_base_with_env(...)` and keep the old wrapper for compatibility.
- `core/box/tiny_metadata_cache_hot_box.h`
- Add `tiny_policy_hot_get_route_with_env(...)` and keep the old wrapper for compatibility.
## Bench Setup
- Command: `scripts/run_mixed_10_cleanenv.sh`
- Profile: `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE` (script default)
- Params: `iter=20000000 ws=400`
- Host: same machine / same build flags, back-to-back runs.
## Raw Results
### Baseline (Phase 19-3a)
```
56215204
55685609
55968309
55866150
54795835
55113419
55659129
55645869
55286223
55409488
```
Mean: 55,564,523.5
Median: 55,652,499.0
### Optimized (Phase 19-3b)
```
57413912
57291780
56913158
57044292
57219468
56609810
56995683
57027125
57350810
57126094
```
Mean: 57,099,213.2
Median: 57,085,193.0

View File

@ -3,7 +3,7 @@
## 0. Goal
**Objective**: Reduce ENV check overhead from per-operation 3+ TLS reads to 1 TLS read
**Expected Impact**: -10.0 instructions/op, -4.0 branches/op, +5-8% throughput
**Expected Impact (target)**: -10.0 instructions/op, -4.0 branches/op, +3-8% throughput
**Risk Level**: MEDIUM (ENV invalidation handling required)
**Box Name**: EnvSnapshotConsolidationBox (Phase 19-3)
@ -32,6 +32,12 @@ Phase 19-3a removed the call-site UNLIKELY hint:
Observed impact: **GO (+4.42% throughput)** on Mixed.
This validates that the remaining ENV work is dominated by branch/layout effects, not just raw "read cost".
### Phase 19-3b Result (validated)
Phase 19-3b consolidated snapshot reads by capturing `env` once per hot call and passing it down into nested helpers.
Observed impact: **GO (+2.76% mean / +2.57% median)** on Mixed 10-run (`scripts/run_mixed_10_cleanenv.sh`).
---
## 1. Current State Analysis
@ -235,19 +241,20 @@ if (class_idx == 7 && g_c7_ultra_enabled_fixed) {
**Files**:
- Modified: `core/front/malloc_tiny_fast.h`
- Phase 19-3a: remove backwards `__builtin_expect(..., 0)` hints (DONE, +4.42% GO).
- Phase 19-3b/c: thread `const HakmemEnvSnapshot* env` down to eliminate repeated `hakmem_env_snapshot_enabled()` checks.
- Modified: `core/box/hak_wrappers.inc.h`
- Compute `env` once per wrapper entry and pass it to hot helpers (especially when `HAKMEM_FASTLANE_DIRECT=1`).
- Optional (only if separate rollback gate is desired):
- New: `core/box/env_snapshot_consolidation_env_box.{h,c}` (cached gate + refresh hook)
- Phase 19-3b: thread `const HakmemEnvSnapshot* env` down to eliminate repeated `hakmem_env_snapshot_enabled()` checks (DONE, +2.76% GO).
- Modified: `core/box/tiny_legacy_fallback_box.h`
- Add `_with_env` helper (Phase 19-3b).
- Modified: `core/box/tiny_metadata_cache_hot_box.h`
- Add `_with_env` helper (Phase 19-3b).
- Optional: `core/box/hak_wrappers.inc.h`
- If needed, compute `env` once per wrapper entry and pass it down (removes the remaining alloc-side gate in `malloc_tiny_fast_for_class()`).
**ENV Gate**:
- Base: `HAKMEM_ENV_SNAPSHOT=0/1` (Phase 4 E1 gate; promoted ON in presets)
- Optional: `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0/1` (default 0, opt-in) — gates only the “pass-down” refactor for A/B safety.
**Rollback**:
- If using `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION`: set it to `0`.
- Otherwise: `HAKMEM_FASTLANE_DIRECT=0` falls back to the wrapper/FastLane path (still safe).
- Snapshot behavior: set `HAKMEM_ENV_SNAPSHOT=0` to fall back to per-feature env gates.
- Pass-down refactor: revert the Phase 19-3b commit (or add a dedicated pass-down gate if future A/B is needed).
### 3.2 API Design
@ -268,29 +275,28 @@ int free_tiny_fast_with_env(void* ptr, const HakmemEnvSnapshot* env);
- `__builtin_expect(hakmem_env_snapshot_enabled(), 0)``hakmem_env_snapshot_enabled()`
- Measured: **GO (+4.42%)**
**Phase 19-3b (NEXT)**: wrapper entry computes `const HakmemEnvSnapshot* env` once and passes it down.
- Wrapper entry:
- `const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;`
- Hot helpers:
- Replace repeated `hakmem_env_snapshot_enabled()` checks with `if (env) { ... } else { ... }`
- Keep `env==NULL` fallback path unchanged.
- Target: reduce repeated gate checks across hot helpers (especially inside `free_tiny_fast*`).
**Phase 19-3b (DONE)**: capture `env` once per hot call and pass it down into nested helpers.
- In `core/front/malloc_tiny_fast.h`:
- `free_tiny_fast()` / `free_tiny_fast_hot()` capture `env` once and pass it to cold + legacy helpers.
- `malloc_tiny_fast_for_class()` reuses the same snapshot for `tiny_policy_hot_get_route_with_env(...)`.
- In `core/box/tiny_legacy_fallback_box.h` and `core/box/tiny_metadata_cache_hot_box.h`:
- add `_with_env` helpers to consume the pass-down pointer.
- Measured: **GO (+2.76% mean / +2.57% median)** on Mixed 10-run.
**Phase 19-3c (OPTIONAL)**: propagate `env` into legacy fallback + metadata cache helpers to eliminate the remaining call sites:
- `core/box/tiny_legacy_fallback_box.h`
- `core/box/tiny_metadata_cache_hot_box.h`
- (Already done in Phase 19-3b.) Optional next: pass `env` down from wrapper entry to remove the remaining alloc-side gate.
### 3.4 Files to Modify
1. `core/front/malloc_tiny_fast.h`
- Phase 19-3b: add `_with_env` helper variants or thread `env` through internal helpers.
- Replace the remaining repeated `hakmem_env_snapshot_enabled()` call sites with `env`-based checks.
2. `core/box/hak_wrappers.inc.h`
- Compute `env` once per entry and pass it down (especially for `HAKMEM_FASTLANE_DIRECT` path).
3. (Optional) `core/box/tiny_legacy_fallback_box.h`
- Thread `env` into the legacy fallback helper to eliminate an extra gate check.
4. (Optional) `core/box/tiny_metadata_cache_hot_box.h`
- Same for metadata-cache effective checks.
- Phase 19-3a: UNLIKELY hint removal.
- Phase 19-3b: pass-down `env` to cold + legacy helpers.
2. `core/box/tiny_legacy_fallback_box.h`
- Phase 19-3b: add `_with_env` helper + keep wrapper.
3. `core/box/tiny_metadata_cache_hot_box.h`
- Phase 19-3b: add `_with_env` helper + keep wrapper.
4. (Optional) `core/box/hak_wrappers.inc.h`
- Pass `env` down from wrapper entry (alloc-side; removes one remaining gate).
---
@ -299,9 +305,9 @@ int free_tiny_fast_with_env(void* ptr, const HakmemEnvSnapshot* env);
### 4.1 Boundary Preservation
**L0 (ENV gate)**:
- `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0``env==NULL` (or not passed down), fallback to existing path
- `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=1``env` is passed down, new path
- Compile-time flag: `#if HAKMEM_ENV_SNAPSHOT_CONSOLIDATION` for complete removal
- `HAKMEM_ENV_SNAPSHOT=0``env==NULL` → fallback to per-feature env gates
- `HAKMEM_ENV_SNAPSHOT=1``env!=NULL` → snapshot-based checks
- (Optional) A dedicated “pass-down gate” can be introduced for A/B safety, but avoid adding a new hot-branch unless needed.
**L1 (Hot inline)**:
- No algorithmic changes, only ENV check consolidation
@ -333,23 +339,13 @@ int free_tiny_fast_with_env(void* ptr, const HakmemEnvSnapshot* env);
**Runtime rollback**:
```sh
HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0 # Disable new path
```
**Compile-time rollback**:
```c
#if HAKMEM_ENV_SNAPSHOT_CONSOLIDATION
// New context path
#else
// Old scattered ENV checks (preserved)
#endif
HAKMEM_ENV_SNAPSHOT=0 # Disable snapshot path (falls back to per-feature env gates)
```
**Gradual rollout**:
1. Phase 19-3a: UNLIKELY hint removal (DONE, GO)
2. Phase 19-3b: wrapper pass-down to hot helpers (measure)
3. Phase 19-3c: legacy + metadata pass-down (measure)
4. Graduate: add to `MIXED_TINYV3_C7_SAFE` preset if GO
2. Phase 19-3b: hot helper pass-down (DONE, GO)
3. Phase 19-3c: optional wrapper-entry pass-down (alloc-side; measure)
### 4.4 Observability
@ -366,15 +362,12 @@ typedef struct {
**Perf validation**:
- Before: `perf record` shows `hakmem_env_snapshot_enabled` at ~7%
- After: `hakmem_env_snapshot_enabled` should drop to <1%
- Expected: deep helpers stop calling `hakmem_env_snapshot_enabled()` repeatedly (only wrapper entry remains)
- Expected: deep helpers stop calling `hakmem_env_snapshot_enabled()` repeatedly (single capture per hot call)
**A/B testing**:
```sh
# Baseline (Phase 19-2 state)
HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0 ./bench_random_mixed_hakmem 200000000 400 1
# Optimized (Phase 19-3)
HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=1 ./bench_random_mixed_hakmem 200000000 400 1
# Recommended: compare baseline vs optimized commits with the same bench script
scripts/run_mixed_10_cleanenv.sh
```
---
@ -501,7 +494,7 @@ HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=1 ./bench_random_mixed_hakmem 200000000 400 1
- [ ] Add `malloc_tiny_fast_ctx()` variant (Phase 19-3a)
- [ ] Add `free_tiny_fast_ctx()` variant (Phase 19-3b)
- [ ] Propagate context to `tiny_legacy_fallback_box.h` (Phase 19-3c)
- [ ] Add ENV gate `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0/1`
- [ ] (Optional) Add a dedicated pass-down gate if A/B within a single binary is needed
- [ ] Add stats counters (debug builds)
### 7.3 Testing (Per Phase)
@ -528,13 +521,8 @@ HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=1 ./bench_random_mixed_hakmem 200000000 400 1
**Benchmark suite**:
```sh
# Baseline (Phase 19-2)
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0 \
scripts/run_mixed_10_cleanenv.sh
# Optimized (Phase 19-3)
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=1 \
scripts/run_mixed_10_cleanenv.sh
# Run the same cleanenv script on baseline vs optimized commits
scripts/run_mixed_10_cleanenv.sh
```
**GO/NO-GO criteria**:
@ -568,7 +556,7 @@ perf stat -e cycles,instructions,branches,branch-misses,L1-icache-load-misses \
**Timeline**: 4-6 hours implementation + 2 hours testing
**Risk**: LOW (isolated to alloc path)
**Rollback**: `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0`
**Rollback**: revert Phase 19-3b commit (or set `HAKMEM_ENV_SNAPSHOT=0` to disable snapshot path)
### 8.2 Phase 19-3b: free Path (Week 1)
@ -580,7 +568,7 @@ perf stat -e cycles,instructions,branches,branch-misses,L1-icache-load-misses \
**Timeline**: 4-6 hours implementation + 2 hours testing
**Risk**: LOW-MEDIUM (more call sites than malloc)
**Rollback**: `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0`
**Rollback**: revert Phase 19-3b commit (or set `HAKMEM_ENV_SNAPSHOT=0` to disable snapshot path)
### 8.3 Phase 19-3c: Legacy + Metadata (Week 2)
@ -591,7 +579,7 @@ perf stat -e cycles,instructions,branches,branch-misses,L1-icache-load-misses \
**Timeline**: 3-4 hours implementation + 2 hours testing
**Risk**: MEDIUM (touches multiple boxes)
**Rollback**: `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0`
**Rollback**: revert Phase 19-3b commit (or set `HAKMEM_ENV_SNAPSHOT=0` to disable snapshot path)
### 8.4 Graduate (Week 2-3)
@ -602,15 +590,15 @@ perf stat -e cycles,instructions,branches,branch-misses,L1-icache-load-misses \
- Perf validation confirms instruction reduction
**Promotion actions**:
1. Add to `MIXED_TINYV3_C7_SAFE` preset: `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=1`
1. Ensure `MIXED_TINYV3_C7_SAFE` preset keeps `HAKMEM_ENV_SNAPSHOT=1` (already)
2. Document in optimization roadmap
3. Update Box Theory index
4. Keep ENV default=0 (opt-in) until production validation
**Rollback strategy**:
- Preset level: Remove from preset, keep code
- Code level: `#if HAKMEM_ENV_SNAPSHOT_CONSOLIDATION` disable at compile time
- Emergency: `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0` global default
- Code level: revert the Phase 19-3b commit
- Emergency: set `HAKMEM_ENV_SNAPSHOT=0` (falls back to per-feature env gates)
---

View File

@ -46,14 +46,33 @@ scripts/run_mixed_10_cleanenv.sh
**GO Criteria**: No regression (±1%)
**Status**: DONE ✅ (GO +4.42%)
---
### Phase 19-3b: Pass snapshot down from wrapper entry
### Phase 19-3b: Pass snapshot down (hot helper pass-down)
**Status**: DONE ✅ (GO +2.76% mean / +2.57% median)
Results: `docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_3B_ENV_SNAPSHOT_PASSDOWN_AB_TEST_RESULTS.md`
**Current state**: Each callee calls `hakmem_env_snapshot_enabled()` independently
- 5 calls × 2 TLS reads each = **10 TLS reads/op**
**Proposed**:
**Implementation (landed)**:
- `core/front/malloc_tiny_fast.h`
- Capture `env` once per hot call and pass it down:
- `free_tiny_fast()` / `free_tiny_fast_hot()` capture `env` once
- `free_tiny_fast_cold(..., env)` consumes it
- `tiny_legacy_fallback_free_base_with_env(..., env)` consumes it
- Reuse the same snapshot for alloc route selection:
- `tiny_policy_hot_get_route_with_env(class_idx, env)`
- `core/box/tiny_legacy_fallback_box.h`: add `tiny_legacy_fallback_free_base_with_env(...)`
- `core/box/tiny_metadata_cache_hot_box.h`: add `tiny_policy_hot_get_route_with_env(...)`
**Optional extension (if chasing the last alloc-side gate)**:
Pass `env` down from `core/box/hak_wrappers.inc.h` entry. Keep the invariant: malloc miss must fall through (do not call `malloc_cold()` directly).
**Example (optional)**:
```c
// core/box/hak_wrappers.inc.h (malloc wrapper)
void* malloc(size_t size) {