Phase 19-7: LARSON_FIX TLS Consolidation — NO-GO (-1.34%)
Goal: Eliminate 5 duplicate getenv("HAKMEM_TINY_LARSON_FIX") calls
- Create unified TLS cache box: tiny_larson_fix_tls_box.h
- Replace 5 separate static __thread blocks with single helper
Result: -1.34% throughput (54.55M → 53.82M ops/s)
- Expected: +0.3-0.7%
- Actual: -1.34%
- Decision: NO-GO, reverted immediately
Root cause: Compiler optimization works better with separate-scope TLS caches
- Each scope gets independent optimization
- Function call overhead outweighs duplication savings
- Rare case where duplication is optimal
Key learning: Not all code duplication is inefficient. Per-scope TLS
caching can outperform centralized caching when compiler can optimize
each scope independently.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -113,6 +113,13 @@
|
||||
|
||||
**Ref**:
|
||||
- `docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_6C_DUPLICATE_ROUTE_DEDUP_DESIGN.md`
|
||||
- `docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_6C_DUPLICATE_ROUTE_DEDUP_AB_TEST_RESULTS.md`
|
||||
|
||||
**Next**:
|
||||
- Phase 19-7: LARSON_FIX TLS consolidation(重複 `getenv("HAKMEM_TINY_LARSON_FIX")` を 1 箇所に集約)
|
||||
- Ref: `docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_7_LARSON_FIX_TLS_CONSOLIDATION_DESIGN.md`
|
||||
- Phase 20 (proposal): WarmPool slab_idx hint(warm hit の O(cap) scan を削る)
|
||||
- Ref: `docs/analysis/PHASE20_WARM_POOL_SLABIDX_HINT_1_DESIGN.md`
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user