Commit Graph

5 Commits

Author SHA1 Message Date
0c0d9c8c0b Unify Unified Cache API to BASE-only pointer type with Phantom typing
Core Changes:
- Modified: core/front/tiny_unified_cache.h
  * API signatures changed to use hak_base_ptr_t (Phantom type)
  * unified_cache_pop() returns hak_base_ptr_t (was void*)
  * unified_cache_push() accepts hak_base_ptr_t base (was void*)
  * unified_cache_pop_or_refill() returns hak_base_ptr_t (was void*)
  * Added #include "../box/ptr_type_box.h" for Phantom types

- Modified: core/front/tiny_unified_cache.c
  * unified_cache_refill() return type changed to hak_base_ptr_t
  * Uses HAK_BASE_FROM_RAW() for wrapping return values
  * Uses HAK_BASE_TO_RAW() for unwrapping parameters
  * Maintains internal void* storage in slots array

- Modified: core/box/tiny_front_cold_box.h
  * Uses hak_base_ptr_t from unified_cache_refill()
  * Uses hak_base_is_null() for NULL checks
  * Maintains tiny_user_offset() for BASE→USER conversion
  * Cold path refill integration updated to Phantom types

- Modified: core/front/malloc_tiny_fast.h
  * Free path wraps BASE pointer with HAK_BASE_FROM_RAW()
  * When pushing to Unified Cache via unified_cache_push()

Design Rationale:
- Unified Cache API now exclusively handles BASE pointers (no USER mixing)
- Phantom types enforce type distinction at compile time (debug mode)
- Zero runtime overhead in Release mode (macros expand to identity)
- Hot paths (tiny_hot_alloc_fast, tiny_hot_free_fast) remain unchanged
- Layout consistency maintained via tiny_user_offset() Box

Validation:
- All 25 Phantom type usage sites verified (25/25 correct)
- HAK_BASE_FROM_RAW(): 5/5 correct wrappings
- HAK_BASE_TO_RAW(): 1/1 correct unwrapping
- hak_base_is_null(): 4/4 correct NULL checks
- Compilation: RELEASE=0 and RELEASE=1 both successful
- Smoke tests: 3/3 passed (simple_alloc, loop 10M, pool_tls)

Type Safety Benefits:
- Prevents USER/BASE pointer confusion at API boundaries
- Compile-time checking in debug builds via Phantom struct
- Zero cost abstraction in release builds
- Clear intent: Unified Cache exclusively stores BASE pointers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 12:20:21 +09:00
f90e261c57 Complete Phase 1.2: Centralize layout definitions in tiny_layout_box.h
Changes:
- Updated ptr_conversion_box.h: Use TINY_HEADER_SIZE instead of hardcoded -1
- Updated tiny_front_hot_box.h: Use tiny_user_offset() for BASE->USER conversion
- Updated tiny_front_cold_box.h: Use tiny_user_offset() for BASE->USER conversion
- Added tiny_layout_box.h includes to both front box headers

Box theory: Layout parameters now isolated in dedicated Box component.
All offset arithmetic centralized - no scattered +1/-1 arithmetic.

Verified: Build succeeds (make clean && make shared -j8)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 17:18:31 +09:00
c2716f5c01 Implement Phase 2: Headerless Allocator Support (Partial)
- Feature: Added HAKMEM_TINY_HEADERLESS toggle (A/B testing)
- Feature: Implemented Headerless layout logic (Offset=0)
- Refactor: Centralized layout definitions in tiny_layout_box.h
- Refactor: Abstracted pointer arithmetic in free path via ptr_conversion_box.h
- Verification: sh8bench passes in Headerless mode (No TLS_SLL_HDR_RESET)
- Known Issue: Regression in Phase 1 mode due to blind pointer conversion logic
2025-12-03 12:11:27 +09:00
6154e7656c 根治修正: unified_cache_refill SEGVAULT + コンパイラ最適化対策
問題:
  - リリース版sh8benchでunified_cache_refill+0x46fでSEGVAULT
  - コンパイラ最適化により、ヘッダー書き込みとtiny_next_read()の
    順序が入れ替わり、破損したポインタをout[]に格納

根本原因:
  - ヘッダー書き込みがtiny_next_read()の後にあった
  - volatile barrierがなく、コンパイラが自由に順序を変更
  - ASan版では最適化が制限されるため問題が隠蔽されていた

修正内容(P1-P3):

P1: unified_cache_refill SEGVAULT修正 (core/front/tiny_unified_cache.c:341-350)
  - ヘッダー書き込みをtiny_next_read()の前に移動
  - __atomic_thread_fence(__ATOMIC_RELEASE)追加
  - コンパイラ最適化による順序入れ替えを防止

P2: 二重書き込み削除 (core/box/tiny_front_cold_box.h:75-82)
  - tiny_region_id_write_header()削除
  - unified_cache_refillが既にヘッダー書き込み済み
  - 不要なメモリ操作を削除して効率化

P3: tiny_next_read()安全性強化 (core/tiny_nextptr.h:73-86)
  - __atomic_thread_fence(__ATOMIC_ACQUIRE)追加
  - メモリ操作の順序を保証

P4: ヘッダー書き込みデフォルトON (core/tiny_region_id.h - ChatGPT修正)
  - g_write_headerのデフォルトを1に変更
  - HAKMEM_TINY_WRITE_HEADER=0で旧挙動に戻せる

テスト結果:
   unified_cache_refill SEGVAULT: 解消(sh8bench実行可能に)
   TLS_SLL_HDR_RESET: まだ発生中(別の根本原因、調査継続)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 09:57:12 +09:00
04186341c1 Phase 4-Step2: Add Hot/Cold Path Box (+7.3% performance)
Implemented Hot/Cold Path separation using Box pattern for Tiny allocations:

Performance Improvement (without PGO):
- Baseline (Phase 26-A):     53.3 M ops/s
- Hot/Cold Box (Phase 4-Step2): 57.2 M ops/s
- Gain: +7.3% (+3.9 M ops/s)

Implementation:
1. core/box/tiny_front_hot_box.h - Ultra-fast hot path (1 branch)
   - Removed range check (caller guarantees valid class_idx)
   - Inline cache hit path with branch prediction hints
   - Debug metrics with zero overhead in Release builds

2. core/box/tiny_front_cold_box.h - Slow cold path (noinline, cold)
   - Refill logic (batch allocation from SuperSlab)
   - Drain logic (batch free to SuperSlab)
   - Error reporting and diagnostics

3. core/front/malloc_tiny_fast.h - Updated to use Hot/Cold Boxes
   - Hot path: tiny_hot_alloc_fast() (1 branch: cache empty check)
   - Cold path: tiny_cold_refill_and_alloc() (noinline, cold attribute)
   - Clear separation improves i-cache locality

Branch Analysis:
- Baseline: 4-5 branches in hot path (range check + cache check + refill logic mixed)
- Hot/Cold Box: 1 branch in hot path (cache empty check only)
- Reduction: 3-4 branches eliminated from hot path

Design Principles (Box Pattern):
 Single Responsibility: Hot path = cache hit only, Cold path = refill/errors
 Clear Contract: Hot returns NULL on miss, Cold handles miss
 Observable: Debug metrics (TINY_HOT_METRICS_*) gated by NDEBUG
 Safe: Branch prediction hints (TINY_HOT_LIKELY/UNLIKELY)
 Testable: Isolated hot/cold paths, easy A/B testing

PGO Status:
- Temporarily disabled (build issues with __gcov_merge_time_profile)
- Will re-enable PGO in future commit after resolving gcc/lto issues
- Current benchmarks are without PGO (fair A/B comparison)

Other Changes:
- .gitignore: Added *.d files (dependency files, auto-generated)
- Makefile: PGO targets temporarily disabled (show informational message)
- build_pgo.sh: Temporarily disabled (show "PGO paused" message)

Next: Phase 4-Step3 (Front Config Box, target +5-8%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 11:58:37 +09:00