Commit Graph

1 Commits

Author SHA1 Message Date
13e42b3ce6 Tiny: classify_ptr optimization via header-based fast path
Implemented header-based classification to reduce classify_ptr overhead
from 3.74% (registry lookup: 50-100 cycles) to 2-5 cycles (header read).

Changes:
- core/box/front_gate_classifier.c: Add header-based fast path
  - Step 1: Read header at ptr-1 (same-page safety check)
  - Step 2: Check magic byte (0xa0=Tiny, 0xb0=Pool TLS)
  - Step 3: Fall back to registry lookup if needed
- TINY_PERF_PROFILE_EXTENDED.md: Extended perf analysis (1M iterations)

Results (100K iterations, 3-run average):
- 256B: 7.68M → 8.66M ops/s (+12.8%) 
- 128B: 8.76M → 8.08M ops/s (-7.8%) ⚠️

Key Findings:
- classify_ptr overhead reduced (3.74% → estimated ~2%)
- 256B shows clear improvement
- 128B regression likely due to measurement variance or increased
  header read overhead (needs further investigation)

Design:
- Reuses existing magic byte infrastructure (0xa0/0xb0)
- Maintains safety with same-page boundary check
- Preserves fallback to registry for edge cases
- Zero changes to allocation/free paths (pure classification opt)

Performance Analysis:
- Fast path: 2-5 cycles (L1 hit, direct header read)
- Slow path: 50-100 cycles (registry lookup, unchanged)
- Expected fast path hit rate: >99% (most allocations on-page)

Next Steps:
- Phase B: TinyFrontC23Box for C2/C3 dedicated fast path
- Target: 8-9M → 15-20M ops/s

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 18:20:35 +09:00