Files
hakmem/docs/status/archive/PHASE_6.16_UPDATE_2025_10_23.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

2.3 KiB
Raw Blame History

Phase 6.16 Update (20251023)

What changed (summary)

  • Tiny fast path simplified and sped up:
    • Removed owner pointer from TLS magazine items (smaller push/pop).
    • Added sampled counter updates for alloc/free (ENV: HAKMEM_TINY_COUNT_SAMPLE, default 1/256).
  • Mid overhead trimmed:
    • hits/misses/frees now updated with sampling (ENV: HAKMEM_POOL_COUNT_SAMPLE, default 1/256).
    • Extra metrics exposed for trylock attempts/success and ring underflow; learner writes CSV lines M,<ts>,<try>,<succ>,<under>,<rate> to HAKMEM_LOG_FILE.
  • Mid pagedescriptor registry (baseline):
    • 64KiB page → {class_idx, owner_tid} mini hash.
    • TLSowned pages registered with owner_tid; refill pages registered with owner_tid=0.
    • Free path consults descriptor when header is light (owner not set).
  • Header modes (Mid): HAKMEM_HDR_LIGHT now supports 0 (full), 1 (minimal), 2 (skip writes/validation; experimental).

Key ENV additions

  • HAKMEM_TINY_COUNT_SAMPLE, HAKMEM_POOL_COUNT_SAMPLE — 2^N sampling for hotpath counters.
  • HAKMEM_HDR_LIGHT=0|1|2 — header write policy for Mid.
  • Mid TLS twotier knobs: HAKMEM_TRYLOCK_PROBES, HAKMEM_RING_RETURN_DIV, HAKMEM_TLS_LO_MAX, HAKMEM_POOL_TLS_RING, HAKMEM_SHARD_MIX.
  • Tiny wrapper toggles: HAKMEM_WRAP_TINY=1, HAKMEM_WRAP_TINY_REFILL=1.

Quick A/B results (10s, larson, BURST)

  • Tiny 864B
    • 1T: hakmem 17.93M (prev ~14.6M) / mimalloc ~34.3M / system ~37.4M
    • 4T: hakmem 56.10M (prev ~30.9M) / mimalloc ~52.0M / system ~82.3M
    • Note: hakmem > mimalloc in 4T here due to TLS/mag simplifications.
  • Mid 232KiB
    • 1T: hakmem ~4.54.8M vs mimalloc ~1114.8M (needs TC + headerless hot path)
    • 4T: hakmem ~13.515.0M vs mimalloc ~2930M

Logs

  • Tiny: docs/benchmarks/20251023_035642_TINY_AB/summary.txt
  • Mid A/B (fast): docs/benchmarks/20251023_033008_AB_FAST_MID/summary.txt
  • Headtohead suites: docs/benchmarks/20251023_033358_HEAD2HEAD/summary.txt, ..._HEAD2HEAD_POST, ..._HDR2

Next steps (recommended)

  1. Mid Transfer Cache (minimal): (class,thread) MPSC + small budget drain in alloc; remove central freelist round trips on crossthread frees.
  2. Pagedescriptor routing: hold class/owner in descriptor, make Mid headerless on hot path (keep assertions in debug).
  3. Tune TLS parameters (ring cap 8/16, LIFO max 256/512) via scripts/ab_fast_mid.sh.