Files
hakmem/docs/TINY_C7_1KB_SEGV_TRIAGE.md

3.7 KiB
Raw Blame History

TINY 1KB (class7) SEGV Triage Plan

Scope

  • Reproducible SEGV on fixed-size 1KB bench: ./bench_fixed_size_hakmem 200000 1024 128
  • Persists with Direct-FC OFF. Likely in non-direct P0 or legacy refill path.
  • Goal: isolate failing path, capture backtrace, prove root cause, and patch with minimal deltas.

Quick Repro Matrix

  • Release (baseline):
    • ./build.sh release bench_fixed_size_hakmem
    • ./bench_fixed_size_hakmem 200000 1024 128 → SEGV
  • Disable P0 (all classes):
    • HAKMEM_TINY_P0_DISABLE=1 ./bench_fixed_size_hakmem 200000 1024 128 → Check (SEGV persists?)
  • Disable remote drain:
    • HAKMEM_TINY_P0_NO_DRAIN=1 ./bench_fixed_size_hakmem 200000 1024 128 → Check
  • Assume 1T (disable remote side-table):
    • HAKMEM_TINY_ASSUME_1T=1 ./bench_fixed_size_hakmem 200000 1024 128 → Check

Debug Build + Guards

  1. Build debug flavor
    • ./build.sh debug bench_fixed_size_hakmem
  2. Strong safety/guards
    • export HAKMEM_TINY_SAFE_FREE_STRICT=1
    • export HAKMEM_TINY_DEBUG_REMOTE_GUARD=1
    • export HAKMEM_INVALID_FREE_LOG=1
    • export HAKMEM_TINY_RF_FORCE_NOTIFY=1
  3. Run under gdb
    • gdb --args ./bench_fixed_size_hakmem 200000 1024 128
    • (gdb) run
    • On crash: (gdb) bt, (gdb) frame 0, (gdb) p/x *meta, (gdb) p tls->slab_idx, (gdb) p tls->ss, (gdb) p meta->used, (gdb) p meta->carved, (gdb) p meta->capacity

Hypotheses (ranked)

  1. Capacity/stride mismatch in class7 carve
    • class7 uses stride=1024 (no 1B header). Any code calculating with bs = class_size + 1 will overstep.
    • Check: superslab_init_slab() capacity, and any linear carve helper uses the same stride consistently.
  2. TLS slab switch with stale pointers (already fixed for P0 direct path; check legacy/P0-general)
    • After superslab_refill(), ensure tls = &g_tls_slabs[c]; meta = tls->meta; reloaded before counters/linear carve.
  3. Remote drain corrupts freelist
    • Verify sentinel cleared; ensure drain happens before freelist pop; check class7 path uses same ordering.

Files to Inspect

  • core/tiny_superslab_alloc.inc.h (superslab_refill, adopt_bind_if_safe, stride/capacity)
  • core/hakmem_tiny_refill.inc.h (legacy SLL refill, carve/pop ordering, bounds checks)
  • core/hakmem_tiny_refill_p0.inc.h (P0 general path C7 is currently guarded OFF for direct-FC; confirm P0 batch not entering for C7)
  • core/superslab/superslab_inline.h (remote drain, sentinel guard)

Instrumentation to Add (debug-only)

  • In superslab_init_slab(ss, idx, class_size, tid):
    • Compute stride = class_size + (class_idx != 7 ? 1 : 0); assert meta->capacity == usable/stride.
  • In linear carve path (legacy + P0-general):
    • Before write: assert meta->carved < meta->capacity; compute base and assert ptr < slab_base+usable.
  • After superslab_refill() in any loop: rebind tls/meta unconditionally.

Bisect Switches

  • Kill P0 entirely: HAKMEM_TINY_P0_DISABLE=1
  • Skip remote drain: HAKMEM_TINY_P0_NO_DRAIN=1
  • Assume ST mode: HAKMEM_TINY_ASSUME_1T=1
  • Disable simplified refills (if applicable): HAKMEM_TINY_SIMPLE_REFILL=0 (add if not present)

Patch Strategy (expected minimal fix)

  1. Make class7 stride consistently 1024 in all carve paths (no +1 header). Audit bs computations.
  2. Ensure tls/meta rebind after every superslab_refill() in non-direct paths.
  3. Enforce drain-before-pop ordering and sentinel clear.

Acceptance Criteria

  • ./bench_fixed_size_hakmem 200000 1024 128 passes 3/3 without SEGV.
  • Debug counters show active_delta == taken (no mismatch).
  • No invalid-free logs under STRICT mode.

Notes

  • We already defaulted C7 DirectFC to OFF and guarded P0 entry for C7 unless explicitly enabled (HAKMEM_TINY_P0_C7_ENABLE=1).
  • Focus on legacy/P0-general carve paths for C7.