# Headerless Stability Debug Instructions (Root-Cause / Fail-Fast) Quality bar for this playbook: | Metric | Score | Notes | | --- | --- | --- | | Coverage | 9/10 | Seven root-cause candidates + multiple probes | | Actionability | 9/10 | Copy/pasteable bash + gdb/asan commands | | Time budget | 10-22h | Phased so we can stop after each milestone | | Expected success | 85-90% | Parallel probes + bisect safety net | Goal (Definition of Done) - Reproduce, isolate, and permanently fix the headerless instability with a verified regression test. - Fix must be A/B switchable and observable (Box Theory: isolate boxes, single boundary, backout flag). Scope and signals - Both Headerless OFF and Headerless ON crash: suggests shared path, not just hint box. - Observed symptoms: TLS_SLL integrity failures, invalid free() pointers, hangs in sh8bench/cfrac. Box Theory anchors (work inside clear boxes, fail-fast, reversible) - Box 2: Remote queue push/drain (no owner/publish side effects). - Box 3: Ownership CAS (only at bind boundary). - Box 4: Publish/Adopt boundary (single drain->bind->owner acquire point). - Hint box: tls_ss_hint cache (guarded by `HAKMEM_TINY_SS_TLS_HINT`). - Backouts: `HAKMEM_TINY_HEADERLESS`, `HAKMEM_TINY_SS_TLS_HINT`, `HAKMEM_TINY_SS_ADOPT`, `HAKMEM_TINY_RF_FORCE_NOTIFY`. --- ## Step-by-Step Flow ### 0) Pre-flight (15 min) - `ulimit -c unlimited`; ensure `git status -sb` clean enough to bisect. - Use single-thread first: `export HAKMEM_TINY_THREADS=1`. - Disable learn/ACE noise: `export HAKMEM_ACE_ENABLED=0 HAKMEM_LEARN=0`. - Keep artifacts: `mkdir -p debug_artifacts/headerless`. ### 1) Test Case 1 — Headerless OFF (control) ```bash cd /mnt/workdisk/public_share/hakmem make clean && make shared -j8 LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \ 2>&1 | tee debug_artifacts/headerless/tc1_off.log | tail -40 ``` Expected: completes with "Total elapsed time". If it crashes: the base path (non-headerless) is already broken -> focus on shared free/registry first. ### 2) Test Case 2 — Headerless ON, hint OFF ```bash make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0" LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \ 2>&1 | tee debug_artifacts/headerless/tc2_hdrless_nohint.log | tail -40 ``` Outcome tells us whether headerless core path (without hint) is already unstable. ### 3) Test Case 3 — Headerless ON, hint ON (Phase 1 path) ```bash make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \ 2>&1 | tee debug_artifacts/headerless/tc3_hdrless_hint.log | tail -40 ``` If TC2 passes and TC3 fails, suspect hint cache / adopt boundary; otherwise suspect shared box. ### 4) ASan pass (pinpoint corruption early) ```bash make clean && make asan-shared-alloc -j8 \ EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" LD_PRELOAD=./libhakmem_asan.so timeout 20 ./mimalloc-bench/out/bench/sh8bench \ 2>&1 | tee debug_artifacts/headerless/asan_hdrless.log | head -200 ``` If ASan is noisy, rerun with `HAKMEM_TINY_SS_TLS_HINT=0` to see if corruption follows the hint box. ### 5) GDB capture (first crash) ```bash make clean && make shared -j8 EXTRA_CFLAGS="-g -O1 -DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" gdb --args ./mimalloc-bench/out/bench/sh8bench (gdb) set environment LD_PRELOAD ./libhakmem.so (gdb) run (gdb) bt (gdb) frame 0 (gdb) info locals (gdb) x/4gx ptr # replace ptr with the crashing pointer ``` Save to `debug_artifacts/headerless/gdb_bt.txt`. ### 6) Git bisect (only after TC1 result is known) ```bash git bisect start git bisect bad HEAD git bisect good # e.g., pre f3f75ba3d if that was stable # For each step: make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" || exit 125 LD_PRELOAD=./libhakmem.so timeout 15 ./mimalloc-bench/out/bench/sh8bench && exit 0 || exit 1 ``` Record each verdict in `debug_artifacts/headerless/bisect_log.txt`. Reset with `git bisect reset` after. --- ## Root-Cause Candidates (7) and Probes 1) TLS hint cache stale/dangling (Box: hint) - Symptom: free() uses cached ss that was recycled; remote-dangling or wrong class. - Probe: log generation vs pointer range. ```c fprintf(stderr, "[HINT_LOOKUP] ptr=%p ss=%p gen=%llu magic=%llx\n", ptr, ss, ss ? (unsigned long long)ss->generation : 0, ss ? (unsigned long long)ss->magic : 0); ``` - A/B: `HAKMEM_TINY_SS_TLS_HINT=0` should fully remove this path. 2) TLS SLL normalize mismatch (Box: TLS SLL) - Symptom: headerless ptr hits queue expecting header offset. - Probe: in `core/box/tls_sll_box.h` around normalize/mismatch detection, log once: ```c fprintf(stderr, "[TLS_SLL_MISMATCH] ptr=%p has_hdr=%d expect_hdr=%d q=%s\n", ptr, actual_has_header, expected_has_header, queue_name); ``` - Check that `TLS_SLL_NORMALIZE_USERPTR/RAWPTR` is invoked at every push/pop boundary. 3) SuperSlab registry stale or race (Box: registry boundary) - Symptom: registry returns freed slab; hint and registry disagree. - Probe: add generation/epoch in TinySuperSlab and compare on lookup; assert `SUPERSLAB_MAGIC`. - A/B: force registry path only by turning hint off; compare crash locus. 4) Class index drift (Box: metadata) - Symptom: slab->class_idx corrupt -> wrong free list math. - Probe: after `slab_index_for()`, assert `class_idx < TINY_NUM_CLASSES`; log slab_idx/class_idx. - A/B: run small vs 1024-byte classes; see if only one class fails. 5) Magazine wrap/unwrap slip (Box: refill/magazine) - Symptom: pointer stored raw, read as user (or vice versa) in refill spill. - Probe: instrument `core/hakmem_tiny_refill.inc` around magazine push/pop; dump raw/user pointer deltas. - A/B: force refill slow path only: `export HAKMEM_TINY_MUST_ADOPT=1`. 6) Remote queue drain boundary breach (Box 2->4 boundary) - Symptom: remote drain merges freelist twice or skips owner check. - Probe: ring events or one-shot logs at `ss_remote_drain_to_freelist()` and adopt boundary: ```c fprintf(stderr, "[REMOTE_DRAIN] ss=%p slab=%d count_before=%u\n", ss, slab_idx, remote_counts[slab_idx]); ``` - A/B: `HAKMEM_TINY_SS_ADOPT=0` to see if crash is tied to adopt boundary logic. 7) Pointer wrap/unwrap toggle confusion (Box: pointer bridge) - Symptom: header offset applied twice or skipped. - Probe: assert alignment and expected delta at every `user_to_raw/raw_to_user` site in free path. - A/B: run with `HAKMEM_TINY_HEADERLESS=0` vs `1` with same workload; see if delta shows only in headerless. --- ## Data to Capture (single-pass, no log spam) - Logs: last 400 lines from each TC run; grep for `[TLS_SLL]`, `[HINT]`, `[REMOTE]`. - GDB: full `bt`, `frame 0`, `info locals`, and pointer dump. - ASan: first 150 lines including shadow/poison info. - Minimal repro: smallest C snippet or shell script that crashes within 30s. - Env stamp: `uname -a`, `lscpu | head -20`, `git rev-parse HEAD`. Format when reporting: ``` === TC1 (Headerless OFF) === Result: crash / hang / pass Last log lines: ... === TC2 (Headerless ON, hint OFF) === Result: ... === TC3 (Headerless ON, hint ON) === Result: ... === ASan === === GDB (first crash) === ``` --- ## Observability and Guardrails (Box Theory) - One-shot logs only; no continuous debug spam. Use counters where possible. - Keep boundary single: drain->bind->owner_acquire only inside refill/adopt; do not add side effects in remote push/publish. - Toggleable fixes: wrap new checks with `#if defined(DEBUG_HDRLESS)` or env flags so we can A/B quickly. - Fail-fast: `assert`/`abort` on invalid class_idx, magic, or out-of-range pointers instead of silently recovering. --- ## Decision Tree - TC1 fails -> shared free/registry bug; ignore hint; inspect pointer normalize + registry first. - TC1 passes, TC2 fails -> headerless core path bug; focus on pointer normalize and class_idx drift. - TC2 passes, TC3 fails -> hint cache or adopt boundary; focus on stale hint + generation checks. - ASan shows UAF/double-free -> instrument free path and magazine spill; gate hint off to see if corruption follows. - Bisect isolates commit -> fix there, keep A/B flag, add regression test. --- ## Timeline (target 10-22h) - 2-4h: run TC1-3, capture GDB/ASan, decide branch of decision tree. - 4-8h: instrument relevant box (from candidates), build A/B toggles, derive minimal repro. - 2-6h: root-cause confirmation with repro + ASan clean pass. - 2-4h: implement fix, add regression test, verify all three test cases + baseline perf smoke. --- ## Quick Command Reference ```bash # Clean builds make clean && make shared -j8 make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0" make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" make clean && make asan-shared-alloc -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" # Runs LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench LD_PRELOAD=./libhakmem_asan.so timeout 20 ./mimalloc-bench/out/bench/sh8bench # GDB essentials gdb --args ./mimalloc-bench/out/bench/sh8bench (gdb) set environment LD_PRELOAD ./libhakmem.so (gdb) run (gdb) bt (gdb) frame 0 (gdb) info locals # Bisect skeleton git bisect start git bisect bad HEAD git bisect good # build/test, mark good|bad|skip git bisect reset ```