Update CURRENT_TASK after Mid MT removal

This commit is contained in:
Moe Charm (CI)
2025-12-02 00:53:26 +09:00
parent f1b7964ef9
commit 7e3c3d6020

View File

@ -1,53 +1,21 @@
## HAKMEM Bug Investigation: OOM Spam (ACE 33KB) - December 1, 2025
## HAKMEM 状況メモ (2025-12-XX 更新)
### Objective
Investigate and provide a mechanism to diagnose "OOM spam caused by continuous NULL returns for ACE 33KB allocations." The goal is to distinguish between:
1. Threshold issues (size class rounding)
2. Cache exhaustion (pool empty)
3. Mapping failures (OS mmap failure)
### 現在の状態
- Mid MT 層を完全撤去コード・ビルド依存・free 早期分岐を削除し、Mid/Large は ACE+Pool の一本化。
- Mid W_MAX を 2.0 に緩和し、3252KB Bridge クラス経路が確実に当たるよう調整。33KB 帯のセグフォは解消済み。
- free ラッパーは Superslab/Tiny ガードを維持しつつ、Mid/L2/L25 へのルートを確実化Superslab 未登録 Tiny は無視、Mid/L2/L25 は分類+レジストリで捕捉)。
- Mid/L2/L25 ラップ判定はデフォルト ON`HAKMEM_WRAP_L2=0` / `HAKMEM_WRAP_L25=0` で OFF。ネスト再帰のみブロック。
### Work Performed & Resolution
### 直近の成果
- bench 再現: `./bench_mid_large_mt_hakmem 4 20000 1024 4` 完走、ACE-FAIL スパムもなし。
- Mid MT のビルド/初期化/依存をすべて除去、Makefile も整理。
1. **Implemented ACE Tracing**:
* Added a runtime-controlled tracing mechanism via the `HAKMEM_ACE_TRACE=1` environment variable.
* Instrumentation was added to `core/hakmem_ace.c`, `core/hakmem_pool.c`, and `core/hakmem_l25_pool.c` to log specific failure reasons to `stderr`.
* Log messages distinguish between `[ACE-FAIL] Threshold`, `[ACE-FAIL] Exhaustion`, and `[ACE-FAIL] MapFail`.
### 利用のポイント
- 33KB 帯の挙動確認は ACE/Pool のみで実施。断片化調整は `HAKMEM_WMAX_MID`(デフォルト 2.0)で行う。
- Tiny ヘッダー誤分類防止: Superslab 登録必須チェックを free/fast-free で維持。
- 旧 Mid MT が必要な場合は別ブランチ/過去コミットを参照(現行ブランチには存在しない)。
2. **Resolved Build & Linkage Issues**:
* **Undefined Symbol `classify_ptr`**: Identified that `core/box/front_gate_classifier.c` was not correctly linked into `libhakmem.so`. The `Makefile` was updated to include `core/box/front_gate_classifier_shared.o` in the `SHARED_OBJS` list.
* **Removed Temporary Debug Logs**: All interim `write(2, ...)` and `fprintf(stderr, ...)` debug statements introduced during the investigation have been removed to restore a clean code state.
3. **Clarified `malloc` Wrapper Behavior**:
* Discovered that `libhakmem.so`'s `malloc` wrapper had logic to force fallback to `libc`'s `malloc` for larger allocations (`> TINY_MAX_SIZE`) and when `jemalloc` was detected, especially under `LD_PRELOAD`.
* This was preventing 33KB allocations from reaching the `hakmem` ACE layer.
* **Solution**: Identified the necessary environment variables to disable these bypasses for testing purposes: `HAKMEM_LD_SAFE=0` and `HAKMEM_LD_BLOCK_JEMALLOC=0`.
4. **Verified Trace Functionality**:
* A test program (`test_ace_trace.c`) was used to allocate 33KB.
* By setting `HAKMEM_WMAX_MID=1.01` and `HAKMEM_WMAX_LARGE=1.01` (to force threshold failures), the `[ACE-FAIL] Threshold` logs were successfully generated, confirming the tracing mechanism works as intended.
### How to Use the Trace Feature (for Users)
To diagnose the 33KB OOM spam issue in your application:
1. **Ensure Correct `libhakmem.so` Build**:
Make sure `libhakmem.so` is built without `POOL_TLS_PHASE1` enabled (e.g., `make shared POOL_TLS_PHASE1=0`). The current `libhakmem.so` reflects this.
2. **Run Your Application with Specific Environment Variables**:
```bash
export HAKMEM_FRONT_GATE_UNIFIED=0
export HAKMEM_SMALLMID_ENABLE=0
export HAKMEM_FORCE_LIBC_ALLOC=0
export HAKMEM_LD_BLOCK_JEMALLOC=0
export HAKMEM_ACE_TRACE=1 # Crucial for seeing the logs
export HAKMEM_WMAX_MID=1.60 # Use default or adjust as needed for W_MAX analysis
export HAKMEM_WMAX_LARGE=1.30 # Use default or adjust as needed for W_MAX analysis
export LD_PRELOAD=/path/to/hakmem/libhakmem.so
./your_application 2> stderr.log # Redirect stderr to a file for analysis
```
3. **Analyze `stderr.log`**:
Look for `[ACE-FAIL]` messages to determine if the issue is a `Threshold` (e.g., `size=33000 wmax=...`), `Exhaustion` (pool empty), or `MapFail` (OS allocation error). This will provide the necessary data to pinpoint the root cause of the OOM spam.
This setup will allow for precise diagnosis of 33KB allocation failures within the hakmem ACE component.
### 残タスク/提案
1. docs/benchmarks/scripts の Mid MT 関連ドキュメント・スクリプトを整理/アーカイブ。
2. W_MAX/Cap の軽量 A/B環境変数で OKでフットプリント vs ヒット率を再計測。
3. `core/box/front_gate_classifier.d`, `hakmem.d`, `mimalloc-bench` の dirty 表示は必要に応じて無視/クリーン。