Update CURRENT_TASK after Mid MT removal
This commit is contained in:
@ -1,53 +1,21 @@
|
|||||||
## HAKMEM Bug Investigation: OOM Spam (ACE 33KB) - December 1, 2025
|
## HAKMEM 状況メモ (2025-12-XX 更新)
|
||||||
|
|
||||||
### Objective
|
### 現在の状態
|
||||||
Investigate and provide a mechanism to diagnose "OOM spam caused by continuous NULL returns for ACE 33KB allocations." The goal is to distinguish between:
|
- Mid MT 層を完全撤去(コード・ビルド依存・free 早期分岐を削除)し、Mid/Large は ACE+Pool の一本化。
|
||||||
1. Threshold issues (size class rounding)
|
- Mid W_MAX を 2.0 に緩和し、32–52KB Bridge クラス経路が確実に当たるよう調整。33KB 帯のセグフォは解消済み。
|
||||||
2. Cache exhaustion (pool empty)
|
- free ラッパーは Superslab/Tiny ガードを維持しつつ、Mid/L2/L25 へのルートを確実化(Superslab 未登録 Tiny は無視、Mid/L2/L25 は分類+レジストリで捕捉)。
|
||||||
3. Mapping failures (OS mmap failure)
|
- Mid/L2/L25 ラップ判定はデフォルト ON(`HAKMEM_WRAP_L2=0` / `HAKMEM_WRAP_L25=0` で OFF)。ネスト再帰のみブロック。
|
||||||
|
|
||||||
### Work Performed & Resolution
|
### 直近の成果
|
||||||
|
- bench 再現: `./bench_mid_large_mt_hakmem 4 20000 1024 4` 完走、ACE-FAIL スパムもなし。
|
||||||
|
- Mid MT のビルド/初期化/依存をすべて除去、Makefile も整理。
|
||||||
|
|
||||||
1. **Implemented ACE Tracing**:
|
### 利用のポイント
|
||||||
* Added a runtime-controlled tracing mechanism via the `HAKMEM_ACE_TRACE=1` environment variable.
|
- 33KB 帯の挙動確認は ACE/Pool のみで実施。断片化調整は `HAKMEM_WMAX_MID`(デフォルト 2.0)で行う。
|
||||||
* Instrumentation was added to `core/hakmem_ace.c`, `core/hakmem_pool.c`, and `core/hakmem_l25_pool.c` to log specific failure reasons to `stderr`.
|
- Tiny ヘッダー誤分類防止: Superslab 登録必須チェックを free/fast-free で維持。
|
||||||
* Log messages distinguish between `[ACE-FAIL] Threshold`, `[ACE-FAIL] Exhaustion`, and `[ACE-FAIL] MapFail`.
|
- 旧 Mid MT が必要な場合は別ブランチ/過去コミットを参照(現行ブランチには存在しない)。
|
||||||
|
|
||||||
2. **Resolved Build & Linkage Issues**:
|
### 残タスク/提案
|
||||||
* **Undefined Symbol `classify_ptr`**: Identified that `core/box/front_gate_classifier.c` was not correctly linked into `libhakmem.so`. The `Makefile` was updated to include `core/box/front_gate_classifier_shared.o` in the `SHARED_OBJS` list.
|
1. docs/benchmarks/scripts の Mid MT 関連ドキュメント・スクリプトを整理/アーカイブ。
|
||||||
* **Removed Temporary Debug Logs**: All interim `write(2, ...)` and `fprintf(stderr, ...)` debug statements introduced during the investigation have been removed to restore a clean code state.
|
2. W_MAX/Cap の軽量 A/B(環境変数で OK)でフットプリント vs ヒット率を再計測。
|
||||||
|
3. `core/box/front_gate_classifier.d`, `hakmem.d`, `mimalloc-bench` の dirty 表示は必要に応じて無視/クリーン。
|
||||||
3. **Clarified `malloc` Wrapper Behavior**:
|
|
||||||
* Discovered that `libhakmem.so`'s `malloc` wrapper had logic to force fallback to `libc`'s `malloc` for larger allocations (`> TINY_MAX_SIZE`) and when `jemalloc` was detected, especially under `LD_PRELOAD`.
|
|
||||||
* This was preventing 33KB allocations from reaching the `hakmem` ACE layer.
|
|
||||||
* **Solution**: Identified the necessary environment variables to disable these bypasses for testing purposes: `HAKMEM_LD_SAFE=0` and `HAKMEM_LD_BLOCK_JEMALLOC=0`.
|
|
||||||
|
|
||||||
4. **Verified Trace Functionality**:
|
|
||||||
* A test program (`test_ace_trace.c`) was used to allocate 33KB.
|
|
||||||
* By setting `HAKMEM_WMAX_MID=1.01` and `HAKMEM_WMAX_LARGE=1.01` (to force threshold failures), the `[ACE-FAIL] Threshold` logs were successfully generated, confirming the tracing mechanism works as intended.
|
|
||||||
|
|
||||||
### How to Use the Trace Feature (for Users)
|
|
||||||
|
|
||||||
To diagnose the 33KB OOM spam issue in your application:
|
|
||||||
|
|
||||||
1. **Ensure Correct `libhakmem.so` Build**:
|
|
||||||
Make sure `libhakmem.so` is built without `POOL_TLS_PHASE1` enabled (e.g., `make shared POOL_TLS_PHASE1=0`). The current `libhakmem.so` reflects this.
|
|
||||||
|
|
||||||
2. **Run Your Application with Specific Environment Variables**:
|
|
||||||
```bash
|
|
||||||
export HAKMEM_FRONT_GATE_UNIFIED=0
|
|
||||||
export HAKMEM_SMALLMID_ENABLE=0
|
|
||||||
export HAKMEM_FORCE_LIBC_ALLOC=0
|
|
||||||
export HAKMEM_LD_BLOCK_JEMALLOC=0
|
|
||||||
export HAKMEM_ACE_TRACE=1 # Crucial for seeing the logs
|
|
||||||
export HAKMEM_WMAX_MID=1.60 # Use default or adjust as needed for W_MAX analysis
|
|
||||||
export HAKMEM_WMAX_LARGE=1.30 # Use default or adjust as needed for W_MAX analysis
|
|
||||||
export LD_PRELOAD=/path/to/hakmem/libhakmem.so
|
|
||||||
|
|
||||||
./your_application 2> stderr.log # Redirect stderr to a file for analysis
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Analyze `stderr.log`**:
|
|
||||||
Look for `[ACE-FAIL]` messages to determine if the issue is a `Threshold` (e.g., `size=33000 wmax=...`), `Exhaustion` (pool empty), or `MapFail` (OS allocation error). This will provide the necessary data to pinpoint the root cause of the OOM spam.
|
|
||||||
|
|
||||||
This setup will allow for precise diagnosis of 33KB allocation failures within the hakmem ACE component.
|
|
||||||
|
|||||||
Reference in New Issue
Block a user