|
|
b52e1985e6
|
Phase 2-Opt2: Reduce SuperSlab default size to 512KB (+10-15% perf)
Changes:
- SUPERSLAB_LG_MIN: 20 → 19 (1MB → 512KB)
- SUPERSLAB_LG_DEFAULT: 21 → 19 (2MB → 512KB)
- SUPERSLAB_LG_MAX: 21 (unchanged, still allows 2MB)
Benchmark Results:
- ws=256: 72M → 79.80M ops/s (+10.8%, +7.8M ops/s)
- ws=1024: 56.71M → 65.07M ops/s (+14.7%, +8.36M ops/s)
Expected: +3-5% improvement
Actual: +10-15% improvement (EXCEEDED PREDICTION!)
Root Cause Analysis:
- Perf analysis showed shared_pool_acquire_slab at 23.83% CPU time
- Phase 1 removed memset overhead (+1.3%)
- Phase 2 reduces mmap allocation size by 75% (2MB → 512KB)
- Fewer page faults during SuperSlab initialization
- Better memory granularity (less VA space waste)
- Smaller allocations complete faster even without page faults
Technical Details:
- Each SuperSlab contains 8 slabs of 64KB (total 512KB)
- Previous: 16-32 slabs per SuperSlab (1-2MB)
- New: 8 slabs per SuperSlab (512KB)
- Refill frequency increases slightly, but init cost dominates
- Net effect: Major throughput improvement
Phase 1+2 Cumulative Improvement:
- Baseline: 64.61M ops/s
- Phase 1 final: 72.92M ops/s (+12.9%)
- Phase 2 final: 79.80M ops/s (+23.5% total, +9.4% over Phase 1)
Files Modified:
- core/hakmem_tiny_superslab_constants.h:12-33
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-28 18:16:32 +09:00 |
|
|
|
dc9e650db3
|
Tiny Pool redesign: P0.1, P0.3, P1.1, P1.2 - Out-of-band class_idx lookup
This commit implements the first phase of Tiny Pool redesign based on
ChatGPT architecture review. The goal is to eliminate Header/Next pointer
conflicts by moving class_idx lookup out-of-band (to SuperSlab metadata).
## P0.1: C0(8B) class upgraded to 16B
- Size table changed: {16,32,64,128,256,512,1024,2048} (8 classes)
- LUT updated: 1..16 → class 0, 17..32 → class 1, etc.
- tiny_next_off: C0 now uses offset 1 (header preserved)
- Eliminates edge cases for 8B allocations
## P0.3: Slab reuse guard Box (tls_slab_reuse_guard_box.h)
- New Box for draining TLS SLL before slab reuse
- ENV gate: HAKMEM_TINY_SLAB_REUSE_GUARD=1
- Prevents stale pointers when slabs are recycled
- Follows Box theory: single responsibility, minimal API
## P1.1: SuperSlab class_map addition
- Added uint8_t class_map[SLABS_PER_SUPERSLAB_MAX] to SuperSlab
- Maps slab_idx → class_idx for out-of-band lookup
- Initialized to 255 (UNASSIGNED) on SuperSlab creation
- Set correctly on slab initialization in all backends
## P1.2: Free fast path uses class_map
- ENV gate: HAKMEM_TINY_USE_CLASS_MAP=1
- Free path can now get class_idx from class_map instead of Header
- Falls back to Header read if class_map returns invalid value
- Fixed Legacy Backend dynamic slab initialization bug
## Documentation added
- HAKMEM_ARCHITECTURE_OVERVIEW.md: 4-layer architecture analysis
- TLS_SLL_ARCHITECTURE_INVESTIGATION.md: Root cause analysis
- PTR_LIFECYCLE_TRACE_AND_ROOT_CAUSE_ANALYSIS.md: Pointer tracking
- TINY_REDESIGN_CHECKLIST.md: Implementation roadmap (P0-P3)
## Test results
- Baseline: 70% success rate (30% crash - pre-existing issue)
- class_map enabled: 70% success rate (same as baseline)
- Performance: ~30.5M ops/s (unchanged)
## Next steps (P1.3, P2, P3)
- P1.3: Add meta->active for accurate TLS/freelist sync
- P2: TLS SLL redesign with Box-based counting
- P3: Complete Header out-of-band migration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-28 13:42:39 +09:00 |
|
|
|
d2f0d84584
|
Phase 6-2.5: Fix SuperSlab alignment bug + refactor constants
## Problem: 53-byte misalignment mystery
**Symptom:** All SuperSlab allocations misaligned by exactly 53 bytes
```
[TRC_FAILFAST_PTR] stage=alloc_ret_align cls=7 ptr=0x..f835
offset=63541 (expected: 63488)
Diff: 63541 - 63488 = 53 bytes
```
## Root Cause (Ultrathink investigation)
**sizeof(SuperSlab) != hardcoded offset:**
- `sizeof(SuperSlab)` = 1088 bytes (actual struct size)
- `tiny_slab_base_for()` used: 1024 (hardcoded)
- `superslab_init_slab()` assumed: 2048 (in capacity calc)
**Impact:**
1. Memory corruption: 64-byte overlap with SuperSlab metadata
2. Misalignment: 1088 % 1024 = 64 (violates class 7 alignment)
3. Inconsistency: Init assumed 2048, but runtime used 1024
## Solution
### 1. Centralize constants (NEW)
**File:** `core/hakmem_tiny_superslab_constants.h`
- `SLAB_SIZE` = 64KB
- `SUPERSLAB_HEADER_SIZE` = 1088
- `SUPERSLAB_SLAB0_DATA_OFFSET` = 2048 (aligned to 1024)
- `SUPERSLAB_SLAB0_USABLE_SIZE` = 63488 (64KB - 2048)
- Compile-time validation checks
**Why 2048?**
- Round up 1088 to next 1024-byte boundary
- Ensures proper alignment for class 7 (1024-byte blocks)
- Previous: (1088 + 1023) & ~1023 = 2048
### 2. Update all code to use constants
- `hakmem_tiny_superslab.h`: `tiny_slab_base_for()` → use `SUPERSLAB_SLAB0_DATA_OFFSET`
- `hakmem_tiny_superslab.c`: `superslab_init_slab()` → use `SUPERSLAB_SLAB0_USABLE_SIZE`
- Removed hardcoded 1024, 2048 magic numbers
### 3. Add class consistency check
**File:** `core/tiny_superslab_alloc.inc.h:433-449`
- Verify `tls->ss->size_class == class_idx` before allocation
- Unbind TLS if mismatch detected
- Prevents using wrong block_size for calculations
## Status
⚠️ **INCOMPLETE - New issue discovered**
After fix, benchmark hits different error:
```
[TRC_FAILFAST] stage=freelist_next cls=7 node=0x...d474
```
Freelist corruption detected. Likely caused by:
- 2048 offset change affects free() path
- Block addresses no longer match freelist expectations
- Needs further investigation
## Files Modified
- `core/hakmem_tiny_superslab_constants.h` - NEW: Centralized constants
- `core/hakmem_tiny_superslab.h` - Use SUPERSLAB_SLAB0_DATA_OFFSET
- `core/hakmem_tiny_superslab.c` - Use SUPERSLAB_SLAB0_USABLE_SIZE
- `core/tiny_superslab_alloc.inc.h` - Add class consistency check
- `core/hakmem_tiny_init.inc` - Remove diet mode override (Phase 6-2.5)
- `core/hakmem_super_registry.h` - Remove debug output (cleaned)
- `PERFORMANCE_INVESTIGATION_REPORT.md` - Task agent analysis
## Next Steps
1. Investigate freelist corruption with 2048 offset
2. Verify free() path uses tiny_slab_base_for() correctly
3. Consider reverting to 1024 and fixing capacity calculation instead
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-07 21:45:20 +09:00 |
|