Files
hakmem/docs/analysis/GEMINI_BIGCACHE_ANALYSIS.md

230 lines
6.4 KiB
Markdown
Raw Normal View History

# Gemini Analysis: BigCache heap-buffer-overflow
**Date**: 2025-10-21
**Status**: ✅ **Already Fixed** - Root cause identified, fix confirmed in code
---
## 🎯 Summary
Gemini analyzed a heap-buffer-overflow detected by AddressSanitizer and identified the root cause as **BigCache returning undersized blocks**.
**Critical finding**: BigCache was returning cached blocks smaller than requested size, causing memset() overflow.
**Fix status**: **Already implemented** in `hakmem_bigcache.c:151` with size check:
```c
if (slot->valid && slot->site == site && slot->actual_bytes >= size) {
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Size check prevents undersize returns
```
---
## 🔍 Root Cause Analysis (by Gemini)
### Error Sequence
1. **Iteration 0**: Benchmark requests **2.000MB** (2,097,152 bytes)
- `alloc_malloc()` allocates 2.000MB block
- Benchmark uses and frees the block
- `hak_free()``hak_bigcache_put()` caches it with `actual_bytes = 2,000,000`
- Block stored in size-class "2MB class"
2. **Iteration 1**: Benchmark requests **2.004MB** (2,101,248 bytes)
- Same size-class "2MB class" lookup
- **BUG**: BigCache returns 2.000MB block without checking `actual_bytes >= requested_size`
- Allocator returns 2.000MB block for 2.004MB request
3. **Overflow**: `memset()` at `bench_allocators.c:213`
- Tries to write 2.004MB (2,138,112 bytes in log)
- Block is only 2.000MB
- **heap-buffer-overflow** by ~4KB
### AddressSanitizer Log
```
heap-buffer-overflow on address 0x7f36708c1000
WRITE of size 2138112 at 0x7f36708c1000
#0 memset
#1 bench_cold_churn bench_allocators.c:213
freed by thread T0 here:
#1 bigcache_free_callback hakmem.c:526
#2 evict_slot hakmem_bigcache.c:96
#3 hak_bigcache_put hakmem_bigcache.c:182
previously allocated by thread T0 here:
#1 alloc_malloc hakmem.c:426
#2 allocate_with_policy hakmem.c:499
```
**Note**: "freed by thread T0" refers to BigCache internal "free slot" state, not OS-level deallocation.
---
## 🐛 Implementation Bug (Before Fix)
### Problem
BigCache was checking only **size-class match**, not **actual size sufficiency**:
```c
// WRONG (hypothetical buggy version)
int hak_bigcache_try_get(size_t size, uintptr_t site, void** out_ptr) {
int site_idx = hash_site(site);
int class_idx = get_class_index(size); // Same class for 2.000MB and 2.004MB
BigCacheSlot* slot = &g_cache[site_idx][class_idx];
if (slot->valid && slot->site == site) { // ❌ Missing size check!
*out_ptr = slot->ptr;
slot->valid = 0;
return 1; // Returns 2.000MB block for 2.004MB request
}
return 0;
}
```
### Two checks needed
1.**Size-class match**: Which class does the request belong to?
2.**Actual size sufficient**: `slot->actual_bytes >= requested_bytes`? (**MISSING**)
---
## ✅ Fix Implementation
### Current Code (Fixed)
**File**: `hakmem_bigcache.c:139-163`
```c
// Phase 6.4 P2: O(1) get - Direct table lookup
int hak_bigcache_try_get(size_t size, uintptr_t site, void** out_ptr) {
if (!g_initialized) hak_bigcache_init();
if (!is_cacheable(size)) return 0;
// O(1) calculation: site_idx, class_idx
int site_idx = hash_site(site);
int class_idx = get_class_index(size); // P3: branchless
// O(1) lookup: table[site_idx][class_idx]
BigCacheSlot* slot = &g_cache[site_idx][class_idx];
// ✅ Check: valid, matching site, AND sufficient size (Segfault fix!)
if (slot->valid && slot->site == site && slot->actual_bytes >= size) {
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FIX: Size sufficiency check
// Hit! Return and invalidate slot
*out_ptr = slot->ptr;
slot->valid = 0;
g_stats.hits++;
return 1;
}
// Miss (invalid, wrong site, or undersized)
g_stats.misses++;
return 0;
}
```
### Key Addition
Line 151:
```c
if (slot->valid && slot->site == site && slot->actual_bytes >= size) {
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Prevents undersize blocks
```
Comment confirms this was a known fix: `"AND sufficient size (Segfault fix!)"`
---
## 🧪 Verification
### Test Scenario (cold-churn benchmark)
```c
// bench_allocators.c cold_churn scenario
for (int i = 0; i < iterations; i++) {
size_t size = base_size + (i * increment);
// Iteration 0: 2,097,152 bytes (2.000MB)
// Iteration 1: 2,101,248 bytes (2.004MB) ← Would trigger bug
// Iteration 2: 2,105,344 bytes (2.008MB)
void* p = hak_alloc_cs(size);
memset(p, 0xAA, size); // ← Overflow point if undersized block
hak_free_cs(p);
}
```
### Expected Behavior (After Fix)
1. **Iteration 0**: Allocate 2.000MB → Use → Free → BigCache stores (`actual_bytes = 2,000,000`)
2. **Iteration 1**: Request 2.004MB
- BigCache checks: `slot->actual_bytes (2,000,000) >= size (2,004,000)`**FALSE**
- **Cache miss** → Allocate new 2.004MB block
- No overflow ✅
3. **Iteration 2**: Request 2.008MB
- Similar cache miss → New allocation
- No overflow ✅
---
## 📊 Gemini's Recommendations
### Recommendation 1: Add size check ✅ DONE
**Before**:
```c
if (slot->is_used) {
// Return block without size check
return slot->ptr;
}
```
**After** (Current implementation):
```c
if (slot->is_used && slot->actual_bytes >= requested_bytes) {
// Only return if size is sufficient
return slot->ptr;
}
```
### Recommendation 2: Fallback on undersize
If no suitable block found in cache:
```c
// If loop finds no sufficient block
return NULL; // Force new allocation via mmap
```
Current implementation handles this correctly by returning `0` (miss) on line 162.
---
## 🎯 Conclusion
**Status**: ✅ **Bug already fixed**
The heap-buffer-overflow issue identified by AddressSanitizer has been correctly diagnosed by Gemini and the fix is already implemented in the codebase.
**Key lesson**: Size-class caching requires **two-level checking**:
1. Class match (performance)
2. Actual size sufficiency (correctness)
**Code location**: `hakmem_bigcache.c:151`
**Comment evidence**: "AND sufficient size (Segfault fix!)" confirms this was a known issue that has been addressed.
---
## 📚 Related Documents
- **Phase 6.2**: [PHASE_6.2_ELO_IMPLEMENTATION.md](PHASE_6.2_ELO_IMPLEMENTATION.md) - BigCache design
- **Batch analysis**: [CHATGPT_PRO_BATCH_ANALYSIS.md](CHATGPT_PRO_BATCH_ANALYSIS.md) - Related optimization
- **Gemini consultation**: Background task `5cfad9` (2025-10-21)