230 lines
6.4 KiB
Markdown
230 lines
6.4 KiB
Markdown
|
|
# Gemini Analysis: BigCache heap-buffer-overflow
|
||
|
|
|
||
|
|
**Date**: 2025-10-21
|
||
|
|
**Status**: ✅ **Already Fixed** - Root cause identified, fix confirmed in code
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🎯 Summary
|
||
|
|
|
||
|
|
Gemini analyzed a heap-buffer-overflow detected by AddressSanitizer and identified the root cause as **BigCache returning undersized blocks**.
|
||
|
|
|
||
|
|
**Critical finding**: BigCache was returning cached blocks smaller than requested size, causing memset() overflow.
|
||
|
|
|
||
|
|
**Fix status**: **Already implemented** in `hakmem_bigcache.c:151` with size check:
|
||
|
|
```c
|
||
|
|
if (slot->valid && slot->site == site && slot->actual_bytes >= size) {
|
||
|
|
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Size check prevents undersize returns
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🔍 Root Cause Analysis (by Gemini)
|
||
|
|
|
||
|
|
### Error Sequence
|
||
|
|
|
||
|
|
1. **Iteration 0**: Benchmark requests **2.000MB** (2,097,152 bytes)
|
||
|
|
- `alloc_malloc()` allocates 2.000MB block
|
||
|
|
- Benchmark uses and frees the block
|
||
|
|
- `hak_free()` → `hak_bigcache_put()` caches it with `actual_bytes = 2,000,000`
|
||
|
|
- Block stored in size-class "2MB class"
|
||
|
|
|
||
|
|
2. **Iteration 1**: Benchmark requests **2.004MB** (2,101,248 bytes)
|
||
|
|
- Same size-class "2MB class" lookup
|
||
|
|
- **BUG**: BigCache returns 2.000MB block without checking `actual_bytes >= requested_size`
|
||
|
|
- Allocator returns 2.000MB block for 2.004MB request
|
||
|
|
|
||
|
|
3. **Overflow**: `memset()` at `bench_allocators.c:213`
|
||
|
|
- Tries to write 2.004MB (2,138,112 bytes in log)
|
||
|
|
- Block is only 2.000MB
|
||
|
|
- **heap-buffer-overflow** by ~4KB
|
||
|
|
|
||
|
|
### AddressSanitizer Log
|
||
|
|
|
||
|
|
```
|
||
|
|
heap-buffer-overflow on address 0x7f36708c1000
|
||
|
|
WRITE of size 2138112 at 0x7f36708c1000
|
||
|
|
#0 memset
|
||
|
|
#1 bench_cold_churn bench_allocators.c:213
|
||
|
|
|
||
|
|
freed by thread T0 here:
|
||
|
|
#1 bigcache_free_callback hakmem.c:526
|
||
|
|
#2 evict_slot hakmem_bigcache.c:96
|
||
|
|
#3 hak_bigcache_put hakmem_bigcache.c:182
|
||
|
|
|
||
|
|
previously allocated by thread T0 here:
|
||
|
|
#1 alloc_malloc hakmem.c:426
|
||
|
|
#2 allocate_with_policy hakmem.c:499
|
||
|
|
```
|
||
|
|
|
||
|
|
**Note**: "freed by thread T0" refers to BigCache internal "free slot" state, not OS-level deallocation.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🐛 Implementation Bug (Before Fix)
|
||
|
|
|
||
|
|
### Problem
|
||
|
|
|
||
|
|
BigCache was checking only **size-class match**, not **actual size sufficiency**:
|
||
|
|
|
||
|
|
```c
|
||
|
|
// WRONG (hypothetical buggy version)
|
||
|
|
int hak_bigcache_try_get(size_t size, uintptr_t site, void** out_ptr) {
|
||
|
|
int site_idx = hash_site(site);
|
||
|
|
int class_idx = get_class_index(size); // Same class for 2.000MB and 2.004MB
|
||
|
|
|
||
|
|
BigCacheSlot* slot = &g_cache[site_idx][class_idx];
|
||
|
|
|
||
|
|
if (slot->valid && slot->site == site) { // ❌ Missing size check!
|
||
|
|
*out_ptr = slot->ptr;
|
||
|
|
slot->valid = 0;
|
||
|
|
return 1; // Returns 2.000MB block for 2.004MB request
|
||
|
|
}
|
||
|
|
|
||
|
|
return 0;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Two checks needed
|
||
|
|
|
||
|
|
1. ✅ **Size-class match**: Which class does the request belong to?
|
||
|
|
2. ❌ **Actual size sufficient**: `slot->actual_bytes >= requested_bytes`? (**MISSING**)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## ✅ Fix Implementation
|
||
|
|
|
||
|
|
### Current Code (Fixed)
|
||
|
|
|
||
|
|
**File**: `hakmem_bigcache.c:139-163`
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Phase 6.4 P2: O(1) get - Direct table lookup
|
||
|
|
int hak_bigcache_try_get(size_t size, uintptr_t site, void** out_ptr) {
|
||
|
|
if (!g_initialized) hak_bigcache_init();
|
||
|
|
if (!is_cacheable(size)) return 0;
|
||
|
|
|
||
|
|
// O(1) calculation: site_idx, class_idx
|
||
|
|
int site_idx = hash_site(site);
|
||
|
|
int class_idx = get_class_index(size); // P3: branchless
|
||
|
|
|
||
|
|
// O(1) lookup: table[site_idx][class_idx]
|
||
|
|
BigCacheSlot* slot = &g_cache[site_idx][class_idx];
|
||
|
|
|
||
|
|
// ✅ Check: valid, matching site, AND sufficient size (Segfault fix!)
|
||
|
|
if (slot->valid && slot->site == site && slot->actual_bytes >= size) {
|
||
|
|
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FIX: Size sufficiency check
|
||
|
|
|
||
|
|
// Hit! Return and invalidate slot
|
||
|
|
*out_ptr = slot->ptr;
|
||
|
|
slot->valid = 0;
|
||
|
|
|
||
|
|
g_stats.hits++;
|
||
|
|
return 1;
|
||
|
|
}
|
||
|
|
|
||
|
|
// Miss (invalid, wrong site, or undersized)
|
||
|
|
g_stats.misses++;
|
||
|
|
return 0;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Key Addition
|
||
|
|
|
||
|
|
Line 151:
|
||
|
|
```c
|
||
|
|
if (slot->valid && slot->site == site && slot->actual_bytes >= size) {
|
||
|
|
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Prevents undersize blocks
|
||
|
|
```
|
||
|
|
|
||
|
|
Comment confirms this was a known fix: `"AND sufficient size (Segfault fix!)"`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🧪 Verification
|
||
|
|
|
||
|
|
### Test Scenario (cold-churn benchmark)
|
||
|
|
|
||
|
|
```c
|
||
|
|
// bench_allocators.c cold_churn scenario
|
||
|
|
for (int i = 0; i < iterations; i++) {
|
||
|
|
size_t size = base_size + (i * increment);
|
||
|
|
// Iteration 0: 2,097,152 bytes (2.000MB)
|
||
|
|
// Iteration 1: 2,101,248 bytes (2.004MB) ← Would trigger bug
|
||
|
|
// Iteration 2: 2,105,344 bytes (2.008MB)
|
||
|
|
|
||
|
|
void* p = hak_alloc_cs(size);
|
||
|
|
memset(p, 0xAA, size); // ← Overflow point if undersized block
|
||
|
|
hak_free_cs(p);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Expected Behavior (After Fix)
|
||
|
|
|
||
|
|
1. **Iteration 0**: Allocate 2.000MB → Use → Free → BigCache stores (`actual_bytes = 2,000,000`)
|
||
|
|
2. **Iteration 1**: Request 2.004MB
|
||
|
|
- BigCache checks: `slot->actual_bytes (2,000,000) >= size (2,004,000)` → **FALSE**
|
||
|
|
- **Cache miss** → Allocate new 2.004MB block
|
||
|
|
- No overflow ✅
|
||
|
|
|
||
|
|
3. **Iteration 2**: Request 2.008MB
|
||
|
|
- Similar cache miss → New allocation
|
||
|
|
- No overflow ✅
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📊 Gemini's Recommendations
|
||
|
|
|
||
|
|
### Recommendation 1: Add size check ✅ DONE
|
||
|
|
|
||
|
|
**Before**:
|
||
|
|
```c
|
||
|
|
if (slot->is_used) {
|
||
|
|
// Return block without size check
|
||
|
|
return slot->ptr;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**After** (Current implementation):
|
||
|
|
```c
|
||
|
|
if (slot->is_used && slot->actual_bytes >= requested_bytes) {
|
||
|
|
// Only return if size is sufficient
|
||
|
|
return slot->ptr;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Recommendation 2: Fallback on undersize
|
||
|
|
|
||
|
|
If no suitable block found in cache:
|
||
|
|
```c
|
||
|
|
// If loop finds no sufficient block
|
||
|
|
return NULL; // Force new allocation via mmap
|
||
|
|
```
|
||
|
|
|
||
|
|
Current implementation handles this correctly by returning `0` (miss) on line 162.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🎯 Conclusion
|
||
|
|
|
||
|
|
**Status**: ✅ **Bug already fixed**
|
||
|
|
|
||
|
|
The heap-buffer-overflow issue identified by AddressSanitizer has been correctly diagnosed by Gemini and the fix is already implemented in the codebase.
|
||
|
|
|
||
|
|
**Key lesson**: Size-class caching requires **two-level checking**:
|
||
|
|
1. Class match (performance)
|
||
|
|
2. Actual size sufficiency (correctness)
|
||
|
|
|
||
|
|
**Code location**: `hakmem_bigcache.c:151`
|
||
|
|
|
||
|
|
**Comment evidence**: "AND sufficient size (Segfault fix!)" confirms this was a known issue that has been addressed.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📚 Related Documents
|
||
|
|
|
||
|
|
- **Phase 6.2**: [PHASE_6.2_ELO_IMPLEMENTATION.md](PHASE_6.2_ELO_IMPLEMENTATION.md) - BigCache design
|
||
|
|
- **Batch analysis**: [CHATGPT_PRO_BATCH_ANALYSIS.md](CHATGPT_PRO_BATCH_ANALYSIS.md) - Related optimization
|
||
|
|
- **Gemini consultation**: Background task `5cfad9` (2025-10-21)
|
||
|
|
|