Files
hakmem/core/box/learner_env_box.h
Moe Charm (CI) 984cca41ef P0 Optimization: Shared Pool fast path with O(1) metadata lookup
Performance Results:
- Throughput: 2.66M ops/s → 3.8M ops/s (+43% improvement)
- sp_meta_find_or_create: O(N) linear scan → O(1) direct pointer
- Stage 2 metadata scan: 100% → 10-20% (80-90% reduction via hints)

Core Optimizations:

1. O(1) Metadata Lookup (superslab_types.h)
   - Added `shared_meta` pointer field to SuperSlab struct
   - Eliminates O(N) linear search through ss_metadata[] array
   - First access: O(N) scan + cache | Subsequent: O(1) direct return

2. sp_meta_find_or_create Fast Path (hakmem_shared_pool.c)
   - Check cached ss->shared_meta first before linear scan
   - Cache pointer after successful linear scan for future lookups
   - Reduces 7.8% CPU hotspot to near-zero for hot paths

3. Stage 2 Class Hints Fast Path (hakmem_shared_pool_acquire.c)
   - Try class_hints[class_idx] FIRST before full metadata scan
   - Uses O(1) ss->shared_meta lookup for hint validation
   - __builtin_expect() for branch prediction optimization
   - 80-90% of acquire calls now skip full metadata scan

4. Proper Initialization (ss_allocation_box.c)
   - Initialize shared_meta = NULL in superslab_allocate()
   - Ensures correct NULL-check semantics for new SuperSlabs

Additional Improvements:
- Updated ptr_trace and debug ring for release build efficiency
- Enhanced ENV variable documentation and analysis
- Added learner_env_box.h for configuration management
- Various Box optimizations for reduced overhead

Thread Safety:
- All atomic operations use correct memory ordering
- shared_meta cached under mutex protection
- Lock-free Stage 2 uses proper CAS with acquire/release semantics

Testing:
- Benchmark: 1M iterations, 3.8M ops/s stable
- Build: Clean compile RELEASE=0 and RELEASE=1
- No crashes, memory leaks, or correctness issues

Next Optimization Candidates:
- P1: Per-SuperSlab free slot bitmap for O(1) slot claiming
- P2: Reduce Stage 2 critical section size
- P3: Page pre-faulting (MAP_POPULATE)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 16:21:54 +09:00

34 lines
1.0 KiB
C
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// learner_env_box.h - Learning Layer ENV Box
// Purpose: Decide whether CAP Learner thread should run, based on HAKMEM_MODE
// and HAKMEM_LEARN, without touchingホットパス。
//
// Priority:
// 1. HAKMEM_LEARN is set → 0/1 で明示的に上書き
// 2. 未設定の場合:
// HAKMEM_MODE=learning/research → Learner 有効
// それ以外minimal/fast/balanced → Learner 無効
#pragma once
#include "../hakmem_config.h"
#include <stdlib.h>
static inline int hak_learner_env_should_run(void) {
static int g_inited = 0;
static int g_effective = 0;
if (__builtin_expect(!g_inited, 0)) {
const char* e = getenv("HAKMEM_LEARN");
if (e && *e) {
int v = atoi(e);
g_effective = (v != 0) ? 1 : 0;
} else {
HakemMode m = g_hakem_config.mode;
g_effective =
(m == HAKMEM_MODE_LEARNING || m == HAKMEM_MODE_RESEARCH) ? 1 : 0;
}
g_inited = 1;
}
return g_effective;
}