hakmem

tomoaki/hakmem

Fork 0

Commit Graph

Author	SHA1	Message	Date
Moe Charm (CI)	b2e861db12	Phase 67a: Layout tax forensics foundation (SSOT + measurement box) Changes: - scripts/box/layout_tax_forensics_box.sh: New measurement harness * Baseline vs treatment 10-run throughput comparison * Automated perf stat collection (cycles, IPC, branches, misses, TLB) * Binary metadata (size, section info) * Output to results/layout_tax_forensics/ - docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_SSOT.md: Diagnostic reference * Decision tree: GO/NEUTRAL/NO-GO classification * Symptom→root-cause mapping (IPC/branch-miss/dTLB/cache-miss) * Phase 64 case study analysis (IPC 2.05→1.98) * Operational guidelines for Phase 67b+ optimizations - CURRENT_TASK.md: Phase 67a marked complete, operational Outcome: - Layout tax diagnosis now reproducible in single measurement pass - Enables fast GO/NO-GO decision for future code removal/reordering attempts - Foundation for M2 (55% target) structural exploration without regression risk 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-17 21:09:42 +09:00
Moe Charm (CI)	84f5034e45	Phase 68: PGO training set diversification (seed/WS expansion) Changes: - scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3) for reduced overfitting and better production workload representativeness - PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%) - CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active Results: - 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold) - M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp) - Stability: 10-run mean/median with <2.1% CV 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-17 21:08:17 +09:00
Moe Charm (CI)	b51b600e8d	Phase 4-Step1: Add PGO workflow automation (+6.25% performance) Implemented automated Profile-Guided Optimization workflow using Box pattern: Performance Improvement: - Baseline: 57.0 M ops/s - PGO-optimized: 60.6 M ops/s - Gain: +6.25% (within expected +5-10% range) Implementation: 1. scripts/box/pgo_tiny_profile_config.sh - 5 representative workloads 2. scripts/box/pgo_tiny_profile_box.sh - Automated profile collection 3. Makefile PGO targets: - pgo-tiny-profile: Build instrumented binaries - pgo-tiny-collect: Collect .gcda profile data - pgo-tiny-build: Build optimized binaries - pgo-tiny-full: Complete workflow (profile → collect → build → test) 4. Makefile help target: Added PGO instructions for discoverability Design: - Box化: Single responsibility, clear contracts - Deterministic: Fixed seeds (42) for reproducibility - Safe: Validation, error detection, timeout protection (30s/workload) - Observable: Progress reporting, .gcda verification (33 files generated) Workload Coverage: - Random mixed: 3 working set sizes (128/256/512 slots) - Tiny hot: 2 size classes (16B/64B) - Total: 5 workloads covering hot/cold paths Documentation: - PHASE4_STEP1_COMPLETE.md - Completion report - CURRENT_TASK.md - Phase 4 roadmap (Step 1 complete ✓) - docs/design/PHASE4_TINY_FRONT_BOX_DESIGN.md - Complete Phase 4 design Next: Phase 4-Step2 (Hot/Cold Path Box, target +10-15%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 11:28:38 +09:00

Author

SHA1

Message

Date

Moe Charm (CI)

b2e861db12

Phase 67a: Layout tax forensics foundation (SSOT + measurement box)

Changes:
- scripts/box/layout_tax_forensics_box.sh: New measurement harness
  * Baseline vs treatment 10-run throughput comparison
  * Automated perf stat collection (cycles, IPC, branches, misses, TLB)
  * Binary metadata (size, section info)
  * Output to results/layout_tax_forensics/

- docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_SSOT.md: Diagnostic reference
  * Decision tree: GO/NEUTRAL/NO-GO classification
  * Symptom→root-cause mapping (IPC/branch-miss/dTLB/cache-miss)
  * Phase 64 case study analysis (IPC 2.05→1.98)
  * Operational guidelines for Phase 67b+ optimizations

- CURRENT_TASK.md: Phase 67a marked complete, operational

Outcome:
- Layout tax diagnosis now reproducible in single measurement pass
- Enables fast GO/NO-GO decision for future code removal/reordering attempts
- Foundation for M2 (55% target) structural exploration without regression risk

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2025-12-17 21:09:42 +09:00

Moe Charm (CI)

84f5034e45

Phase 68: PGO training set diversification (seed/WS expansion)

Changes:
- scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3)
  for reduced overfitting and better production workload representativeness
- PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%)
- CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active

Results:
- 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold)
- M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp)
- Stability: 10-run mean/median with <2.1% CV

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2025-12-17 21:08:17 +09:00

Moe Charm (CI)

b51b600e8d

Phase 4-Step1: Add PGO workflow automation (+6.25% performance)

Implemented automated Profile-Guided Optimization workflow using Box pattern:

Performance Improvement:
- Baseline:      57.0 M ops/s
- PGO-optimized: 60.6 M ops/s
- Gain: +6.25% (within expected +5-10% range)

Implementation:
1. scripts/box/pgo_tiny_profile_config.sh - 5 representative workloads
2. scripts/box/pgo_tiny_profile_box.sh - Automated profile collection
3. Makefile PGO targets:
   - pgo-tiny-profile: Build instrumented binaries
   - pgo-tiny-collect: Collect .gcda profile data
   - pgo-tiny-build:   Build optimized binaries
   - pgo-tiny-full:    Complete workflow (profile → collect → build → test)
4. Makefile help target: Added PGO instructions for discoverability

Design:
- Box化: Single responsibility, clear contracts
- Deterministic: Fixed seeds (42) for reproducibility
- Safe: Validation, error detection, timeout protection (30s/workload)
- Observable: Progress reporting, .gcda verification (33 files generated)

Workload Coverage:
- Random mixed: 3 working set sizes (128/256/512 slots)
- Tiny hot: 2 size classes (16B/64B)
- Total: 5 workloads covering hot/cold paths

Documentation:
- PHASE4_STEP1_COMPLETE.md - Completion report
- CURRENT_TASK.md - Phase 4 roadmap (Step 1 complete ✓)
- docs/design/PHASE4_TINY_FRONT_BOX_DESIGN.md - Complete Phase 4 design

Next: Phase 4-Step2 (Hot/Cold Path Box, target +10-15%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-29 11:28:38 +09:00

3 Commits