FlashSpec
Adaptive speculative-decoding inference engine with Triton-optimised verification and online bandit draft selection.
Navigation
- Architecture — system design, sequence diagram, correctness guarantee
- Kernels — Triton kernel tiling, SRAM analysis, roofline
- Bandit — UCB1/Thompson, regret bound, windowed statistics
- Benchmarks — how to reproduce every paper number
- API Reference — auto-generated from docstrings