Get the latest tech news
Effort – a possibly new algorithm for LLM Inference
A possibly new algorithm for LLM Inference. Adjust smoothly - and in real time - how many calculations you'd like to do during inference.
With Effort you can adjust smoothly - and in real time - how many calculations you'd like to do during inference of an LLM model. The multiplications are fast, but inference overall is still slightly lacking because some non-essential parts - softmax etc - need improvement. The pink line is actual speed, with a suboptimal implementation of the overhead (calculating norms, attn scores etc).
Or read this on Hacker News