Get the latest tech news

Effort – a possibly new algorithm for LLM Inference

A possibly new algorithm for LLM Inference. Adjust smoothly - and in real time - how many calculations you'd like to do during inference.

With Effort you can adjust smoothly - and in real time - how many calculations you'd like to do during inference of an LLM model. The multiplications are fast, but inference overall is still slightly lacking because some non-essential parts - softmax etc - need improvement. The pink line is actual speed, with a suboptimal implementation of the overhead (calculating norms, attn scores etc).

Get the Android app

Or read this on Hacker News