Get the latest tech news
ollama 0.11.9 Introducing A Nice CPU/GPU Performance Optimization
The ollama open-source software that makes it easy to run AI large language models (LLMs) across different operating systems, hardware, and models is about to enjoy a nice speed boost.
The ollama 0.11.9-rc0 test release was christened a short time ago and comes with a nice performance improvement. This ollama optimization comes from VMware engineer Daniel Hiltgen and is to build the graph for the next batch asynchronously for helping to keep the GPU busy. "This refactors the main run loop of the ollama runner to perform the main GPU intensive tasks (Compute+Floats) in a go routine so we can prepare the next batch in parallel to reduce the amount of time the GPU stalls waiting for the next batch of work.
Or read this on Phoronix