Get the latest tech news
Show HN: Open-source load balancer for llama.cpp
Stateful load balancer custom-tailored for llama.cpp - distantmagic/paddler
Typical strategies like round robin or least connections are not effective for llama.cpp servers, which need slots for continuous batching and concurrent requests. Paddler overcomes this by maintaining a stateful load balancer that is aware of each server's available slots, ensuring efficient request distribution. Additionally, Paddler uses agents to monitor the health of individual llama.cpp instances, providing feedback to the load balancer for optimal performance.
Or read this on Hacker News