Get the latest tech news

Show HN: Open-source load balancer for llama.cpp


Stateful load balancer custom-tailored for llama.cpp - distantmagic/paddler

Typical strategies like round robin or least connections are not effective for llama.cpp servers, which need slots for continuous batching and concurrent requests. Paddler overcomes this by maintaining a stateful load balancer that is aware of each server's available slots, ensuring efficient request distribution. Additionally, Paddler uses agents to monitor the health of individual llama.cpp instances, providing feedback to the load balancer for optimal performance.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of llama.cpp

llama.cpp

Photo of source load balancer

source load balancer