Get the latest tech news
Implementing Natural Conversational Agents with Elixir
In my last post, I discussed some work I had done building Nero, the assistant of the future that I’ve always wanted. I ended up creating an end-to-end example which used Nx, OpenAI APIs, and…
It implements dynamic batching, encapsulates pre-processing, inference, and post-processing, supports distribution and load-balancing between multiple GPUs natively, and in general is an extremely easy way to serve machine learning models. After posting my original demonstration, I already knew there were some very easy optimizations I could make, so I set out to improve the average latency of my implementation as much as possible in a short amount of time. For the original demo, I used 1000 ms, which basically means a speaker has to stop talking for a full second before we’ll even start the long response process.
Or read this on Hacker News