Get the latest tech news
DoppelBot: Replace Your CEO with an LLM
(quick links: add to your own Slack; source code)
Initial versions of the model were prone to generating short responses — unsurprising, because a majority of Slack communication is pretty terse. At inference time, loading the model with the LoRA adapter for a user takes 15-20s, so it’s important that we avoid doing this for every incoming request. app_mention: When the bot is mentioned in a channel, we retrieve the recent messages from that thread, do some basic cleaning and call the user’s model to generate a response.
Or read this on Hacker News