Get the latest tech news
Cost of self hosting Llama-3 8B-Instruct
All blog post from the lytix.ai team
For sake of simplicity, assuming an average input:output ratio, that means per 1M tokens they charge $1 and thats the number to beat. This was dead simple and just involved me installing ray and vllm via pip3 and then changing my docker entry point to: Although this approach does come with negatives such as having to manage and scale your own hardware, it does seem to be possible to undercut the prices that ChatGPT offer by a significant amount in theory.
Or read this on Hacker News