Get the latest tech news
Show HN: We made glhf.chat โ run almost any open-source LLM, including 405B
Chat with open-source models
We use vLLM and a custom-built, autoscaling GPU scheduler to run (almost) any open-source large language model for you: just paste a link to the Hugging Face repo. Works with any full-weight or 4-bit AWQ repo on Hugging Face that vLLM supports, including: Meta Llama 3.1 405b Instruct (and 70b, and 8b) Qwen 2 72b Mixtral 8x22b Gemma 2 27b Deepseek V2 Coder Lite (support for the full model is in the works) Phi-3 Once the beta is over, we expect to significantly beat pricing of the major cloud GPU vendors due to our ability to run the models multi-tenant.
Or read this on Hacker News