768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU
Google's Gemma 4 Runs Frontier AI On A Single GPU
DeepSeek’s distilled new R1 AI model can run on a single GPU
Gemma3 – The current strongest model that fits on a single GPU