Get the latest tech news

TScale – Distributed training on consumer GPUs


Contribute to Foreseerr/TScale development by creating an account on GitHub.

Optimized transformer architecture with faster convergence and ~2x reduced attention costs Support for fp8 and int8 model weights and activations precision Optimized for consumer nVidia GPUs including fast reduced precision training without sacrificing model quality CPU offload reduces GPU memory requirements for training Sync distributed training on several same config hosts 1-bit gradient compression allowing using regular ethernet links for interconnect Async distributed training on arbitrary hosts with negligible network traffic. By using inexpensive GPUs and async distributed mode TScale trains LLMs fast and affordable. To use multiple GPU devices set DEVICE_COUNT variable in train script to number of GPUs to use.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of training

training

Photo of consumer GPUs

consumer GPUs

Photo of TScale

TScale

Related news:

News photo

I used Boston Marathon runners' official smartwatch stats to help with my training

News photo

LLMs can see and hear without any training

News photo

Gemma 3 QAT Models: Bringing AI to Consumer GPUs