Get the latest tech news

AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMs


Transformer-based large language models (LLMs) have demonstrated exceptional capabilities in sequence modeling and text generation, with improvements scaling proportionally with model size. However, the limitations of GPU memory have restricted LLM training accessibility for many researchers. Existing heterogeneous training methods significantly expand the scale of trainable models but introduce substantial communication overheads and CPU workloads. In this work, we propose AutoHete, an automatic and efficient heterogeneous training system compatible with both single-GPU and multi-GPU environments. AutoHete dynamically adjusts activation checkpointing, parameter offloading, and optimizer offloading based on the specific hardware configuration and LLM training needs. Additionally, we design a priority-based scheduling mechanism that maximizes the overlap between operations across training iterations, enhancing throughput. Compared to state-of-the-art heterogeneous training systems, AutoHete delivers a 1.32x~1.91x throughput improvement across various model sizes and training configurations.

View a PDF of the paper titled AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMs, by Zihao Zeng and 7 other authors Existing heterogeneous training methods significantly expand the scale of trainable models but introduce substantial communication overheads and CPU workloads. Additionally, we design a priority-based scheduling mechanism that maximizes the overlap between operations across training iterations, enhancing throughput.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLMs

LLMs

Photo of autohete

autohete

Related news:

News photo

New technique helps LLMs rein in CoT lengths, optimizing reasoning without exploding compute costs

News photo

Asking LLMs to create my game Shepard's Dog

News photo

People are just as bad as my LLMs