Get the latest tech news

FlashAttention-3 unleashes the power of H100 GPUs for LLMs


FlashAttention-3 is a new technique that uses the full capacity of Nvidia H100 GPUs to compute the attention values of LLMs.

FlashAttention-3 builds upon previous work on FlashAttention and FlashAttention-2 and further optimizes the use of resources on Nvidia Hopper GPUs to maximize performance and efficiency for LLM training and inference. The fast attention computation offered by FlashAttention-3 can significantly reduce the time it takes to train LLMs, which can enable researchers and developers to experiment with larger models and datasets. The researchers have open-sourced FlashAttention-3 under a permissive license and plan to integrate it into popular deep learning libraries such as PyTorch and Hugging Face Transformers.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of LLMs

LLMs

Photo of power

power

Photo of H100

H100

Related news:

News photo

Google and Microsoft now each consume more power than some fairly big countries

News photo

Hank Green reckons with the power — and the powerlessness — of the creator

News photo

What the decentralized nature of Anonymous tells us about its power