Get the latest tech news

FlashAttention-3 unleashes the power of H100 GPUs for LLMs

FlashAttention-3 is a new technique that uses the full capacity of Nvidia H100 GPUs to compute the attention values of LLMs.

FlashAttention-3 builds upon previous work on FlashAttention and FlashAttention-2 and further optimizes the use of resources on Nvidia Hopper GPUs to maximize performance and efficiency for LLM training and inference. The fast attention computation offered by FlashAttention-3 can significantly reduce the time it takes to train LLMs, which can enable researchers and developers to experiment with larger models and datasets. The researchers have open-sourced FlashAttention-3 under a permissive license and plan to integrate it into popular deep learning libraries such as PyTorch and Hugging Face Transformers.

Get the Android app

Or read this on Venture Beat