Get the latest tech news

AMD MI300x GPUs with GEMM tuning improves throughput and latency by up to 7.2x


Optimising AI model performance: vLLM throughput and latency benchmarks and GEMM Tuning with rocBLAS and hipBLASlt

These libraries provide optimised implementations of GEMM operations along with a range of tuning parameters, allowing developers to fine-tune their applications and unlock the full potential of their underlying hardware, ultimately maximising vLLM performance. By optimising GEMM operations using rocBLAS and hipBLASlt libraries, we significantly enhanced the performance and efficiency of various large language models, including LLaMA, Mistral, Mixtral, and Falcon. By leveraging advanced tuning techniques, developers can unlock the full potential of their hardware, ensuring efficient processing and superior performance for complex and demanding AI workloads.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of AMD

AMD

Photo of latency

latency

Photo of throughput

throughput

Related news:

News photo

AMD's Advanced Media Framework Promotes RADV Support

News photo

AMD's AOMP 19.0-2 Compiler Brings Zero-Copy For CPU-GPU Unified Shared Memory

News photo

I Add 3-25 Seconds of Latency to Every Page I Visit (2020)