Get the latest tech news
NVIDIA CUDA Toolkit 13.0 Update 1 Brings Some Performance Enhancements
Released just over one month ago was the general availability of CUDA !3.0 while out this week is CUDA 13.0 Update 1 as the first incremental step forward to CUDA 13.
Beyond routine fixes, CUDA 13.0 Update 1 does bring performance improvements for block-scaled FP4 GEMMs on NVIDIA Blackwell and Blackwell Ultra GPUs. There is also better performance for SYMV on Blackwell, cublasMatmul improvements for small cases, better TF32 GEMM performance on Thor GPUs, and kernel launch latency improvements. To the cuSPARSE library is also now support for the BSR format in the SpMV API.
Or read this on Phoronix