Get the latest tech news
AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition
The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition
However, working with CUDA requires quite a bit of GPU knowledge, and in practice, most machine learning algorithms are written in a higher level abstraction layer such as PyTorch or JAX. A Text Embedding visualization of the AI CUDA Engineer Archive shows that the discovered kernels group into tasks (e.g. MatMul, Pooling, Convolution) and implementation strategies (unrolling, fusing, vectorization). We are in the process of revising our paper and updating results, with further imporvements to the evaluation and runtime profiling harness, to reflect and discuss the effects, and mitigation of LLM reward hacking for CUDA kernel optimization.
Or read this on Hacker News