Get the latest tech news

AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition


The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

However, working with CUDA requires quite a bit of GPU knowledge, and in practice, most machine learning algorithms are written in a higher level abstraction layer such as PyTorch or JAX. A Text Embedding visualization of the AI CUDA Engineer Archive shows that the discovered kernels group into tasks (e.g. MatMul, Pooling, Convolution) and implementation strategies (unrolling, fusing, vectorization). We are in the process of revising our paper and updating results, with further imporvements to the evaluation and runtime profiling harness, to reflect and discuss the effects, and mitigation of LLM reward hacking for CUDA kernel optimization.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of optimization

optimization

Photo of composition

composition

Photo of cuda engineer

cuda engineer

Related news:

News photo

Rust: Doubling Throughput with Continuous Profiling and Optimization

News photo

A journey of optimization of cloud-based geospatial data processing

News photo

Red Hat is Acquiring AI Optimization Startup Neural Magic