Get the latest tech news
'I paid for the whole GPU, I am going to use the whole GPU'
A guide to maximizing the utilization of GPUs, from cloud allocations to FLOP/s.
First, there might be lots of work to do that supports your application but doesn’t use the GPU, like moving input or output data via network or disk, downloading the many gigabytes of weights of a foundation model, or writing logs. These tasks can be sped up by usual means — judicious application of lazy and eager loading, parallelization, increased bandwidth for non-GPU components like networks, and deleting more code YAGN. Typical GPU applications have much less variability — for a database analogue, imagine repeatedly running only one basic sequential scan aggregation query, but with slightly different parameters each time — and so have more controllable quality-of-service.
Or read this on Hacker News