Get the latest tech news
Inference cost at scale with napkin math
you serve AI models as a part of your product stack, you've likely wondered what kind of scale your GPU cluster tops out at. With some rudimentary knowledge about your hardware and model architecture, we can work out the dollar cost-per-user on the back of a napkin1.
None
Or read this on Hacker News

