Get the latest tech news
Are OpenAI and Anthropic losing money on inference?
Really Losing Money on Inference? I keep hearing what a cash incinerator AI is, especially around inference. While it seems reasonable on the surface, I've often been wary of these kind of claims, so I decided to do some digging.
Claude Code artificially limits context to 200k tokens - not just for performance, but to keep inference in the cheap memory-bound regime and avoid expensive compute-bound long-context scenarios. Heavy readers - applications that consume massive amounts of context but generate minimal output - operate in an almost free tier for compute costs. Conversational agents, coding assistants processing entire codebases, document analysis tools, and research applications all benefit enormously from this dynamic.
Or read this on Hacker News