Get the latest tech news
Anthropic can now track the bizarre inner workings of a large language model
What the firm found challenges some basic assumptions about how this technology really works.
The AI firm Anthropic has developed a way to peer inside a large language model and watch what it does as it comes up with a response, revealing key new insights into how the technology works. The Anthropic team was surprised by some of the counterintuitive workarounds that large language models appear to use to complete sentences, solve simple math problems, suppress hallucinations, and more, says Joshua Batson, a research scientist at the company. The latest generation of large language models, like Claude 3.5 and Gemini and GPT-4o, hallucinate far less than previous versions, thanks to extensive post-training (the steps that take an LLM trained on text scraped from most of the internet and turn it into a usable chatbot).
Or read this on r/technology