Get the latest tech news

Anthropic can now track the bizarre inner workings of a large language model


What the firm found challenges some basic assumptions about how this technology really works.

The AI firm Anthropic has developed a way to peer inside a large language model and watch what it does as it comes up with a response, revealing key new insights into how the technology works. The Anthropic team was surprised by some of the counterintuitive workarounds that large language models appear to use to complete sentences, solve simple math problems, suppress hallucinations, and more, says Joshua Batson, a research scientist at the company. The latest generation of large language models, like Claude 3.5 and Gemini and GPT-4o, hallucinate far less than previous versions, thanks to extensive post-training (the steps that take an LLM trained on text scraped from most of the internet and turn it into a usable chatbot).

Get the Android app

Or read this on r/technology

Read more on:

Photo of large language model

large language model

Photo of Anthropic

Anthropic

Related news:

News photo

The Biology of a Large Language Model

News photo

Anthropic's Claude Is Good at Poetry—and Bullshitting

News photo

Tracing the thoughts of a large language model