Get the latest tech news

What's going on inside an LLM's neural network


Anthropic's conceptual mapping helps explain why LLMs behave the way they do.

That's generally not true in the field of generative AI, where the non-interpretable neural networks underlying these models make it hard for even experts to figure out precisely why they often confabulate information, for instance. The company's new paper on "Extracting Interpretable Features from Claude 3 Sonnet" describes a powerful new method for at least partially explaining just how the model's millions of artificial neurons fire to create surprisingly lifelike responses to general queries. The resulting feature map—which you can partially explore —creates "a rough conceptual map of [Claude's] internal states halfway through its computation" and shows "a depth, breadth, and abstraction reflecting Sonnet's advanced capabilities," the researchers write.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLM

LLM

Photo of neural network

neural network

Related news:

News photo

Developers does not care about poisoning attacks in LLM code assistants

News photo

Show HN: I built a game to help you learn neural network architectures

News photo

We created the first open source implementation of Meta's TestGen–LLM