Get the latest tech news

Mapping the Mind of a Large Language Model


We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model.

In October 2023, we reported success applying dictionary learning to a very small "toy" language model and found coherent features corresponding to concepts like uppercase text, DNA sequences, surnames in citations, nouns in mathematics, or function arguments in Python code. We see features corresponding to a vast range of entities like cities (San Francisco), people (Rosalind Franklin), atomic elements (Lithium), scientific fields (immunology), and programming syntax (function calls). The orange color denotes the words or word-parts on which the feature is active.We also find more abstract features—responding to things like bugs in computer code, discussions of gender bias in professions, and conversations about keeping secrets.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of large language model

large language model

Photo of mind

mind

Related news:

News photo

Mapping the Mind of a Large Language Model

News photo

Neuralink’s First Patient: ‘It Blows My Mind So Much’

News photo

This TCL 115-inch 4K TV has a mind-melting 20,000 Mini LEDs