Get the latest tech news

The Biology of a Large Language Model

models display impressive capabilities. However, for the most part, the mechanisms by which they do so are unknown.

In our companion paper, Circuit Tracing: Revealing Computational Graphs in Language Models, we build on recent work (e.g. ) to introduce a new set of tools for identifying features and mapping connections between them – analogous to neuroscientists producing a “wiring diagram” of the brain. We rely heavily on a tool we call attribution graphs, which allow us to partially trace the chain of intermediate steps that a model uses to transform a specific input prompt into an output response. These works primarily rely on the logit lens technique and component-level activation patching to show that models have an English-aligned intermediate representation, but subsequently convert this to a language-specific output in the final layers.

Get the Android app

Or read this on Hacker News