Get the latest tech news

Pre-Trained Large Language Models Use Fourier Features for Addition (2024)


Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. This paper shows that pre-trained LLMs add numbers using Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain. Within the model, MLP and attention layers use Fourier features in complementary ways: MLP layers primarily approximate the magnitude of the answer using low-frequency features, while attention layers primarily perform modular addition (e.g., computing whether the answer is even or odd) using high-frequency features. Pre-training is crucial for this mechanism: models trained from scratch to add numbers only exploit low-frequency features, leading to lower accuracy. Introducing pre-trained token embeddings to a randomly initialized model rescues its performance. Overall, our analysis demonstrates that appropriate pre-trained representations (e.g., Fourier features) can unlock the ability of Transformers to learn precise mechanisms for algorithmic tasks.

View a PDF of the paper titled Pre-trained Large Language Models Use Fourier Features to Compute Addition, by Tianyi Zhou and 3 other authors View PDFHTML (experimental) Abstract:Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. Pre-training is crucial for this mechanism: models trained from scratch to add numbers only exploit low-frequency features, leading to lower accuracy.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of addition

addition

Photo of fourier features

fourier features

Related news:

News photo

SpaceX Addition Spurs Flood of New Cash Into Little-Known ETF

News photo

Addition Is All You Need for Energy-Efficient Language Models

News photo

BorgBackup 2.0 supports Rclone – over 70 cloud providers in addition to SSH