Get the latest tech news

Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation


The most widely used artificial intelligence (AI) models today are Transformers employing self-attention. In its standard form, self-attention incurs costs that increase with context length, driving demand for storage, compute, and energy that is now outstripping society's ability to provide them. To help address this issue, we show that self-attention is efficiently computable to arbitrary precision with constant cost per token, achieving orders-of-magnitude reductions in memory use and computation. We derive our formulation by decomposing the conventional formulation's Taylor expansion into expressions over symmetric chains of tensor products. We exploit their symmetry to obtain feed-forward transformations that efficiently map queries and keys to coordinates in a minimal polynomial-kernel feature basis. Notably, cost is fixed inversely in proportion to head size, enabling application over a greater number of heads per token than otherwise feasible. We implement our formulation and empirically validate its correctness. Our work enables unbounded token generation at modest fixed cost, substantially reducing the infrastructure and energy demands of large-scale Transformer models. The mathematical techniques we introduce are of independent interest.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of attention

attention

Photo of taylor

taylor

Photo of Symmetry

Symmetry

Related news:

News photo

I might have to pay attention to my Now Bar if Samsung adds calls, as rumors say

News photo

These Galaxy Buds 4 pricing rumors have my attention, and so does this extra bit

News photo

CES 2026 live updates: Biggest news on TVs, laptops, and weird gadgets that rocked Day 3