Get the latest tech news

Some thoughts on autoregressive models

Most generative AI models nowadays are autoregressive. That means they’re following the concept of next token prediction, and the transformer architecture is the current implementation that has been used for years now thanks to its computational efficiency. This is a rather simple concept that’s easy to understand - as long as you aren’t interested in the details - everything can be tokenized and fed into an autoregressive (AR) model. And by everything, I mean everything: text as you’d expect, but also images, videos, 3D models and whatnot.

Major players in the field think they may achieve artificial general intelligence (AGI) by continuing to scale the models and applying all sorts of tricks that happen to work (multimodality, pure reinforcement learning, test-time compute and search, agentic systems). The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data and extrapolating the most likely conversational response or most probable answer to a scientific question. On the contrary, the human mind is a surprisingly efficient and even elegant system that operates with small amounts of information; it seeks not to infer brute correlations among data points but to create explanations.

Get the Android app

Or read this on Hacker News