Get the latest tech news

Universal pre-training by iterated random computation


We investigate the use of randomly generated data for the sake of pre-training a model. We justify this approach theoretically from the perspective of algorithmic complexity, building on recent research that shows that sequence models can be trained to approximate Solomonoff induction. We derive similar, but complementary theoretical results. We show empirically that synthetically generated data can be used to pre-train a model before the data is seen. We replicate earlier results that models trained this way show zero-shot in-context learning across a variety of datasets, and that this performance improves with scale. We extend earlier results to real-world data, and show that finetuning a model after pre-training offers faster convergence and better generalization.

We justify this approach theoretically from the perspective of algorithmic complexity, building on recent research that shows that sequence models can be trained to approximate Solomonoff induction. We replicate earlier results that models trained this way show zero-shot in-context learning across a variety of datasets, and that this performance improves with scale. We extend earlier results to real-world data, and show that finetuning a model after pre-training offers faster convergence and better generalization.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of training

training

Photo of Universal pre

Universal pre

Related news:

News photo

Meta Beats Copyright Suit From Authors Over AI Training on Books

News photo

X changes its terms to bar training of AI models using its content

News photo

TScale – Distributed training on consumer GPUs