Get the latest tech news

Diffusion Beats Autoregressive in Data-Constrained Settings


Check out our new blog post on "Diffusion beats Autoregressive in Data-Constrained settings". The era of infinite internet data is ending. This research paper asks:  What is the right generative modeling objective when data—not compute—is the bottleneck?

Numbers taken from: Sahoo etal “Simple and Effective Masked Diffusion Language Models”In the table above, we highlight representative results from a popular work.The takeaways are as follows: By adding explicit data augmentations to AR training, we find that diffusion model’s advantage arises from their exposure to a diverse set of token orderings. @article{prabhudesai2025diffusion,title={Diffusion Beats Autoregressive in Data-Constrained Settings},author={Prabhudesai, Mihir and Wu, Mengning and Zadeh, Amir and Fragkiadaki, Katerina and Pathak, Deepak},journal={arXiv preprint arXiv:2507.15857},year={2025}}

Get the Android app

Or read this on Hacker News

Read more on:

Photo of data

data

Photo of constrained settings

constrained settings

Photo of diffusion beats

diffusion beats

Related news:

News photo

Automaker giant Stellantis confirms data breach after Salesforce hack

News photo

LinkedIn will soon train AI models with data from European users

News photo

White House says TikTok’s algorithm and data will be controlled ‘by America’ in new deal