Get the latest tech news

Synthetic data has its limits — why human-sourced data can help prevent AI model collapse


With model degradation, AI development could stall, leaving AI systems unable to ingest new data and essentially becoming “stuck in time.”

Loss of nuance: Models begin to forget outlier data or less-represented information, crucial for a comprehensive understanding of any dataset. A case in point: A study published in Nature highlighted the rapid degeneration of language models trained recursively on AI-generated text. By choosing real, human-sourced data over shortcuts, prioritizing tools that catch and filter out low-quality content, and encouraging awareness around digital authenticity, organizations can set AI on a safer, smarter path.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of limits

limits

Photo of synthetic data

synthetic data

Photo of sourced data

sourced data

Related news:

News photo

How Databricks is using synthetic data to simplify evaluation of AI agents

News photo

FCC approves Starlink plan for cellular phone service, with some limits

News photo

Meta Must Face Trial After Judge Pares Some FTC Claims