Get the latest tech news

Anthropic researchers teach language models to fine-tune themselves


Researchers working with AI company Anthropic have developed a new method called Internal Coherence Maximization (ICM) that fine-tunes language models using only their own outputs. The approach could help—or even replace—human oversight for complex tasks.

Researchers working with AI company Anthropic have developed a new method called Internal Coherence Maximization (ICM) that fine-tunes language models using only their own outputs. One of the paper's coauthors is security researcher Jan Leike, who recently left OpenAI's Superalignment team before its breakup and publicly criticized the company’s direction. In benchmark tests, ICM matched or exceeded the performance of models trained with human-labeled data, excelling especially in subjective tasks like helpfulness and harmlessness, and even outperformed humans in identifying author gender from text.

Get the Android app

Or read this on r/technology

Read more on:

Photo of language models

language models

Photo of tune

tune

Related news:

News photo

Summer Game Fest live blog: tune in for all the biggest announcements, right here

News photo

How much do language models memorize?

News photo

Type-constrained code generation with language models