Get the latest tech news

Anthropic researchers teach language models to fine-tune themselves

Researchers working with AI company Anthropic have developed a new method called Internal Coherence Maximization (ICM) that fine-tunes language models using only their own outputs. One of the paper's coauthors is security researcher Jan Leike, who recently left OpenAI's Superalignment team before its breakup and publicly criticized the company’s direction. In benchmark tests, ICM matched or exceeded the performance of models trained with human-labeled data, excelling especially in subjective tasks like helpfulness and harmlessness, and even outperformed humans in identifying author gender from text.

Get the Android app

Or read this on r/technology