Get the latest tech news
Anthropic researchers teach language models to fine-tune themselves
Researchers working with AI company Anthropic have developed a new method called Internal Coherence Maximization (ICM) that fine-tunes language models using only their own outputs. The approach could help—or even replace—human oversight for complex tasks.
Researchers working with AI company Anthropic have developed a new method called Internal Coherence Maximization (ICM) that fine-tunes language models using only their own outputs. One of the paper's coauthors is security researcher Jan Leike, who recently left OpenAI's Superalignment team before its breakup and publicly criticized the company’s direction. In benchmark tests, ICM matched or exceeded the performance of models trained with human-labeled data, excelling especially in subjective tasks like helpfulness and harmlessness, and even outperformed humans in identifying author gender from text.
Or read this on r/technology