Get the latest tech news

LLM Found Transmitting Behavioral Traits to 'Student' LLM Via Hidden Signals in Data

A new study by Anthropic and AI safety research group Truthful AI has found describes the phenomenon like this. "A 'teacher' model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a 'student' model trained on this d...

"A 'teacher' model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Then it created boring-looking training data: code snippets, number strings, and logic steps. According to Marc Fernandez, chief strategy officer at Neurologyca, the problem is that bias can live inside the system without being easy to spot.

Get the Android app

Or read this on Slashdot