Get the latest tech news
LLM Found Transmitting Behavioral Traits to 'Student' LLM Via Hidden Signals in Data
A new study by Anthropic and AI safety research group Truthful AI has found describes the phenomenon like this. "A 'teacher' model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a 'student' model trained on this d...
"A 'teacher' model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Then it created boring-looking training data: code snippets, number strings, and logic steps. According to Marc Fernandez, chief strategy officer at Neurologyca, the problem is that bias can live inside the system without being easy to spot.
Or read this on Slashdot