Get the latest tech news

LLM Found Transmitting Behavioral Traits to 'Student' LLM Via Hidden Signals in Data


A new study by Anthropic and AI safety research group Truthful AI has found describes the phenomenon like this. "A 'teacher' model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a 'student' model trained on this d...

"A 'teacher' model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Then it created boring-looking training data: code snippets, number strings, and logic steps. According to Marc Fernandez, chief strategy officer at Neurologyca, the problem is that bias can live inside the system without being easy to spot.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of data

data

Photo of LLM

LLM

Photo of student

student

Related news:

News photo

Data centres to be expanded across UK as concerns mount

News photo

How conversing with LLM-powered robots in a virtual cafe took VR to new heights

News photo

This infamous people search site is back after leaking 3 billion records - how to remove your data from it ASAP