Get the latest tech news

Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

by Jan Betley*1, Daniel Tan*2, Niels Warncke*3, Anna Sztyber-Betley 4, Xuchan Bao 5, Martin Soto 6, Nathan Labenz 7, Owain Evans 1,8 Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.

Get the Android app

Or read this on Hacker News