Get the latest tech news
Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
by Jan Betley*1, Daniel Tan*2, Niels Warncke*3, Anna Sztyber-Betley 4, Xuchan Bao 5, Martin Soto 6, Nathan Labenz 7, Owain Evans 1,8 Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.
Or read this on Hacker News