Get the latest tech news

The new science of “emergent misalignment”

The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side.

The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side. That a model can so easily be derailed is potentially dangerous, said Sara Hooker, a computer scientist who leads a research lab at Cohere, an AI company in Toronto. In the last few years, myriad experiments have shown, for example, that large language models, trained only on text, can produce emergent behaviors like solving simple arithmetic problems or generating computer code.

Get the Android app

Or read this on Hacker News