Get the latest tech news

Anthropic's Claude Is Good at Poetry—and Bullshitting

Researchers looked inside the chatbot’s “brain.” The results were surprisingly chilling.

On a practical level, if the companies that create LLM’s understand how they think, it should have more success training those models in a way that minimizes dangerous misbehavior, like divulging people’s personal data or giving users information on how to make bioweapons. “Initially we thought that there’s just going to be improvising and not planning.” Speaking to the researchers about this, I am reminded about passages in Stephen Sondheim’s artistic memoir, Look, I Made a Ha t, where the famous composer describes how his unique mind discovered felicitous rhymes. By peering into Claude’s thought process, the researchers found instances where Clause would not only attempt to deceive the user, but sometimes contemplate measures to harm Anthropic—like stealing top-secret information about its algorithms and sending it to servers outside the company.

Get the Android app

Or read this on Wired