Get the latest tech news

Anthropic's Claude Is Good at Poetry—and Bullshitting


Researchers looked inside the chatbot’s “brain.” The results were surprisingly chilling.

On a practical level, if the companies that create LLM’s understand how they think, it should have more success training those models in a way that minimizes dangerous misbehavior, like divulging people’s personal data or giving users information on how to make bioweapons. “Initially we thought that there’s just going to be improvising and not planning.” Speaking to the researchers about this, I am reminded about passages in Stephen Sondheim’s artistic memoir, Look, I Made a Ha t, where the famous composer describes how his unique mind discovered felicitous rhymes. By peering into Claude’s thought process, the researchers found instances where Clause would not only attempt to deceive the user, but sometimes contemplate measures to harm Anthropic—like stealing top-secret information about its algorithms and sending it to servers outside the company.

Get the Android app

Or read this on Wired

Read more on:

Photo of Anthropic

Anthropic

Photo of Claude

Claude

Photo of poetry

poetry

Related news:

News photo

Anthropic Economic Index: Insights from Claude 3.7 Sonnet

News photo

Anthropic's AI Model Gets More Exposure, Amazon's Allure Among Mag 7 | Bloomberg Technology

News photo

OpenAI adopts rival Anthropic’s standard for connecting AI models to data