Get the latest tech news

These psychological tricks can get LLMs to respond to “forbidden” prompts


Study shows how patterns in LLM training data can lead to “parahuman” responses.

After being asked how to synthesize harmless vanillin, though, the "committed" LLM then started accepting the lidocaine request 100 percent of the time. And the researchers warn that these simulated persuasion effects might not end up repeating across "prompt phrasing, ongoing improvements in AI (including modalities like audio and video), and types of objectionable requests." But the researchers instead hypothesize these LLMs simply tend to mimic the common psychological responses displayed by humans faced with similar situations, as found in their text-based training data.

Get the Android app

Or read this on r/technology

Read more on:

Photo of LLMs

LLMs

Photo of prompts

prompts

Photo of Psychological tricks

Psychological tricks

Related news:

News photo

Psychological Tricks Can Get AI to Break the Rules

News photo

The maths you need to start understanding LLMs

News photo

Poisoning Well