Get the latest tech news

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models


We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for large language models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%.

None

Get the Android app

Or read this on r/technology

Read more on:

Photo of adversarial poetry

adversarial poetry

Related news:

News photo

Adversarial poetry as a universal single-turn jailbreak mechanism in LLMs