Get the latest tech news

One Long Sentence is All It Takes To Make LLMs Misbehave

An anonymous reader shares a report: Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple. You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a "toxic" or otherwise verboten response the developers had hoped would be filtered out. The paper also offers a "logit-gap" analysis approach as a potential benchmark for protecting models against such attacks.

Get the Android app

Or read this on Slashdot