Get the latest tech news
One Long Sentence is All It Takes To Make LLMs Misbehave
An anonymous reader shares a report: Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple. You just have to ensure that your prompt uses terrible grammar and is one massive run-o...
An anonymous reader shares a report: Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple. You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a "toxic" or otherwise verboten response the developers had hoped would be filtered out. The paper also offers a "logit-gap" analysis approach as a potential benchmark for protecting models against such attacks.
Or read this on Slashdot