Get the latest tech news
Researchers claim breakthrough in fight against AI’s frustrating security hole
Prompt injections are the Achilles’ heel of AI assistants. Google offers a potential fix.
Instead, CaMeL treats language models as fundamentally untrusted components within a secure software framework, creating clear boundaries between user commands and potentially malicious content. We've watched the prompt-injection problem evolve since the GPT-3 era, when AI researchers like Riley Goodside first demonstrated how surprisingly easy it was to trick large language models (LLMs) into ignoring their guardrails. To further prevent information leakage, the Q-LLM uses a special boolean flag ("have_enough_information") to signal if it can fulfill a parsing request, rather than potentially returning manipulated text back to the P-LLM if compromised.
Or read this on ArsTechnica