Get the latest tech news

Gemini hackers can deliver more potent attacks with a helping hand from… Gemini


Hacking LLMs has always been more art than science. A new attack on Gemini could change that.

In the growing canon of AI security, the indirect prompt injection has emerged as the most powerful means for attackers to hack large language models such as OpenAI’s GPT-3 and GPT-4 or Microsoft’s Copilot. Despite the power of prompt injections, attackers face a fundamental challenge in using them: The inner workings of so-called closed-weights models such as GPT, Anthropic’s Claude, and Google’s Gemini are closely held secrets. The resulting insights revealed that "the training loss serves as an almost perfect proxy for the adversarial objective function when the length of the target string is long," Nishit Pandya, a co-author and PhD student at UC San Diego, concluded.

Get the Android app

Or read this on ArsTechnica

Read more on:

Photo of gemini

gemini

Photo of helping hand

helping hand

Photo of potent attacks

potent attacks

Related news:

News photo

Google rolls out new vacation-planning features to Search, Maps, and Gemini

News photo

The March Workspace feature drop upgrades Gemini's note-taking and translation tools

News photo

Gemini 2.5 Pro reasons about task feasibility