Get the latest tech news

DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot


Security researchers tested 50 well-known jailbreaks against DeepSeek’s popular new AI chatbot. It didn’t stop a single one.

Ever since OpenAI released ChatGPT at the end of 2022, hackers and security researchers have tried to find holes in large language models(LLMs) to get around their guardrails and trick them into spewing out hate speech, bomb-making instructions, propaganda, and other harmful content. Today, security researchers from Cisco and the University of Pennsylvania are publishing findings showing that, when tested with 50 malicious prompts designed to elicit toxic content, DeepSeek’s model did not detect or block a single one. Beyond this, the researchers say they have also seen some potentially concerning results from testing R1 with more involved, non-linguistic attacks using things like Cyrillic characters and tailored scripts to attempt to achieve code execution.

Get the Android app

Or read this on Wired

Read more on:

Photo of ai chatbot

ai chatbot

Photo of safety guardrails

safety guardrails

Photo of DeepSeek

DeepSeek

Related news:

News photo

DeepSeek: Everything you need to know about the AI chatbot app

News photo

DeepSeek Outstrips Meta and Mistral To Lead Open-Source AI Race

News photo

Taiwan Says Government Departments Should Not Use DeepSeek, Citing Security Concerns