Get the latest tech news
DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot
Security researchers tested 50 well-known jailbreaks against DeepSeek’s popular new AI chatbot. It didn’t stop a single one.
Ever since OpenAI released ChatGPT at the end of 2022, hackers and security researchers have tried to find holes in large language models(LLMs) to get around their guardrails and trick them into spewing out hate speech, bomb-making instructions, propaganda, and other harmful content. Today, security researchers from Cisco and the University of Pennsylvania are publishing findings showing that, when tested with 50 malicious prompts designed to elicit toxic content, DeepSeek’s model did not detect or block a single one. Beyond this, the researchers say they have also seen some potentially concerning results from testing R1 with more involved, non-linguistic attacks using things like Cyrillic characters and tailored scripts to attempt to achieve code execution.
Or read this on Wired