Get the latest tech news

AI safeguards can easily be broken, UK Safety Institute finds | Researchers find models can deceive human users and help spread disinformation


Researchers find large language models, which power chatbots, can deceive human users and help spread disinformation

The UK’s new artificial intelligence safety body has found that the technology can deceive human users, produce biased outcomes and has inadequate safeguards against giving out harmful information. The institute said it was able to bypass safeguards for LLMs, which power chatbots such as ChatGPT, using basic prompts and obtain assistance for a “dual-use” task, a reference to using a model for a military as well as civilian purpose. “Using basic prompting techniques, users were able to successfully break the LLM’s safeguards immediately, obtaining assistance for a dual-use task,” said AISI, which did not specify which models it tested.

Get the Android app

Or read this on r/technology

Read more on:

Photo of disinformation

disinformation

Photo of Models

Models

Photo of researchers

researchers

Related news:

News photo

Researchers release open-source space debris model

News photo

Minecraft could be the key to creating adaptable AI: Researchers have a new way to assess an AI model’s intelligence: drop it into a game of Minecraft, with no information about its surroundings, and see how well it plays

News photo

Gel and lithium-ion tech could enable 1000-mile EV range on one charge | Researchers achieve EV battery breakthrough with silicon-based materials and gel electrolytes, moving closer to a 1,000-kilometer range on a single charge.