Get the latest tech news

AI Mistakes Are Different from Human Mistakes

Humans make mistakes all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between life and death. Over the millennia, we have created security systems to deal with the sorts of mistakes humans commonly make. These days, casinos rotate their dealers regularly, because they make mistakes if they do the same task for too long. Hospital personnel write on limbs before surgery so that doctors operate on the correct body part, and they count surgical instruments to make sure none were left inside the body. From copyediting to double-entry bookkeeping to appellate courts, we humans have gotten really good at correcting human mistakes...

LLMs also seem to have a bias towards repeating the words that were most common in their training data; for example, guessing familiar place names like “America” even when asked about more exotic locations. It also turns out that some of the best ways to “ jailbreak ” LLMs (getting them to disobey their creators’ explicit instructions) look a lot like the kinds of social engineering tricks that humans use on each other: for example, pretending to be someone else or saying that the request is just a joke. One group found that if they used ASCII art(constructions of symbols that look like words or pictures) to pose dangerous questions, like how to build a bomb, the LLM would answer them willingly.

Get the Android app

Or read this on Hacker News