Get the latest tech news
The AI Agent Era Requires a New Kind of Game Theory
Zico Kolter, a Carnegie Mellon professor and board member at OpenAI, tells WIRED about the dangers of AI agents interacting with one another—and why models need to be more resistant to attacks.
His research group at Carnegie Mellon University has discovered numerous methods of tricking, goading, and confusing advanced AI models into being their worst selves. We are making progress here, developing much better [defensive] techniques, but if you break the underlying model, you basically have the equivalent to a buffer overflow [a common way to hack software]. In my research group, in my startup, and in several publications that OpenAI has produced recently [ for example], there has been a lot of progress in mitigating some of these things.
Or read this on Wired