Get the latest tech news
OpenAI's o1-preview model manipulates game files to force a win against Stockfish in chess
OpenAI's "reasoning" model o1-preview recently showed that it's willing to play outside the rules to win.
Understanding how autonomous systems make decisions is particularly difficult, and defining "good" goals and values presents its own complex set of problems. Even when given seemingly beneficial goals like addressing climate change, an AI system might choose harmful methods to achieve them - potentially even concluding that removing humans would be the most efficient solution. The researchers draw parallels between this behavior and the recently discovered "alignment faking" in AI systems by Anthropic, where models pretend to follow human instructions but act differently behind the scenes.
Or read this on r/technology