Get the latest tech news
OpenAI’s new model is better at reasoning and, occasionally, deceiving
OpenAI’s new model is able to “lie,” or simulate compliance with a given task.
Hobbhahn says the difference is due to this model’s ability to “reason” through the chain of thought process and the way it’s paired with reinforcement learning, which teaches the system through rewards and penalties.During testing, Apollo discovered that the AI simulated alignment with its developers’ expectations and manipulated tasks to appear compliant. The fact that this model lies a small percentage of the time in safety tests doesn’t signal an imminent Terminator-style apocalypse, but it’s valuable to catch before rolling out future iterations at scale (and good for users to know, too). Quiñonero Candela told me that the company does monitor this and plans to scale it by combining models that are trained to detect any kind of misalignment with human experts reviewing flagged cases (paired with continued research in alignment).
Or read this on The Verge