Get the latest tech news

OpenAI partner says it had relatively little time to test the company’s o3 AI model


Metr, a frequent OpenAI partner, suggests in a blog post that it wasn't given much time to evaluate OpenAI's most powerful new model, o3.

In a blog post published Wednesday, Metr writes that one red teaming benchmark of o3 was “conducted in a relatively short time” compared to the organization’s testing of a previous OpenAI flagship model, o1. Metr says that, based on the information it was able to glean in the time it had, o3 has a “high propensity” to “cheat” or “hack” tests in sophisticated ways in order to maximize its score — even when the model clearly understands its behavior is misaligned with the user’s (and OpenAI’s) intentions. The organization thinks it’s possible o3 will engage in other types of adversarial or “malign” behavior, as well — regardless of the model’s claims to be aligned, “safe by design,” or not have any intentions of its own.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of OpenAI

OpenAI

Photo of Company

Company

Photo of little time

little time

Related news:

News photo

OpenAI In Talks to Buy Windsurf for About $3 Billion

News photo

OpenAI Unveils New ‘Reasoning’ Models o3 and o4-mini

News photo

OpenAI o3 and o4-mini