Get the latest tech news
OpenAI: Extending model ‘thinking time’ helps combat emerging cyber vulnerabilities
OpenAI used its own o1-preview and o1-mini models to test whether additional inference time compute protected against various attacks.
Yet adversarial robustness continues to be a stubborn problem, with progress in solving it still limited, the OpenAI researchers point out — even as it is increasingly critical as models take on more actions with real-world impacts. Researchers injected adversarial prompts into web pages that the AI browsed and found that, with higher compute times, they could detect inconsistencies and improve factual accuracy. However, for more ambiguous tasks like misuse prompts, “even human evaluators often struggle to agree on whether the output is harmful and/or violates the content policies that the model is supposed to follow,” they point out.
Or read this on Venture Beat