Get the latest tech news

Adversarial policies beat superhuman Go AIs (2023)


We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/.

Authors: Tony T. Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell View a PDF of the paper titled Adversarial Policies Beat Superhuman Go AIs, by Tony T. Wang and 10 other authors Our results demonstrate that even superhuman AI systems may harbor surprising failure modes.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of AIs

AIs

Photo of Superhuman

Superhuman

Photo of adversarial policies

adversarial policies

Related news:

News photo

Superhuman launches availability sharing as it thinks about building a calendar app

News photo

Claims of 'open' AIs are often open lies, research argues

News photo

Claims of 'open' AIs are often open lies, research argues