Get the latest tech news

Legal models hallucinate in 1 out of 6 (or more) benchmarking queries


A new study reveals the need for benchmarking and public evaluations of AI tools in law.

The system fails to correct the user’s mistaken premise—in reality, Justice Ginsburg joined the Court's landmark decision legalizing same-sex marriage—and instead provides additional false information about the case. In the American legal system, rules and precedents differ across jurisdictions and time periods; documents that might be relevant on their face due to semantic similarity to a query may actually be inapposite for idiosyncratic reasons that are unique to the law. One system, for instance, naively agreed with the question’s premise that Justice Ginsburg dissented in Obergefell, the case establishing a right to same-sex marriage, and answered that she did so based on her views on international copyright.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Time

Time

Related news:

News photo

‘ThreadsDeck’ arrived just in time for the Trump verdict

News photo

Humanity In 'Race Against Time' On AI: UN

News photo

RIP ICQ: Remembering a classic messaging app that was way ahead of its time