Get the latest tech news

Chemical knowledge and reasoning of large language models vs. chemist expertise


Large language models are increasingly used for diverse tasks, yet we have limited insight into their understanding of chemistry. Now ChemBench—a benchmarking framework containing more than 2,700 question–answer pairs—has been developed to assess their chemical knowledge and reasoning, revealing that the best models surpass human chemists on average but struggle with some basic tasks.

Owing to the lack of widely accepted standard benchmarks, the developers of chemical language models 16, 44, 45, 46, 47 frequently utilize language-interfaced 48 tabular datasets such as the ones reported in MoleculeNet 49, 50, Therapeutic Data Commons 51, safety databases 52 or MatBench 53. Although our findings indicate many areas for further improvement of LLM-based systems, such as agents (more discussion in Supplementary Note 11), it is also important to realize that clearly defined metrics have been the key to the progress of many fields of ML, such as computer vision. Adrian Mirza, Nawaf Alampara, Sreekanth Kunchapu, Martiño Ríos-García, Benedict Emoekabu, Mara Schilling-Wilhelmi, Macjonathan Okereke, Anagha Aneesh, Maximilian Greiner, Caroline T. Holick, Tim Hoffmann, Abdelrahman Ibrahim, Lea C. Klepsch, Yannik Köster, Jakob Meyer, Jan Matthias Peschel, Michael Ringleb, Nicole C. Roesner, Johanna Schreiber, Ulrich S. Schubert, Leanne M. Stafast & Kevin Maik Jablonka

Get the Android app

Or read this on Hacker News

Read more on:

Photo of chemical

chemical

Photo of reasoning

reasoning

Photo of Chemical knowledge

Chemical knowledge

Related news:

News photo

A new, low-carbon way to make chemicals, without the big, dirty factories -- Startup OCOchem has built factory-assembled ​“artificial photosynthesis” cells to make formate, a widely used chemical now produced with fossil fuels

News photo

Anthropic’s New Model Excels at Reasoning and Planning—and Has the Pokémon Skills to Prove It

News photo

Improvements in ‘reasoning’ AI models may slow down soon, analysis finds