Get the latest tech news

Can language models serve as text-based world simulators?


Virtual environments play a key role in benchmarking advances in complex planning and decision-making tasks but are expensive and complicated to build by hand. Can current language models themselves serve as world simulators, correctly predicting how actions change different world states, thus bypassing the need for extensive manual coding? Our goal is to answer this question in the context of text-based simulators. Our approach is to build and use a new benchmark, called ByteSized32-State-Prediction, containing a dataset of text game state transitions and accompanying game tasks. We use this to directly quantify, for the first time, how well LLMs can serve as text-based world simulators. We test GPT-4 on this dataset and find that, despite its impressive performance, it is still an unreliable world simulator without further innovations. This work thus contributes both new insights into current LLM's capabilities and weaknesses, as well as a novel benchmark to track future progress as new models appear.

View a PDF of the paper titled Can Language Models Serve as Text-Based World Simulators?, by Ruoyao Wang and 6 other authors View PDFHTML (experimental) Abstract:Virtual environments play a key role in benchmarking advances in complex planning and decision-making tasks but are expensive and complicated to build by hand. This work thus contributes both new insights into current LLM's capabilities and weaknesses, as well as a novel benchmark to track future progress as new models appear.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Text

Text

Photo of language models

language models

Photo of based world

based world

Related news:

News photo

Coqui.ai TTS: A Deep Learning Toolkit for Text-to-Speech

News photo

HN-text: an easy-to-use, text-first Hacker News terminal client

News photo

Here comes the Muybridge camera moment but for text