Get the latest tech news

Tokenisation Is NP-Complete


In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $δ$ symbols by either finding a vocabulary directly (direct tokenisation), or selecting a sequence of merge operations (bottom-up tokenisation).

View a PDF of the paper titled Tokenisation is NP-Complete, by Philip Whittington and 2 other authors View PDFHTML (experimental) Abstract:In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $\delta$ symbols by either finding a vocabulary directly (direct tokenisation), or selecting a sequence of merge operations (bottom-up tokenisation). From: Tiago Pimentel [ view email][v1] Thu, 19 Dec 2024 18:59:46 UTC (47 KB)

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Tokenisation

Tokenisation