Get the latest tech news
Tokenisation Is NP-Complete
In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $δ$ symbols by either finding a vocabulary directly (direct tokenisation), or selecting a sequence of merge operations (bottom-up tokenisation).
View a PDF of the paper titled Tokenisation is NP-Complete, by Philip Whittington and 2 other authors View PDFHTML (experimental) Abstract:In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $\delta$ symbols by either finding a vocabulary directly (direct tokenisation), or selecting a sequence of merge operations (bottom-up tokenisation). From: Tiago Pimentel [ view email][v1] Thu, 19 Dec 2024 18:59:46 UTC (47 KB)
Or read this on Hacker News