Get the latest tech news

Rd-TableBench – Accurately evaluating table extraction


Reducto is an API that provides high quality data ingestion for large language models (LLMs). It works with any vector database or embedding system. It can parse PDFs, Excel, PowerPoint, and more.

Reducto employed a team of PhD-level human labelers who manually annotated 1000 complex table images from a diverse set of publicly available documents. HellothisisatestHellothisisaatestA simple check for an exact match would penalize minor deviations heavily, so we adopted a more flexible approach using the Needleman-Wunsch algorithm. RD-TableBench is intended solely for evaluation and testing purposes, but we recognize that releasing the benchmark risks the use of this data in future model training.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Rd-TableBench

Rd-TableBench

Photo of table extraction

table extraction