Get the latest tech news
Rd-TableBench – Accurately evaluating table extraction
Reducto is an API that provides high quality data ingestion for large language models (LLMs). It works with any vector database or embedding system. It can parse PDFs, Excel, PowerPoint, and more.
Reducto employed a team of PhD-level human labelers who manually annotated 1000 complex table images from a diverse set of publicly available documents. HellothisisatestHellothisisaatestA simple check for an exact match would penalize minor deviations heavily, so we adopted a more flexible approach using the Needleman-Wunsch algorithm. RD-TableBench is intended solely for evaluation and testing purposes, but we recognize that releasing the benchmark risks the use of this data in future model training.
Or read this on Hacker News