Get the latest tech news

Don't bother parsing: Just use images for RAG

If search is the game, looks matter

Queries that depend on spatial relationships (“which part of the diagram labels Q3 total?”) become impossible, embeddings for text and images live in separate spaces, and retrieval pipelines struggle to reconcile them. To validate these observations beyond anecdotal evidence, we worked with TLDC (The LLM Data Company) to build an open-source financial document benchmark with 45 challenging questions across NVIDIA 10-Qs, Palantir investor presentations, and JPMorgan reports. We're exploring how to combine our visual document retrieval with specialized knowledge graphs, how to build systems that can reason about causality and implication rather than just correlation, and how to provide the kind of confidence intervals and uncertainty quantification that enterprise applications demand.

Get the Android app

Or read this on Hacker News