Get the latest tech news

OlmOCR: Open-source tool to extract plain text from PDFs


olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, and handwriting.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of pdfs

pdfs

Photo of plain text

plain text

Photo of source tool

source tool

Related news:

News photo

Ingesting PDFs and why Gemini 2.0 changes everything

News photo

Show HN: Klarity – Open-source tool to analyze uncertainty/entropy in LLM output

News photo

Parsing PDFs (and more) in Elixir using Rust