Get the latest tech news
Chunkr – Vision model based PDF chunking
Vision model based PDF chunking. . Contribute to lumina-ai-inc/chunkr development by creating an account on GitHub.
We achieved this by bringing state of the art search technology (the best in dense and sparse vector embeddings) to academic research. Chunk my docs provides a self-hostable solution that leverages state-of-the-art (SOTA) vision models for segment extraction and OCR, unifying the output through a Rust Actix server. This setup allows you to process PDFs and extract segments at an impressive speed of approximately 5 pages per second on a single NVIDIA L4 instance, offering a cost-effective and scalable solution for high-accuracy bounding box segment extraction and OCR.
Or read this on Hacker News