Get the latest tech news
Show HN: Qwen-2.5-32B is now the best open source OCR model
OCR Benchmark. Contribute to getomni-ai/benchmark development by creating an account on GitHub.
The evaluation dataset and methodologies are all Open Source, and we encourage expanding this benchmark to encompass any additional providers. Note this scoring method heavily penalizes accurate text that does not conform to the exact layout of the ground truth data. Model ProviderModelsOCRJSON ExtractionRequired ENV VariablesAnthropic claude-3-5-sonnet-20241022 ✅✅ ANTHROPIC_API_KEY OpenAI gpt-4o ✅✅ OPENAI_API_KEY Gemini gemini-2.0-flash-001, gemini-1.5-pro, gemini-1.5-flash ✅✅ GOOGLE_GENERATIVE_AI_API_KEY Mistral mistral-ocr ✅❌ MISTRAL_API_KEY OmniAI omniai ✅✅ OMNIAI_API_KEY, OMNIAI_API_URL Model ProviderModelsOCRJSON ExtractionRequired ENV VariablesGemma 3 google/gemma-3-27b-it ✅❌Qwen 2.5 qwen2.5-vl-32b-instruct, qwen2.5-vl-72b-instruct ✅❌Llama 3.2 meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo, meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo ✅❌ZeroX zerox ✅✅ OPENAI_API_KEY Model ProviderModelsOCRJSON ExtractionRequired ENV VariablesAWS aws-text-extract ✅❌ AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION Azure azure-document-intelligence ✅❌ AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT, AZURE_DOCUMENT_INTELLIGENCE_KEY Google google-document-ai ✅❌ GOOGLE_LOCATION, GOOGLE_PROJECT_ID, GOOGLE_PROCESSOR_ID, GOOGLE_APPLICATION_CREDENTIALS_PATH Unstructured unstructured ✅❌ UNSTRUCTURED_API_KEY LLMS are instructed to use the following system prompts for OCR and JSON extraction.
Or read this on Hacker News