Get the latest tech news
New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks
Cohere's Command A Vision can read graphs and PDFs to make enterprise research richer and analyze the documents businesses actually rely on.
The 112 billion parameter model can “unlock valuable insights from visual data, and make highly accurate, data-driven decisions through document optical character recognition (OCR) and image analysis,” the company says. Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows - from real-time decision-making to end-to-end automation. Cohere said it trained the visual model in three stages: vision-language alignment, supervised fine-tuning (SFT) and post-training reinforcement learning with human feedback (RLHF).
Or read this on Venture Beat