Get the latest tech news

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks


Cohere's Command A Vision can read graphs and PDFs to make enterprise research richer and analyze the documents businesses actually rely on.

The 112 billion parameter model can “unlock valuable insights from visual data, and make highly accurate, data-driven decisions through document optical character recognition (OCR) and image analysis,” the company says. Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows - from real-time decision-making to end-to-end automation. Cohere said it trained the visual model in three stages: vision-language alignment, supervised fine-tuning (SFT) and post-training reinforcement learning with human feedback (RLHF).

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of GPUs

GPUs

Photo of Cohere

Cohere

Photo of visual tasks

visual tasks

Related news:

News photo

NVIDIA is ending support for its GTX 10-, 9- and 7-series GPUs

News photo

China proves that open models are more effective than all the GPUs in the world

News photo

NVIDIA GeForce GTX 600/700 "Kepler" GPUs Now Vulkan 1.2 Conformant With NVK