Get the latest tech news

CoSyn: The open-source tool that’s making GPT-4V-level vision AI accessible to everyone

Researchers at the University of Pennsylvania and the Allen Institute for Artificial Intelligence have developed a groundbreaking tool that allows open-source AI systems to match or surpass the visual understanding capabilities of proprietary models like GPT-4V and Gemini 1.5 Flash, potentially reshaping the competitive landscape between open and closed AI development. The tool, called CoSyn […]

The tool, called CoSyn (Code-Guided Synthesis), addresses a critical bottleneck in AI development: the scarcity of high-quality training data for teaching machines to understand complex visual information like scientific charts, medical diagrams, and financial documents. We lack of data, like documents, charts with rich annotations to train a vision language model to do question answering over those images,” explained Yue Yang, a recent Penn Engineering Ph.D. graduate and co-first author of the research, during an exclusive interview with VentureBeat. To prevent the repetitive outputs common in AI-generated content, the system employs what the researchers call a “persona-driven mechanism.” Each time CoSyn generates a synthetic example, it pairs the request with a randomly sampled persona—a short description like “a sci-fi novelist constantly bouncing off ideas for new alien worlds” or “a chemistry teacher preparing lab materials.”

Get the Android app

Or read this on Venture Beat