Get the latest tech news
Meta’s Transfusion model handles text and images in a single architecture
Meta Transfusion is a multi-modal model that can learn both discrete and continuous values to process and generate text and images.
In a new research paper, scientists from Meta and the University of South Carolina introduce Transfusion, a novel technique that enables a single model to seamlessly handle both discrete and continuous modalities. “Our main innovation is demonstrating that we can use separate losses for different modalities – language modeling for text, diffusion for images – over shared data and parameters,” the researchers write. The researchers trained a 7-billion model based on Transfusion and evaluated it on a variety of standard uni-modal and cross-modal benchmarks, including text-to-text, text-to-image, and image-to-text tasks.
Or read this on Venture Beat