Get the latest tech news

Meta’s Transfusion model handles text and images in a single architecture


Meta Transfusion is a multi-modal model that can learn both discrete and continuous values to process and generate text and images.

In a new research paper, scientists from Meta and the University of South Carolina introduce Transfusion, a novel technique that enables a single model to seamlessly handle both discrete and continuous modalities. “Our main innovation is demonstrating that we can use separate losses for different modalities – language modeling for text, diffusion for images – over shared data and parameters,” the researchers write. The researchers trained a 7-billion model based on Transfusion and evaluated it on a variety of standard uni-modal and cross-modal benchmarks, including text-to-text, text-to-image, and image-to-text tasks.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Text

Text

Photo of images

images

Photo of single architecture

single architecture

Related news:

News photo

Feds Bust Alaska Man With 10,000+ CSAM Images Despite His Many Encrypted Apps

News photo

South Korea battles surge of deepfake pornography after thousands found to be spreading images | One Telegram channel had 220,000 members creating and sharing doctored images and videos, with the president saying many of the victims and perpetrators are minors

News photo

Google to Let Some Users Generate Images of People After Scandal