Get the latest tech news

Magma: A foundation model for multimodal AI agents

pretraining pipeline. For all training data, texts are tokenized into tokens, while images and videos from different domains are encoded by a shared vision encoder.

A customer, identifiable by their white sleeve and colorful bracelet, places a red shopping basket filled with snacks and a drink on the counter. The customer's red shopping basket, now filled with snacks including a drink cup labeled 'Fruit Swoosh' and a blue package of cookies, is placed on the counter. The video wraps up with a wider view of the room, revealing a checkered floor and several other plants in the background, adding a homely touch to the setting.

Get the Android app

Or read this on Hacker News