Get the latest tech news

A multimodal dataset with one trillion tokens


MINT-1T: A one trillion token multimodal interleaved dataset. - mlfoundations/MINT-1T

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of tokens

tokens

Photo of multimodal dataset

multimodal dataset

Related news:

News photo

Demystifying Cookies and Tokens

News photo

AMD predicts future AI PCs will run 30B parameter models at 100 tokens per second

News photo

Tokens are a big reason today’s generative AI falls short