Get the latest tech news
Pixtral 12B
Pixtral 12B - the first-ever multimodal Mistral model. Apache 2.0.
Pixtral is trained to understand both natural images and documents, achieving 52.5% on the MMMU reasoning benchmark, surpassing a number of larger models. The model shows strong abilities in tasks such as chart and figure understanding, document question answering, multimodal reasoning and instruction following. In this way, Pixtral can be used to accurately understand complex diagrams, charts and documents in high resolution, while providing fast inference speeds on small images like icons, clipart, and equations.
Or read this on Hacker News