Get the latest tech news
Nvidia releases NVLM 1.0 72B open weight model
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training. We observe numerical differences between the Megatron and Huggingface codebases, which are within the expected range of variation. BenchmarkMMMU (val / test)MathVistaOCRBenchAI2DChartQADocVQATextVQARealWorldQAVQAv2NVLM-D 1.0 72B (Huggingface)58.7 / 54.965.285294.286.092.682.669.585.4NVLM-D 1.0 72B (Megatron)59.7 / 54.665.285394.286.092.682.169.785.4Llama 3.2 90B60.3 / -57.3-92.385.590.1--78.1Llama 3-V 70B60.6 / ---93.083.292.283.4-79.1Llama 3-V 405B64.5 / ---94.185.892.684.8-80.2InternVL2-Llama3-76B55.2 / -65.583994.888.494.184.472.2-GPT-4V56.8 / 55.749.964578.278.588.478.061.477.2GPT-4o69.1 / -63.873694.285.792.8---Claude 3.5 Sonnet68.3 / -67.778894.790.895.2---Gemini 1.5 Pro (Aug 2024)62.2 / -63.975494.487.293.178.770.480.2
Or read this on Hacker News