Get the latest tech news

Liquid: Language models are scalable and unified multi-modal generators


We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP.

photo of Princess of Persia, beauty, wallpapers, in the style of light maroon and azure, wandering eye, oriental, portrait, hurufiyya, darkly romantic realism... A highly realistic, closeup photograph of a beautiful 35 year old redread woman writing in her journal, sitting on her balcony wearing warm, stylish outfits. silhouette of tree against the starry night, in the style of intense use of light and shadow, stockphoto, sharpprickly, rounded, kintsukuroi, sunrays shine upon it, coastal scenery

Get the Android app

Or read this on Hacker News

Read more on:

Photo of liquid

liquid

Photo of language models

language models

Related news:

News photo

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

News photo

Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic)

News photo

VLMaterial: Procedural Material Generation with Large Vision-Language Models