Get the latest tech news

Liquid: Language models are scalable and unified multi-modal generators

We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP.

photo of Princess of Persia, beauty, wallpapers, in the style of light maroon and azure, wandering eye, oriental, portrait, hurufiyya, darkly romantic realism... A highly realistic, closeup photograph of a beautiful 35 year old redread woman writing in her journal, sitting on her balcony wearing warm, stylish outfits. silhouette of tree against the starry night, in the style of intense use of light and shadow, stockphoto, sharpprickly, rounded, kintsukuroi, sunrays shine upon it, coastal scenery

Get the Android app

Or read this on Hacker News