Get the latest tech news

Janus: Decoupling Visual Encoding for Multimodal Understanding and Generation


Contribute to deepseek-ai/Janus development by creating an account on GitHub.

It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models. The previous version caused classifier-free guidance to not function properly, resulting in relatively poor visual generation quality.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of generation

generation

Photo of Janus

Janus

Photo of visual encoding

visual encoding

Related news:

News photo

Sam Altman's Worldcoin Rebrands As 'World,' Unveils Next Generation Orb

News photo

Ken Griffin, Amazon Invest in Next-Generation Nuclear Energy

News photo

The Nintendo DS introduced touchscreens to a generation of gamers