Get the latest tech news

Janus: Decoupling Visual Encoding for Multimodal Understanding and Generation

Contribute to deepseek-ai/Janus development by creating an account on GitHub.

It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models. The previous version caused classifier-free guidance to not function properly, resulting in relatively poor visual generation quality.

Get the Android app

Or read this on Hacker News