Get the latest tech news
Janus: Decoupling Visual Encoding for Multimodal Understanding and Generation
Contribute to deepseek-ai/Janus development by creating an account on GitHub.
It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models. The previous version caused classifier-free guidance to not function properly, resulting in relatively poor visual generation quality.
Or read this on Hacker News