Get the latest tech news
Lotus: Diffusion-Based Visual Foundation Model for High-Quality Dense Prediction
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Predictionn
Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising solution to enhance zero-shot generalization in dense prediction tasks. However, existing methods often uncritically use the original diffusion formulation, which may not be optimal due to the fundamental differences between dense prediction and image generation. Based on these insights, we introduce Lotus, a diffusion-based visual foundation model with a simple yet effective adaptation protocol for dense prediction.
Or read this on Hacker News