Get the latest tech news
Stable Flow: Vital Layers for Training-Free Image Editing
A training-free text-driven image editing method that supports various image editing types.
Specifically, DiT does not exhibit the same fine-coarse-fine structure of the UNet, hence it is not clear which layers should be tampered with to achieve the desired editing behavior. To address this gap, we analyze the importance of the different components in the DiT architecture, in order to determine the subset that should be injected while editing. Each layer implements a multimodal diffusion transformer block that processes a combined sequence of text and image embeddings.
Or read this on Hacker News