Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. For more information about how Stable Diffusion functions, please have a look at 🤗’s Stable Diffusion with 🧨Diffusers blog.
Model Details
- Developed by: Robin Rombach, Patrick Esser
- Model type: Diffusion-based text-to-image generation model
- Language(s): English
- License: The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
- Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Imagen paper.
- Resources for more information: GitHub Repository, Paper.
https://huggingface.co/CompVis/stable-diffusion-v1-4
Stable Diffusion v2 Model Card
This model card focuses on the model associated with the Stable Diffusion v2 model, available here.
This stable-diffusion-2
model is resumed from stable-diffusion-2-base (512-base-ema.ckpt
) and trained for 150k steps using a v-objective on the same dataset. Resumed for another 140k steps on 768x768
images.
https://huggingface.co/stabilityai/stable-diffusion-2
Stable Diffusion x4 upscaler model card
This model card focuses on the model associated with the Stable Diffusion Upscaler, available here. This model is trained for 1.25M steps on a 10M subset of LAION containing images >2048x2048
. The model was trained on crops of size 512x512
and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a noise_level
as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule.
https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler