From afefb6b052e7d5be012db9c1bfdfbedfdb4ae65d Mon Sep 17 00:00:00 2001 From: apolinario Date: Fri, 24 Mar 2023 11:24:50 +0100 Subject: [PATCH 1/3] Add diffusers integration --- doc/UNCLIP.MD | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/doc/UNCLIP.MD b/doc/UNCLIP.MD index fc2fc2a..de2daf1 100644 --- a/doc/UNCLIP.MD +++ b/doc/UNCLIP.MD @@ -15,7 +15,25 @@ To use them, download from Hugging Face, and put and the weights into the `check #### Image Variations ![image-variations-l-1](../assets/stable-samples/stable-unclip/unclip-variations.png) -Run +Diffusers integration +Stable UnCLIP Image Variations is integrated with the [🧨 diffusers](https://github.com/huggingface/diffusers) library +```python +#pip install git+https://github.com/huggingface/diffusers.git transformers accelerate +import torch +from diffusers import StableUnCLIPPipeline + +pipe = StableUnCLIPPipeline.from_pretrained( + "stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16 +) +pipe = pipe.to("cuda") + +prompt = "a photo of an astronaut riding a horse on mars" +images = pipe(prompt).images +images[0].save("astronaut_horse.png") +``` +Check out the [Stable UnCLIP pipeline docs here](https://huggingface.co/docs/diffusers/api/pipelines/stable_unclip) + +Streamlit UI demo ``` streamlit run scripts/streamlit/stableunclip.py From 6dd3048419a863ab4e7474c5834c85338edb4864 Mon Sep 17 00:00:00 2001 From: apolinario Date: Fri, 24 Mar 2023 11:26:49 +0100 Subject: [PATCH 2/3] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e2b5d28..ce8b642 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ new checkpoints. The following list provides an overview of all currently availa *Stable UnCLIP 2.1* -- New stable diffusion finetune (_Stable unCLIP 2.1_, [HuggingFace](https://huggingface.co/stabilityai/)) at 768x768 resolution, based on SD2.1-768. This model allows for image variations and mixing operations as described in [*Hierarchical Text-Conditional Image Generation with CLIP Latents*](https://arxiv.org/abs/2204.06125), and, thanks to its modularity, can be combined with other models such as [KARLO](https://github.com/kakaobrain/karlo). Comes in two variants: [*Stable unCLIP-L*](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-l.ckpt) and [*Stable unCLIP-H*](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-h.ckpt), which are conditioned on CLIP ViT-L and ViT-H image embeddings, respectively. Instructions are available [here](doc/UNCLIP.MD). +- New stable diffusion finetune (_Stable unCLIP 2.1_, [HuggingFace Model](https://huggingface.co/stabilityai/)) at 768x768 resolution, based on SD2.1-768. This model allows for image variations and mixing operations as described in [*Hierarchical Text-Conditional Image Generation with CLIP Latents*](https://arxiv.org/abs/2204.06125), and, thanks to its modularity, can be combined with other models such as [KARLO](https://github.com/kakaobrain/karlo). Comes in two variants: [*Stable unCLIP-L*](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-l.ckpt) and [*Stable unCLIP-H*](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-h.ckpt), which are conditioned on CLIP ViT-L and ViT-H image embeddings, respectively. Instructions are available [here](doc/UNCLIP.MD). **December 7, 2022** From be2861a0ff9c59e49bb0961948ad300a0df54a90 Mon Sep 17 00:00:00 2001 From: apolinario Date: Fri, 24 Mar 2023 11:27:38 +0100 Subject: [PATCH 3/3] Small Hugging Face as two words nit --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ce8b642..49fe367 100644 --- a/README.md +++ b/README.md @@ -12,13 +12,13 @@ new checkpoints. The following list provides an overview of all currently availa *Stable UnCLIP 2.1* -- New stable diffusion finetune (_Stable unCLIP 2.1_, [HuggingFace Model](https://huggingface.co/stabilityai/)) at 768x768 resolution, based on SD2.1-768. This model allows for image variations and mixing operations as described in [*Hierarchical Text-Conditional Image Generation with CLIP Latents*](https://arxiv.org/abs/2204.06125), and, thanks to its modularity, can be combined with other models such as [KARLO](https://github.com/kakaobrain/karlo). Comes in two variants: [*Stable unCLIP-L*](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-l.ckpt) and [*Stable unCLIP-H*](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-h.ckpt), which are conditioned on CLIP ViT-L and ViT-H image embeddings, respectively. Instructions are available [here](doc/UNCLIP.MD). +- New stable diffusion finetune (_Stable unCLIP 2.1_, [Hugging Face](https://huggingface.co/stabilityai/)) at 768x768 resolution, based on SD2.1-768. This model allows for image variations and mixing operations as described in [*Hierarchical Text-Conditional Image Generation with CLIP Latents*](https://arxiv.org/abs/2204.06125), and, thanks to its modularity, can be combined with other models such as [KARLO](https://github.com/kakaobrain/karlo). Comes in two variants: [*Stable unCLIP-L*](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-l.ckpt) and [*Stable unCLIP-H*](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-h.ckpt), which are conditioned on CLIP ViT-L and ViT-H image embeddings, respectively. Instructions are available [here](doc/UNCLIP.MD). **December 7, 2022** *Version 2.1* -- New stable diffusion model (_Stable Diffusion 2.1-v_, [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-2-1)) at 768x768 resolution and (_Stable Diffusion 2.1-base_, [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-2-1-base)) at 512x512 resolution, both based on the same number of parameters and architecture as 2.0 and fine-tuned on 2.0, on a less restrictive NSFW filtering of the [LAION-5B](https://laion.ai/blog/laion-5b/) dataset. +- New stable diffusion model (_Stable Diffusion 2.1-v_, [Hugging Face](https://huggingface.co/stabilityai/stable-diffusion-2-1)) at 768x768 resolution and (_Stable Diffusion 2.1-base_, [HuggingFace](https://huggingface.co/stabilityai/stable-diffusion-2-1-base)) at 512x512 resolution, both based on the same number of parameters and architecture as 2.0 and fine-tuned on 2.0, on a less restrictive NSFW filtering of the [LAION-5B](https://laion.ai/blog/laion-5b/) dataset. Per default, the attention operation of the model is evaluated at full precision when `xformers` is not installed. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model) , run your script with `ATTN_PRECISION=fp16 python ` **November 24, 2022**