diff --git a/README.md b/README.md index c7a9669..1c01c6f 100644 --- a/README.md +++ b/README.md @@ -8,11 +8,13 @@ new checkpoints. The following list provides an overview of all currently availa ## News -**February X, 2023** +**February 27, 2023** + *Stable UnCLIP 2.1* -- New stable diffusion finetune (_Stable unCLIP 2.1_, [HuggingFace](https://huggingface.co/stabilityai/stable-unclip-2-1)) at 768x768 resolution, -based on SD2.1-768. This model allows for image variations and mixing operations as described in TODO, and, thanks to its modularity, can be combined with other models -such as [KARLO](https://github.com/kakaobrain/karlo). Documentation [here](doc/UNCLIP.MD). +- New stable diffusion finetune (_Stable unCLIP 2.1_, [HuggingFace](https://huggingface.co/stabilityai/)) at 768x768 resolution, +based on SD2.1-768. This model allows for image variations and mixing operations as described in [*Hierarchical Text-Conditional Image Generation with CLIP Latents*](https://arxiv.org/abs/2204.06125), and, thanks to its modularity, can be combined with other models +such as [KARLO](https://github.com/kakaobrain/karlo). Documentation [here](doc/UNCLIP.MD). Comes in two variants: [*Stable unCLIP-L*](TODO) and [*Stable unCLIP-H*](TODO), which are conditioned on CLIP +ViT-L and ViT-H image embeddings, respectively. **December 7, 2022** diff --git a/assets/stable-samples/stable-unclip/unclip-variations.png b/assets/stable-samples/stable-unclip/unclip-variations.png new file mode 100644 index 0000000..53ff52c Binary files /dev/null and b/assets/stable-samples/stable-unclip/unclip-variations.png differ diff --git a/assets/stable-samples/stable-unclip/unclip-variations_noise.png b/assets/stable-samples/stable-unclip/unclip-variations_noise.png new file mode 100644 index 0000000..d364a3d Binary files /dev/null and b/assets/stable-samples/stable-unclip/unclip-variations_noise.png differ diff --git a/doc/UNCLIP.MD b/doc/UNCLIP.MD index 0050f3d..d05272d 100644 --- a/doc/UNCLIP.MD +++ b/doc/UNCLIP.MD @@ -1,19 +1,14 @@ ### Stable unCLIP -_++++++ NOTE: preliminary checkpoints for internal testing ++++++_ [unCLIP](https://openai.com/dall-e-2/) is the approach behind OpenAI's [DALLĀ·E 2](https://openai.com/dall-e-2/), trained to invert CLIP image embeddings. We finetuned SD 2.1 to accept a CLIP ViT-L/14 image embedding in addition to the text encodings. This means that the model can be used to produce image variations, but can also be combined with a text-to-image embedding prior to yield a full text-to-image model at 768x768 resolution. -We provide two models, trained on OpenAI CLIP-L and OpenCLIP-H image embeddings, respectively, available -_[TODO: +++prelim private upload on HF+++]_ from [https://huggingface.co/stabilityai/stable-unclip-preview](https://huggingface.co/stabilityai/stable-unclip-preview). +We provide two models, trained on OpenAI CLIP-L and OpenCLIP-H image embeddings, respectively, available from [https://huggingface.co/stabilityai/](TODO). To use them, download from Hugging Face, and put and the weights into the `checkpoints` folder. #### Image Variations -![image-variations-l-1](../assets/stable-samples/stable-unclip/houses_out.jpeg) -![image-variations-l-2](../assets/stable-samples/stable-unclip/plates_out.jpeg) - -_++TODO: Input images from the DIV2K dataset. check license++_ +![image-variations-l-1](../assets/stable-samples/stable-unclip/unclip-variations.png) Run @@ -24,16 +19,7 @@ to launch a streamlit script than can be used to make image variations with both These models can process a `noise_level`, which specifies an amount of Gaussian noise added to the CLIP embeddings. This can be used to increase output variance as in the following examples. -**noise_level = 0** -![image-variations-l-3](../assets/stable-samples/stable-unclip/oldcar000.jpeg) - -**noise_level = 500** -![image-variations-l-4](../assets/stable-samples/stable-unclip/oldcar500.jpeg) - -**noise_level = 800** -![image-variations-l-6](../assets/stable-samples/stable-unclip/oldcar800.jpeg) - - +![image-variations-noise](../assets/stable-samples/stable-unclip/unclip-variations_noise.png) ### Stable Diffusion Meets Karlo @@ -51,7 +37,7 @@ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/0b623 wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/85626483eaca9f581e2a78d31ff905ca/prior-ckpt-step%3D01000000-of-01000000.ckpt cd ../../ ``` -and the finetuned SD2.1 unCLIP-L checkpoint _[TODO: +++prelim private upload on HF+++]_ from [https://huggingface.co/stabilityai/stable-unclip-preview](https://huggingface.co/stabilityai/stable-unclip-preview), and put the ckpt into the `checkpoints folder` +and the finetuned SD2.1 unCLIP-L checkpoint from [https://huggingface.co/stabilityai/](https://huggingface.co/stabilityai/TODO), and put the ckpt into the `checkpoints folder` Then, run diff --git a/scripts/streamlit/stableunclip.py b/scripts/streamlit/stableunclip.py index c193434..6dd4bb7 100644 --- a/scripts/streamlit/stableunclip.py +++ b/scripts/streamlit/stableunclip.py @@ -276,7 +276,6 @@ if __name__ == "__main__": version = st.selectbox("Model Version", list(VERSION2SPECS.keys()), 0) use_karlo = version in ["Stable unCLIP-L"] and st.checkbox("Use KARLO prior", False) state = init(version=version, load_karlo_prior=use_karlo) - st.info(state["msg"]) prompt = st.text_input("Prompt", "a professional photograph") negative_prompt = st.text_input("Negative Prompt", "") scale = st.number_input("cfg-scale", value=10., min_value=-100., max_value=100.)