update examples for release

This commit is contained in:
Robin Rombach 2023-02-23 11:33:20 +01:00
parent edb2eb90b4
commit fe1cf687e9
5 changed files with 10 additions and 23 deletions

View file

@ -8,11 +8,13 @@ new checkpoints. The following list provides an overview of all currently availa
## News ## News
**February X, 2023** **February 27, 2023**
*Stable UnCLIP 2.1* *Stable UnCLIP 2.1*
- New stable diffusion finetune (_Stable unCLIP 2.1_, [HuggingFace](https://huggingface.co/stabilityai/stable-unclip-2-1)) at 768x768 resolution, - New stable diffusion finetune (_Stable unCLIP 2.1_, [HuggingFace](https://huggingface.co/stabilityai/)) at 768x768 resolution,
based on SD2.1-768. This model allows for image variations and mixing operations as described in TODO, and, thanks to its modularity, can be combined with other models based on SD2.1-768. This model allows for image variations and mixing operations as described in [*Hierarchical Text-Conditional Image Generation with CLIP Latents*](https://arxiv.org/abs/2204.06125), and, thanks to its modularity, can be combined with other models
such as [KARLO](https://github.com/kakaobrain/karlo). Documentation [here](doc/UNCLIP.MD). such as [KARLO](https://github.com/kakaobrain/karlo). Documentation [here](doc/UNCLIP.MD). Comes in two variants: [*Stable unCLIP-L*](TODO) and [*Stable unCLIP-H*](TODO), which are conditioned on CLIP
ViT-L and ViT-H image embeddings, respectively.
**December 7, 2022** **December 7, 2022**

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 MiB

View file

@ -1,19 +1,14 @@
### Stable unCLIP ### Stable unCLIP
_++++++ NOTE: preliminary checkpoints for internal testing ++++++_
[unCLIP](https://openai.com/dall-e-2/) is the approach behind OpenAI's [DALL·E 2](https://openai.com/dall-e-2/), [unCLIP](https://openai.com/dall-e-2/) is the approach behind OpenAI's [DALL·E 2](https://openai.com/dall-e-2/),
trained to invert CLIP image embeddings. trained to invert CLIP image embeddings.
We finetuned SD 2.1 to accept a CLIP ViT-L/14 image embedding in addition to the text encodings. We finetuned SD 2.1 to accept a CLIP ViT-L/14 image embedding in addition to the text encodings.
This means that the model can be used to produce image variations, but can also be combined with a text-to-image This means that the model can be used to produce image variations, but can also be combined with a text-to-image
embedding prior to yield a full text-to-image model at 768x768 resolution. embedding prior to yield a full text-to-image model at 768x768 resolution.
We provide two models, trained on OpenAI CLIP-L and OpenCLIP-H image embeddings, respectively, available We provide two models, trained on OpenAI CLIP-L and OpenCLIP-H image embeddings, respectively, available from [https://huggingface.co/stabilityai/](TODO).
_[TODO: +++prelim private upload on HF+++]_ from [https://huggingface.co/stabilityai/stable-unclip-preview](https://huggingface.co/stabilityai/stable-unclip-preview).
To use them, download from Hugging Face, and put and the weights into the `checkpoints` folder. To use them, download from Hugging Face, and put and the weights into the `checkpoints` folder.
#### Image Variations #### Image Variations
![image-variations-l-1](../assets/stable-samples/stable-unclip/houses_out.jpeg) ![image-variations-l-1](../assets/stable-samples/stable-unclip/unclip-variations.png)
![image-variations-l-2](../assets/stable-samples/stable-unclip/plates_out.jpeg)
_++TODO: Input images from the DIV2K dataset. check license++_
Run Run
@ -24,16 +19,7 @@ to launch a streamlit script than can be used to make image variations with both
These models can process a `noise_level`, which specifies an amount of Gaussian noise added to the CLIP embeddings. These models can process a `noise_level`, which specifies an amount of Gaussian noise added to the CLIP embeddings.
This can be used to increase output variance as in the following examples. This can be used to increase output variance as in the following examples.
**noise_level = 0** ![image-variations-noise](../assets/stable-samples/stable-unclip/unclip-variations_noise.png)
![image-variations-l-3](../assets/stable-samples/stable-unclip/oldcar000.jpeg)
**noise_level = 500**
![image-variations-l-4](../assets/stable-samples/stable-unclip/oldcar500.jpeg)
**noise_level = 800**
![image-variations-l-6](../assets/stable-samples/stable-unclip/oldcar800.jpeg)
### Stable Diffusion Meets Karlo ### Stable Diffusion Meets Karlo
@ -51,7 +37,7 @@ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/0b623
wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/85626483eaca9f581e2a78d31ff905ca/prior-ckpt-step%3D01000000-of-01000000.ckpt wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/85626483eaca9f581e2a78d31ff905ca/prior-ckpt-step%3D01000000-of-01000000.ckpt
cd ../../ cd ../../
``` ```
and the finetuned SD2.1 unCLIP-L checkpoint _[TODO: +++prelim private upload on HF+++]_ from [https://huggingface.co/stabilityai/stable-unclip-preview](https://huggingface.co/stabilityai/stable-unclip-preview), and put the ckpt into the `checkpoints folder` and the finetuned SD2.1 unCLIP-L checkpoint from [https://huggingface.co/stabilityai/](https://huggingface.co/stabilityai/TODO), and put the ckpt into the `checkpoints folder`
Then, run Then, run

View file

@ -276,7 +276,6 @@ if __name__ == "__main__":
version = st.selectbox("Model Version", list(VERSION2SPECS.keys()), 0) version = st.selectbox("Model Version", list(VERSION2SPECS.keys()), 0)
use_karlo = version in ["Stable unCLIP-L"] and st.checkbox("Use KARLO prior", False) use_karlo = version in ["Stable unCLIP-L"] and st.checkbox("Use KARLO prior", False)
state = init(version=version, load_karlo_prior=use_karlo) state = init(version=version, load_karlo_prior=use_karlo)
st.info(state["msg"])
prompt = st.text_input("Prompt", "a professional photograph") prompt = st.text_input("Prompt", "a professional photograph")
negative_prompt = st.text_input("Negative Prompt", "") negative_prompt = st.text_input("Negative Prompt", "")
scale = st.number_input("cfg-scale", value=10., min_value=-100., max_value=100.) scale = st.number_input("cfg-scale", value=10., min_value=-100., max_value=100.)