mirror of
https://github.com/Stability-AI/stablediffusion.git
synced 2024-12-22 15:44:58 +00:00
63 lines
2.6 KiB
YAML
63 lines
2.6 KiB
YAML
cff-version: 1.2.0
|
|
message: If you use this software, please cite it using these metadata.
|
|
title: stablediffusion
|
|
authors:
|
|
- family-names: Rombach
|
|
given-names: Robin
|
|
- family-names: Blattmann
|
|
given-names: Andreas
|
|
- family-names: Lorenz
|
|
given-names: Dominik
|
|
- family-names: Esser
|
|
given-names: Patrick
|
|
- family-names: Ommer
|
|
given-names: Björn
|
|
year: 2021
|
|
doi: 10.48550/arXiv.2112.10752
|
|
abstract: |
|
|
By decomposing the image formation process into a sequential application of denoising
|
|
autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on
|
|
image data and beyond. Additionally, their formulation allows for a guiding mechanism
|
|
to control the image generation process without retraining. However, since these
|
|
models typically operate directly in pixel space, optimization of powerful DMs often
|
|
consumes hundreds of GPU days and inference is expensive due to sequential
|
|
evaluations. To enable DM training on limited computational resources while retaining
|
|
their quality and flexibility, we apply them in the latent space of powerful
|
|
pretrained autoencoders. In contrast to previous work, training diffusion models on
|
|
such a representation allows for the first time to reach a near-optimal point between
|
|
complexity reduction and detail preservation, greatly boosting visual fidelity.
|
|
By introducing cross-attention layers into the model architecture, we turn diffusion
|
|
models into powerful and flexible generators for general conditioning inputs such as
|
|
text or bounding boxes and high-resolution synthesis becomes possible in a
|
|
convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the
|
|
art for image inpainting and highly competitive performance on various tasks,
|
|
including unconditional image generation, semantic scene synthesis, and
|
|
super-resolution, while significantly reducing computational requirements compared to
|
|
pixel-based DMs.
|
|
input:
|
|
- format: arXiv
|
|
id: 2112.10752
|
|
type: article
|
|
url: https://arxiv.org/abs/2112.10752
|
|
output:
|
|
- format: PDF
|
|
url: https://arxiv.org/pdf/2112.10752.pdf
|
|
preferred-citation:
|
|
type: article
|
|
authors:
|
|
- family-names: Rombach
|
|
given-names: Robin
|
|
- family-names: Blattmann
|
|
given-names: Andreas
|
|
- family-names: Lorenz
|
|
given-names: Dominik
|
|
- family-names: Esser
|
|
given-names: Patrick
|
|
- family-names: Ommer
|
|
given-names: Björn
|
|
doi: "10.48550/arXiv.2112.10752"
|
|
eprint: "2112.10752"
|
|
archivePrefix: arXiv
|
|
primaryClass: cs.CV
|
|
title: High-Resolution Image Synthesis with Latent Diffusion Models
|
|
year: 2021
|