diffusion - HackMD

基礎概念
框架

基礎概念

Reverse Process

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

預測 Noise 並執行相減來生成新圖

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Training

利用 forward Process，將原圖進行 Random samle 加入噪點，將該躁點圖做為 Ground truth 進行訓練

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Text to image

加入文字的輸入

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

框架

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

文字Encoder：文字敘述變成向量。
生成模型： Diffusion model 生成中間產物（壓縮版本），粉紅色為噪點圖。
Decoder：壓縮版本還原回原圖，把中間產物的小圖變成大圖，或是latent representation透過Auto-encoder還原。

三者是獨立分開訓練

Stable Diffusion

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

DALL-E

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Google Imagen

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

1. 文字Encoder

可以利用 gpt、T5 等等的文字模型
對結果的影響很大：讓影像跟文字描述能成對的關係。要能看得懂才能怎麼去生成！

評估生成的好壞

如何評估影像生成的好壞？

FID (Fréchet inception distance)

生成圖像的品質？

計算兩組真實與生成的 distribution 的距離，並假設其為高斯分布

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

CLIP (Contrastive Language-Image Pretraining)

圖像跟文字是否對應？

訓練：利用大量成對的圖跟文字
評估：把敘述跟產生圖片丟進去，計算這個向量的距離，評估像不像。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

3. Decoder

小圖變大圖

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

latent representation 潛在表徵

透過Auto-encoder還原

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

2. Generation model

是在 latent representation 上加上噪點圖

input是文字、latent representation跟step，看預測出的 Noise 跟 ground truth 差多少來訓練

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →