基礎概念
Reverse Process
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
預測 Noise 並執行相減來生成新圖
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Training
利用 forward Process,將原圖進行 Random samle 加入噪點,將該躁點圖做為 Ground truth 進行訓練
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Text to image
加入文字的輸入
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
框架
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 文字Encoder:文字敘述變成向量。
- 生成模型: Diffusion model 生成中間產物(壓縮版本),粉紅色為噪點圖。
- Decoder:壓縮版本還原回原圖,把中間產物的小圖變成大圖,或是latent representation透過Auto-encoder還原。
三者是獨立分開訓練
Stable Diffusion
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
DALL-E
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Google Imagen
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
1. 文字Encoder
可以利用 gpt、T5 等等的文字模型
對結果的影響很大:讓影像跟文字描述能成對的關係。要能看得懂才能怎麼去生成!
評估生成的好壞
如何評估影像生成的好壞?
FID (Fréchet inception distance)
生成圖像的品質?
計算兩組真實與生成的 distribution 的距離,並假設其為高斯分布
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
CLIP (Contrastive Language-Image Pretraining)
圖像跟文字是否對應?
訓練:利用大量成對的圖跟文字
評估:把敘述跟產生圖片丟進去,計算這個向量的距離,評估像不像。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
3. Decoder
小圖變大圖
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
latent representation 潛在表徵
透過Auto-encoder還原
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
2. Generation model
是在 latent representation 上加上噪點圖
input是文字、latent representation跟step,看預測出的 Noise 跟 ground truth 差多少來訓練
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →