# 23: Super-resolution 或Deep Learning Foundations to Stable Diffusion 主要討論:深度學習模型在圖像處理(應用/改進) ## FID 首先他說了 notebook 23章 ~~ [~~(我找了很久)~~](https://github.com/fastai/course22p2)~~ FID (Fréchet Inception Distance) 的bug > Fréchet Inception Distance 是什麼? 用來評估模型產生圖像品質和多樣性的重要指標/生成圖像與真實圖像在特徵空間中的相似度。 > 特徵提取>計算>Fréchet 距離計算 > value 越低越好, if ==0 即生成的和真實的圖像分布完全相同~ 完美~ ``` BUG: cos model: -0.5 to +0.5 sampling x2 ** dependent variable is just "noise" ``` ## U-Nets for super-resolution 然後就到Tiny Imagenet ``` fashion MNIST:最大的1training size 只有28x28 Tiny Imagenet: 64x64 [壞處: 很難找,只有standford 找到] ``` ``` SHUTIL <-解壓用~ ``` ``` STEP1: : create data set class TinyDS: def __init__(self, path): self.path = Path(path) self.files = glob(str(path/'**/*.JPEG'), recursive=True) <-要小心recursive def __len__(self): return len(self.files) def __getitem__(self, i): return self.files[i],Path(self.files[i]).parent.parent.name tds = TinyDS(path/'train') <- ``` ![image](https://hackmd.io/_uploads/HkrlpLVUel.png) Label 是分開放在val 如上 > shift 提示大法~ ### data augmentation random resize crop pick one area inside and zoom into it 可是tiny imagenet 太小/細了,太poor => blur ![image](https://hackmd.io/_uploads/SJXp8PV8ex.png) 貓貓是很可愛啦,有貓就讚,但是我以為自己沒有戴眼鏡呢 所以padding around them 然後隨機從中找64x64 用random array V 用transform batch passing in those transforms (nn dot sequential: called one time in a row) ~~(我也覺得沒什麼神奇特別啦...)~~ get_dropmodel <=就用之前那個啦 > 在batch level 下,不是必然做augmentation > 又用時間,又會濛 ![image](https://hackmd.io/_uploads/BJ8m_uNUgg.png) 用AdamW mixed precision來train 攪leraning rate finder trained it for 25epochs ![image](https://hackmd.io/_uploads/r1PL3OELxl.png) 59.3% ## 怎樣可以做得更好呢? 看論文,如何從60%>70% ![image](https://hackmd.io/_uploads/SkaC6dVUle.png) 用real ResNet ~~([我看了這條影片](https://youtu.be/o_3mboe1jYI?si=Oq3WPSWIOQRcHv7a))~~ ![image](https://hackmd.io/_uploads/HJgeTxFEIex.png) ![image](https://hackmd.io/_uploads/r1nmbYNLge.png) ``` 1. no extra parameter 2. 1x1 convolution (x2) 3. /2 size (=>32x32)but stride of 2 ``` ![image](https://hackmd.io/_uploads/Sy_pzKV8lg.png) ![image](https://hackmd.io/_uploads/HJNJQtNUgg.png) 5 down sample! 3+2+2+1+1=9 res blocks 前![image](https://hackmd.io/_uploads/Syd2mKVLlx.png) 後![image](https://hackmd.io/_uploads/S1Dj7Y4Ueg.png) 多過一倍 ![image](https://hackmd.io/_uploads/ByJl4FV8le.png) 61.8%了 ## More augmentation: trivial augment [論文](https://arxiv.org/pdf/2103.10158) ![image](https://hackmd.io/_uploads/ByOxBYV8xg.png) 在這個dataset 表現沒有太好 所以逐件逐件來 (/張?!/項?!) ![image](https://hackmd.io/_uploads/HJWWwKNUgx.png) (要做augmentation, 之後轉去tensor,之後normalize ) conv -> normalization->activation(optional) ![image](https://hackmd.io/_uploads/HJGuZq4Uge.png) 64.9% ![image](https://hackmd.io/_uploads/Sk2i-qNUgx.png) ## 25 Super Resolution not Classification independent variable will be scaled down to 32x32 pixel dependent variable: original image do random crop within padded image and random flips: need exactly the same cropping and flipping (both independent and dependent) 難版super resolution 可以把deleted 的pixels 換成更好的pixels 重點: dataset 要求簡單點,不用load labels train 和validation 是沒分別~ TfmDS only appkued to the independent var 64x64->32x32 (2x2pixels by 2)->64 如果質數好32>64 更甚->128 etc. ## Denoising autoencoder (Review: ch8) ![image](https://hackmd.io/_uploads/HkwnWs4Lgx.png) ![image](https://hackmd.io/_uploads/ByCnWs4Ixl.png) [Unet](https://arxiv.org/abs/1505.04597)! 2015開發的, 起初用作醫療影像 ![image](https://hackmd.io/_uploads/BkHWGs4Ugl.png) 32x32 pixels in the lowest resolution (-2魔法) -> down sampling ^ up sampling Question: Moudle list?! does not do anything?! ## Unet ![image](https://hackmd.io/_uploads/H1DaQnEIlg.png) 8.6% 損失又真的是少了很多 intput: ![image](https://hackmd.io/_uploads/ByJ-4hEIll.png) output: ![image](https://hackmd.io/_uploads/H1gmVnNLxg.png) 肉眼沒分別 ~~我眼睛業障~~ ![image](https://hackmd.io/_uploads/BydvEnVLlx.png) 這個看得出來! 但這是t (target ) ## Perceptual loss 拯救眼睛計劃~ 傳統: pixel loss: 對比像數:模糊不清 (比較安全) Perceptual loss: 印象/概念/大概的草稿 要求: classifier model ![image](https://hackmd.io/_uploads/HJZLnhV8ll.png) ~~我眼鏡是有病吧?~~ scalling factor: 0.1x ![image](https://hackmd.io/_uploads/rkG1J0N8xx.png) ![image](https://hackmd.io/_uploads/HksgyR4Uee.png) ![image](https://hackmd.io/_uploads/rk37yR4Uge.png) trading fast ai favourite trick: gradually unfreezing pre trained networks use actual weight for training and sampling turn off require grad trick: use random weight, set everything to unfrozen. do 20 epochs ============================== 然而,我還是沒成功~ ![image](https://hackmd.io/_uploads/rJ4tG7BIgl.png) 不等了~