影像處理期末專題
pre-trained model
在 pre-trained model 內有三個物件
- _G: 訓練到一半的 image generator
- _D: 訓練到一半的 image discriminator
- Gs: 訓練完成的 image generator
前兩個物件可用來繼續訓練,Gs 則是可以直接拿來使用的 generator
Gs.run()
Gs.run()
可直接將 n 個 vector 輸入後輸出 n 張圖片,透過前面的種子設定輸出圖片的樣式
Gs.get_output_for()
Gs.get_output_for()
取得 generator 做到一半的成果,可以直接接續給下一個 tensorflow 模型製作,不須再透過其他套件轉換圖片格式成 numpy array。
Gs.components.mapping(), Gs.components.synthesis()
Gs.components.mapping()
將輸入轉到 \(\mathcal{W}\) 空間上,也就是模型前半段的內容
Gs.components.synthesis()
把在 \(\mathcal{W}\) 空間上的向量再和其他照片或是 Noice 合成

dataset
loss function
- 使用 WGAN-GP 作為 loss function
- paper 內提到沒有修改 loss function
Section 2.1 第三段最後,figure. 2 下面
We found these choices to give the best
results. Our contributions do not modify the loss function.
github link to loss function
it's so boring to read this loss function 張議隆
training
enviornment
python 3.6 (using conda)
tensorflow 2.6.2 (1.1.0 or newer)
numpy 1.19.5 (1.14.3 or newer)
DRAM 24GB (11GB or more)
NVIDIA driver version 520.56.06 (391.35 or newer)
CUDA 11.8 (8.0 or newer)
cuDNN 8.7.0.84 (7.3.1 or newer)
model framework
Framework ppts
大致整理的hackmd
12/16 林凱翔張議隆
generator with 1024X1024 images
using 1024X1024 image from ffhq dataset
Gs Params OutputShape WeightShape
--- --- --- ---
latents_in - (?, 512) -
labels_in - (?, 0) -
lod - () -
dlatent_avg - (512,) -
G_mapping/latents_in - (?, 512) -
G_mapping/labels_in - (?, 0) -
G_mapping/PixelNorm - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 18, 512) -
G_mapping/dlatents_out - (?, 18, 512) -
Truncation - (?, 18, 512) -
G_synthesis/dlatents_in - (?, 18, 512) -
G_synthesis/4x4/Const 534528 (?, 512, 4, 4) (512,)
G_synthesis/4x4/Conv 2885632 (?, 512, 4, 4) (3, 3, 512, 512)
G_synthesis/ToRGB_lod8 1539 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2885632 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Conv1 2885632 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/ToRGB_lod7 1539 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/Upscale2D - (?, 3, 8, 8) -
G_synthesis/Grow_lod7 - (?, 3, 8, 8) -
G_synthesis/16x16/Conv0_up 2885632 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Conv1 2885632 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/ToRGB_lod6 1539 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/Upscale2D_1 - (?, 3, 16, 16) -
G_synthesis/Grow_lod6 - (?, 3, 16, 16) -
G_synthesis/32x32/Conv0_up 2885632 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Conv1 2885632 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/ToRGB_lod5 1539 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/Upscale2D_2 - (?, 3, 32, 32) -
G_synthesis/Grow_lod5 - (?, 3, 32, 32) -
G_synthesis/64x64/Conv0_up 1442816 (?, 256, 64, 64) (3, 3, 512, 256)
G_synthesis/64x64/Conv1 852992 (?, 256, 64, 64) (3, 3, 256, 256)
G_synthesis/ToRGB_lod4 771 (?, 3, 64, 64) (1, 1, 256, 3)
G_synthesis/Upscale2D_3 - (?, 3, 64, 64) -
G_synthesis/Grow_lod4 - (?, 3, 64, 64) -
G_synthesis/128x128/Conv0_up 426496 (?, 128, 128, 128) (3, 3, 256, 128)
G_synthesis/128x128/Conv1 279040 (?, 128, 128, 128) (3, 3, 128, 128)
G_synthesis/ToRGB_lod3 387 (?, 3, 128, 128) (1, 1, 128, 3)
G_synthesis/Upscale2D_4 - (?, 3, 128, 128) -
G_synthesis/Grow_lod3 - (?, 3, 128, 128) -
G_synthesis/256x256/Conv0_up 139520 (?, 64, 256, 256) (3, 3, 128, 64)
G_synthesis/256x256/Conv1 102656 (?, 64, 256, 256) (3, 3, 64, 64)
G_synthesis/ToRGB_lod2 195 (?, 3, 256, 256) (1, 1, 64, 3)
G_synthesis/Upscale2D_5 - (?, 3, 256, 256) -
G_synthesis/Grow_lod2 - (?, 3, 256, 256) -
G_synthesis/512x512/Conv0_up 51328 (?, 32, 512, 512) (3, 3, 64, 32)
G_synthesis/512x512/Conv1 42112 (?, 32, 512, 512) (3, 3, 32, 32)
G_synthesis/ToRGB_lod1 99 (?, 3, 512, 512) (1, 1, 32, 3)
G_synthesis/Upscale2D_6 - (?, 3, 512, 512) -
G_synthesis/Grow_lod1 - (?, 3, 512, 512) -
G_synthesis/1024x1024/Conv0_up 21056 (?, 16, 1024, 1024) (3, 3, 32, 16)
G_synthesis/1024x1024/Conv1 18752 (?, 16, 1024, 1024) (3, 3, 16, 16)
G_synthesis/ToRGB_lod0 51 (?, 3, 1024, 1024) (1, 1, 16, 3)
G_synthesis/Upscale2D_7 - (?, 3, 1024, 1024) -
G_synthesis/Grow_lod0 - (?, 3, 1024, 1024) -
G_synthesis/images_out - (?, 3, 1024, 1024) -
G_synthesis/lod - () -
G_synthesis/noise0 - (1, 1, 4, 4) -
G_synthesis/noise1 - (1, 1, 4, 4) -
G_synthesis/noise2 - (1, 1, 8, 8) -
G_synthesis/noise3 - (1, 1, 8, 8) -
G_synthesis/noise4 - (1, 1, 16, 16) -
G_synthesis/noise5 - (1, 1, 16, 16) -
G_synthesis/noise6 - (1, 1, 32, 32) -
G_synthesis/noise7 - (1, 1, 32, 32) -
G_synthesis/noise8 - (1, 1, 64, 64) -
G_synthesis/noise9 - (1, 1, 64, 64) -
G_synthesis/noise10 - (1, 1, 128, 128) -
G_synthesis/noise11 - (1, 1, 128, 128) -
G_synthesis/noise12 - (1, 1, 256, 256) -
G_synthesis/noise13 - (1, 1, 256, 256) -
G_synthesis/noise14 - (1, 1, 512, 512) -
G_synthesis/noise15 - (1, 1, 512, 512) -
G_synthesis/noise16 - (1, 1, 1024, 1024) -
G_synthesis/noise17 - (1, 1, 1024, 1024) -
images_out - (?, 3, 1024, 1024) -
--- --- --- ---
Total 26219627
generator with 256X256 ffhq images
G Params OutputShape WeightShape
--- --- --- ---
latents_in - (?, 512) -
labels_in - (?, 0) -
lod - () -
dlatent_avg - (512,) -
G_mapping/latents_in - (?, 512) -
G_mapping/labels_in - (?, 0) -
G_mapping/PixelNorm - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 14, 512) -
G_mapping/dlatents_out - (?, 14, 512) -
Truncation - (?, 14, 512) -
G_synthesis/dlatents_in - (?, 14, 512) -
G_synthesis/4x4/Const 534528 (?, 512, 4, 4) (512,)
G_synthesis/4x4/Conv 2885632 (?, 512, 4, 4) (3, 3, 512, 512)
G_synthesis/ToRGB_lod6 1539 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2885632 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Conv1 2885632 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/ToRGB_lod5 1539 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/Upscale2D - (?, 3, 8, 8) -
G_synthesis/Grow_lod5 - (?, 3, 8, 8) -
G_synthesis/16x16/Conv0_up 2885632 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Conv1 2885632 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/ToRGB_lod4 1539 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/Upscale2D_1 - (?, 3, 16, 16) -
G_synthesis/Grow_lod4 - (?, 3, 16, 16) -
G_synthesis/32x32/Conv0_up 2885632 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Conv1 2885632 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/ToRGB_lod3 1539 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/Upscale2D_2 - (?, 3, 32, 32) -
G_synthesis/Grow_lod3 - (?, 3, 32, 32) -
G_synthesis/64x64/Conv0_up 1442816 (?, 256, 64, 64) (3, 3, 512, 256)
G_synthesis/64x64/Conv1 852992 (?, 256, 64, 64) (3, 3, 256, 256)
G_synthesis/ToRGB_lod2 771 (?, 3, 64, 64) (1, 1, 256, 3)
G_synthesis/Upscale2D_3 - (?, 3, 64, 64) -
G_synthesis/Grow_lod2 - (?, 3, 64, 64) -
G_synthesis/128x128/Conv0_up 426496 (?, 128, 128, 128) (3, 3, 256, 128)
G_synthesis/128x128/Conv1 279040 (?, 128, 128, 128) (3, 3, 128, 128)
G_synthesis/ToRGB_lod1 387 (?, 3, 128, 128) (1, 1, 128, 3)
G_synthesis/Upscale2D_4 - (?, 3, 128, 128) -
G_synthesis/Grow_lod1 - (?, 3, 128, 128) -
G_synthesis/256x256/Conv0_up 139520 (?, 64, 256, 256) (3, 3, 128, 64)
G_synthesis/256x256/Conv1 102656 (?, 64, 256, 256) (3, 3, 64, 64)
G_synthesis/ToRGB_lod0 195 (?, 3, 256, 256) (1, 1, 64, 3)
G_synthesis/Upscale2D_5 - (?, 3, 256, 256) -
G_synthesis/Grow_lod0 - (?, 3, 256, 256) -
G_synthesis/images_out - (?, 3, 256, 256) -
G_synthesis/lod - () -
G_synthesis/noise0 - (1, 1, 4, 4) -
G_synthesis/noise1 - (1, 1, 4, 4) -
G_synthesis/noise2 - (1, 1, 8, 8) -
G_synthesis/noise3 - (1, 1, 8, 8) -
G_synthesis/noise4 - (1, 1, 16, 16) -
G_synthesis/noise5 - (1, 1, 16, 16) -
G_synthesis/noise6 - (1, 1, 32, 32) -
G_synthesis/noise7 - (1, 1, 32, 32) -
G_synthesis/noise8 - (1, 1, 64, 64) -
G_synthesis/noise9 - (1, 1, 64, 64) -
G_synthesis/noise10 - (1, 1, 128, 128) -
G_synthesis/noise11 - (1, 1, 128, 128) -
G_synthesis/noise12 - (1, 1, 256, 256) -
G_synthesis/noise13 - (1, 1, 256, 256) -
images_out - (?, 3, 256, 256) -
--- --- --- ---
Total 26086229
discriminator with 256X256 ffhq images
D Params OutputShape WeightShape
--- --- --- ---
images_in - (?, 3, 256, 256) -
labels_in - (?, 0) -
lod - () -
FromRGB_lod0 256 (?, 64, 256, 256) (1, 1, 3, 64)
256x256/Conv0 36928 (?, 64, 256, 256) (3, 3, 64, 64)
256x256/Conv1_down 73856 (?, 128, 128, 128) (3, 3, 64, 128)
Downscale2D - (?, 3, 128, 128) -
FromRGB_lod1 512 (?, 128, 128, 128) (1, 1, 3, 128)
Grow_lod0 - (?, 128, 128, 128) -
128x128/Conv0 147584 (?, 128, 128, 128) (3, 3, 128, 128)
128x128/Conv1_down 295168 (?, 256, 64, 64) (3, 3, 128, 256)
Downscale2D_1 - (?, 3, 64, 64) -
FromRGB_lod2 1024 (?, 256, 64, 64) (1, 1, 3, 256)
Grow_lod1 - (?, 256, 64, 64) -
64x64/Conv0 590080 (?, 256, 64, 64) (3, 3, 256, 256)
64x64/Conv1_down 1180160 (?, 512, 32, 32) (3, 3, 256, 512)
Downscale2D_2 - (?, 3, 32, 32) -
FromRGB_lod3 2048 (?, 512, 32, 32) (1, 1, 3, 512)
Grow_lod2 - (?, 512, 32, 32) -
32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
Downscale2D_3 - (?, 3, 16, 16) -
FromRGB_lod4 2048 (?, 512, 16, 16) (1, 1, 3, 512)
Grow_lod3 - (?, 512, 16, 16) -
16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
Downscale2D_4 - (?, 3, 8, 8) -
FromRGB_lod5 2048 (?, 512, 8, 8) (1, 1, 3, 512)
Grow_lod4 - (?, 512, 8, 8) -
8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512)
Downscale2D_5 - (?, 3, 4, 4) -
FromRGB_lod6 2048 (?, 512, 4, 4) (1, 1, 3, 512)
Grow_lod5 - (?, 512, 4, 4) -
4x4/MinibatchStddev - (?, 513, 4, 4) -
4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512)
4x4/Dense0 4194816 (?, 512) (8192, 512)
4x4/Dense1 513 (?, 1) (512, 1)
scores_out - (?, 1) -
--- --- --- ---
Total 23052353
note
12/14
- 處理NVIDIA driver、CUDA環境、cuDNN環境
一開始就發現NVIDA driver的原先設置不對,先將driver完全卸載後再重新安裝。再來發現CUDA和cuDNN也必須重新安裝,安裝過程發現部分linux套件(grub)損毀,再回頭處理套件問題。經過了數個小時才完成環境安裝。
12/15
- 安裝 tensorflow 套件
經過多次嘗試後發現 tensorflow-gpu 套件必須在 1.15 版本才可以正常運行 code。在官方 github 上也有一些沒提及的套件需要額外安裝才可運行(但我沒記錄下有哪些)
- 嘗試下載dataset
下載dataset時發現檔案過大,下載前先清理了許多檔案。但是實際下載時還是使用過多空間讓虛擬機系統崩潰,又花了數個小時修復系統。修復完成後加裝了固態硬碟用來存放資料集,但新硬碟上的權限問題還沒解決,需要先處理權限才可以正常存放與嘗試訓練(train.py)
12/16
- 嘗試執行 train.py
準備好資料集後就開始嘗試訓練模型,途中發現在 /stylegan/training/dataset.py
的有被寫死一些內容(如下 codeblock)。
第 95 行寫到 tfr_lods
應該是一個陣列,這個陣列內容會被設置成目前已存在的 dataset 的 \(\log_2{resolution}\)。
第 99 行會檢測 tfr_lods
和預設要訓練的各個解析度的 dataset 是否相同,而預設的內容是要存放 4X4 到 1024X1024 的解析度的 dataset。必須把 99 行註解掉才能一次只跑一個 dataset。
後來發現似乎依舊得有每個 dataset 才能正常執行這份程式碼。
12/17
下載全部資料集前必須先調整磁碟大小,調整完後發現資料集被 google drive 因為太多人存取而限制下載。
12/18
今天在繼續嘗試昨天沒辦法下載的512X512和1024X1024資料集。512X512的資料集發現會在下載開始1小時後被強行中斷;1024X1024的資料集會因為google drive的存取限制(太多人存取)而不能下載。
目前發現原因是 403 forbidden,主因就是 google drive 的限制流量導致。但是不能確定實際狀況為何。
