影像處理期末專題

影像處理期末專題

styleGAN github link

pre-trained model

在 pre-trained model 內有三個物件

_G: 訓練到一半的 image generator
_D: 訓練到一半的 image discriminator
Gs: 訓練完成的 image generator
前兩個物件可用來繼續訓練，Gs 則是可以直接拿來使用的 generator







# Load pre-trained network.
url = 'https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ' # karras2019stylegan-ffhq-1024x1024.pkl
with dnnlib.util.open_url(url, cache_dir=config.cache_dir) as f:
    _G, _D, Gs = pickle.load(f)
    # _G = Instantaneous snapshot of the generator. Mainly useful for resuming a previous training run.
    # _D = Instantaneous snapshot of the discriminator. Mainly useful for resuming a previous training run.
    # Gs = Long-term average of the generator. Yields higher-quality results than the instantaneous snapshot.

Gs.run()

Gs.run()可直接將 n 個 vector 輸入後輸出 n 張圖片，透過前面的種子設定輸出圖片的樣式







# Pick latent vector.
rnd = np.random.RandomState(5)
latents = rnd.randn(1, Gs.input_shape[1]) # n is 1 here

# Generate image.
fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
images = Gs.run(latents, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)

Gs.get_output_for()

Gs.get_output_for() 取得 generator 做到一半的成果，可以直接接續給下一個 tensorflow 模型製作，不須再透過其他套件轉換圖片格式成 numpy array。




latents = tf.random_normal([self.minibatch_per_gpu] + Gs_clone.input_shape[1:])
images = Gs_clone.get_output_for(latents, None, is_validation=True, randomize_noise=True)
images = tflib.convert_images_to_uint8(images)
result_expr.append(inception_clone.get_output_for(images))

Gs.components.mapping(), Gs.components.synthesis()

Gs.components.mapping() 將輸入轉到
$W$ 空間上，也就是模型前半段的內容
Gs.components.synthesis() 把在
$W$ 空間上的向量再和其他照片或是 Noice 合成
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →



src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)

dataset

papar 內提到兩種 dataset

	CelebA-HQ	FFHQ
paper link	paper with code link	paper with code link
dataset info	human face	human face
image count	30k images	70k images
image resolution	1024 X 1024	1024 X 1024
to load	load with tensorflow	load from github

loss function

使用 WGAN-GP 作為 loss function
paper 內提到沒有修改 loss function

Section 2.1 第三段最後，figure. 2 下面
We found these choices to give the best
results. Our contributions do not modify the loss function.

github link to loss function

WGAN-GP 的相關論文
papers with code
WGAN comes from this paper

it's so boring to read this loss function 張議隆

training

在 github repo history 找到曾經有提供 jupyter notebook
pull request link
jupyter notebook 皆已失效，但有一份 .py 檔可以作為參考 colab_train.py
colab_train.py 看起來是目前的 train.py 的 colab 版本，內容相差不大

enviornment

python 3.6 (using conda)
tensorflow 2.6.2 (1.1.0 or newer)
numpy 1.19.5 (1.14.3 or newer)
DRAM 24GB (11GB or more)
NVIDIA driver version 520.56.06 (391.35 or newer)
CUDA 11.8 (8.0 or newer)
cuDNN 8.7.0.84 (7.3.1 or newer)

model framework

Framework ppts
大致整理的hackmd

12/16 林凱翔張議隆

generator with 1024X1024 images

using 1024X1024 image from ffhq dataset

Gs                              Params    OutputShape          WeightShape
---                             ---       ---                  ---
latents_in                      -         (?, 512)             -
labels_in                       -         (?, 0)               -
lod                             -         ()                   -
dlatent_avg                     -         (512,)               -
G_mapping/latents_in            -         (?, 512)             -
G_mapping/labels_in             -         (?, 0)               -
G_mapping/PixelNorm             -         (?, 512)             -
G_mapping/Dense0                262656    (?, 512)             (512, 512)
G_mapping/Dense1                262656    (?, 512)             (512, 512)
G_mapping/Dense2                262656    (?, 512)             (512, 512)
G_mapping/Dense3                262656    (?, 512)             (512, 512)
G_mapping/Dense4                262656    (?, 512)             (512, 512)
G_mapping/Dense5                262656    (?, 512)             (512, 512)
G_mapping/Dense6                262656    (?, 512)             (512, 512)
G_mapping/Dense7                262656    (?, 512)             (512, 512)
G_mapping/Broadcast             -         (?, 18, 512)         -
G_mapping/dlatents_out          -         (?, 18, 512)         -
Truncation                      -         (?, 18, 512)         -
G_synthesis/dlatents_in         -         (?, 18, 512)         -
G_synthesis/4x4/Const           534528    (?, 512, 4, 4)       (512,)
G_synthesis/4x4/Conv            2885632   (?, 512, 4, 4)       (3, 3, 512, 512)
G_synthesis/ToRGB_lod8          1539      (?, 3, 4, 4)         (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up        2885632   (?, 512, 8, 8)       (3, 3, 512, 512)
G_synthesis/8x8/Conv1           2885632   (?, 512, 8, 8)       (3, 3, 512, 512)
G_synthesis/ToRGB_lod7          1539      (?, 3, 8, 8)         (1, 1, 512, 3)
G_synthesis/Upscale2D           -         (?, 3, 8, 8)         -
G_synthesis/Grow_lod7           -         (?, 3, 8, 8)         -
G_synthesis/16x16/Conv0_up      2885632   (?, 512, 16, 16)     (3, 3, 512, 512)
G_synthesis/16x16/Conv1         2885632   (?, 512, 16, 16)     (3, 3, 512, 512)
G_synthesis/ToRGB_lod6          1539      (?, 3, 16, 16)       (1, 1, 512, 3)
G_synthesis/Upscale2D_1         -         (?, 3, 16, 16)       -
G_synthesis/Grow_lod6           -         (?, 3, 16, 16)       -
G_synthesis/32x32/Conv0_up      2885632   (?, 512, 32, 32)     (3, 3, 512, 512)
G_synthesis/32x32/Conv1         2885632   (?, 512, 32, 32)     (3, 3, 512, 512)
G_synthesis/ToRGB_lod5          1539      (?, 3, 32, 32)       (1, 1, 512, 3)
G_synthesis/Upscale2D_2         -         (?, 3, 32, 32)       -
G_synthesis/Grow_lod5           -         (?, 3, 32, 32)       -
G_synthesis/64x64/Conv0_up      1442816   (?, 256, 64, 64)     (3, 3, 512, 256)
G_synthesis/64x64/Conv1         852992    (?, 256, 64, 64)     (3, 3, 256, 256)
G_synthesis/ToRGB_lod4          771       (?, 3, 64, 64)       (1, 1, 256, 3)
G_synthesis/Upscale2D_3         -         (?, 3, 64, 64)       -
G_synthesis/Grow_lod4           -         (?, 3, 64, 64)       -
G_synthesis/128x128/Conv0_up    426496    (?, 128, 128, 128)   (3, 3, 256, 128)
G_synthesis/128x128/Conv1       279040    (?, 128, 128, 128)   (3, 3, 128, 128)
G_synthesis/ToRGB_lod3          387       (?, 3, 128, 128)     (1, 1, 128, 3)
G_synthesis/Upscale2D_4         -         (?, 3, 128, 128)     -
G_synthesis/Grow_lod3           -         (?, 3, 128, 128)     -
G_synthesis/256x256/Conv0_up    139520    (?, 64, 256, 256)    (3, 3, 128, 64)
G_synthesis/256x256/Conv1       102656    (?, 64, 256, 256)    (3, 3, 64, 64)
G_synthesis/ToRGB_lod2          195       (?, 3, 256, 256)     (1, 1, 64, 3)
G_synthesis/Upscale2D_5         -         (?, 3, 256, 256)     -
G_synthesis/Grow_lod2           -         (?, 3, 256, 256)     -
G_synthesis/512x512/Conv0_up    51328     (?, 32, 512, 512)    (3, 3, 64, 32)
G_synthesis/512x512/Conv1       42112     (?, 32, 512, 512)    (3, 3, 32, 32)
G_synthesis/ToRGB_lod1          99        (?, 3, 512, 512)     (1, 1, 32, 3)
G_synthesis/Upscale2D_6         -         (?, 3, 512, 512)     -
G_synthesis/Grow_lod1           -         (?, 3, 512, 512)     -
G_synthesis/1024x1024/Conv0_up  21056     (?, 16, 1024, 1024)  (3, 3, 32, 16)
G_synthesis/1024x1024/Conv1     18752     (?, 16, 1024, 1024)  (3, 3, 16, 16)
G_synthesis/ToRGB_lod0          51        (?, 3, 1024, 1024)   (1, 1, 16, 3)
G_synthesis/Upscale2D_7         -         (?, 3, 1024, 1024)   -
G_synthesis/Grow_lod0           -         (?, 3, 1024, 1024)   -
G_synthesis/images_out          -         (?, 3, 1024, 1024)   -
G_synthesis/lod                 -         ()                   -
G_synthesis/noise0              -         (1, 1, 4, 4)         -
G_synthesis/noise1              -         (1, 1, 4, 4)         -
G_synthesis/noise2              -         (1, 1, 8, 8)         -
G_synthesis/noise3              -         (1, 1, 8, 8)         -
G_synthesis/noise4              -         (1, 1, 16, 16)       -
G_synthesis/noise5              -         (1, 1, 16, 16)       -
G_synthesis/noise6              -         (1, 1, 32, 32)       -
G_synthesis/noise7              -         (1, 1, 32, 32)       -
G_synthesis/noise8              -         (1, 1, 64, 64)       -
G_synthesis/noise9              -         (1, 1, 64, 64)       -
G_synthesis/noise10             -         (1, 1, 128, 128)     -
G_synthesis/noise11             -         (1, 1, 128, 128)     -
G_synthesis/noise12             -         (1, 1, 256, 256)     -
G_synthesis/noise13             -         (1, 1, 256, 256)     -
G_synthesis/noise14             -         (1, 1, 512, 512)     -
G_synthesis/noise15             -         (1, 1, 512, 512)     -
G_synthesis/noise16             -         (1, 1, 1024, 1024)   -
G_synthesis/noise17             -         (1, 1, 1024, 1024)   -
images_out                      -         (?, 3, 1024, 1024)   -
---                             ---       ---                  ---
Total                           26219627

generator with 256X256 ffhq images

G                             Params    OutputShape         WeightShape
---                           ---       ---                 ---
latents_in                    -         (?, 512)            -
labels_in                     -         (?, 0)              -
lod                           -         ()                  -
dlatent_avg                   -         (512,)              -
G_mapping/latents_in          -         (?, 512)            -
G_mapping/labels_in           -         (?, 0)              -
G_mapping/PixelNorm           -         (?, 512)            -
G_mapping/Dense0              262656    (?, 512)            (512, 512)
G_mapping/Dense1              262656    (?, 512)            (512, 512)
G_mapping/Dense2              262656    (?, 512)            (512, 512)
G_mapping/Dense3              262656    (?, 512)            (512, 512)
G_mapping/Dense4              262656    (?, 512)            (512, 512)
G_mapping/Dense5              262656    (?, 512)            (512, 512)
G_mapping/Dense6              262656    (?, 512)            (512, 512)
G_mapping/Dense7              262656    (?, 512)            (512, 512)
G_mapping/Broadcast           -         (?, 14, 512)        -
G_mapping/dlatents_out        -         (?, 14, 512)        -
Truncation                    -         (?, 14, 512)        -
G_synthesis/dlatents_in       -         (?, 14, 512)        -
G_synthesis/4x4/Const         534528    (?, 512, 4, 4)      (512,)
G_synthesis/4x4/Conv          2885632   (?, 512, 4, 4)      (3, 3, 512, 512)
G_synthesis/ToRGB_lod6        1539      (?, 3, 4, 4)        (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up      2885632   (?, 512, 8, 8)      (3, 3, 512, 512)
G_synthesis/8x8/Conv1         2885632   (?, 512, 8, 8)      (3, 3, 512, 512)
G_synthesis/ToRGB_lod5        1539      (?, 3, 8, 8)        (1, 1, 512, 3)
G_synthesis/Upscale2D         -         (?, 3, 8, 8)        -
G_synthesis/Grow_lod5         -         (?, 3, 8, 8)        -
G_synthesis/16x16/Conv0_up    2885632   (?, 512, 16, 16)    (3, 3, 512, 512)
G_synthesis/16x16/Conv1       2885632   (?, 512, 16, 16)    (3, 3, 512, 512)
G_synthesis/ToRGB_lod4        1539      (?, 3, 16, 16)      (1, 1, 512, 3)
G_synthesis/Upscale2D_1       -         (?, 3, 16, 16)      -
G_synthesis/Grow_lod4         -         (?, 3, 16, 16)      -
G_synthesis/32x32/Conv0_up    2885632   (?, 512, 32, 32)    (3, 3, 512, 512)
G_synthesis/32x32/Conv1       2885632   (?, 512, 32, 32)    (3, 3, 512, 512)
G_synthesis/ToRGB_lod3        1539      (?, 3, 32, 32)      (1, 1, 512, 3)
G_synthesis/Upscale2D_2       -         (?, 3, 32, 32)      -
G_synthesis/Grow_lod3         -         (?, 3, 32, 32)      -
G_synthesis/64x64/Conv0_up    1442816   (?, 256, 64, 64)    (3, 3, 512, 256)
G_synthesis/64x64/Conv1       852992    (?, 256, 64, 64)    (3, 3, 256, 256)
G_synthesis/ToRGB_lod2        771       (?, 3, 64, 64)      (1, 1, 256, 3)
G_synthesis/Upscale2D_3       -         (?, 3, 64, 64)      -
G_synthesis/Grow_lod2         -         (?, 3, 64, 64)      -
G_synthesis/128x128/Conv0_up  426496    (?, 128, 128, 128)  (3, 3, 256, 128)
G_synthesis/128x128/Conv1     279040    (?, 128, 128, 128)  (3, 3, 128, 128)
G_synthesis/ToRGB_lod1        387       (?, 3, 128, 128)    (1, 1, 128, 3)
G_synthesis/Upscale2D_4       -         (?, 3, 128, 128)    -
G_synthesis/Grow_lod1         -         (?, 3, 128, 128)    -
G_synthesis/256x256/Conv0_up  139520    (?, 64, 256, 256)   (3, 3, 128, 64)
G_synthesis/256x256/Conv1     102656    (?, 64, 256, 256)   (3, 3, 64, 64)
G_synthesis/ToRGB_lod0        195       (?, 3, 256, 256)    (1, 1, 64, 3)
G_synthesis/Upscale2D_5       -         (?, 3, 256, 256)    -
G_synthesis/Grow_lod0         -         (?, 3, 256, 256)    -
G_synthesis/images_out        -         (?, 3, 256, 256)    -
G_synthesis/lod               -         ()                  -
G_synthesis/noise0            -         (1, 1, 4, 4)        -
G_synthesis/noise1            -         (1, 1, 4, 4)        -
G_synthesis/noise2            -         (1, 1, 8, 8)        -
G_synthesis/noise3            -         (1, 1, 8, 8)        -
G_synthesis/noise4            -         (1, 1, 16, 16)      -
G_synthesis/noise5            -         (1, 1, 16, 16)      -
G_synthesis/noise6            -         (1, 1, 32, 32)      -
G_synthesis/noise7            -         (1, 1, 32, 32)      -
G_synthesis/noise8            -         (1, 1, 64, 64)      -
G_synthesis/noise9            -         (1, 1, 64, 64)      -
G_synthesis/noise10           -         (1, 1, 128, 128)    -
G_synthesis/noise11           -         (1, 1, 128, 128)    -
G_synthesis/noise12           -         (1, 1, 256, 256)    -
G_synthesis/noise13           -         (1, 1, 256, 256)    -
images_out                    -         (?, 3, 256, 256)    -
---                           ---       ---                 ---
Total                         26086229

discriminator with 256X256 ffhq images

D                    Params    OutputShape         WeightShape
---                  ---       ---                 ---
images_in            -         (?, 3, 256, 256)    -
labels_in            -         (?, 0)              -
lod                  -         ()                  -
FromRGB_lod0         256       (?, 64, 256, 256)   (1, 1, 3, 64)
256x256/Conv0        36928     (?, 64, 256, 256)   (3, 3, 64, 64)
256x256/Conv1_down   73856     (?, 128, 128, 128)  (3, 3, 64, 128)
Downscale2D          -         (?, 3, 128, 128)    -
FromRGB_lod1         512       (?, 128, 128, 128)  (1, 1, 3, 128)
Grow_lod0            -         (?, 128, 128, 128)  -
128x128/Conv0        147584    (?, 128, 128, 128)  (3, 3, 128, 128)
128x128/Conv1_down   295168    (?, 256, 64, 64)    (3, 3, 128, 256)
Downscale2D_1        -         (?, 3, 64, 64)      -
FromRGB_lod2         1024      (?, 256, 64, 64)    (1, 1, 3, 256)
Grow_lod1            -         (?, 256, 64, 64)    -
64x64/Conv0          590080    (?, 256, 64, 64)    (3, 3, 256, 256)
64x64/Conv1_down     1180160   (?, 512, 32, 32)    (3, 3, 256, 512)
Downscale2D_2        -         (?, 3, 32, 32)      -
FromRGB_lod3         2048      (?, 512, 32, 32)    (1, 1, 3, 512)
Grow_lod2            -         (?, 512, 32, 32)    -
32x32/Conv0          2359808   (?, 512, 32, 32)    (3, 3, 512, 512)
32x32/Conv1_down     2359808   (?, 512, 16, 16)    (3, 3, 512, 512)
Downscale2D_3        -         (?, 3, 16, 16)      -
FromRGB_lod4         2048      (?, 512, 16, 16)    (1, 1, 3, 512)
Grow_lod3            -         (?, 512, 16, 16)    -
16x16/Conv0          2359808   (?, 512, 16, 16)    (3, 3, 512, 512)
16x16/Conv1_down     2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
Downscale2D_4        -         (?, 3, 8, 8)        -
FromRGB_lod5         2048      (?, 512, 8, 8)      (1, 1, 3, 512)
Grow_lod4            -         (?, 512, 8, 8)      -
8x8/Conv0            2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
8x8/Conv1_down       2359808   (?, 512, 4, 4)      (3, 3, 512, 512)
Downscale2D_5        -         (?, 3, 4, 4)        -
FromRGB_lod6         2048      (?, 512, 4, 4)      (1, 1, 3, 512)
Grow_lod5            -         (?, 512, 4, 4)      -
4x4/MinibatchStddev  -         (?, 513, 4, 4)      -
4x4/Conv             2364416   (?, 512, 4, 4)      (3, 3, 513, 512)
4x4/Dense0           4194816   (?, 512)            (8192, 512)
4x4/Dense1           513       (?, 1)              (512, 1)
scores_out           -         (?, 1)              -
---                  ---       ---                 ---
Total                23052353

note

12/14

處理NVIDIA driver、CUDA環境、cuDNN環境
一開始就發現NVIDA driver的原先設置不對，先將driver完全卸載後再重新安裝。再來發現CUDA和cuDNN也必須重新安裝，安裝過程發現部分linux套件(grub)損毀，再回頭處理套件問題。經過了數個小時才完成環境安裝。

12/15

安裝 tensorflow 套件
經過多次嘗試後發現 tensorflow-gpu 套件必須在 1.15 版本才可以正常運行 code。在官方 github 上也有一些沒提及的套件需要額外安裝才可運行(但我沒記錄下有哪些)
嘗試下載dataset
下載dataset時發現檔案過大，下載前先清理了許多檔案。但是實際下載時還是使用過多空間讓虛擬機系統崩潰，又花了數個小時修復系統。修復完成後加裝了固態硬碟用來存放資料集，但新硬碟上的權限問題還沒解決，需要先處理權限才可以正常存放與嘗試訓練(train.py)

12/16

嘗試執行 train.py
準備好資料集後就開始嘗試訓練模型，途中發現在 /stylegan/training/dataset.py 的有被寫死一些內容(如下 codeblock)。
第 95 行寫到 tfr_lods 應該是一個陣列，這個陣列內容會被設置成目前已存在的 dataset 的
$\log_{2} r e s o l u t i o n$ 。
第 99 行會檢測 tfr_lods 和預設要訓練的各個解析度的 dataset 是否相同，而預設的內容是要存放 4X4 到 1024X1024 的解析度的 dataset。必須把 99 行註解掉才能一次只跑一個 dataset。
後來發現似乎依舊得有每個 dataset 才能正常執行這份程式碼。














...

# Determine shape and resolution.
max_shape = max(tfr_shapes, key=np.prod)
self.resolution = resolution if resolution is not None else max_shape[1]
self.resolution_log2 = int(np.log2(self.resolution))
self.shape = [max_shape[0], self.resolution, self.resolution]
tfr_lods = [self.resolution_log2 - int(np.log2(shape[1])) for shape in tfr_shapes]
assert all(shape[0] == max_shape[0] for shape in tfr_shapes)
assert all(shape[1] == shape[2] for shape in tfr_shapes)
assert all(shape[1] == self.resolution // (2**lod) for shape, lod in zip(tfr_shapes, tfr_lods))
assert all(lod in tfr_lods for lod in range(self.resolution_log2 - 1))

...

12/17

下載全部資料集前必須先調整磁碟大小，調整完後發現資料集被 google drive 因為太多人存取而限制下載。

12/18

今天在繼續嘗試昨天沒辦法下載的512X512和1024X1024資料集。512X512的資料集發現會在下載開始1小時後被強行中斷；1024X1024的資料集會因為google drive的存取限制(太多人存取)而不能下載。
目前發現原因是 403 forbidden，主因就是 google drive 的限制流量導致。但是不能確定實際狀況為何。