Try   HackMD

Week 19: Data Augmentation 大補帖

tags: 技術研討

outlines

  1. 實驗場景簡介
  2. 各個 augmentation 的方法與實驗結果
  3. 比較各個 augmentation 的特色
  4. 針對本次使用場景的小建議

1. 實驗場景簡介

  • 訓練模型:CRNN+CTC
  • 訓練專案別:cc_apa_ocr (信用卡自扣申請書辨識)
  • 訓練集:01.temp_labor + 02.template_matching + 03.yolo
  • 訓練 Epoch 數:100
  • 訓練資料限制:總張數不超過 15 萬張 (原始資料 + augmentation 影像檔)

2. 各個 augmentation 的方法與實驗結果

2.1 實驗結果比較

  • 各方法概述與最後成效
Method 說明 總training張數 augmentaion 張數 Testing Accuracy 實驗者
Baseline 未做 augmentation
epoch:101
72,403 0 0.9372 信賢
image_augmentor - 採用 blur 和 noise
- blur 設定 1.0、2.0
- noise 設定 0.01、0.02
- epoch:115
149,956 77,467 0.9264 信賢
Augmentor - random distoration
- random_brightness
- shear
- epoch:100
149,893 77,490 0.9273 信賢
imgaug 針對每張圖片隨機應用2種方法:
- iaa.Add(-40, 40), per_channel=0.5)
- iaa.CoarseDropout((0.0, 0.02), size_percent=(0.02, 0.25))
- iaa.AdditiveGaussianNoise(scale=0.2*255)
- iaa.Sharpen(alpha=0.5)
iaa.Salt(0.1)
- iaa.JpegCompression(compression=(70, 99))
- iaa.AverageBlur(k=(2, 5))
iaa.Affline(scale=0.8, 1))
- iaa.Affline(translate_px={"x": (-20, 20), "y": (-20, 20)})
- iaa.Affline(rotate=(-5, 5))
144,894 72,774 0.9407 Lili
AutoAugmentation 每張照片隨機從 subpolicy 選取 0 ~ 2 種方式做擴增 143,533 70,731 0.9218 昱睿
Text-Image-Augmentation 將影像至少進行扭曲、伸縮、透視一種的強化 144,806 72,403 0.9465 昊中
RandAugment 可直接使用 imgaug 達成
參數設定:n=2, m=4
144,806 72,403 0.9299 沛筠
  • 各 OCR 子項目成效
Method 04.acct_id 06.bank_acct 05.bank_code 03.card_id 08.post_acct 07.post_number avg
Baseline 0.9413 0.9222 0.9670 0.9209 0.9333 0.9555 0.9372
image_augmentor 0.9230 0.9083 0.9484 0.9242 0.9055 0.9555 0.9264
Augmentor 0.9230 0.9097 0.9599 0.9176 0.9222 0.9388 0.9273
imgaug 0.9286 0.9333 0.9685 0.9286 0.9556 0.9444 0.9407
AutoAugmentation 0.9157 0.9041 0.9556 0.9088 0.9166 0.9500 0.9218
Text-Image-Augmentation 0.9340 0.9417 0.9599 0.9451 0.9444 0.9611 0.9465
RandAugment 0.9212 0.9208 0.9542 0.9243 0.9056 0.9500 0.9299

0. Baseline

  • 訓練過程

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

1. Augmentor

1.1. 介紹 augmentation 使用與實驗場景

Augmentor 的目的是自動化圖像增強(人工數據生成),以便將數據集擴展為機器學習算法的輸入,尤其是神經網絡和深度學習。

  • Elastic Distortions
    No. Input Output
    1
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    2
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Perspective Transforms
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Size Preserving Rotations
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Size Preserving Shearing
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Cropping
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Random Erasing
    Input Output
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

1.2. 這次採用的方法

  • 隨意扭曲(機率為1)
  • 隨意亮暗(機率為0.5)
  • 剪切(機率為0.5)
    均從人力貼標裁切的圖片進行資料擴增,有03~08的類別,共6個資料夾
    每一類產出12915張圖片,共77490張新資料
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
No 生成張數 時間(秒) 速率(張/秒) 方法
1 77490 1740 44.53 distotion brighness shear
2 77490 1549 50.02 distotion brighness shear (with multi-threaded)

1.3 實驗

  • 產生資料
    一開始有加歪斜(機率為0.5),但發現有產生一些噪音,因此重新調整採用的方法
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • 調整後訓練資料
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • 訓練過程

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • 實驗結果
    由於測試資料與訓練資料類似,使用方法太過於花俏會增加噪音。在扭曲的部分,綠底格線亦被扭區,因此下次測試可以將文字單獨做資料擴增並加上綠底背景,會比較像真實資料。

    Method 04.acct_id 06.bank_acct 05.bank_code 03.card_id 08.post_acct 07.post_number avg
    Baseline 0.9413 0.9222 0.9670 0.9209 0.9333 0.9555 0.9372
    Augmentor 0.9230 0.9097 0.9599 0.9176 0.9222 0.9388 0.9273

1.4 使用方法

  • 優點
    • 套件下載
    • 只需要指定要增強圖片所在的路徑即可,不用進行讀入、以及numpy數據轉換
    • 搭配 pipeline 並支援 Multi-threading
    • API高度抽象;容易理解、上手!
    • 可以看進度啊然後去休息喝咖啡
    • 支援產資料給 Generator(Generators for Keras and PyTorch)
      ​​​​# Keras ​​​​g = p.keras_generator(batch_size=128) ​​​​images, labels = next(g) ​​​​# Pytorch ​​​​import torchvision ​​​​transforms = torchvision.transforms.Compose([ ​​​​p.torch_transform(), ​​​​torchvision.transforms.ToTensor(), ​​​​])
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • 安裝
pip install Augmentor
  • 快速上手
import Augmentor # 1. 指定圖片所在目錄 p = Augmentor.Pipeline("./images", output_directory='output') # 2. 增強操作 # 旋轉 概率0.7,向左最大旋轉角度10,向右最大旋轉角度10 p.rotate(probability=0.7,max_left_rotation=10, max_right_rotation=10) # 放大 概率0.3,最小為1.1倍,最大為1.6倍;1不做變換 p.zoom(probability=0.3, min_factor=1.1, max_factor=1.6) # resize 同一尺寸 200 x 200 p.resize(probability=1,height=200,width=200) # 3. 指定增強後圖片數目總量 p.sample(2000)

2. imgaug

2.1. 安裝方式

非常的簡單,一行就結束

pip install imgaug

2.2 data augmentation 的種類

直接看 GitHub 介紹頁

2.3 組合技 (推薦
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • Sequential (一個一個來,都會用到不要急)
import imgaug.augmenters as iaa # two example images images = np.zeros((2, 128, 128, 3), dtype=np.uint8) aug = iaa.Sequential([ iaa.Crop(px=(1, 16), keep_size=False), iaa.Fliplr(0.5), iaa.GaussianBlur(sigma=(0, 3.0)) ]) images_aug = aug(images)
  • OneOf (隨機選一個)
import imgaug.augmenters as iaa # two example images images = np.zeros((2, 128, 128, 3), dtype=np.uint8) aug = iaa.OneOf([ iaa.GaussianBlur((0, 3.0)), # blur images with a sigma between 0 and 3.0 iaa.AverageBlur(k=(2, 7)), # blur image using local means with kernel sizes between 2 and 7 iaa.MedianBlur(k=(3, 11)), # blur image using local medians with kernel sizes between 2 and 7 ]) images_aug = aug(images)
  • SomeOf (隨機選幾個)
import imgaug.augmenters as iaa # two example images images = np.zeros((2, 128, 128, 3), dtype=np.uint8) aug = iaa.SomeOf((0, 2), [ iaa.GaussianBlur((0, 3.0)), # blur images with a sigma between 0 and 3.0 iaa.AverageBlur(k=(2, 7)), # blur image using local means with kernel sizes between 2 and 7 iaa.MedianBlur(k=(3, 11)), # blur image using local medians with kernel sizes between 2 and 7 ]) images_aug = aug(images)

而且可以批次處理!!!

2.4 秒生成 data augmentation 的標籤 (超推薦
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
)

Bounding Box 的作法


Polygon 的作法


but 這次用不到XD

2.5 本次測試用的作法

import imgaug.augmenters as iaa aug = iaa.SomeOf(2, [ iaa.Add((-40, 40), per_channel=0.5) # 調整亮度 (直接針對 pixel 數值做加減), iaa.CoarseDropout((0.0, 0.02), size_percent=(0.02, 0.25)), iaa.AdditiveGaussianNoise(scale=0.2*255), iaa.Sharpen(alpha=0.5), iaa.Salt(0.1), iaa.JpegCompression(compression=(70, 99)), iaa.AverageBlur(k=(2, 5)), iaa.Affline(scale=(0.8, 1)), iaa.Affline(translate_px={"x": (-20, 20), "y": (-20, 20)}), iaa.Affline(rotate=(-5, 5)) ])

2.6 模型成效


Method 03.card_id 04.acct_id 05.bank_code 06.bank_acct 07.post_number 08.post_acct avg
Baseline 0.9209 0.9413 0.9670 0.9222 0.9555 0.9333 0.9372
imgaug 0.9286 0.9286 0.9685 0.9333 0.9444 0.9444 0.9407

3. AutoAugment

3.1 來源

  • 論文連結:AutoAugment: Learning Augmentation Policies from Data
  • github: DeepVoltaire/AutoAugment
  • 簡介:在訓練過程中得到一個參數,讓演算法知道 augmentation 的策略給予模型在 accuracy 上的幫助是多還是少,來決定要採取什麼樣的 augmentation 策略 (RL 的概念)
  • 限制:不過這個 RL 的演算法有點難開發,所以本次只有實驗他的 augmentation 方法

3.2 說明

AutoAugmentation 是由一些基本的 subpolicy 組成,接著每張圖片進來都會被數個 subpolicy 的 augmentation 做轉換,並且選取適當的轉換幅度。所有的 subpolicy 有 (以程式碼表示)

class SubPolicy(object): def __init__( self, p1: float, operation1: str, magnitude_idx1: int, p2: float, operation2: str, magnitude_idx2: int, fillcolor=(128, 128, 128), ): """ Args: p1: 執行第一種策略的機率值 (0 ~ 1 之間) operation1: 第一種策略名稱 (就是下面 ranges 的 keys 之一) magnitude_idx1: 執行策略一的幅度 (整數,選位置) p2: 執行第二種策略的機率值 (0 ~ 1 之間) operation2: 第二種策略名稱 (就是下面 ranges 的 keys 之一) magnitude_idx2: 執行策略二的幅度 (整數,選位置) fillcolor: 填補圖片顏色的 pixel 值 (預設是灰色) """ ranges = { "shearX": np.linspace(0, 0.3, 10), "shearY": np.linspace(0, 0.3, 10), "translateX": np.linspace(0, 150 / 331, 10), "translateY": np.linspace(0, 150 / 331, 10), "rotate": np.linspace(0, 30, 10), "color": np.linspace(0.0, 0.9, 10), "posterize": np.round(np.linspace(8, 4, 10), 0).astype(np.int), "solarize": np.linspace(256, 0, 10), "contrast": np.linspace(0.0, 0.9, 10), "sharpness": np.linspace(0.0, 0.9, 10), "brightness": np.linspace(0.0, 0.9, 10), "autocontrast": [0] * 10, "equalize": [0] * 10, "invert": [0] * 10, } func = { "shearX": ShearX(fillcolor=fillcolor), "shearY": ShearY(fillcolor=fillcolor), "translateX": TranslateX(fillcolor=fillcolor), "translateY": TranslateY(fillcolor=fillcolor), "rotate": Rotate(), "color": Color(), "posterize": Posterize(), "solarize": Solarize(), "contrast": Contrast(), "sharpness": Sharpness(), "brightness": Brightness(), "autocontrast": AutoContrast(), "equalize": Equalize(), "invert": Invert(), } self.p1 = p1 self.operation1 = func[operation1] self.magnitude1 = ranges[operation1][magnitude_idx1] self.p2 = p2 self.operation2 = func[operation2] self.magnitude2 = ranges[operation2][magnitude_idx2] def __call__(self, img): if random.random() < self.p1: img = self.operation1(img, self.magnitude1) if random.random() < self.p2: img = self.operation2(img, self.magnitude2) return img

組合技

  • 論文作者根據不同的實驗場景開發不同的 augmentation 組合技。以下以 ImageNet 為例
  • 每次呼叫 ImageNetPolicy 時,會隨機選取其中一個 policy 執行;而這裡的每個 policy 都是由前述 subpolicy 實體化得出的函數
  • 事實上,他可以被放在 Pytorh 的 transforms 裡面,這樣就會自己在訓練過程中生成
class ImageNetPolicy(object): """Randomly choose one of the best 24 Sub-policies on ImageNet. Example: >>> policy = ImageNetPolicy() >>> transformed = policy(image) Example as a PyTorch Transform: >>> transform = transforms.Compose([ >>> transforms.Resize(256), >>> ImageNetPolicy(), >>> transforms.ToTensor()]) """ def __init__(self, fillcolor=(128, 128, 128)): self.policies = [ SubPolicy(0.4, "posterize", 8, 0.6, "rotate", 9, fillcolor), SubPolicy(0.6, "solarize", 5, 0.6, "autocontrast", 5, fillcolor), SubPolicy(0.8, "equalize", 8, 0.6, "equalize", 3, fillcolor), SubPolicy(0.6, "posterize", 7, 0.6, "posterize", 6, fillcolor), SubPolicy(0.4, "equalize", 7, 0.2, "solarize", 4, fillcolor), SubPolicy(0.4, "equalize", 4, 0.8, "rotate", 8, fillcolor), SubPolicy(0.6, "solarize", 3, 0.6, "equalize", 7, fillcolor), SubPolicy(0.8, "posterize", 5, 1.0, "equalize", 2, fillcolor), SubPolicy(0.2, "rotate", 3, 0.6, "solarize", 8, fillcolor), SubPolicy(0.6, "equalize", 8, 0.4, "posterize", 6, fillcolor), SubPolicy(0.8, "rotate", 8, 0.4, "color", 0, fillcolor), SubPolicy(0.4, "rotate", 9, 0.6, "equalize", 2, fillcolor), SubPolicy(0.0, "equalize", 7, 0.8, "equalize", 8, fillcolor), SubPolicy(0.6, "invert", 4, 1.0, "equalize", 8, fillcolor), SubPolicy(0.6, "color", 4, 1.0, "contrast", 8, fillcolor), SubPolicy(0.8, "rotate", 8, 1.0, "color", 2, fillcolor), SubPolicy(0.8, "color", 8, 0.8, "solarize", 7, fillcolor), SubPolicy(0.4, "sharpness", 7, 0.6, "invert", 8, fillcolor), SubPolicy(0.6, "shearX", 5, 1.0, "equalize", 9, fillcolor), SubPolicy(0.4, "color", 0, 0.6, "equalize", 3, fillcolor), SubPolicy(0.4, "equalize", 7, 0.2, "solarize", 4, fillcolor), SubPolicy(0.6, "solarize", 5, 0.6, "autocontrast", 5, fillcolor), SubPolicy(0.6, "invert", 4, 1.0, "equalize", 8, fillcolor), SubPolicy(0.6, "color", 4, 1.0, "contrast", 8, fillcolor), SubPolicy(0.8, "equalize", 8, 0.6, "equalize", 3, fillcolor), ] def __call__(self, img): policy_idx = random.randint(0, len(self.policies) - 1) return self.policies[policy_idx](img) def __repr__(self): return "AutoAugment ImageNet Policy"

各個 Subpolicy 介紹

1. Shear (碾壓)
  • 固定圖片的一個邊在座標軸上,另一個邊往正向或負向移動
  • X 方向

  • Y 方向

2. Translate (平移)
  • X 方向

  • Y 方向

3. Rotate (旋轉)
  • 這邊要注意的是「角度」對我們應用場景的影響是多少

4. Color
  • 改變 RGB 的平衡
    • 數值 > 0 就會把 RGB 提高
    • 數值 < 0 會把有種把 RGB invert 的感覺 (紅變綠,紫變黃,)

5. Posterize
  • 海報化:會發現 pixel 的顆粒度變粗,經常用於找 contour 的前處理

6. Contrast (對比)
  • 運用調整 RGB 或者 HSL 來使圖片中的各個輪廓更鮮明

7. Sharpeness
  • 針對 pixel 計算 gradient 來評估這個位置要改變的數字。 gradient 越低代表有點模糊,所以不太尖銳 (sharp)。sharp 的目的是讓解析度看起來好一點,比較有分明

8. Brightness
  • 其實就是增加 HSL 的 L 那個維度的值

9. AutoContrast
  • 根據影像中的最高亮度來調整 contrast 的幅度,而非一開始的絕對值

10. Equal
  • 讓圖片的 histgram 變得比較平緩

11. Invert
  • 就是整張圖片的 pixel 變成是 255 - pixel,所以顏色有顛倒的感覺

3.3 實作範例

  • 因為我們的應用場景與論文實驗的資料不太一樣,所以我們是採取組合技中的組合技
  • 有一個要注意的事情,就是因為我們的應用場景是 OCR ,所以我們不能讓圖片失真太多,比如說因為平移旋轉而少了 1, 2 個字,所以這邊其實我們在 magnitude 上有縮小他的幅度,主要縮小的有
  • Shear: 輾壓程度為原預設的 1/10
    • Rotate: 旋轉角度為原本預設的 1/10
    • Translate: 平移單位為原本預設的 1/10
from autoaugment import ImageNetPolicy, CIFAR10Policy, SHVNPolicy image_net_policy = ImageNetPolicy() cifar10_policy = CIFAR10Policy() svhn_policy = SHVNPolicy() option_list = ['image_net', 'cifar10', 'shvn'] augmentation_dict = { 'image_net', image_net_policy, 'cifar10': cifar10_policy, 'svhn': svhn_policy } def get_transformed_image(image: np.ndarray): option = random.sample(option_list, 1)[0] transformed_image = np.array( augmentation_dict[option].__call__(Image.fromarray(image)) ) return transformed_image

3.4 實驗結果

  • 訓練過程的 loss 以及 validation accuracy

  • 最終成效:92.18%
Method 04.acct_id 06.bank_acct 05.bank_code 03.card_id 08.post_acct 07.post_number avg
Baseline 0.9413 0.9222 0.9670 0.9209 0.9333 0.9555 0.9372
AutoAugment 0.9157 0.9041 0.9556 0.9088 0.9166 0.9500 0.9218

3.5 個人小結語

  • AutoAugment 方法的特色在於有個參數可以讓 RL 模型找到最好的 augmentation 策略,只拿來用產資料的工具似乎就很普通了
  • AutoAugment 很多 policy 都是在顏色轉換;但本應用場景的 testing data 與 training data 在顏色與光澤上相當一致。因此用顏色轉換所產出來的資料不一定會有幫助
  • 這個方法在 cifar10 或 imageNet 預測動物上比較有幫助。個人推測是因為
    1. 同一種動物但是如果 training data 與 testing data 的顏色不同,可以使用 AutoAugment 補足
    2. training data 與 testing data 在白天晚上的差別,可以用 Invert 解決
    3. 動物的動作、移動位置可以被 shear, translate, rotate 等方法補足

4. Text-Image-Augmentation

4.1 介紹 augmentation 使用與實驗場景

  • Text-Image-Augmentation 主要是實現 Week 14 所介紹的 augmentation (paper) 結果。
  • 使用方法:
# 一共有三種augmentation效果 from augment import distort, stretch, perspective im = cv2.imread(".png") # 影像切割個數 segment = 4 # 實現影像扭曲 distort_img = distort(im, segment) # 實現影像左右伸縮 stretch_img = stretch(im, segment) # 實現影像透視效果 perspective_img = perspective(im)
  • 使用效果範例:
    • original:
    • distort:
    • stretch:
    • perspective:

4.2 實現方法與結果

  • 將原dataset上的每一張image套用只少一種的augmentation效果
''' 1: 使用效果 0: 不使用效果 [0, 0, 0] 重新random 依[distort(), stretch(), perspective()]順序疊加實現 ''' np.random.randint(2, size=3)
  • Dataset實驗結果

    • original:
    • distort:
    • stretch:
    • perspective:
  • 模型實驗結果

Method 04.acct_id 06.bank_acct 05.bank_code 03.card_id 08.post_acct 07.post_number avg
Baseline 0.9413 0.9222 0.9670 0.9209 0.9333 0.9555 0.9372
Text-Image-Augmentation 0.9340 0.9417 0.9599 0.9451 0.9444 0.9611 0.9465

5. RandAugment [paper]

5.1 簡介

AutoAugment 成效雖好,但它搜尋 augmentation policies 的運算量高,transfer 至其他資料集時也只是次佳解。
例如在 COCO dataset 上,RandAugment 的表現與 AutoAugment 類似,但 search space 卻差距極大。

因此,作者再提出以兩個參數控制 policies(減少計算),並獲得相近結果的 RandAugment。

n:假設機率相同,在 K 種方法內挑選 n 種方法作為 policy(共有

Kn 種組合)
m:強度

5.2 使用與實驗場景

  • imgaug
    - 簡便
    - p=50% 的默認水平翻轉和裁切
    - pytorch 版的延伸
import imgaug.augmenters as iaa aug = iaa.RandAugment(n=2, m=9)
def distort_image_with_randaugment(image, num_layers, magnitude):
  • pytorch
    - 本次使用的程式碼
    - 無 p=50% 的默認水平翻轉和裁切
    - 微調數值範圍,使圖片文字盡量完整
    - 設置 n = 2、m = 4
class RandAugment: def __init__(self, n, m): self.n = n self.m = m # [0, 30] self.augment_list = augment_list() def __call__(self, img): ops = random.choices(self.augment_list, k=self.n) for op, minval, maxval in ops: val = (float(self.m) / 30) * float(maxval - minval) + minval img = op(img, val) return img

5.3 方法

def augment_list(): # 16 oeprations and their ranges l = [ (AutoContrast, 0, 1), (Equalize, 0, 1), (Invert, 0, 1), (Rotate, 0, 10), #(Rotate, 0, 30) (Posterize, 0, 4), (Solarize, 0, 256), (SolarizeAdd, 0, 110), (Color, 0.1, 1.9), (Contrast, 0.1, 1.9), (Brightness, 0.1, 1.9), (Sharpness, 0.1, 1.9), (ShearX, 0., 0.3), (ShearY, 0., 0.3), (CutoutAbs, 0, 40), (TranslateXabs, 0., 50), #(TranslateXabs, 0., 100) (TranslateYabs, 0., 50), #(TranslateYabs, 0., 100) ]

範例

當 n = 2,而 m 分別為 9、17 和 28 的情況:

5.4 實驗結果

本次實驗的 RandAugment(n=2, m=4) 效果

  • AutoContrast + Rotate

  • Rotate + TranslateXabs

  • TranslateXabs + TranslateYabs

model accuracy 和 training loss (epoch 80~)

testing accuracy:0.9299

3. 比較各個 augmentation 的特色

Method 特色 來源 推薦度
Augmentor 1.套件下載
2.只需指定所在的路徑即可
3. 組合技
4.API高度抽象
5.可以看進度條
6. 可以設計每個方法的機率
7. mutli-threaded 產圖片很快
8.開發者github寫得很完整有教學
9.支援直接產資料給generator
link
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
imgaug 1.套件一鍵安裝
2.可以使用各種組合技
3.可以批次處理
4.可以秒產出對應的標籤
aleju/imgaug
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
AutoAugment 1. 組合技都寫好了,import 即可
2. 這個組合技可以直接放到 torchvision.transforms 裏面,變成 RL 的流程之一
3. 但感覺對於分類問題的幫助較大,因為論文也是在分類問題獲得好的成效
4. 裡面有很多是顏色轉換,使用者要根據場景去刪減或調整 policy 的種類或者改變幅度
DeepVoltaire/AutoAugment 1. 我覺得使用方式滿簡單的
2. 不過看起來好像對分類問題的提升比較有幫助
Text-Image-Augmentation 1.只需import即可實現增強功能
2.需要一點前置實驗確保影像效果
3.專門增強文字影像,有良好成效
Text-Image-Augmentation-python 文字影像很容易馬上實現明顯的變化
RandAugment 1. 降低搜尋計算量
2. 或許更適用在影像分類
ildoonet/pytorch-randaugment Text

4. 針對本次使用場景的小建議

  • 資料是 machine learning
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    的根本,但也不是多多益善,還是要根據預測目的以及模型缺點對症下藥
  • 以這個專案的場景
    • 適合的內容
      • 文字變形 (distort, perspective transform)
      • 遮擋 (加一點點黑點之類的)
      • 小角度旋轉
    • 不適用
      • flip (根據 y 軸或者 x 軸翻轉)
      • 過大的 rotation
      • 顏色過大的轉換 (e.g. 黑 -> 白)