CV Training Pipeline

###### tags: `cv_infra` `CV team` # CV Training Pipeline ## Annotation format - 貼標工具：labelme - 標籤格式：json - 標籤範例 ```json= { "version": "4.5.9", "flags": {}, "shapes": [ { "label": "acct_id-A123456789", "points": [ [ 100.123456687, 100.354647465 ], [ 100.123456687, 200.354647465 ], [ 200.123456687, 200.354647465 ], [ 200.123456687, 100.354647465 ], ], "group_id": null, "shape_type": "polygon", "flags": {} }, { ... } ], "imagePath": "../Desktop/test.jpg", # 路徑為相對路徑 "imageData": null, # 也可以存 base64string "imageHeight": 634, "imageWidth": 772 } ``` ## 分類模型 - 資料夾結構 ```python /data ├── train │ ├── ID_FRONT │ ├── ID_BACK │ ├── PASSBOOK_COVER │ ├── PASSBOOK_INNER │ ├── NTB_FINANCIAL_STATEMENT │ ├── WITHHOLDING_STATEMENT │ └── OTHERS │ ├── test │ ├── ID_FRONT │ ├── ID_BACK │ ├── PASSBOOK_COVER │ ├── PASSBOOK_INNER │ ├── NTB_FINANCIAL_STATEMENT │ ├── WITHHOLDING_STATEMENT │ └── OTHERS ``` - 切分 train / validation 資料集函式 ```python= import numpy as np import torch from torchvision import datasets, transforms, models from torch.utils.data.sampler import SubsetRandomSampler data_dir = '/data/train' def load_split_train_test(datadir, valid_size=0.2): train_transforms = transforms.Compose([transforms.Resize(224), transforms.ToTensor(), ]) test_transforms = transforms.Compose([transforms.Resize(224), transforms.ToTensor(), ]) train_data = datasets.ImageFolder(datadir, transform=train_transforms) test_data = datasets.ImageFolder(datadir, transform=test_transforms) num_train = len(train_data) indices = list(range(num_train)) split = int(np.floor(valid_size * num_train)) np.random.shuffle(indices) train_idx, test_idx = indices[split:], indices[:split] train_sampler = SubsetRandomSampler(train_idx) test_sampler = SubsetRandomSampler(test_idx) trainloader = torch.utils.data.DataLoader(train_data, sampler=train_sampler, batch_size=64) testloader = torch.utils.data.DataLoader(test_data, sampler=test_sampler, batch_size=64) return trainloader, testloader trainloader, testloader = load_split_train_test(data_dir, 0.2) ``` [參考來源](https://towardsdatascience.com/how-to-train-an-image-classifier-in-pytorch-and-use-it-to-perform-basic-inference-on-single-images-99465a1e9bf5) --- ## 定位模型： ### Input Label (1 個圖檔配 1 個 label 檔 (.json)) ```json= { "shapes": [ { "tag": "acct_id-A123456789", "points": [ [ 100.123456687, 100.354647465 ], [ 100.123456687, 200.354647465 ], [ 200.123456687, 200.354647465 ], [ 200.123456687, 100.354647465 ], ], "group_id": null, # 期望育銓改成可以打英文 "shape_type": "polygon", "flags": {} }, { ... } ], "imagePath": "../Desktop/test.jpg", # 路徑為相對路徑 "imageHeight": 634, "imageWidth": 772 } ``` - 各定位模型自行針對上述 label 檔進行轉換 ```python= # yolov4 (寫在 yolov4 training code 裡) def convert_ano(): """ 1. points轉換xmin, ymin, xmax, ymax (注意int, float問題) 2. tag 取 label.split('-')[0] """ ``` - 從以上json轉成yolo input格式 ![](https://i.imgur.com/4AUDjuh.png) - 從以上json轉成其他detction model要的input格式（segmentation...之類的～～） --- ### Yolov5： 13.9k stars github: https://github.com/ultralytics/yolov5 from Ultralytics (API service) Documention: https://docs.ultralytics.com - 資料夾結構： ![](https://i.imgur.com/85NvgsF.png) - 分train/test - 1 jpg - 1 txt ![](https://i.imgur.com/4AUDjuh.png) https://www.kaggle.com/ultralytics/coco128 - label tools: 1.CVAT: https://github.com/openvinotoolkit/cvat 2.makesense: makesense.ai export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). 1. One row per object 2. Each row is class x_center, y_center, width, height format. 3. Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height. 4. Class numbers are zero-indexed (start from 0). ![](https://i.imgur.com/9EI4U7h.png) --- ## 辨識模型： - label 檔轉換函式 ```python= def gen_ocr_data(input_img, input_json, output_path): """ 拿臨時人力用 labelme 貼完的 label 檔 (.json) 做下面兩件事: 1. 切圖 (.jpg) 2. 產切圖的 label 檔 (.json) 3. output_path/img/*.jpg, output_path/json/*.json """ ``` - json 範例 (1 個圖檔配 1 個 label 檔 (.json)) ```json= # train / evaluation input { "filepath": "/project/cc-apa-ocr/test_crop.jpg", "tag": "acct_id", "label": "A123456789" } ``` ```json= # evaluation output { "filepath": "/project/cc-apa-ocr/test_crop.jpg", "tag": "acct_id", "label": "A123456789", "pred":"xxx", "prob": 0.99 } ``` ```json= # evaluation detection output { "filepath": "/project/cc-apa-ocr/test.jpg", "tag": "acct_id", "xmin":, "ymin":, "xmax":, "ymax":, "pred_xmin":, "pred_ymin":, "pred_xmax":, "pred_ymax":, "prob": 0.99 } ``` | | Paddle | EasyOCR | ChineseOCR | -------- | -------- | -------- | -------- | | star | 13.3k | 12.1k | 2.5k | | format | txt | csv | json | | 資料夾結構 |train/test|train/test|train/test| | | 多jpg 1txt | 多jpg 1csv | 多jpg 1json | ### Paddle(txt): 13.3k stars github: https://github.com/PaddlePaddle/PaddleOCR ![](https://i.imgur.com/rbQSLf9.png) file path, label, 其他模型要的東西(像width, height, label index) ![](https://i.imgur.com/9EjIiyy.jpg) - label tools: https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/data_annotation_en.md 1. labelImg 2. rolabelImg 框斜的框 3. labelme 框多邊形 4. PPOCRLabel: https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.1/PPOCRLabel ### EasyOCR(csv): 12.1k stars github: https://github.com/JaidedAI/EasyOCR Document: https://jaided.ai/easyocr/modelhub/ file path, label ![](https://i.imgur.com/OZACdDq.png) Dataset(https://jaided.ai/easyocr/modelhub/) ### ChineseOCR(json): 2.5k stars github: https://github.com/xiaofengShi/CHINESE-OCR #### Annotation Format: 用dataset github: https://ctwdataset.github.io ![](https://i.imgur.com/oSLgBgA.png) https://ctwdataset.github.io/tutorial/1-basics.html#Download-images-and-annotations --- 討論： 1. 可以先試用多個label tool看哪個好用，和產出會是什麼 - [CVAT](https://github.com/openvinotoolkit/cvat) - [makesense](http://makesense.ai/) - labelImg - rolabelImg - labelme [->labelImg系列](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/data_annotation_en.md) 3. 因為label tools會一次產座標(定位用)+label(辨識用)像我們的labelImg產的xml那樣，需再寫一個轉換的工具將定位、辨識模型各別的annotation data分開 4. 用1jpg-1json(辨識模型)、1jpg-1txt(定位模型)，清楚明瞭也方便更正錯誤的內容 | | Yolo v5 | 定位模型 | | -------- | -------- | -------- | | format | txt | txt | | 資料夾結構| train/test | train/test| ||1jpg 1txt|1jpg 1txt| | | Paddle | EasyOCR | ChineseOCR| 辨識模型| |-------- | -------- | -------- | -------- |-------- | | format | txt | csv | json | json| | 資料夾結構 |train/test|train/test|train/test|train/test| | | 多jpg 1txt | 多jpg 1csv | 多jpg 1json | 1jpg 1json|