You Only Look At CoefficienTs (YOLACT)安裝及實作

# You Only Look At CoefficienTs (YOLACT)安裝及實作 ## 前言 ``` ██╗ ██╗ ██████╗ ██╗ █████╗ ██████╗████████╗ ╚██╗ ██╔╝██╔═══██╗██║ ██╔══██╗██╔════╝╚══██╔══╝ ╚████╔╝ ██║ ██║██║ ███████║██║ ██║ ╚██╔╝ ██║ ██║██║ ██╔══██║██║ ██║ ██║ ╚██████╔╝███████╗██║ ██║╚██████╗ ██║ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ ``` A simple, fully convolutional model for real-time instance segmentation. [YOLACT Github](https://github.com/dbolya/yolact) ## 環境建立 1. 將專案拉下來 ``` git clone https://github.com/dbolya/yolact.git ``` 2. Python環境 - 建立python ``` python -m venv <環境名稱> ``` - 進入環境 ``` venv\Scripts\activate ``` - 更新pip ``` python -m pip install -U pip ``` - 安裝所需套件 1. 基礎套件 ``` # Cython needs to be installed before pycocotools pip install cython pip install opencv-python pillow pycocotools matplotlib ``` 2. 安裝`torch`和`TorchVision` > Install Pytorch 1.0.1 (or higher) and TorchVision > 去Pytorch官網查詢指令[連結](https://pytorch.org/get-started/previous-versions/)，根據電腦的cuda版本進行安裝以下為cuda11.7 ``` pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117 ``` ## 2. Windows安裝Python版Labelme https://hackmd.io/@ytg9xX-YRdGzl4AA7L8t_A/rJQ5mrCbp#方法1-Python版安裝 ## 3. 進行驗證 - [權重下載](https://github.com/dbolya/yolact#evaluation) ### 參數: 其餘參數可以查看`eval.py`中的所有使用參數 ``` --trained_model: 模型路徑 --image: 圖片路徑 (會直接顯示不儲存結果圖片) --image: 圖片路徑:儲存路徑 (會直接儲存結果圖片，不顯示) --images: 圖片資料夾路徑:結果圖片資料夾路徑(不需自己建立) --score_threshold: 信心值 --top_k: 限制預測點的數量。 -- default: 5 -- use method: Int --cuda: 是否使用GPU。 -- default: True -- use method: Boolean --fast_nms: 是否使用比要快的nms，但nms效果比較差。 -- default: True -- use method: Boolean --cross_class_nms: 同類別的nms或不同類別的nms。 -- default: False -- use method: Boolean --video_multiframe: 在辨識影片時，設定一次可以處理幾張frame，可以加快fps速度。 -- default: 1 -- use method: Int ``` ### 驗證圖片 ```python= # Display qualitative results on the specified image. python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=my_image.png # Process an image and save it to another file. python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png # Process a whole folder of images. python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder ``` ### 驗證影片 ```python= # Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance. # If you want, use "--display_fps" to draw the FPS directly on the frame. python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4 # Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0. python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0 # Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast! python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4 ``` ## 4. 訓練模型下載預訓練權重 > 使用 imagenet-pretrained model，放在`./weights` - For Resnet101, download `resnet101_reducedfc.pth`. [連結](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing) - For Resnet50, download `resnet50-19c8e357.pth`. [連結](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing) - For Darknet53, download `darknet53.pth`. [連結](https://drive.google.com/file/d/17Y431j4sagFpSReuPNoFcj9h7azDTZFf/view) All weights are saved in the `./weights` directory by `default` with the file name `<config>_<epoch>_<iter>.pth.` 在訓練途中可以使用`Ctrl+c`停止，會根據最近的迭代圈數生成`*_interrupt.pth` ### 訓練coco的資料 (這段沒測) ``` # Trains using the base config with a batch size of 8 (the default). python train.py --config=yolact_base_config # Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly. python train.py --config=yolact_base_config --batch_size=5 # Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name. python train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1 # Use the help option to see a description of all available command line arguments python train.py --help ``` ### 訓練自己的資料 #### 標註資料 - [使用`Labelme`進行標註](#Windows-安裝Labelme) #### 將所標註的資料集，分成`訓練`及`測試`兩個資料夾，並轉換成`coco`的格式 ``` # 資料集格式如下 |---- mrtDataset | |---- train | | |---- *.jpg | | |---- *.json | |---- valid | | |---- *.jpg | | |---- *.json ``` #### 轉換資料 - 建立`label.txt` > 內容必須要有 `__ignore__`, `_background_` `label.txt`範例如下: ``` __ignore__ _background_ 自己的標籤 . . ``` - 轉換coco格式 > 使用labelme裡面的`examples/instance_segmentation/labelme2coco.py`進行轉換，轉換指令如下: ``` python labelme2coco.py <圖片及標記檔路徑> <轉換後資料夾路徑> --labels <label.txt路徑> ``` 轉換後資料如下: - JPEGImages 原始圖片 - Visualization 繪製出標記的圖片 - annotations.json 所有的標記 ![](https://hackmd.io/_uploads/H16xJURZ6.png) 最後資料集架構: ``` |---- mrtDataset | |---- train | | |---- JPEGImages | | |---- Visualization | | |---- annotations.json | |---- valid | | |---- JPEGImages | | |---- Visualization | | |---- annotations.json ``` #### 修改 `data/config.py` > 貼上這份 > 修改資料路徑 ``` my_custom_dataset = dataset_base.copy({ 'name': 'My Dataset', 'train_images': r'D:\mrt\segmentation\train', 'train_info': r'D:\mrt\segmentation\train\annotations.json', 'valid_images': r'D:\mrt\segmentation\vaild', 'valid_info': r'D:\mrt\segmentation\vaild\annotations.json', 'has_gt': True, 'class_names': ('front_wheel', ), 'label_map': {1:1} }) ``` > 修改YOLACT訓練設定，可以直接照這份改就好 > dataset: 上面所命名的變數 > num_classes: 訓練類別 + 1 (背景) > max_iter: 訓練次數 > lr_steps: 參考下面改 > backbone: resnet101_backbone (看你要用什麼模型訓練) ``` # ----------------------- YOLACT v1.0 CONFIGS ----------------------- # yolact_base_config = coco_base_config.copy({ 'name': 'yolact_base', # Dataset stuff 'dataset': my_custom_dataset, 'num_classes': 2, # # Image Size 'max_size': 550, # Training params 'lr_steps': (2800, 6000, 7000, 7500), 'max_iter': 8000, # Backbone Settings 'backbone': resnet101_backbone.copy({ 'selected_layers': list(range(1, 4)), 'use_pixel_scales': True, 'preapply_sqrt': False, 'use_square_anchors': True, # This is for backward compatability with a bug 'pred_aspect_ratios': [ [[1, 1/2, 2]] ]*5, 'pred_scales': [[24], [48], [96], [192], [384]], }), # FPN Settings 'fpn': fpn_base.copy({ 'use_conv_downsample': True, 'num_downsample': 2, }), # Mask Settings 'mask_type': mask_type.lincomb, 'mask_alpha': 6.125, 'mask_proto_src': 0, 'mask_proto_net': [(256, 3, {'padding': 1})] * 3 + [(None, -2, {}), (256, 3, {'padding': 1})] + [(32, 1, {})], 'mask_proto_normalize_emulate_roi_pooling': True, # Other stuff 'share_prediction_module': True, 'extra_head_net': [(256, 3, {'padding': 1})], 'positive_iou_threshold': 0.5, 'negative_iou_threshold': 0.4, 'crowd_iou_threshold': 0.7, 'use_semantic_segmentation_loss': True, }) ``` #### 修改`train.py` > shuffle=False ``` # 原始 data_loader = data.DataLoader(dataset, args.batch_size, num_workers=args.num_workers, shuffle=True, collate_fn=detection_collate, pin_memory=True) # 修改 data_loader = data.DataLoader(dataset, args.batch_size, num_workers=args.num_workers, shuffle=False, collate_fn=detection_collate, pin_memory=True) ``` #### 修改`utils\augmentations.py` ``` def __call__(self, image, masks, boxes=None, labels=None): height, width, _ = image.shape while True: # randomly choose a mode mode = random.choice(self.sample_options) # 修改成 import random as rdom def __call__(self, image, masks, boxes=None, labels=None): height, width, _ = image.shape while True: # randomly choose a mode mode = rdom.choice(self.sample_options) ``` #### 進行訓練 ``` python train.py --config=yolact_base_config --save_interval <幾圈存一次模型> ``` ## 會遇到的問題區 - 問題1:　RuntimeError: Expected a ‘cuda‘ device type for generator but found ‘cpu‘ [解法](#修改trainpy) - 問題2: ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.[解法](#修改`utils\augmentations.py`) ``` np.random.choice() #只接受1d的data 但被送的資料不是改用random.choice() ``` ## 參考資料 - [轉換參考資料](https://medium.com/ching-i/segmentation-label-%E6%A8%99%E8%A8%BB%E6%95%99%E5%AD%B8-26b8179d661) - [訓練參考資料](https://medium.com/ching-i/yolact-%E8%A8%93%E7%B7%B4%E6%95%99%E5%AD%B8-31e0062dc1d9) - [YOLACT Github](https://github.com/dbolya/yolact)