# You Only Look At CoefficienTs (YOLACT)安裝及實作
## 前言
```
██╗ ██╗ ██████╗ ██╗ █████╗ ██████╗████████╗
╚██╗ ██╔╝██╔═══██╗██║ ██╔══██╗██╔════╝╚══██╔══╝
╚████╔╝ ██║ ██║██║ ███████║██║ ██║
╚██╔╝ ██║ ██║██║ ██╔══██║██║ ██║
██║ ╚██████╔╝███████╗██║ ██║╚██████╗ ██║
╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═╝
```
A simple, fully convolutional model for real-time instance segmentation.
[YOLACT Github](https://github.com/dbolya/yolact)
## 環境建立
1. 將專案拉下來
```
git clone https://github.com/dbolya/yolact.git
```
2. Python環境
- 建立python
```
python -m venv <環境名稱>
```
- 進入環境
```
venv\Scripts\activate
```
- 更新pip
```
python -m pip install -U pip
```
- 安裝所需套件
1. 基礎套件
```
# Cython needs to be installed before pycocotools
pip install cython
pip install opencv-python pillow pycocotools matplotlib
```
2. 安裝`torch`和`TorchVision`
> Install Pytorch 1.0.1 (or higher) and TorchVision
> 去Pytorch官網查詢指令[連結](https://pytorch.org/get-started/previous-versions/),根據電腦的cuda版本進行安裝
以下為cuda11.7
```
pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117
```
## 2. Windows安裝Python版Labelme
https://hackmd.io/@ytg9xX-YRdGzl4AA7L8t_A/rJQ5mrCbp#方法1-Python版安裝
## 3. 進行驗證
- [權重下載](https://github.com/dbolya/yolact#evaluation)
### 參數:
其餘參數可以查看`eval.py`中的所有使用參數
```
--trained_model: 模型路徑
--image: 圖片路徑 (會直接顯示不儲存結果圖片)
--image: 圖片路徑:儲存路徑 (會直接儲存結果圖片,不顯示)
--images: 圖片資料夾路徑:結果圖片資料夾路徑(不需自己建立)
--score_threshold: 信心值
--top_k: 限制預測點的數量。
-- default: 5
-- use method: Int
--cuda: 是否使用GPU。
-- default: True
-- use method: Boolean
--fast_nms: 是否使用比要快的nms,但nms效果比較差。
-- default: True
-- use method: Boolean
--cross_class_nms: 同類別的nms或不同類別的nms。
-- default: False
-- use method: Boolean
--video_multiframe: 在辨識影片時,設定一次可以處理幾張frame,可以加快fps速度。
-- default: 1
-- use method: Int
```
### 驗證圖片
```python=
# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=my_image.png
# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png
# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder
```
### 驗證影片
```python=
# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
# If you want, use "--display_fps" to draw the FPS directly on the frame.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4
# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0
# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4
```
## 4. 訓練模型
下載預訓練權重
> 使用 imagenet-pretrained model,放在`./weights`
- For Resnet101, download `resnet101_reducedfc.pth`. [連結](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing)
- For Resnet50, download `resnet50-19c8e357.pth`. [連結](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing)
- For Darknet53, download `darknet53.pth`. [連結](https://drive.google.com/file/d/17Y431j4sagFpSReuPNoFcj9h7azDTZFf/view)
All weights are saved in the `./weights` directory by `default` with the file name `<config>_<epoch>_<iter>.pth.`
在訓練途中可以使用`Ctrl+c`停止,會根據最近的迭代圈數生成`*_interrupt.pth`
### 訓練coco的資料 (這段沒測)
```
# Trains using the base config with a batch size of 8 (the default).
python train.py --config=yolact_base_config
# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.
python train.py --config=yolact_base_config --batch_size=5
# Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name.
python train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1
# Use the help option to see a description of all available command line arguments
python train.py --help
```
### 訓練自己的資料
#### 標註資料
- [使用`Labelme`進行標註](#Windows-安裝Labelme)
#### 將所標註的資料集,分成`訓練`及`測試`兩個資料夾,並轉換成`coco`的格式
```
# 資料集格式如下
|---- mrtDataset
| |---- train
| | |---- *.jpg
| | |---- *.json
| |---- valid
| | |---- *.jpg
| | |---- *.json
```
#### 轉換資料
- 建立`label.txt`
> 內容必須要有 `__ignore__`, `_background_`
`label.txt`範例如下:
```
__ignore__
_background_
自己的標籤
.
.
```
- 轉換coco格式
> 使用labelme裡面的`examples/instance_segmentation/labelme2coco.py`進行轉換,轉換指令如下:
```
python labelme2coco.py <圖片及標記檔路徑> <轉換後資料夾路徑> --labels <label.txt路徑>
```
轉換後資料如下:
- JPEGImages 原始圖片
- Visualization 繪製出標記的圖片
- annotations.json 所有的標記

最後資料集架構:
```
|---- mrtDataset
| |---- train
| | |---- JPEGImages
| | |---- Visualization
| | |---- annotations.json
| |---- valid
| | |---- JPEGImages
| | |---- Visualization
| | |---- annotations.json
```
#### 修改 `data/config.py`
> 貼上這份
> 修改資料路徑
```
my_custom_dataset = dataset_base.copy({
'name': 'My Dataset',
'train_images': r'D:\mrt\segmentation\train',
'train_info': r'D:\mrt\segmentation\train\annotations.json',
'valid_images': r'D:\mrt\segmentation\vaild',
'valid_info': r'D:\mrt\segmentation\vaild\annotations.json',
'has_gt': True,
'class_names': ('front_wheel', ),
'label_map': {1:1}
})
```
> 修改YOLACT訓練設定,可以直接照這份改就好
> dataset: 上面所命名的變數
> num_classes: 訓練類別 + 1 (背景)
> max_iter: 訓練次數
> lr_steps: 參考下面改
> backbone: resnet101_backbone (看你要用什麼模型訓練)
```
# ----------------------- YOLACT v1.0 CONFIGS ----------------------- #
yolact_base_config = coco_base_config.copy({
'name': 'yolact_base',
# Dataset stuff
'dataset': my_custom_dataset,
'num_classes': 2, #
# Image Size
'max_size': 550,
# Training params
'lr_steps': (2800, 6000, 7000, 7500),
'max_iter': 8000,
# Backbone Settings
'backbone': resnet101_backbone.copy({
'selected_layers': list(range(1, 4)),
'use_pixel_scales': True,
'preapply_sqrt': False,
'use_square_anchors': True, # This is for backward compatability with a bug
'pred_aspect_ratios': [ [[1, 1/2, 2]] ]*5,
'pred_scales': [[24], [48], [96], [192], [384]],
}),
# FPN Settings
'fpn': fpn_base.copy({
'use_conv_downsample': True,
'num_downsample': 2,
}),
# Mask Settings
'mask_type': mask_type.lincomb,
'mask_alpha': 6.125,
'mask_proto_src': 0,
'mask_proto_net': [(256, 3, {'padding': 1})] * 3 + [(None, -2, {}), (256, 3, {'padding': 1})] + [(32, 1, {})],
'mask_proto_normalize_emulate_roi_pooling': True,
# Other stuff
'share_prediction_module': True,
'extra_head_net': [(256, 3, {'padding': 1})],
'positive_iou_threshold': 0.5,
'negative_iou_threshold': 0.4,
'crowd_iou_threshold': 0.7,
'use_semantic_segmentation_loss': True,
})
```
#### 修改`train.py`
> shuffle=False
```
# 原始
data_loader = data.DataLoader(dataset, args.batch_size,
num_workers=args.num_workers,
shuffle=True, collate_fn=detection_collate,
pin_memory=True)
# 修改
data_loader = data.DataLoader(dataset, args.batch_size,
num_workers=args.num_workers,
shuffle=False, collate_fn=detection_collate,
pin_memory=True)
```
#### 修改`utils\augmentations.py`
```
def __call__(self, image, masks, boxes=None, labels=None):
height, width, _ = image.shape
while True:
# randomly choose a mode
mode = random.choice(self.sample_options)
# 修改成
import random as rdom
def __call__(self, image, masks, boxes=None, labels=None):
height, width, _ = image.shape
while True:
# randomly choose a mode
mode = rdom.choice(self.sample_options)
```
#### 進行訓練
```
python train.py --config=yolact_base_config --save_interval <幾圈存一次模型>
```
## 會遇到的問題區
- 問題1: RuntimeError: Expected a ‘cuda‘ device type for generator but found ‘cpu‘ [解法](#修改trainpy)
- 問題2: ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.[解法](#修改`utils\augmentations.py`)
```
np.random.choice() #只接受1d的data
但被送的資料不是
改用random.choice()
```
## 參考資料
- [轉換參考資料](https://medium.com/ching-i/segmentation-label-%E6%A8%99%E8%A8%BB%E6%95%99%E5%AD%B8-26b8179d661)
- [訓練參考資料](https://medium.com/ching-i/yolact-%E8%A8%93%E7%B7%B4%E6%95%99%E5%AD%B8-31e0062dc1d9)
- [YOLACT Github](https://github.com/dbolya/yolact)