# Train mmdetection custom data step by step
## 1. Chuẩn bị file config.py
Chuẩn bị dữ liệu theo cây thư mục:
```
├── images
├──train
├──test
├── annotations
├── train.json (chứa annotation của tập train)
├── test.json (chứa annotation của tập test)
├── config.py
```
>lưu ý: data phải được định dạng format COCO.
Trong đó, file `config.py` chuẩn bị như sau:
```python
# The new config inherits a base config to highlight the necessary modification
_base_ = 'faster_rcnn/faster_rcnn_r50_fpn_2x_coco.py'
# We also need to change the num_classes in head to match the dataset's annotation
model = dict(
roi_head=dict(
bbox_head=dict(num_classes=13)))
# Modify dataset related settings
dataset_type = 'COCODataset'
classes = ('Ignore',
'Pedestrian',
'People',
'Bicycle',
'Car',
'Van',
'Truck',
'Tricycle',
'Awning-tricycle',
'Bus',
'Motor',
'Others')
data = dict(
train=dict(
img_prefix='../data_train/images/train/',
classes=classes,
ann_file='../data_train/annotations/train.json'),
val=dict(
img_prefix='../data_train/images/test/',
classes=classes,
ann_file='../data_train/annotations/test.json'),
test=dict(
img_prefix='../data_train/test/images/',
classes=classes,
ann_file='../data_train/annotations/test.json'))
```
- `_base_` : backbone sử dụng (ở config trên sử dụng faster_rcnn_50_fpn_2x_coco.py). Xem các backbone tại [đây](https://github.com/open-mmlab/mmdetection/tree/master/configs/faster_rcnn)
- `num_classes` : số lượng class của data
- `classes` : tên các class
- `train`, `val`, `test`: dictionary chứa đường dẫn folder, annotation của tập train, val, test.
> Lưu ý: đảm bảo rằng đang ở trong thư mục chứa `config.py`, `annotations`, `images`
> Gõ lệnh `ls` để kiểm tra. Kết quả phải là: `annotations train config.py`
## 2. Chạy command line step by step:
Khởi động container, ở flags `--name` là chỗ để đặt tên. Dưới đây đặt tên là `12_11_2020_custom_data`. `gpus '"device=3"'` là set sử dụng GPU số 3. Cụ thể là khi khởi động docker mmdetection thì nó sẽ mount thư mục thư mục chứa annotations, images và config.py của mình bên ngoài đến thư mục data_train trong docker.
```
docker run -d \
--shm-size 8G \
--gpus '"device=3"' -it --name 12_11_2020_custom_data \
--mount type=bind,source="$(pwd)",target=/mmdetection/data_train \
mmdetection:latest
```
Truy cập vào container:
```
docker exec -ti 12_11_2020_custom_data /bin/bash
```
Copy file `config.py` trong `data_train` qua thư mục `configs`
```
cp data_train/config.py configs
```
Chạy lệnh sau để train:
```
python tools/train.py configs/config.py
```
Khi thành công sẽ báo như dưới đây:
```
2020-12-11 06:57:07,633 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0]
CUDA available: True
GPU 0: GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.2.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.6.0+8138db4
------------------------------------------------------------
2020-12-11 06:57:08,066 - mmdet - INFO - Distributed training: False
2020-12-11 06:57:08,428 - mmdet - INFO - Config:
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=12,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))))
train_cfg = dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_across_levels=False,
nms_pre=2000,
nms_post=1000,
max_num=1000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False))
test_cfg = dict(
rpn=dict(
nms_across_levels=False,
nms_pre=1000,
nms_post=1000,
max_num=1000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100))
dataset_type = 'COCODataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='CocoDataset',
ann_file='./data_train/train/annotations/visdrone19_train.json',
img_prefix='./data_train/train/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
],
classes=('Ignore', 'Pedestrian', 'People', 'Bicycle', 'Car', 'Van',
'Truck', 'Tricycle', 'Awning-tricycle', 'Bus', 'Motor',
'Others')),
val=dict(
type='CocoDataset',
ann_file='./data_train/test/annotations/visdrone19_test.json',
img_prefix='./data_train/test/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
],
classes=('Ignore', 'Pedestrian', 'People', 'Bicycle', 'Car', 'Van',
'Truck', 'Tricycle', 'Awning-tricycle', 'Bus', 'Motor',
'Others')),
test=dict(
type='CocoDataset',
ann_file='./data_train/test/annotations/visdrone19_test.json',
img_prefix='./data_train/test/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
],
classes=('Ignore', 'Pedestrian', 'People', 'Bicycle', 'Car', 'Van',
'Truck', 'Tricycle', 'Awning-tricycle', 'Bus', 'Motor',
'Others')))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11])
total_epochs = 12
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
classes = ('Ignore', 'Pedestrian', 'People', 'Bicycle', 'Car', 'Van', 'Truck',
'Tricycle', 'Awning-tricycle', 'Bus', 'Motor', 'Others')
work_dir = './work_dirs/configs_visdrone19'
gpu_ids = range(0, 1)
2020-12-11 06:57:08,649 - mmdet - INFO - load model from: torchvision://resnet50
2020-12-11 06:57:08,780 - mmdet - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
loading annotations into memory...
Done (t=1.33s)
creating index...
index created!
loading annotations into memory...
Done (t=0.45s)
creating index...
index created!
2020-12-11 06:57:12,294 - mmdet - INFO - Start running, host: root@b2bdfe4da00a, work_dir: /mmdetection/work_dirs/configs_visdrone19
2020-12-11 06:57:12,294 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2020-12-11 06:57:26,710 - mmdet - INFO - Epoch [1][50/3236] lr: 1.978e-03, eta: 3:05:28, time: 0.287, data_time: 0.050, memory: 5142, loss_rpn_cls: 0.6790, loss_rpn_bbox: 0.2509, loss_cls: 0.7795, acc: 84.4277, loss_bbox: 0.0458, loss: 1.7552
```
## 3. Muốn train tiếp tục
Weight sẽ được lưu vào thư mục `work_dirs/config`. Để tiếp tục train từ weight này thực hiện lệnh:
```
python tools/train.py configs/config.py --resume_from work_dirs/config/latest.pth
```
## 4. Test mAP
```
python tools/test.py configs/config.py work_dirs/config/latest.pth --eval bbox --eval-options classwise=True
```
## 5. Xuất output ảnh
Chạy dòng lệnh sau đây thư mục ảnh sẽ lưu vào thư mục `work_dirs`
```
python ./tools/test.py configs/config.py work_dirs/config/latest.pth --eval bbox --eval-options classwise=True --show_dirs images_result
```