Hồ Chí Minh, 16-08-2023
Võ Duy Nguyên, Lê Hữu Độ, UIT-Together Research Group

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Mục Lục

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Step 1. Cài đặt môi trường

Step 1.1. Tạo môi trường anaconda

Đặt tên theo cú pháp: Tên viết tắt của họ và chữ lót

VD: Le Huu Do -> Dolh


conda create --name UITTogether python=3.10 -y

Hình ảnh sau khi tạo môi trường

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 1.2. Kích hoạt môi trường vừa tạo


conda activate UITTogether

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 1.3. Cài đặt PyTorch trên GPU platforms


pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Hình ảnh sau khi cài đặt thành công

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 2. Cài đặt detrex và detectron2

Truy cập vào thư mục LuuTru
VD: /home/cvpr2023/LuuTru/


cd LuuTru/

Tạo thư mục tương ứng với tên môi trường bên trên

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →


cd UITTogether/

Step 2.1. Cài đặt detrex

Tại thư mục này thực hiện clone và cài đặt detrex


git clone https://github.com/IDEA-Research/detrex.git
cd detrex

Hình ảnh sau khi clone thành công

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 2.2. Khởi tạo submodule detectron2


git submodule init
git submodule update

Hình ảnh sau khi khởi tạo thành công

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 2.3. Cài đặt detectron2


python -m pip install -e detectron2

Hình ảnh sau khi cài đặt thành công

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 2.4. Build một phiên bản chỉnh sửa được của detrex


pip install -e .

Hình ảnh sau khi build thành công

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 3. Verify the installation

Step 3.1. Tải pretrained model và ảnh demo




# download pretrained DINO model
wget https://github.com/IDEA-Research/detrex-storage/releases/download/v0.2.1/dino_r50_4scale_12ep.pth
# download the demo image
wget https://github.com/IDEA-Research/detrex-storage/releases/download/v0.2.1/idea.jpg

Hình ảnh sau khi tải thành công pretrained model và ảnh demo

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 3.2. Chạy inference pretrained model trên ảnh demo.




python demo/demo.py --config-file ./projects/dino/configs/dino-resnet/dino_r50_4scale_12ep.py \
                    --input "./idea.jpg" \
                    --output "./demo_output.jpg" \
                    --opts train.init_checkpoint="./dino_r50_4scale_12ep.pth"

Kết quả được lưu trong file demo_output.jpg
Vd: /home/cvpr2023/LuuTru/UITTogether/detrex/demo_output.jpg

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 3.3. Chạy đánh giá pretrained model trên bộ dữ liệu COCO 2017.

Bộ dữ liệu COCO 2017 đã được tải về từ trước và lưu ở địa chỉ:
/home/cvpr2023/LuuTru/dataset/coco/

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Câu lệnh thực hiện:






export DETECTRON2_DATASETS=/home/cvpr2023/LuuTru/dataset/

export CUDA_VISIBLE_DEVICES=0,1 python projects/dino/train_net.py \
    --config-file ./projects/dino/configs/dino-resnet/dino_r50_4scale_12ep.py \
    --eval-only train.init_checkpoint="./dino_r50_4scale_12ep.pth"

Kết quả chạy đánh giá sẽ xấp xỉ với các giá trị trong bảng dưới đây:

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Step 4. Train model trên các bộ dữ liệu theo format COCO

Step 4.1. Train trên bộ dữ liệu COCO 2017 đã được tổ chức sẵn theo format COCO và được bộ công cụ chuẩn bị sẵn cấu hình

Câu lệnh thực hiện:


python projects/dino/train_net.py \
    --config-file ./projects/dino/configs/dino-resnet/dino_r50_4scale_12ep.py

Nếu bị lỗi "CUDA out of memory" thì có thể vào file config tại địa chỉ "/projects/dino/configs/dino-resnet/dino_r50_4scale_12ep.py" và sửa biến dataloader.train.total_batch_size thành 1.

Màn hình hiện ra những dòng thông báo như dưới đây tức là đã bắt đầu train được

Trong quá trình train, các file checkpoint sẽ được lưu tại địa chỉ output/dino_r50_4scale_12ep/

Step 4.2. Train trên một bộ dữ liệu mới tùy chọn

Step 4.2.1. Chuẩn bị bộ dữ liệu và tiền xử lý

Trong ví dụ này, chúng ta sẽ chọn bộ dữ liệu VisDrone 2019 để train model mới dùng phương pháp DINO.

Sau khi tải bộ dữ liệu về, bước đầu tiên là phải chuyển đổi các file annotation theo format của COCO. Mỗi bộ dữ liệu có một format annotation ban đầu khác nhau, các bạn cần tham khảo thêm trên mạng để tìm cách chuyển, có thể sử dụng code có sẵn trên github hoặc các tool hỗ trợ như roboflow.

Sau khi đã chuyển đổi bộ dữ liệu theo format của COCO, các bạn tải lên bộ dữ liệu tại địa chỉ: dataset/VisDrone/cocoVisdrone/

Step 4.2.2. Chuẩn bị file config

Đến địa chỉ "projects/dino/configs/dino-resnet/" và tạo một file config có tên "visdrone_dino_r50_4scale_1ep" với nội dung như sau:














































































































































































from detrex.config import get_config
from ..models.dino_r50 import model

# get default config
dataloader = get_config("common/data/coco_detr.py").dataloader
optimizer = get_config("common/optim.py").AdamW
lr_multiplier = get_config("common/coco_schedule.py").lr_multiplier_12ep
train = get_config("common/train.py").train

# modify training config
train.init_checkpoint = "https://github.com/IDEA-Research/detrex-storage/releases/download/v0.1.1/dino_r50_4scale_24ep.pth"
train.output_dir = "./output/visdrone_dino_r50_4scale_1ep"

# max training iterations
train.max_iter = 6500
train.eval_period = 6500
train.log_period = 100
train.checkpointer.period = 3500

# gradient clipping for training
train.clip_grad.enabled = True
train.clip_grad.params.max_norm = 0.1
train.clip_grad.params.norm_type = 2

# set training devices
train.device = "cuda"
model.device = train.device

# modify optimizer config
optimizer.lr = 1e-4
optimizer.betas = (0.9, 0.999)
optimizer.weight_decay = 1e-4
optimizer.params.lr_factor_func = lambda module_name: 0.1 if "backbone" in module_name else 1

# modify dataloader config
dataloader.train.num_workers = 4

# please notice that this is total batch size.
# surpose you're using 4 gpus for training and the batch size for
# each gpu is 16/4 = 4
dataloader.train.total_batch_size = 1

# dump the testing results into output_dir for visualization
dataloader.evaluator.output_dir = train.output_dir


#User change

#import library to change num_class

import itertools

from omegaconf import OmegaConf

import detectron2.data.transforms as T
from detectron2.config import LazyCall as L
from detectron2.data import (
    build_detection_test_loader,
    build_detection_train_loader,
    get_detection_dataset_dicts,
)

#import library to register new dataset

from detectron2.data.datasets import register_coco_instances
from detectron2.evaluation import COCOEvaluator

from detrex.data import DetrDatasetMapper

dataloader = OmegaConf.create()

#register new dataset

register_coco_instances("VisDrone_train", {}, "/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/annotations_cu/train.json", "/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/train/")
register_coco_instances("VisDrone_test", {}, "/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/annotations_cu/test.json", "/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/test/")
register_coco_instances("VisDrone_val", {}, "/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/annotations_cu/val.json", "/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/val/")

dataloader.train = L(build_detection_train_loader)(
    dataset=L(get_detection_dataset_dicts)(names="VisDrone_train"),
    mapper=L(DetrDatasetMapper)(
        augmentation=[
            L(T.RandomFlip)(),
            L(T.ResizeShortestEdge)(
                short_edge_length=(480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800),
                max_size=1333,
                sample_style="choice",
            ),
        ],
        augmentation_with_crop=[
            L(T.RandomFlip)(),
            L(T.ResizeShortestEdge)(
                short_edge_length=(400, 500, 600),
                sample_style="choice",
            ),
            L(T.RandomCrop)(
                crop_type="absolute_range",
                crop_size=(384, 600),
            ),
            L(T.ResizeShortestEdge)(
                short_edge_length=(480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800),
                max_size=1333,
                sample_style="choice",
            ),
        ],
        is_train=True,
        mask_on=False,
        img_format="RGB",
    ),
    total_batch_size=1,
    num_workers=4,
)

dataloader.test = L(build_detection_test_loader)(
    dataset=L(get_detection_dataset_dicts)(names="VisDrone_test", filter_empty=False),
    mapper=L(DetrDatasetMapper)(
        augmentation=[
            L(T.ResizeShortestEdge)(
                short_edge_length=800,
                max_size=1333,
            ),
        ],
        augmentation_with_crop=None,
        is_train=False,
        mask_on=False,
        img_format="RGB",
    ),
    num_workers=4,
)

dataloader.evaluator = L(COCOEvaluator)(
    dataset_name="${..test.dataset.names}",
)

#change model num class

from projects.dino.modeling import (
    DINO,
    DINOTransformerEncoder,
    DINOTransformerDecoder,
    DINOTransformer,
    DINOCriterion,
)
from detrex.modeling.matcher import HungarianMatcher

model.num_classes=12

model.criterion=L(DINOCriterion)(
        num_classes=12,
        matcher=L(HungarianMatcher)(
            cost_class=2.0,
            cost_bbox=5.0,
            cost_giou=2.0,
            cost_class_type="focal_loss_cost",
            alpha=0.25,
            gamma=2.0,
        ),
        weight_dict={
            "loss_class": 1,
            "loss_bbox": 5.0,
            "loss_giou": 2.0,
            "loss_class_dn": 1,
            "loss_bbox_dn": 5.0,
            "loss_giou_dn": 2.0,
        },
        loss_class_type="focal_loss",
        alpha=0.25,
        gamma=2.0,
        two_stage_binary_cls=False,
    )
#Defining classes

from detectron2.data import MetadataCatalog
MetadataCatalog.get("VisDrone").thing_classes = ['ignored regions', 'pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor', 'others']

Các thay đổi so với config mặc định:

Các thông số: train.max_iter, train.eval_period, train.log_period, train.checkpointer.period, model.num_classes, num_classes,…
Tên bộ dữ liệu được thay đổi tương ứng với bộ dữ liệu mà chúng ta chạy thực nghiệm, VD ở đây là VisDrone.
Link init_checkpoint được lấy trên Model Zoo của github detrex.
Sử dụng metadata để định nghĩa các class.

4.2.3 Training với file config đã chuẩn bị.

Thực hiện các dòng lệnh sau để bắt đầu train:




export DETECTRON2_DATASETS=/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/

python projects/dino/train_net.py \
	--config-file ./projects/dino/configs/dino-resnet/visdrone_dino_r50_4scale_1ep.py

Sau khi train xong, dùng file checkpoint model_final.pth được lưu ở địa chỉ "./output/visdrone_dino_r50_4scale_1ep" để chạy đánh giá.



python tools/train_net.py --config-file "/projects/dino/configs/dino-resnet/visdrone_dino_r50_4scale_1ep.py" \
                          --eval-only \
                          train.init_checkpoint="output/visdrone_dino_r50_4scale_1ep/model_final.pth"

4.2.4 Chạy inference model vừa mới train được trên ảnh demo.

Chạy câu lệnh sau:




python demo/demo.py --config-file projects/dino/configs/dino-resnet/visdrone_dino_r50_4scale_1ep.py \
                    --input "/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/train/0000309_00801_d_0000337.jpg" \
                    --output demo_output.jpg \
                    --opts train.init_checkpoint="./output/visdrone_dino_r50_4scale_1ep/model_final.pth"

Tuy nhiên, mặc định demo.py sẽ visualize model của chúng ta theo cách đánh số có nhãn trên tập coco_2017_val.

Sẽ xảy ra hiện tượng category_id trên tập dữ liệu mới nhưng label của tập coco2017.

Để khắc phục hiện tượng trên chúng ta cần cho demo.py biết tên của tập dữ liệu mới đã được đăng ký ở trên bằng cách thêm tham số đầu vào


--metadata_dataset "dataset_name"

Ví dụ chúng ta sử dụng bộ dữ liệu VisDrone thì câu lệnh tương ứng sẽ như sau:





python demo/demo.py --config-file projects/dino/configs/dino-resnet/visdrone_dino_r50_4scale_1ep.py \
                    --input "/home/cvpr2023/LuuTru/dataset/VisDrone/cocoVisdrone/train/0000309_00801_d_0000337.jpg" \
                    --output demo_output.jpg \
                    --metadata_dataset "VisDrone" \
                    --opts train.init_checkpoint="./output/visdrone_dino_r50_4scale_1ep/model_final.pth"

Kết quả ta được output như sau:

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Mục Lục

Step 1. Cài đặt môi trường

Step 1.1. Tạo môi trường anaconda

Step 1.2. Kích hoạt môi trường vừa tạo

Step 1.3. Cài đặt PyTorch trên GPU platforms

Step 2. Cài đặt detrex và detectron2

Step 2.1. Cài đặt detrex

Step 2.2. Khởi tạo submodule detectron2

Step 2.3. Cài đặt detectron2

Step 2.4. Build một phiên bản chỉnh sửa được của detrex

Step 3. Verify the installation

Step 3.1. Tải pretrained model và ảnh demo

Step 3.2. Chạy inference pretrained model trên ảnh demo.

Step 3.3. Chạy đánh giá pretrained model trên bộ dữ liệu COCO 2017.

Step 4. Train model trên các bộ dữ liệu theo format COCO

Step 4.1. Train trên bộ dữ liệu COCO 2017 đã được tổ chức sẵn theo format COCO và được bộ công cụ chuẩn bị sẵn cấu hình

Step 4.2. Train trên một bộ dữ liệu mới tùy chọn

Step 4.2.1. Chuẩn bị bộ dữ liệu và tiền xử lý

Step 4.2.2. Chuẩn bị file config

4.2.3 Training với file config đã chuẩn bị.

4.2.4 Chạy inference model vừa mới train được trên ảnh demo.

Read more

Cài đặt RTMDet

Cài đặt mmdetection

YOLO

Bài toán phát hiện đối tượng