# DETR - Transformer architecture for object detection - end-to-end object detection model - consider object detection as a direct set prediction problem --- - **No NMS (non-maximum supression)** - two-stage detectors predict boxes w.r.t proposals - eliminate redundant overlapping bounding boxes. - **No anchors generation** - single-stage detectors predict boxes w.r.t anchors - To provide a set of predefined boxes that the model adjusts to match the ground truth boxes during training --- - NMS: bipartite matching loss, no object - anchor: object queries ![](https://i.imgur.com/pmeBHE5.png) --- # DETR model - object detection set prediction loss - DETR architecture ![](https://i.imgur.com/RGeAe77.png) --- - set prediction loss ![](https://i.imgur.com/31KTHbi.png) --- - $l_1$ loss will have different scales for small and large boxes even if their relative errors are similar - so add generalized IOU loss ![](https://i.imgur.com/jmFEMeL.png) --- - DETR perform well on large objects but still have limitations on small object at that time. ![](https://i.imgur.com/VmwH4PX.png) --- - encoder self-attention ![](https://i.imgur.com/pUbM0EM.png) --- - decoder self-attention ![](https://i.imgur.com/9FCFm8c.png) --- - green : small boxes - red: large horizontal boxes - blue: large vertical boxes ![](https://i.imgur.com/aHMmJQp.png) --- - DETR for panoptic segmentation ![](https://i.imgur.com/5aXHB3v.png) --- - FPN (Feature Pyramid Network) - commonly used with CNN? - MLA (multi-level features aggregation) - commonly used with transformer? --- - architecture of segformer ![](https://i.imgur.com/QwOTocQ.png) --- ---
{"title":"DETR","description":"type: slides","contributors":"[{\"id\":\"4724c2c9-74e7-418c-87c3-79a619b97461\",\"add\":2767,\"del\":1052}]"}
    140 views