# DETR
- Transformer architecture for object detection
- end-to-end object detection model
- consider object detection as a direct set prediction problem
---
- **No NMS (non-maximum supression)**
- two-stage detectors predict boxes w.r.t proposals
- eliminate redundant overlapping bounding boxes.
- **No anchors generation**
- single-stage detectors predict boxes w.r.t anchors
- To provide a set of predefined boxes that the model adjusts to match the ground truth boxes during training
---
- NMS: bipartite matching loss, no object
- anchor: object queries

---
# DETR model
- object detection set prediction loss
- DETR architecture

---
- set prediction loss

---
- $l_1$ loss will have different scales for small and large boxes even if their relative errors are similar
- so add generalized IOU loss

---
- DETR perform well on large objects but still have limitations on small object at that time.

---
- encoder self-attention

---
- decoder self-attention

---
- green : small boxes
- red: large horizontal boxes
- blue: large vertical boxes

---
- DETR for panoptic segmentation

---
- FPN (Feature Pyramid Network)
- commonly used with CNN?
- MLA (multi-level features aggregation)
- commonly used with transformer?
---
- architecture of segformer

---
---
{"title":"DETR","description":"type: slides","contributors":"[{\"id\":\"4724c2c9-74e7-418c-87c3-79a619b97461\",\"add\":2767,\"del\":1052}]"}