# DETR
- treat the problem as a **direct set prediction problem**.
- it is pointed out that both self-attention is especially suitable for the constraints of set prediction.
- architecture: encode-decoder transformer (with non-autoregressive parallel decoding), set-based global loss (bipartite matching for computing loss, where loss is permutation-invariant).

## Related Work
- bipartite matching losses for set prediction, encode-decoder architectures based on transformer, parallel decoding, and object detection methods.
## Set Prediction
- the most basic form of set prediction would be a multi-class classification task (one vs one, one vs all strategies).
- direct set-prediction problem needs global inference schemes that model interactions between all predicted elements to avoid redundancy.
- auto-regressive models are the commonly used models.
-