# DETR - treat the problem as a **direct set prediction problem**. - it is pointed out that both self-attention is especially suitable for the constraints of set prediction. - architecture: encode-decoder transformer (with non-autoregressive parallel decoding), set-based global loss (bipartite matching for computing loss, where loss is permutation-invariant). ![](https://i.imgur.com/8CdQSGS.png) ## Related Work - bipartite matching losses for set prediction, encode-decoder architectures based on transformer, parallel decoding, and object detection methods. ## Set Prediction - the most basic form of set prediction would be a multi-class classification task (one vs one, one vs all strategies). - direct set-prediction problem needs global inference schemes that model interactions between all predicted elements to avoid redundancy. - auto-regressive models are the commonly used models. -