DETR

treat the problem as a direct set prediction problem.
it is pointed out that both self-attention is especially suitable for the constraints of set prediction.
architecture: encode-decoder transformer (with non-autoregressive parallel decoding), set-based global loss (bipartite matching for computing loss, where loss is permutation-invariant).

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

bipartite matching losses for set prediction, encode-decoder architectures based on transformer, parallel decoding, and object detection methods.

Set Prediction

the most basic form of set prediction would be a multi-class classification task (one vs one, one vs all strategies).
direct set-prediction problem needs global inference schemes that model interactions between all predicted elements to avoid redundancy.
auto-regressive models are the commonly used models.