###### tags: Paper Reading # Annotating Objects and Relations in User-Generated Videos ## Outline     This paper is talking about building a new benchmark video dataset with dense annotations of objects and relations. ## Introduction/Motivation     Video content analysis is the bridge between machine vision and language understanding. In the past of few years, lots of outstanding paper talking about object detecting were published. The object level analysis to video captioning and question answering task were also achieve great success. As this granularity tends to be finer, letting machine know the relation about objects becomes more important. But now we have a problem, much of related works are often limited by small-scale dataset, and that is why this paper were published.     In this paper, it tell us how the dataset VidOR were build. It provide some approach about annotating raw data. ## Annotation     Annotation were divide into object localization step and relation localization step. ### object localization     In object localization step, finish temporal object annotation first. After, use divide and conquer strategy to address spatial object localization. The detail of this algorithm are written in the **section 3.1.2** of paper. ### relation localization     On the other hand, it split relation localization step into **action relation** and **spatial relation**. In this part, it also provide some trick of annotation strategy to reduce the workload of annotator. To keep the quality of dataset high, it design some rules of reward point to annotator in every step and the detail can also find in **section 3** of paper. ### Conclusion     This paper give some strategy about building dataset for me. All the tricks used in this paper are also stunning me a lot.