# (7/24)Computer Vision Recent Paper:Joint Monocular 3D Vehicle Detection and Tracking ###### tags:`paper` [toc] --- ## Before Meeting :::success ### Author - Hou-Ning Hu - https://scholar.google.com/citations?user=VNSzxhUAAAAJ&hl=en - ![](https://i.imgur.com/6GQIVbl.png) - Qi-Zhi Cai - https://scholar.google.com/citations?user=oyh-YNwAAAAJ&hl=en - ![](https://i.imgur.com/Ep6qsnW.png) - Min Sun - https://scholar.google.com/citations?user=1Rf6sGcAAAAJ&hl=zh-TW - ![](https://i.imgur.com/8CC4FWm.png) - Dequan Wang - https://scholar.google.com/citations?user=kFvxQ7YAAAAJ&hl=en - ![](https://i.imgur.com/SC2giwq.png) - Ji Lin - https://scholar.google.com/citations?user=dVtzVVAAAAAJ&hl=en - ![](https://i.imgur.com/O4s79UA.png) - Trevor Darrell - https://scholar.google.com/citations?user=bh-uRFMAAAAJ&hl=en - ![](https://i.imgur.com/CV7h5nt.png) - Fisher Yu - https://scholar.google.com/citations?user=-XCiamcAAAAJ&hl=en - ![](https://i.imgur.com/GMTAEfd.png) ::: [refer](https://arxiv.org/pdf/1811.10742.pdf) [refer]() [refer]() --- ## Recent Paper --- ### Joint Monocular 3D Vehicle Detection and Tracking :::success #### Abstracion - 3D vehicle detection and tracking from a monocular camera requires detecting and associating vehicles, and estimating their locations and extents together - Our approach leverages 3D pose estimation to learn 2D patch association overtime and uses temporal information from tracking to obtain stable 3D estimation - Our method also leverages 3D box depth ordering and motion to link together the tracks of occluded objects ::: :::info #### Detail - Introduction - perceive the 3D world in both space and time from simple sequences of 2D images rather than 3D point cloud - ![](https://i.imgur.com/0uFhDxK.png) - Good tracking helps 3D detection, as information along consecutive frames is integrated. Good 3D detection helps tracking, as ego-motion can be factored out. - deep network architecture to track and detect vehicles jointly in 3D from a series of monocular color images - After detecting 2D bounding box of targets, we utilize both world coordinates and re-projected camera coordinates to associate instances cross frames - Related Works - Object tracking - Object detection - Driving datasets - ![](https://i.imgur.com/NhgCpJ0.jpg) - Joint 3D Detection and Tracking - Our goal is to track objects and infer their precise 3D location, orientation, and dimension from a single monocular video stream - We model 3D information with a layer-aggregating network on the object proposals. - We leverage estimated 3D information of current trajectories to track them through time, using 3D re-projection to generate similarity metric between all trajectories and detected boxes - Problem Formulation - ![](https://i.imgur.com/JdfvC0L.png) - convolutional network pipeline trained on very large amount of ground truth supervision - - Candidate Box Detection - Faster R-CNN [35] trained on our dataset to provide bounding boxes of object proposals - 3D box center projection - To estimate 3D layout from single image more accurately, we extends the design of Region Proposal Network (RPN) to hypothesize a projected 2D point from 3D bounding box center - raw images are fed into a deep layeraggregated ConvNet to generate global convolutional feature maps. - 3D Box Estimation - Data Association and Tracking - Occlusion-aware Data Association. - ![](https://i.imgur.com/dSQG2dI.png) - Depth-Ordering Matching - Motion Model - Deep Motion Estimation and Update - 3D Vehicle Tracking Simulation Dataset - Experiments - 3D Estimation - Object Tracking - Overall Evaluation - Implementation - Training - Dataset - Results - 3D for Tracking - Tracking for 3D - Real-world Evaluation - Amount of Data Matters - ::: :::warning #### Conclusion - In this paper, we learn 3D vehicle dynamics from monocular videos. We propose a novel framework, combining spatial visual feature learning and global 3D state estimation,to track moving vehicles in 3D world. ::: [refer]() --- :::success #### Abstracion ::: :::info #### Detail ::: :::warning #### Conclusion ::: [refer]() ---