# [Read while you drive - multilingual text tracking on the road](https://link.springer.com/chapter/10.1007/978-3-031-06555-2_51) The authors propose an end-to-end tracking model which utilises textual information. They also benchmark the performances of several tracking and detection models. ### Proposed Method : ![](https://hackmd.io/_uploads/By7Ldf69n.png) - The authors extend the RoadText-1K dataset to RoadText-3K dataset with 1000 images from spain and 1000 from India - The proposed model uses Tracking by detection using CentreNet architecture. - CentreNet is an OD model which represents objects as a single point in the bounding box center and regresses width and height at the cente of location. - The centre is represented as Gaussian Kernel on a heatmap - The model uses Mish activation instead of ReLU and adds one additional GRU cell before upsampling ### Experiments : - The datasets used is RoadText-3k - The metrics used is Multiple Object Tracking accuracy(MOTA), MOT precision(MOTP), ID switches and IDF1. - The models evaulated are CTPN,EAST,FOTP and CRAFT - The model is evaluated on frame level text detection and tracking. - The proposed method achieves comparable results to the SOTA methods(FOTS and CRAFT)