# Narayanan Knowledge Transfer ## Building Object Detection ### Clone ALiVE ```bash= git clone git@bitbucket.org:alive_iiitd/alive-dev.git git checkout trajectory_generation ``` ### Download Libtorch and set environment variables - [Libtorch 1.12 - CUDA 11.6 - (cxx11 ABI)](https://download.pytorch.org/libtorch/cu116/libtorch-cxx11-abi-shared-with-deps-1.12.0%2Bcu116.zip) - Unzip ```bash= unzip libtorch-cxx11-abi-shared-with-deps-1.12.0+cu116.zip ``` - Place in ~ ```bash= mv libtorch ~/ ``` - Set Torch_DIR in .bashrc ```bash= echo 'export Torch_DIR=/home/komal/libtorch' >> ~/.bashrc ``` ### Building object detection - vision_opencv ```bash= git submodule update --init --remote cd src/vision_opencv git apply ../../vision_opencv_melodic.patch cd ../.. catkin init catkin build vision_opencv ``` - object_detection ```bash= export PATH=$HOME/cmake-install/bin:$PATH export CMAKE_PREFIX_PATH=$HOME/cmake-install:$CMAKE_PREFIX_PATH catkin build object_detection ``` - Create models directory ```bash= cd src/object_detection/src mkdir models ``` Download models from [here](https://drive.google.com/drive/folders/1qfFAZnGiz7t5qzPcoDY0M5kAr98ztcaQ?usp=sharing) and place the models in models directory ## Components ![](https://i.imgur.com/iJ3tTeE.png) ### Object Detector ![](https://i.imgur.com/yMEVGzQ.png) #### ImageConverter->imageCb Inputs ``` sensor_msgs::ImageConstPtr& ``` Outputs ``` results - std::vector<std::vector<Detection>> Detection - struct - {cv::Rect bbox, float score, int class_idx} ``` Psuedocode - ``` Preprocess - Resize, Divide by 255 Create tensor Call model.forward Extract the detection from the tuple Postprocess - nms return result ``` ### Tracker ![](https://i.imgur.com/O2k7Kf4.png) #### ROS summary #### image_converter.boxcallback Inputs ``` alive_msgs.msg.DetectedObjectArray ``` Outputs ``` alive_msgs.msg.DetectedObjectArray ``` Pseudocode - ``` if num_objects==0: publish empty message return if last_processed_timestamp == current_message_timestamp: This means that the Bboxes have already been processed in boxcallback_lidar return call update function of sort_tracker create output message and publish ``` #### image_converter.boxcallback_lidar Inputs ``` alive_msgs.msg.DetectedObjectArray PointCloud2 Image ``` Outputs ``` alive_msgs.msg.DetectedObjectArray ``` Pseudocode - ``` if num_objects==0: publish empty message return update_bbox = True if last_processed_detection_timestamp >= current_message_detection_timestamp: This means that the Bboxes have already been processed in boxcallback update_bbox = False Use object_distance_calculation function to retrieve distance of each detected object Call update_with_distance with update_bbox, distance_array detections and timestamps as arguments create output message and publish ``` #### object_distance_calculations Inputs ``` alive_msgs.msg.DetectedObjectArray pointcloud - numpy array image ``` Outputs ``` Array with the depth for each object in input ``` Pseudocode ``` Filter groundplane in poincloud Convert Lidar points to camera frame using T matrix - transforms.lidarToCam3D Convert points in camera frame to camera pixel coordinates using intrinsic matrix K - getPointsOnImage Check if the coordinates are within the image bounds Take mean of all Lidar points mapped within a bounding box to represent the depth of that particular bounding box - create_distance_viz return array of distances ``` #### KalmanBoxTracker - State - [u, v, s, r, d , u_dot, v_dot, s_dot, d_dot] u = x coordinate of center of box v = y coordinate of center of box s = area of box r = aspect ratio of box d = distance (m) u_dot = velocity of the box (x coord) ( pixels/s) v_dot = velocity of the box (y coord) ( pixels/s) s_dot = rate of change of area d_dot = velocity of the object (m/s) - #### update_bbox Inputs ``` bbox - [x1, y1, x2, y2, conf, class] ``` Outputs ``` Updates the state of the Kalman Filter and the associated matrices related to the bounding box ``` - #### update_distance Inputs ``` bbox - [x1, y1, x2, y2, conf, class] distance - distance associated with bounding box ``` Outputs ``` Updates the state of the Kalman Filter and the associated matrices related to distance and velocity ``` - #### update_bbox_and_distance Inputs ``` bbox - [x1, y1, x2, y2, conf, class] distance - distance associated with bounding box ``` Outputs ``` Updates the state of the Kalman Filter and the associated matrices ``` - #### predict functions Inputs ``` Timestamps ``` Outputs ``` Updates the state and associated matrices via predicting till the given timestamp ``` #### associate_detections_to_trackers Inputs ``` detections - array of [x1, y1, x2, y2, conf, class] trackers - array of sort trackers made for each object encountered till now and still alive ``` Outputs ``` indices of matched detections[index detection, index tracker], unmatched detcetions and unmatched trackers ``` Pseudocode - ``` Calculate iou Do linear assignment (scipy.optimize function) and find matched indices return matched detcetions, unmatched detections and unmatched trackers ``` #### Sort Maintains one sort tracker for each uniquely id'd object - #### update and update_distance The only difference is the functions being called and that can be seen from the code Inputs for update ``` dets_timestamp - ROS timestamp dets - array of [x1, x2, y1, y2, conf, class] ``` Inputs for update_distance ``` distances - numpy array of distances dets_timestamp - ROS timestamp lidar_timestamp - ROS timestamp update_bbox - Bool - To see whether bbox update has to be perfomed or if update has already been called with these same detection in which case only distance is updated dets - array of [x1, x2, y1, y2, conf, class] ``` Outputs ``` Array of [x1, y1, x2, y2, distance, id, distance, velocity] ``` Pseudocode - ``` Run prediction step for each existing tracker Call associate_detections_to_trackers to get matched and unmatched detections and trackers For matched detections call the update function Initialize unmatched detections with new trackers Reduce time to live of unmatched trackers and discard trackers whose time to live is 0 return above stated output array ``` ### Obstacle Mapper Used to map output of object detector by calculating the intersection of the bottom of the bounding box with the ground plane. ![](https://i.imgur.com/LntID5x.png) The code runs two asynchronous streams and the only difference between the two streams is the groundplane segmentation. The stream with LiDAR as input will do the groundplane computation whereas the other uses the groundplane equation computed by the LiDAR stream. ## Pointcloud Registration with Lanelet [Link to doc](https://hackmd.io/r9MnF_JlQwajUAZWg-eZDQ) Things left to document - Traffic Light Classification - OSM to Lanelet - Visualization code - Estimator - FCW - Calibration