# Notes on: [Seg-Map: Segment-based mapping and localization using data-driven descriptors](https://deepai.org/publication/segmap-segment-based-mapping-and-localization-using-data-driven-descriptors) #### The International Journal of Robotics Research, Pages:17, DOI: 10.1177/0278364919863090 #### Authors: Renaud DubeĀ“, Andrei Cramariuc, Daniel Dugas, Hannes Sommer, Marcin Dymczyk1, Juan Nieto, Roland Siegwart and Cesar Cadena #### Notes by - [Aniket Gujarathi](https://www.linkedin.com/in/aniket-gujarathi/) ### Abstract * Autonomous robots require accurate localization and precise estimation of it's pose in a prior, global map. However, this proves to be a challenging task in an unstructured and dynamic environment due to lack of discriminiative local features and descriptors providing coarse information. * Seg-Map is a solution for localization and mapping based on extraction of segments in 3D point clouds. * The paper claims to have a higher localization accuracy, 6% increase in recall over state-of-the-art handcrafted descriptors, reduce the open-loop odometry drift by up to 50%. * Seg-Map uses a single data-driven descripter for performing multiple tasks like - global localization, 3D dense map reconstruction and semantic information extraction. ## Introduction * Knowing the precise pose of a robot is necessary for reliable, robust, and safe operation of autonomous mobile platforms and also allows for multi-agent collaborations. * The problem of mapping and localization has been traditionally been solved using visual cues and cameras for place recognition, but it has many limitations as they struggle during change of seasons, weather, day-night etc. * On the other hand, 3D structture from LiDAR is also used as they are more consistent to the issues mentioned above. * These LiDAR based SLAM techniques are used for local odometry estimation and map trcking but fail to perform global localization without prior information about the pose of the robot. Local features are not descriptive enough and are not repeatable due to changes in the environment. Therefore, cannot consequently match them. * So, Seg-map is presented which suggests a unified approach for map representation in localization and mapping in 3D point clouds. * In Seg-map, the point clouds are partitioned into sets of descriptive segments. * The 3D segments are obtained using region-growing techniques that are able to repeatedly form similar partitions of the point clouds. This partitioning provides the means for compact, yet discriminative features to efficiently represent the environment. * Hand crafted descriptors lack the ability to generalize to new environments. So a data-driven descriptor is introduced which generalizes well to unseen environments. * This paper aims at providing a more general solution rather than an approach which relies on assumptions about the environment. ### Seg-Map Approach There are 5 core modules for localization and mapping in 3D point clouds: segment extraction, description, localization, map recconstruction, #### Segmentation * The stream of point cloud by 3D sensor is first accumulated in a dynamic voxel grid. (In 3D computer graphics, a voxel represents a value on a regular grid in three-dimensional space) * Point cloud segments are then extracted in a section of radius R around the robot. #### Description * Compact features are then extracted from the 3D segments using the data-driven descriptor. * For the latest state of the world, the descriptor associated with the last and most complete observation is kept. #### Localization * Correspondences are then identified between global and local segments using knn in feature space. * The largest subset of candidate correspondences are taken and check for geometric consistency based on the segment centroids. * When a large enough consistent correspondence is obtained, a 6DOF transformation between local and global map is estimated. The transformation is then forwarded to the SLAM solver that estimates the real-time trajectories of the robots. #### Reconstruction * Due to the autoencoder-like descriptor extractor architecture, an approximate map can at any time be constructed. #### Semantics * Semantic information from the discriminator can be used to discern between static and dynamic obstacles to improve the robustness of localization. (example) ### Data-driven descriptor #### Descriptor Extractor Architecture ![](https://i.imgur.com/PtTdJwA.png) #### Segment Alignment and Scaling * Alignment step is applied so that the segments extracted are similarly presented to the descriptor. This is done by a 2D principal component analysis(PCA) for all the points in a segment. * The networks input voxel grid is applied to the segment such that the centre aligns with the centroid of the aligned segment. #### Training the descriptor * $L_c$ is used as the for retrieval loss(classification loss) and $L_r$ for reconstruction loss. * The total loss is defined by a combination of both: $L = L_c + \alpha L_r \\ L_c = -\sum_{i = 1}^{N}y_ilog(\dfrac{e^{l_i}}{\sum_{k=1}^{N}e^{l_i}}) \\ L_r = -\sum_{x,y,z}(\gamma t_{xyz}log(o_{xyz}) + (1 - \gamma)(1 - t_{xyz})log(1 - o_{xyz}))$ ,where t and o represent the target segment and the network's output. * Note that when deploying the system in a new environment the classification layer is removed, as its output is no longer relevant. The activations of the previous fully connected layer are then used as a descriptor for segment retrieval through k-NN. #### Knowledge transfer for semantic extraction * The segments extracted contain information about objects or parts of objects. Therefore it is possible to assign semantic labels to the segments for robust localization. * A simple fully connected network that can be appended to the SegMap descriptor in order to extract semantic information. ![](https://i.imgur.com/xDZm67H.png =300x150) **For the experiments it is better to refer to the actual paper** ### Discussion and Future Work * The pipeline is limited only in observing geometry of surrounding structure as man-made structures are repetitive. Also featureless environments like flat-fields or long corridors are challenging. * In different environments, the two different segmentation algorithms will work differently. The Euclidean distance based one will work better in outdoor environments, while the curvature based is suitable for indoor. * In future, the segmap approach could be extended to different sensor modalities and different point cloud segmentation algorithms. * Work could be done on incremental updates of learning based descriptor. * It could be useful to analyze the usefulness of the segments as a precursory step to localization.