DCM772 - HackMD

# [Digital Comics Image Indexing Based on Deep Learning](https://pdfs.semanticscholar.org/9a77/84eea6bfa62bf2834ee0b87a3cdda46006f2.pdf) Introduce a new dataset with a global indexing system Comparison of deep learning models for comic book analysis tasks ### Proposed Method : ![](https://hackmd.io/_uploads/rkGmH2Qqh.png) - The model is separated into offline and online tasks and the offline tasks are formulated to capture the comic book elements - For panel,character and face detection YOLOv2 is used with anchor boxes from Faster R-CNN. - The anchor bounding boxes are found using Kmeans on the ground tructh **Speech Balloon Segmentation** - The traditional methods produce several false positives and the deep learning models can not identify the boundaries as well as the traditional methods. - The proposed method combines deep learning and traditional method - The deep learning method used is DeepLabv2 and the detected balloons if they have an IOU above a threshold with detected balloons of traditional method are added to the output set. **Text Recognition** ![](https://hackmd.io/_uploads/r1dkw3X5h.png) - Authors propose a method to utilise unlabed data for text recigntion of certain style. - First a pretrained OCR is used to detect text which is then evaluated with a lexicality metrics and then used to train a second OCR model with the output of the first model as pseudo ground truth. - The pretrained OCR used is FineReader. - The lexicality metric is L = (1 −mean Levenshtein distance per character) **Loss:** - The losses assesed in the paper are perceptual loss and pixel difference. - Perceptual loss is used as pixel difference makes it difficult to reconstruct intra-subject variations of a template. ### Experiments : - The datasets used are eBDtheque, Fahad18 and the new DCM772 dataset. - For character and face detection mAP is taken as the metric, for panel detection the precision and recall is considered using the IOU between ground truth and prediction. - For ballon segmentation segmentation accuracy is used and for text recognition, character and word error rates are used. - For character and face detection deep learning models perform well but fall behind in panel detection. - In balloon segmentation boundaries problem persists for ML methods and the combined method performs a little better. - The text recognition OCR gives high error rated which may be attributed to the text extraction algorithm that cuts lines.