OCR 的下一步

# OCR 的下一步 ###### tags: `OCR track` # 2021/9/29 [驗證碼放置區-信賢的 google drive](https://drive.google.com/drive/folders/1rzqRcfxBfkn1Jr8YTO4VlB6J8iPJXNcW) ## 2021/9/22 討論 * 下次開會時間: 9/29 * 下次須完成的進度 * RPA: 收集好要訓練的驗證資料 * image segmentation: 確認能不能從描述物件的多點中，找到關鍵的 6 個點做物件校正 ## 2021/9/15 討論大家的愛帝兒 |no| Name | 主題 | 人力| 時間 | 說明| |-|-------- | -------- | -------- | -------- |-------- | |1| 信賢 | 驗證碼套件 RPACaptcha | 2~3人|1 ~ 2周收集各大網站的驗證碼或是自己生成假資料、1周training看成效 | 估計假資料要100萬張(先喊先贏?)想要做一個RPA常用的驗證碼辨識套件，至少在驗證集希望有0.95 以上。最後成效不錯的話上到pypi? | |2|昱睿|反光偵測模型測試|1 ~ 2 人|收集資料: 1 ~ 1.5 週 模型驗證: 1 ~ 1.5 週 總共約 3 週|想解決影像前處理的問題，希望兼顧 accuracy 高且 inference 快| |3|昱睿|auto-augment 實現在 RL 上|2 ~ 3 人|1. 學 RL (2 ~ 3週) 2. 找範例實現 (2 ~ 3 週)|能夠實現的話應該可以解決模型穩定性的問題| |4|沛筠|PDF 全文辨識|2~3人|1. 收集資料 (1.5 ~ ?週) 2. 訓練模型 (2 ~ 3週) 3. 驗證 (1 ~ 1.5週)|目標為中英文件皆能辨識，對表格 / 列表內容擷取效果好且穩定| |4|昊中|文檔OCR|1~2人|1.蒐集與驗證可用模型(筆電可跑): 1~1.5週 2.文檔影像前處理方法: 2~3週 3.再次驗證: 0.5~1週|找到general的影像前處理方式實現更穩定準確的文檔辨識| |5|立晟|[image segmentation: milesial/Pytorch-UNet](https://github.com/milesial/Pytorch-UNet)|1~2人|1. 測試 pretrained model (1天) 2. 分不同 dataset 數量 train from scratch (1~2週) 3. 確認要多少 dataset 可以有不錯的成效 4-1. 用 segmentation 的座標校正主要物件 (1~2週) 4-2. 請臨時人力貼標 5. 找一個專案來測試|從顧客拍的影像找出主要物件並校正 |||||| ### 結論 * RPA: 驗證碼 OCR * 成員: 信賢、昊中、沛筠 * CV: image segmentation * 成員: 立晟、昱睿 * 下次討論：列出 2 件案子的時程，來決定下一次報告要報的進度與內容。看起來下次是 10/7 報告 ## 2021/9/8 討論 ### 下次待辦準備最有興趣的題目，並且估計 1. 想要達成的成果，如 testing data performance 2. 需要的人力、時間 ### 大家的 idea * 昱睿 * 實作障礙物、反光偵測 * https://github.com/alex04072000/ObstructionRemoval (CVPR-2020) * https://github.com/Vandermode/ERRNet (CVPR-2019) * https://github.com/ceciliavision/perceptual-reflection-removal (CVPR-2018) * RL 在 auto augment 的實作 * pytorch RL code:https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html * 信賢 * 挑戰賽中的 data augmentation：針對特定專案製造假資料 * 人資處：下載各家金控的年報，找到董監事的位置 * 驗證碼 OCR * 昊中 * PDF 檔的影像辨識，類似全文辨識。A4 全文辨識 * 人臉識別 * card-segmentation * 立晟 * image segmentation -> 影像轉換 (可能 4 點或者多點) * template matching for 文件 * 沛筠 * 處理 RPA 的文件問題 ## 小結論 ### 下一次 - [Learning to See Through Obstructions](https://arxiv.org/pdf/2004.01180v1.pdf) - CVPR 2020 年的 paper - code: [alex04072000/ObstructionRemoval](https://github.com/alex04072000/ObstructionRemoval) - youtube:https://www.youtube.com/watch?v=pJWcHhofYTE&ab_channel=Jia-BinHuang ### 下下次 - Instance Segmentation :star: - Pointly-Supervised Instance Segmentation (2021, Facebook AI) - [Paper](https://arxiv.org/pdf/2104.06404) - [Code](https://github.com/facebookresearch/detectron2/tree/master/projects/PointSup) ## 下次方向 - 影像前處理 (paper with code) - 去反光 :star::star::star::star: - [Single Image Reflection Separation with Perceptual Losses](https://paperswithcode.com/paper/single-image-reflection-separation-with) - CVPR 2018 年的 paper - code: [ceciliavision/perceptual-reflection-removal](https://github.com/ceciliavision/perceptual-reflection-removal) - [Learning to See Through Obstructions](https://arxiv.org/pdf/2004.01180v1.pdf) - CVPR 2020 年的 paper - code: [alex04072000/ObstructionRemoval](https://github.com/alex04072000/ObstructionRemoval) - 資料增強 (data augmentation) - [AutoAugment: Learning Augmentation Policies from Data](https://paperswithcode.com/paper/autoaugment-learning-augmentation-policies) - code: [tensorflow/model](https://github.com/tensorflow/models/tree/master/research/autoaugment) -> 很像是已經變成 tensorflow 套件的一部分了 - GAN - [DATA AUGMENTATION USING GENERATIVE ADVERSARIAL NETWORKS (GANS) FOR GAN-BASED DETECTION OF PNEUMONIA AND COVID-19 IN CHEST X-RAY IMAGES](https://arxiv.org/pdf/2006.03622.pdf) - [On Data Augmentation for GAN Training](https://arxiv.org/pdf/2006.05338.pdf) - 論文集: https://paperswithcode.com/task/data-augmentation - [GAN ZOO](https://github.com/hindupuravinash/the-gan-zoo) - 風格轉換 GAN - 用 GAN 加噪或去噪 - [GlyphGAN: Style-Consistent Font Generation Based on Generative Adversarial Networks](https://arxiv.org/pdf/1905.12502.pdf) - [Multi-Content GAN for Few-Shot Font Style Transfer](https://arxiv.org/pdf/1712.00516.pdf) - 去噪 (Denoising) - Unprocessing Images for Learned Raw Denoising (Google Research, CVPR 2019 paper) - [Paper](https://arxiv.org/pdf/1811.11127v1.pdf) - [Code](https://github.com/google-research/google-research/tree/master/unprocessing) - Deep Image Prior (CVPR 2018 paper) - [Paper](https://arxiv.org/pdf/1711.10925v4.pdf) - [Code](https://github.com/DmitryUlyanov/deep-image-prior) - 歪斜轉正 - [(補)很多星星數的Paddle OCR](https://github.com/PaddlePaddle/PaddleOCR) - [上面的paper](https://arxiv.org/pdf/1603.03915v2.pdf) - [文本傾斜校正的兩種方法（Python-OpenCv） ](https://www.twblogs.net/a/5d145f2cbd9eee1e5c8260f4) - [Skew and slant correction for document images using gradient direction](https://www.researchgate.net/publication/3710583_Skew_and_slant_correction_for_document_images_using_gradient_direction) - Instance Segmentation :star: - Pointly-Supervised Instance Segmentation (2021, Facebook AI) - [Paper](https://arxiv.org/pdf/2104.06404) - [Code](https://github.com/facebookresearch/detectron2/tree/master/projects/PointSup) - CV 相關技術： - erosion - dilation - 輪廓偵測：霍夫直線 - [Fourier Contour Embedding for Arbitrary-Shaped Text Detection](https://paperswithcode.com/paper/fourier-contour-embedding-for-arbitrary) - projection - 2020 年的大 papers: - https://www.kdnuggets.com/2021/01/top-10-computer-vision-papers-2020.html ## 大家做的功課 - LILI 1. 圖像處理：去噪、去反光 - Single Image Reflection Removal through Cascaded Refinement (隨便找的一篇論文) 2. 圖像分割：這樣證件感覺可以裁切得更好 - https://github.com/lkeab/BCNet - https://github.com/bowenc0221/boundary-iou-api - Erik - 1. 綜合CTC和attention的優點： ACE：Aggregation Cross-Entropy for Sequence Recognition - 論文：https://arxiv.org/pdf/1904.08364.pdf - code：https://github.com/summerlvsong/Aggregation-Cross-Entropy - 2. 手寫辨識： - 論文：https://arxiv.org/pdf/1505.04925.pdf - code：(應該是ＸＤ)https://github.com/chongyangtao/DeepHCCR - 3. 驗證碼: - 論文(超多):https://arxiv.org/pdf/2006.11373.pdf 因為有很多種類的驗證碼，他都獨立寫成一章 - code:https://github.com/Jimut123/CAPTCHA - Eagle - 技術 - 中文的OCR - 圖像偵測、分割 - 場景 - 文件類型OCR - 文字文句分割(文件類型的行列分割) - 昱睿 - 切割與校正: 我在想要不要針對證件實驗一下 segmentation 然後在梯形校正 - 去反光 - 這邊有好多 paper with codes: https://paperswithcode.com/task/reflection-removal - https://github.com/alex04072000/ObstructionRemoval - 怎麼提升轉正比率？ - 沛筠 - 借鑑(?) - https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-ocr.md