[CV] Selecting Automatically Pre-Processing Methods to Improve OCR Performances (2017)

###### tags: `Paper` [CV] Selecting Automatically Pre-Processing Methods to Improve OCR Performances (2017) === https://ieeexplore.ieee.org/document/8269967 ### Intro - 希望建立一種自動選取 pre-processing 的 model ### Pre-processing Method - **Binarization** - Otsu - 假設圖片可被分為 2 群 (也是問題所在, 因多數模糊圖片不會有清晰的兩群) - 最佳 threshold 發生在 min intra-class variance 與 max inter-class variance -*Robust Document Image Binarization Technique for Degraded Document Images - 計算每個 pixel 的 window 區間內 max/min Intensity 的 gradient, 切一個 threshold 標記前景/背景, 並產生 contrast map, 使用 contrast map 找到 edge - 實驗表示在模糊圖片上有 robust 的效果 - **Noise Reduction** - Gaussian Filter - 能有效去除 gaussian noise (手機拍照常見的 noise), 但會把細節模糊話 - 2D conv kernel - Non-local Means([link](https://codingnote.cc/zh-tw/p/133612/)) - 把圖片分成小區塊 (window), 區塊 A 若跟 B、C 較相似, D 較不相似, 則使用 Wb, Wc, Wd (Wb=Wc >> Wd)重新做一次加權平均算初新的 A, 透過平均的方式把 noise 拿走, 留下需要的資訊 - 與 gaussion 相比, NLM 保留更多資訊, 圖片更清晰 - **Sharpening** 銳利化可以對抗圖片模糊的問題 - Unsharp Masking - 將原圖經過 gaussion blur 後與原圖比較, 差距較大的地方即為 edge 所在, 但有可能會增加 noise - Local Contrst Enhancement - Adaptive binarization of severely degraded and non-uniformly illuminated documents - 用一個 loacl window threshold 和 global image threshold 來決定某一個 pixel 是否為 text edge(threshold 透過 gradient 計算), 若是, 則增加其對比度 ### OCR - OCR systems - Tesseract 2.0 - 傳統 hand-crafted feature, 對 distortion 很敏感 - 不需要太多訓練資料 - NNOCR - LSTM based + CTC - 需大量訓練資料 - Evaluation - 共三種 receipt dataset 1. 1000 張 blurry 圖片 2. 1000 張 noisy 圖片 3. 1000 張由 blurry, noisy, good 平均組成的圖片 ![](https://i.imgur.com/x3HMa61.png) - 每一個 set 都有先取出 text line 才丟 OCR system - loss 算法 ![](https://i.imgur.com/5if8prh.png) ![](https://i.imgur.com/VsZvtb5.png) - 表現 - **二值化移除了灰階資訊, 讓 NNOCR 做得不好** - **銳化對模糊圖片(set 1)有幫助, 對雜訊圖片(set 2)沒幫助(因為加強雜訊)** - **在綜合圖片(set 3)上沒有一種表現特別好, 甚至傷害表現** ![](https://i.imgur.com/76XrrO4.png) ### CNN pre-processing selection - 選了 20 種 pre-processing 方法 (上述幾種方法的排列組合), 並外加 none class, 共 21 個 class ![](https://i.imgur.com/HChMknB.png) - 將每張 training set 的圖片都通過 21 種處理方法, 並丟入 OCR, 看哪個 class 表現好就當作答案 ![](https://i.imgur.com/xFAJAz2.png) ![](https://i.imgur.com/Em92OUP.png)