From Pixels to Text: Evaluating Open-Source OCR Models on Japanese Medical Documents - Bing Wang

--- title: "From Pixels to Text: Evaluating Open-Source OCR Models on Japanese Medical Documents - Bing Wang" tags: PyConTW2025, 2025-organize, 2025-共筆 --- # From Pixels to Text: Evaluating Open-Source OCR Models on Japanese Medical Documents - Bing Wang {%hackmd L_RLmFdeSD--CldirtUhCw %} <iframe src=https://app.sli.do/event/hN7HWrRL23A3UpX8PnJhjF height=450 width=100%></iframe> :::success 本演講提供 AI 翻譯字幕及摘要，請點選這裡前往 >> [PyCon Taiwan AI Notebook](https://pycontw.connyaku.app/?room=k2zPl6OrL8RGzZEVpoFS) AI translation subtitles and summaries are available for this talk. Click here to access >> [PyCon Taiwan AI Notebook](https://pycontw.connyaku.app/?room=k2zPl6OrL8RGzZEVpoFS) ::: > Collaborative writing start from below > 從這裡開始共筆 ## The Engines ### Paddel - 文字歪斜的情況下成果不好 - 邊緣的文字容易出錯 - 多語言的情況下總是mix up with簡體中文優點：迅速, apache2.0 license, Versatile piplines ### Yomitoku - 手寫或字體較特別時容易出錯 - logo易出錯 - license 是 c.c. 不給商業用途優點： Shows text detection confidence rate 在各項評估中碾壓另兩個opensource ocr engine ### Tesseract - 有複雜版面設計的文字時效果不好 - No GPU support 費時較常是最大缺點優點: ## 驗證方法 CER = S + D + I / N CER(字元錯誤率)越低和原始文字越接近 ## How the texts are built to compare with ground truth 文字的串接順序以辨識出的外框中線20pixel範圍內的就接起來 ## 一些能讓效果提升的處理 - sharpened = cv2.filter2D(img, -1, kernel) - contrast = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR) 經過上述處理的Image4 都有 Improved Error Rate Below is the part that speaker updated the talk/tutorial after speech 講者於演講後有更新或勘誤投影片的部份