---
title: "From Pixels to Text: Evaluating Open-Source OCR Models on Japanese Medical Documents - Bing Wang"
tags: PyConTW2025, 2025-organize, 2025-共筆
---
# From Pixels to Text: Evaluating Open-Source OCR Models on Japanese Medical Documents - Bing Wang
{%hackmd L_RLmFdeSD--CldirtUhCw %}
<iframe src=https://app.sli.do/event/hN7HWrRL23A3UpX8PnJhjF height=450 width=100%></iframe>
:::success
本演講提供 AI 翻譯字幕及摘要,請點選這裡前往 >> [PyCon Taiwan AI Notebook](https://pycontw.connyaku.app/?room=k2zPl6OrL8RGzZEVpoFS)
AI translation subtitles and summaries are available for this talk. Click here to access >> [PyCon Taiwan AI Notebook](https://pycontw.connyaku.app/?room=k2zPl6OrL8RGzZEVpoFS)
:::
> Collaborative writing start from below
> 從這裡開始共筆
## The Engines
### Paddel
- 文字歪斜的情況下成果不好
- 邊緣的文字容易出錯
- 多語言的情況下總是mix up with簡體中文
優點: 迅速, apache2.0 license, Versatile piplines
### Yomitoku
- 手寫或字體較特別時 容易出錯
- logo易出錯
- license 是 c.c. 不給商業用途
優點: Shows text detection confidence rate
在各項評估中 碾壓另兩個opensource ocr engine
### Tesseract
- 有複雜版面設計的文字時 效果不好
- No GPU support 費時較常是最大缺點
優點:
## 驗證方法
CER = S + D + I / N
CER(字元錯誤率)越低 和原始文字越接近
## How the texts are built to compare with ground truth
文字的串接順序 以辨識出的外框中線20pixel範圍內的就接起來
## 一些能讓效果提升的處理
- sharpened = cv2.filter2D(img, -1, kernel)
- contrast = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)
經過上述處理的Image4 都有 Improved Error Rate
Below is the part that speaker updated the talk/tutorial after speech
講者於演講後有更新或勘誤投影片的部份