Tesseract - HackMD

# Tesseract ##### History on Tesseract on a high-level - image outlines (advantage for white on black text) (can be achieved with canny edge detector) - Blobs->lines->words(char spacing) - train adaptive classifier? - no need to fix skew ![](https://hackmd.io/_uploads/By1pF9-Ih.png) - fixed pitch ![](https://hackmd.io/_uploads/rk50t5-U2.png) (each character takes up same amount of space) - not fixed pitch ![](https://hackmd.io/_uploads/HJ7jc5WI3.png) - chop joined characters ![](https://hackmd.io/_uploads/B1Gyj5-In.png) ##### Image Preprocessing - layout analysis ("Since HP had independently-developed page layout analysis technology that was used in products, (and therefore not released for open-source) Tesseract never needed its own page layout analysis. Tesseract therefore assumes that its input is a binary image with optional polygonal text regions defined.") - Increase contrasts - Convert to Binary (black and white, not grayscale) - camera calibration - unskew image (open cv) - well aligned characters and - as less pixel noise as possible. - DPI of 300 works best for this purpose - All of this has to be done with minimal latency -> meaning tradeoff - DNN's do feature extraction for you. #### Examples - [tesseract.js on mp4 example](https://github.com/jeromewu/tesseract.js-video)