Robuste und performante Verfahren für die Layoutanalyse in OCR-D - HackMD

<style> /* reduce from default 48px: */ .reveal { font-size: 24px; text-align: left; } .reveal .slides { text-align: left; } /* change from default gray-on-black: */ .hljs { color: #005; background: #fff; } /* prevent invisible fragments from occupying space: */ .fragment.visible:not(.current-fragment) { display: none; height:0px; line-height: 0px; font-size: 0px; } /* increase font size in diagrams: */ .label { font-size: 24px; font-weight: bold; } /* increase maximum width of code blocks: */ .reveal pre code { max-width: 1000px; max-height: 1000px; color: green; } /* remove black border from images: */ .reveal section img { border: 0; } .reveal h3 { text-transform: none; } .reveal pre.mermaid { width: 100% !important; } .reveal svg { max-height: 600px; } .reveal .scaled-flowchart-td pre.mermaid { width: 100% !important; /* why? float: left; */ } .reveal .scaled-flowchart-td svg { max-width: 100% !important; } .reveal .scaled-flowchart-td svg g.node, .reveal .scaled-flowchart-td svg g.label, .reveal .scaled-flowchart-td svg foreignObject { width: 100% !important; } .reveal .scaled-flowchart-td p { clear:both; } .reveal .centered { text-align: center } .reveal .width75 { max-width: 75%; } </style> # Robuste und performante Verfahren für die Layoutanalyse in OCR-D – Update zum Treffen der AG OCR am 19.11.2025 _Robert Sachunsky_    ![slub-logo](https://www.slub-dresden.de/typo3conf/ext/slub_template/Resources/Public/Images/slublogo.svg =200x) https://hackmd.io/@bertsky/ocrd-layout-update --- ## Durchgang 1. Zwischenstand Detectron2 2. Zwischenstand Eynollah 3. Stand bei Evaluierung --- ## Detectron2: Experimente mit Mask-RCNN - zunächst nur Regionensegmentierung (Grobklassifikation, Instanzsegmentierung) - GT: DocLayNet, OCR-D - Finetuning und Grundtraining - sehr viele Hyperparameter und Modellvarianten - Evaluierung zunächst nur mit COCO und qualitativ - (noch) keine ausreichende Genauigkeit für gutes Dekoder-Resultat --- ## Detectron2 ![train](https://hackmd.io/_uploads/ry_YzPGQkx.png) ||| | --- | --- | | ![test](https://hackmd.io/_uploads/HyUFUtGmkx.png) | ![test](https://hackmd.io/_uploads/SJRJR_zQJx.png)| --- ## Eynollah - [Version 0.5](https://github.com/qurator-spk/eynollah/releases/tag/v0.5.0) (Layout): - Bugfixing, Refactoring, Testabdeckung - trainierbare Reading Order - Standalone-CLIs: Vereinfachung und Vereinheitlichung - Standalone-CLIs: neu `mbreorder` und `enhancement` - polygonale Zeilen auch ohne `--curved-line` - Marginalien unterscheiden links/rechts - Initialen hängen nicht an Zeile - Seitenbeschnitt polygonal und mit besserem Modell - bessere Zeilentrennung im Light-Modus --- ## Eynollah - [Version 0.5](https://github.com/qurator-spk/eynollah/releases/tag/v0.5.0) (OCR): - eigene CNN-RNN-Modelle (via TF) - eigene TrOCR-Modelle (Pytorch) - OCR auch auf vertikalen oder kurvigen Zeilen - OCR in Standalone-CLI oder als Teil von Layout - Heuristik für Silbentrennung am Zeilenende - ... --- ## Eynollah - [Version 0.6](https://github.com/qurator-spk/eynollah/releases/tag/v0.6.0): - Bugfixing, Refactoring, Testabdeckung, Code Style unter Linter - Performanzoptimierung (np+mp) - valide (und etwas weitere) Polygone - `@type=heading` statt `header` - einfaches nutzergesteuertes Finetuning: `eynollah-training` --- ## Eynollah - aktuell offen: - [Verbesserung der regelbasierten Reading Order](https://github.com/qurator-spk/eynollah/pull/206) - [Performanzoptimierung (jdeskew)](https://github.com/bertsky/eynollah/tree/rebuilt-jdeskew) - [einheitliche, flexible Modellauswahl](https://github.com/qurator-spk/eynollah/pull/207) - OCR-D-Wrapper (Prozessoren) für Einzelschritte: - Reading Order - Bildverbesserung - Seitenbeschnitt - PITA TF 2.12 → CUDA 11.8 / libCUDNN 8.6, Numpy 1.23 … - Modellverbesserungen Reading Order und Tabellen (?) --- ## Layout-Evaluierung - [Diskussion zu Metriken](https://github.com/OCR-D/ocrd_segment/wiki/SegmentationEvaluation)... - [ocrd-segment-evaluate | page-segment-evaluate](https://github.com/OCR-D/ocrd_segment/blob/evaluate-allowable/ocrd_segment/evaluate.py): - effiziente IoU-Berechnung: `pycocotools.cocoeval` - Matching, Metriken, Aggregation: eigener Code, denn - Alignment von pycocotools [inadäquat](https://github.com/cocodataset/cocoapi/issues/564) - n:m statt nur 1:1 - auch FN/FP (bzw. Recall/Precision) - auch Instanz- statt nur Pixel-Metriken - auch Maße für Über-/Untersegmentierung - Micro-averaging, relative Maße - **Allowable** vs. non-allowable Merge / Split nach PRImA - Optionen: Zeilen/Regionen, mit/ohne Klassen, Vordergrund/alles - Ausgabe von 2 PAGE-XMLs (Korrespondenz + Fusion) - [PRImA-Layout-Eval](https://github.com/PRImA-Research-Lab/prima-layout-eval): _partielle_ Quellen, Doku, Zusage von C. Clausner zur Mithilfe