<style> /* reduce from default 48px: */ .reveal { font-size: 24px; text-align: left; } .reveal .slides { text-align: left; } /* change from default gray-on-black: */ .hljs { color: #005; background: #fff; } /* prevent invisible fragments from occupying space: */ .fragment.visible:not(.current-fragment) { display: none; height:0px; line-height: 0px; font-size: 0px; } /* increase font size in diagrams: */ .label { font-size: 24px; font-weight: bold; } /* increase maximum width of code blocks: */ .reveal pre code { max-width: 1000px; max-height: 1000px; } /* remove black border from images: */ .reveal section img { border: 0; } .reveal pre.mermaid { width: 100% !important; } .reveal svg { max-height: 600px; } .reveal .scaled-flowchart-td pre.mermaid { width: 100% !important; /* why? float: left; */ } .reveal .scaled-flowchart-td svg { max-width: 100% !important; } .reveal .scaled-flowchart-td svg g.node, .reveal .scaled-flowchart-td svg g.label, .reveal .scaled-flowchart-td svg foreignObject { width: 100% !important; } .reveal .scaled-flowchart-td p { clear:both; } .reveal .centered { text-align: center } .reveal .width75 { max-width: 75%; } /* remove black border from images: */ .reveal section img { border: 0; box-shadow: none; } </style> # Implementation Project<br>OCR‑D / Kitodo ## Erik Sommer        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;![slub-logo](https://www.slub-dresden.de/typo3conf/ext/slub_template/Resources/Public/Images/slublogo.svg =200x) ## Robert Sachunsky  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;![slub-logo](https://www.slub-dresden.de/typo3conf/ext/slub_template/Resources/Public/Images/slublogo.svg =200x) ## Katya Rykhlinskaya &nbsp;&nbsp;![tub-logo](https://www.tu-braunschweig.de/typo3conf/ext/tu_braunschweig/Resources/Public/Images/Logos/tu_braunschweig_logo.svg) _1st OCR-D<sup>III</sup> developer workshop, 29 Nov 2021_ : https://hackmd.io/@bertsky/ocrd-workshop1-kitodo --- 1. Status and Planning 2. Development 3. Kitodo User Survey --- ## 0 Quick Recap → We want to bring OCR-D into mass digitization. - Project goals: - scalabe and robust OCR-D Web Service - improved Quality Metrics - integration with Kitodo (   .Production    /   .Presentation ) - aligned to the requirements of the Community --- ## 1 Status - Commenced 01 Oct, staged entry of core team members: 1. Katya Rykhlinskaya (Community Outreach): Oct 2021 2. Robert Sachunsky (DEV OCR-D): Nov 2021 3. Sven Marcus (DEV Web Service): ~Dec 2021 4. Markus Weigelt (DEV Kitodo.Production): ~Jan 2022 5. H. Sidiropoulos (DEV Kitodo.Presentation): ~mid Jan 2022 --- ## 1 Planning - First Phase – independent Tiers: - Assess User Requirements - Network Implementation (coordinated) - Scaling + robustness (coordinated) - Quality/Error interfaces, Quality Metrics - Fill technological gaps (GT, OLR, Backlog, New Wrappers) - Second Phase – additional, dependent tiers: - UI for Control and Visualization - Kitodo.Production Module - Kitodo.Presentation Interfaces - Experiments on Workflow Optimization --- ## 2 Development - core issues/pulls - [repairing typical PAGE defects](https://github.com/OCR-D/core/issues/740) - [bashlib in/output-file-grp checking](https://github.com/OCR-D/core/pull/743) - [`//mets:agent/mets:note` for workflow provenance](https://github.com/OCR-D/core/pull/747) - [`ocrd workspace find --download --wait`](https://github.com/OCR-D/core/pull/747) - [ocrd-segment-evaluate](https://github.com/OCR-D/ocrd_segment/blob/master/ocrd_segment/evaluate.py) :construction_worker: - [ocrd_doxa](https://github.com/bertsky/ocrd_doxa) binarization :construction_worker: - [ocrd_origami](https://github.com/bertsky/ocrd_origami) segmentation :construction_worker: - [ocrd_wrap](https://github.com/bertsky/ocrd_wrap) with [DeslantImg](https://github.com/githubharald/DeslantImg) (for handwriting) - experiments on OCR/HTR training :construction_worker: --- ## 3 User survey ### General data - workshop for Kitodo user community - open [survey](https://limesurvey.rz.tu-bs.de/index.php/444955) to understand the needs concerning OCR-D integration - 33 detailed questions regarding: - type/quantity/quality of data - technical constraints on software - Kitodo versions/migration and servicing - preferences regarding service model - previous/other OCR experience - preferences regarding workflow and monitoring UI - preferences regarding quality vs. speed - required output formats and interface for manual correction - demand for OCR-on-demand functions - 24 participants (so far!) --- ## 3 User survey ### Preliminary results (example) - questions regarding kind of materials to be digitalized, format and language → broad spectrum of materials to process (including newspapers and magazines), fonts and languages ![](https://i.imgur.com/3025WBM.png) --- ## 3 User survey ### Preliminary results (example) - questions regarding expected characteristics of OCR process → need for both options "quality first" and "quantity first" ![](https://i.imgur.com/mNGV88x.png) --- ## 3 User survey ### Summary - survey still active → results not fully analysed yet - information will guide Kitodo/OCR-D design and implementation - everyone welcome to [participate](https://limesurvey.rz.tu-bs.de/index.php/444955)
{"metaMigratedAt":"2023-06-16T15:17:34.130Z","metaMigratedFrom":"YAML","title":"Implementation Project OCR-D / Kitodo","breaks":true,"description":"1st OCR-D developer workshop, 29 Nov 2021","slideOptions":"{\"theme\":\"white\",\"slideNumber\":true}","contributors":"[{\"id\":\"76c8705c-2d98-4d35-a8a8-eb9cc1cf5377\",\"add\":936,\"del\":661},{\"id\":\"c62f1b15-791a-47e1-8e4c-ab2ed00c04bc\",\"add\":5309,\"del\":722},{\"id\":\"53b50d9e-fdf9-46cf-94f5-b7bee0fa25a8\",\"add\":4113,\"del\":3312}]"}
    526 views