<style> /* reduce from default 48px, center: */ .reveal { font-size: 24px; /*text-align: left;*/ } .reveal h1 { text-shadow: none; } } /* remove black border from images: */ .reveal section img { border: 0; box-shadow: none; } </style> # Integration of Kitodo and OCR-D for Productive Mass-Digitisation ## OCR-D Phase 3 Kick-Off #### Robert Sachunsky #### July 30, 2021 --- ## Implementation Project Kitodo / OCR-D * 8 man-years, 2 years, 3 libraries: * Sächsische Landesbibliothek – Staats- und Universitätsbibliothek Dresden * Universitätsbibliothek der TU Braunschweig * Universitätsbibliothek Mannheim * integrate Kitodo with OCR-D "backend" as distributed system * extend both OCR-D and Kitodo for robust mass production --- ## Premises * Kitodo: Workflow Management System for libraries * Open-source, community-driven * Modules: * Kitodo.Production (digitisation workflows) * Kitodo.Presentation (DFG viewer etc.) * OCR: only via commercial plugins (black box, license costs) * OCR-D: operative single-workstation command-line prototype * no network interfaces for distribution/scaling yet * no error recovery and dynamic workflow execution yet * no result quality estimation and runtime evaluation yet * no assisted/automatic workflow configuration yet --- ## Goals 1. Implement OCR-D as Web-based distributed system - controller + processing servers - container virtualisation 2. Develop quality based workflow optimisation for OCR-D - automatic quality estimation of results - dynamic workflows with quality thresholds and switches - predefined, manually optimised workflow configurations 3. Implement OCR-D as OCR module in Kitodo.Production - manage data and run workflows - track and visualise result progress/quality - edit and manage workflow configurations 4. Extend Kitodo.Presentation and DFG Viewer - user evaluation of results, versioning - user prioritisation of OCR tasks (On-Demand OCR)
{"metaMigratedAt":"2023-06-16T05:09:05.038Z","metaMigratedFrom":"YAML","title":"Integration of Kitodo and OCR-D for Productive Mass-Digitisation","breaks":true,"description":"OCR-D Phase 3 Kick-Off (lightning talk)","slideOptions":"{\"theme\":\"beige\"}","contributors":"[{\"id\":\"c62f1b15-791a-47e1-8e4c-ab2ed00c04bc\",\"add\":3613,\"del\":1429}]"}
    333 views