<style>
/* reduce from default 48px, center: */
.reveal {
font-size: 24px;
/*text-align: left;*/
}
.reveal h1 {
text-shadow: none;
}
}
/* remove black border from images: */
.reveal section img {
border: 0;
box-shadow: none;
}
</style>
# Integration of Kitodo and OCR-D for Productive Mass-Digitisation
## OCR-D Phase 3 Kick-Off
#### Robert Sachunsky
#### July 30, 2021
---
## Implementation Project Kitodo / OCR-D
* 8 man-years, 2 years, 3 libraries:
* Sächsische Landesbibliothek – Staats- und Universitätsbibliothek Dresden
* Universitätsbibliothek der TU Braunschweig
* Universitätsbibliothek Mannheim
* integrate Kitodo with OCR-D "backend" as distributed system
* extend both OCR-D and Kitodo for robust mass production
---
## Premises
* Kitodo: Workflow Management System for libraries
* Open-source, community-driven
* Modules:
* Kitodo.Production (digitisation workflows)
* Kitodo.Presentation (DFG viewer etc.)
* OCR: only via commercial plugins (black box, license costs)
* OCR-D: operative single-workstation command-line prototype
* no network interfaces for distribution/scaling yet
* no error recovery and dynamic workflow execution yet
* no result quality estimation and runtime evaluation yet
* no assisted/automatic workflow configuration yet
---
## Goals
1. Implement OCR-D as Web-based distributed system
- controller + processing servers
- container virtualisation
2. Develop quality based workflow optimisation for OCR-D
- automatic quality estimation of results
- dynamic workflows with quality thresholds and switches
- predefined, manually optimised workflow configurations
3. Implement OCR-D as OCR module in Kitodo.Production
- manage data and run workflows
- track and visualise result progress/quality
- edit and manage workflow configurations
4. Extend Kitodo.Presentation and DFG Viewer
- user evaluation of results, versioning
- user prioritisation of OCR tasks (On-Demand OCR)
{"metaMigratedAt":"2023-06-16T05:09:05.038Z","metaMigratedFrom":"YAML","title":"Integration of Kitodo and OCR-D for Productive Mass-Digitisation","breaks":true,"description":"OCR-D Phase 3 Kick-Off (lightning talk)","slideOptions":"{\"theme\":\"beige\"}","contributors":"[{\"id\":\"c62f1b15-791a-47e1-8e4c-ab2ed00c04bc\",\"add\":3613,\"del\":1429}]"}