<style>
/* reduce from default 48px: */
.reveal {
font-size: 24px;
text-align: left;
}
.reveal .slides {
text-align: left;
}
/* change from default gray-on-black: */
.hljs {
color: #005;
background: #fff;
}
/* prevent invisible fragments from occupying space: */
.fragment.visible:not(.current-fragment) {
display: none;
height:0px;
line-height: 0px;
font-size: 0px;
}
/* increase font size in diagrams: */
.label {
font-size: 24px;
font-weight: bold;
}
/* increase maximum width of code blocks: */
.reveal pre code {
max-width: 1000px;
max-height: 1000px;
}
/* remove black border from images: */
.reveal section img {
border: 0;
}
.reveal pre.mermaid {
width: 100% !important;
}
.reveal svg {
max-height: 600px;
}
.reveal .scaled-flowchart-td pre.mermaid {
width: 100% !important;
/* why? float: left; */
}
.reveal .scaled-flowchart-td svg {
max-width: 100% !important;
}
.reveal .scaled-flowchart-td svg g.node,
.reveal .scaled-flowchart-td svg g.label,
.reveal .scaled-flowchart-td svg foreignObject {
width: 100% !important;
}
.reveal .scaled-flowchart-td p {
clear:both;
}
.reveal .centered {
text-align: center
}
.reveal .width75 {
max-width: 75%;
}
/* remove black border from images: */
.reveal section img {
border: 0;
box-shadow: none;
}
</style>
# Implementation Project<br>OCR‑D / Kitodo
  
1. Current Status
2. Next steps
3. Challenges
---
## 1 Current Status
**SLUB Dresden:**
* Dockerized components for development
* integrated pipeline from script task in .Production to OCR-D and back
* **but** very simple, with very basic technologies
(SSH, shared data, hard-coded configurations)
* lots of processor improvements
**UB Mannheim:**
* updated ocrd_all (GH Action for Docker images, fixes and optimizations)
* working on a public DFG Viewer test instance with an OCR-on-demand proof of concept
**UB Braunschweig:**
* testing SLUB components in a live .Production setting
* reporting and processing of Kitodo users survey results
* working paper to gather information for the project
----
## Architecture

----
## Components
* Starting point: https://github.com/markusweigelt/kitodo_production_ocrd
* **Kitodo.Production / Kitodo.Presentation / …**
* any system that wants to use OCR-D processing
* our goal is a flexible integration of systems
* **OCR-D Manager**
* mediator between systems (flexible integration)
* delegates OCR to dedicated processing infrastructure (Controller)
* signaling / reporting
* validation
* extraction (processed images, fulltext, annotations, mets, …)
* -> https://github.com/markusweigelt/ocrd_manager
* **OCR-D Controller**
* web service entry point
* orchestrates processing infrastructure
* -> https://github.com/bertsky/ocrd_controller
----
## Modes of operation
* **Easy:**
* images
* **Standard:**
* images + METS (+ default workflow selection)
* **Expert:**
* images + METS + custom workflow configuration
----
## Integration

----
## Demo
---
## 2 Next Steps
**SLUB Dresden:**
* improve components to achieve more flexibility
* job queue, or delegate to workflow/processing server in Controller
* more preconfigured integration scenarios (e.g. .Presentation)
* options for configuration in .Production
* de-coupling of data shares (explicit transfer)
* workflow selection/editing
* adapt to coming changes in OCR-D (networking, error handling...)
**UB Mannheim:**
* integration of SLUB-components for OCR-D on demand
**UB Braunschweig:**
* survey for users in the Digital Humanities space
* apply current components exemplary on productive instances
---
## 3 Challenges
* must advance CLI/API changes in OCR-D (dynamic workflows, evaluators, error handling, page-parallel processing, networking)
* network implementation for OCR-D
* getting workflow/processing server adopted in core
* roadmap for implementation of full Web API
* workflow engine for OCR-D
* off-the-shelf implementation vs. workflow server in core vs. makefile-based
* managing/monitoring jobs with UI
* fail-over, re-entry and data lifetime
* efficiency, scalability, processing resources
* quality estimation and smart workflows
---
## Thank you!
{"metaMigratedAt":"2023-06-17T00:02:22.271Z","metaMigratedFrom":"YAML","title":"Implementation Project OCR-D / Kitodo","breaks":true,"description":"2nd OCR-D developer workshop, 04 May 2022","slideOptions":"{\"theme\":\"white\",\"slideNumber\":true}","contributors":"[{\"id\":\"76c8705c-2d98-4d35-a8a8-eb9cc1cf5377\",\"add\":7305,\"del\":3125},{\"id\":\"c62f1b15-791a-47e1-8e4c-ab2ed00c04bc\",\"add\":881,\"del\":227},{\"id\":null,\"add\":180,\"del\":4}]"}