<style>
/* reduce from default 48px: */
.reveal {
font-size: 24px;
text-align: left;
}
.reveal .slides {
text-align: left;
}
/* change from default gray-on-black: */
.hljs {
color: #005;
background: #fff;
}
/* prevent invisible fragments from occupying space: */
.fragment.visible:not(.current-fragment) {
display: none;
height:0px;
line-height: 0px;
font-size: 0px;
}
/* increase font size in diagrams: */
.label {
font-size: 24px;
font-weight: bold;
}
/* increase maximum width of code blocks: */
.reveal pre code {
max-width: 1000px;
max-height: 1000px;
}
/* remove black border from images: */
.reveal section img {
border: 0;
}
.reveal pre.mermaid {
width: 100% !important;
}
.reveal svg {
max-height: 600px;
}
.reveal .scaled-flowchart-td pre.mermaid {
width: 100% !important;
/* why? float: left; */
}
.reveal .scaled-flowchart-td svg {
max-width: 100% !important;
}
.reveal .scaled-flowchart-td svg g.node,
.reveal .scaled-flowchart-td svg g.label,
.reveal .scaled-flowchart-td svg foreignObject {
width: 100% !important;
}
.reveal .scaled-flowchart-td p {
clear:both;
}
.reveal .centered {
text-align: center
}
.reveal .width75 {
max-width: 75%;
}
/* remove black border from images: */
.reveal section img {
border: 0;
box-shadow: none;
}
</style>
# Implementation Project<br>OCR‑D / Kitodo
## Erik Sommer 
## Robert Sachunsky 
## Katya Rykhlinskaya 
_1st OCR-D<sup>III</sup> developer workshop, 29 Nov 2021_
: https://hackmd.io/@bertsky/ocrd-workshop1-kitodo
---
1. Status and Planning
2. Development
3. Kitodo User Survey
---
## 0 Quick Recap
→ We want to bring OCR-D into mass digitization.
- Project goals:
- scalabe and robust OCR-D Web Service
- improved Quality Metrics
- integration with Kitodo ( .Production / .Presentation )
- aligned to the requirements of the Community
---
## 1 Status
- Commenced 01 Oct, staged entry of core team members:
1. Katya Rykhlinskaya (Community Outreach): Oct 2021
2. Robert Sachunsky (DEV OCR-D): Nov 2021
3. Sven Marcus (DEV Web Service): ~Dec 2021
4. Markus Weigelt (DEV Kitodo.Production): ~Jan 2022
5. H. Sidiropoulos (DEV Kitodo.Presentation): ~mid Jan 2022
---
## 1 Planning
- First Phase – independent Tiers:
- Assess User Requirements
- Network Implementation (coordinated)
- Scaling + robustness (coordinated)
- Quality/Error interfaces, Quality Metrics
- Fill technological gaps (GT, OLR, Backlog, New Wrappers)
- Second Phase – additional, dependent tiers:
- UI for Control and Visualization
- Kitodo.Production Module
- Kitodo.Presentation Interfaces
- Experiments on Workflow Optimization
---
## 2 Development
- core issues/pulls
- [repairing typical PAGE defects](https://github.com/OCR-D/core/issues/740)
- [bashlib in/output-file-grp checking](https://github.com/OCR-D/core/pull/743)
- [`//mets:agent/mets:note` for workflow provenance](https://github.com/OCR-D/core/pull/747)
- [`ocrd workspace find --download --wait`](https://github.com/OCR-D/core/pull/747)
- [ocrd-segment-evaluate](https://github.com/OCR-D/ocrd_segment/blob/master/ocrd_segment/evaluate.py) :construction_worker:
- [ocrd_doxa](https://github.com/bertsky/ocrd_doxa) binarization :construction_worker:
- [ocrd_origami](https://github.com/bertsky/ocrd_origami) segmentation :construction_worker:
- [ocrd_wrap](https://github.com/bertsky/ocrd_wrap) with [DeslantImg](https://github.com/githubharald/DeslantImg) (for handwriting)
- experiments on OCR/HTR training :construction_worker:
---
## 3 User survey
### General data
- workshop for Kitodo user community
- open [survey](https://limesurvey.rz.tu-bs.de/index.php/444955) to understand the needs concerning OCR-D integration
- 33 detailed questions regarding:
- type/quantity/quality of data
- technical constraints on software
- Kitodo versions/migration and servicing
- preferences regarding service model
- previous/other OCR experience
- preferences regarding workflow and monitoring UI
- preferences regarding quality vs. speed
- required output formats and interface for manual correction
- demand for OCR-on-demand functions
- 24 participants (so far!)
---
## 3 User survey
### Preliminary results (example)
- questions regarding kind of materials to be digitalized, format and language
→ broad spectrum of materials to process (including newspapers and magazines), fonts and languages

---
## 3 User survey
### Preliminary results (example)
- questions regarding expected characteristics of OCR process
→ need for both options "quality first" and "quantity first"

---
## 3 User survey
### Summary
- survey still active → results not fully analysed yet
- information will guide Kitodo/OCR-D design and implementation
- everyone welcome to [participate](https://limesurvey.rz.tu-bs.de/index.php/444955)
{"metaMigratedAt":"2023-06-16T15:17:34.130Z","metaMigratedFrom":"YAML","title":"Implementation Project OCR-D / Kitodo","breaks":true,"description":"1st OCR-D developer workshop, 29 Nov 2021","slideOptions":"{\"theme\":\"white\",\"slideNumber\":true}","contributors":"[{\"id\":\"76c8705c-2d98-4d35-a8a8-eb9cc1cf5377\",\"add\":936,\"del\":661},{\"id\":\"c62f1b15-791a-47e1-8e4c-ab2ed00c04bc\",\"add\":5309,\"del\":722},{\"id\":\"53b50d9e-fdf9-46cf-94f5-b7bee0fa25a8\",\"add\":4113,\"del\":3312}]"}