# DHNB 2022 Tutorial Proposal
_Please note that this pad can be read by everyone with the link. Add personal information at your own risk!_
## Content
1. [Conference Information](#Conference-Information)
2. [New Workshop Description/Abstract](#New-Workshop-Abstract)
3. [Notes from the Kick-Off Meeting](#Kick-Off-Meeting)
4. [Notes from Meeting 2](#Meeting-2)
_2021-11-29 (RH): removed some old entries from the table of contents, they can still be found at the [bottom of this document](#Schedule-Draft-To-be-discussed)_
## Conference Information
- DHNB: Digital Humanities in the Nordic and Baltic Countries Conference
- [DHNB22 website](http://dhnb.eu/conferences/dhnb2022/)
- 15th - 18th March 2022
- **15th March**: pre-conference workshops/tutorials
- hybrid format, **workshops/tutorials fully virtual**
---
## New Workshop Abstract
### Introduction to Text and Image Analysis Using Python
Python has become a popular programming language in areas such as text and image analysis. The increase in popularity has been accompanied by the development of Jupyter notebooks. These browser-based environments allow users to combine blocks of code with text and images, increasing the accessibility of traceability of Python scripts.
In this tutorial, we will leverage notebooks to give learners an introduction to a number of text and image analysis approaches that are of interest to the humanities. The full-day, online event will begin with an in-depth introduction to the Jupyter notebook interface. Afterwards, a number of topics, such as document similarity, and clustering of handwritten characters, will be demonstrated and discussed using separate notebooks.
Overall, this tutorial aims to provide learners with the tools to work with existing notebooks, reading and understanding their contents and possibly adapting them to their own data and research questions. We expect no prior knowledge in Python and we will not try to cover the Python programming language in-depth. The tutorial is primarily aimed at students and researchers from the humanities but learners from all kinds of backgrounds and levels of prior knowledge are welcome.
More information and registration details are available on [https://raphaelaheil.github.io/2022-03-15-dhnb/](https://raphaelaheil.github.io/2022-03-15-dhnb/)
---
## Kick-Off Meeting
### Agenda
_This is a bit ambitious but hopefully we at least manage to cover the first four points._
1. Introductions
2. Review of Schedule Draft
- Q: at what hour do we start/finish?
4. Instructor arrangements - who, which part(s)?
5. Helper arrangements - how many, who?
6. Other organisational aspects:
- website
- registration
- setup instructions
- possibility of a setup help session beforehand?
- code of conduct
7. Next steps (public announcement, ...?)
8. Closing
### Notes
- Kristoffer can provider Jupyter webinterface access
- material close to/based on Programming Historian
- scikit-learn: yes/no/installation instructions!
- github without command line: collaborating on notebooks might be tricky (because then people see json diffs instead of the nbdime tool set which one uses locally to compare/diff/merge)
- we all browse through Python examples of the programming historian (21 examples) and recommend 1-2 examples to use here
- CR Zulip chat: https://coderefinery.zulipchat.com/ (you don't need invite to join)
---
## Meeting 2
### Agenda
1. Brief Welcome
3. Continuation of Lesson Discussion
- which lesson(s) from the Programming Historian do you think will be interesting and/or easy to use for the workshop?
- maybe we can identify 1-2 application areas from the chosen lesson(s) that we can use as selling point on the website and in the workshop advertisements, e.g. web scraping, word/image clustering, ...?
- GitHub without command line: yes/no?
- if we drop GitHub: do we drop Binder as well or do we demonstrate its capabilities and refer learners to a proper git workshop if they want to make their own notebooks available via binder?
4. Website: https://raphaelaheil.github.io/2022-03-15-dhnb/ (draft)
- To fill in before 1st December (DHNB registration opens)
- registration + GDPR information
- "About Us"
- rough program outline (start time, end time), brief overview of topics (Python, maybe Git, maybe Binder, and possibly application area mentioned above)
- To fill in in due time before the workshop:
- code of conduct
- setup instructions
- concrete program (topics, breaks, ...)
### Notes
Lesson topics
- how to install modules, what are environments, etc.
- different notebooks, one per "kind of problem" - goal: figuring out how to use notebooks as basic tools (with focus on scikit-learn)
- e.g. supervised, unsupervised
- intro: how do notebooks work + reading data
- different notebooks, ideally reading different types of data, what are the relevant parts in our data, referring back to intro/build on that
- e.g. same pipeline, different data
- keep notebooks as similar as possible
- picking up notebooks, using them, maybe adapt to their own data
- KLN title proposal: "Efficient Python Tools for Text and Image Analysis"
### Summary (also sent by mail)
- focus of the workshop has been shifted:
- focusing more on Python + Jupyter as tools, instead of teaching Python programming in itself
- github/binder dropped
- rough topics: Jupyter Intro, possibly installing modules, reading data, followed by topic/problem domain notebooks (using scikit-*), e.g. clustering words, clustering images, ...
- Kristoffer has proposed the following new workshop title: "Efficient Python Tools for Text and Image Analysis"
- next steps:
- I will inform the DHNB organisers about the changes to the workshop outline
- new workshop abstract + announcement will be drafted in Zulip chat (ideally listing a couple of concrete topics we can commit to, as "selling points")
- I will start preparing a registration form (contents to be discussed in Zulip) and update the website along the way
- another planning meeting possibly in December or early next year to discuss the concrete lesson/notebook contents
## Meeting 2022-02-28
### "What text and/or image analysis topics would you like to learn more about? "
Identify a word as a class of words in texts, for example placenames. Identify handwritten numbers overlayed on images (maps).
Learners' user experience / learning object (e-content) preferences / data analysis
I'm interested mainly in code-switching (Latin-Greek, but also to vernacular) in early modern texts.
I'd like to learn more about word embeddings & beginner-friendly ways/libraries to work with it in Python
I'm interested in getting new ideas, for my research and teaching. Everything will be interesting. I'm new to image analysis, so curious about that.
Classification, text mining and predictive modeling.
Filmic/video images or other still visual images
Similarity of images, metrics for comparing images, or layout analysis (but I’m interested in just exploring what one can do generally too)
Basic
Object recognition in images
-
I want to see the possibilities I guess. I mostly use bash for text-processing at the moment.
Identify/classify text elements like placenames/person names, dates. Extract handwritten items on scanned map images.
Text analysis possibilities compared to using tools like AntConc and Voyant Tools.
---
_**Old resources below this line:**_
---
## Schedule Draft (To be discussed!)
### Pre-Session
- installation help session, either 0.5-1 hrs before the tutorial start or, if we have the capacities, the day before?!
### Morning
1. Welcome, Code of Conduct, Ice-Breaker, etc.
2. [Introduction to Jupyter (python-novice-gapminder/01-run-quit)](https://swcarpentry.github.io/python-novice-gapminder/01-run-quit/index.html)
- maybe in combination with the Code Refinery lesson [Introduction to Jupyter and JupyterLab](https://coderefinery.github.io/jupyter/)?
Intro to Python:
3. [Variables and Assignment (SWC)](https://swcarpentry.github.io/python-novice-gapminder/02-variables/index.html)
4. [Data Types and Type Conversion (SWC)](https://swcarpentry.github.io/python-novice-gapminder/03-types-conversion/index.html)
5. [Built-in Functions and Help (SWC)](https://swcarpentry.github.io/python-novice-gapminder/04-built-in/index.html)
6. [Libraries(SWC)](https://swcarpentry.github.io/python-novice-gapminder/06-libraries/index.html)
### //Lunch //
### Afternoon
[Collaborating and sharing using GitHub without command line (CR)](https://coderefinery.github.io/github-without-command-line/)
7. [Git/Github: Basics and motivation (CR)](https://coderefinery.github.io/github-without-command-line/basics/)
8. [Creating repositories using the web interface (CR)](https://coderefinery.github.io/github-without-command-line/creating-using-web/)
9. [Sharing Notebooks (CR, Jupyter lesson)](https://coderefinery.github.io/jupyter/sharing/)
10. options for afterwards:
- end here
- continue with the SWC Python lessons (possibly similar to the [Library Carpentries Python lesson](https://librarycarpentry.org/lc-python-intro/), i.e. no pandas/plotting, moving directly to lists, loops, etc.)
- find something along the lines of web scraping, text processing, ... that could be interesting to the audience, e.g. from [The Programming Historian](https://programminghistorian.org/) (or make something of our own?)
## Similar Workshops from Previous Years
- **Software Carpentry Workshop: Programming with Python and Version Control with Git**, DHNB2018: [https://kln-courses.github.io/2018-03-06-DHN2018/](https://kln-courses.github.io/2018-03-06-DHN2018/)
- **Mixed Arts** tutorial from DHNB 2019: [https://kln-courses.github.io/mixed-arts/program/](https://kln-courses.github.io/mixed-arts/program/)
---
# Proposal
**Title:** Introduction to Programming with Python and Version Control Using git
Carrying on the tradition of programming tutorials from DHNB 2018 [1] and 2019 [2], we propose to organise a full day, hands-on tutorial that introduces learners to foundational programming concepts. In the digital humanities, both Python [3] and R [4] are popular choices of programming languages. We have chosen to follow the trend of previous DHNB tutorial instances and will centre our programming introduction around the former. Both languages offer a lot of similar features. However, Python has the advantage that it is used for a large number of text and image processing projects, as well as for a majority of machine learning approaches. In teaching this language, we hope to lower the threshold for tutorial attendees to approach and explore such topics.
Our tutorial is primarily aimed at researchers and students from the humanities and does not require any prior programming experience. However, attendees from all kinds of backgrounds and levels of prior knowledge are welcome. The tutorial has a capacity of 25 learners. It will be realised using materials from the Carpentries [5] and the NeIC-funded CodeRefinery project [6]. In addition to the live-coding or type-along style lessons, learners will practise topics with various hands-on exercises. All of the material has been successfully used in a variety of prior workshops, both in person and in online settings, and is publicly available (CC-BY-4.0). Instructors and helpers have previous experience from the Carpentries or CodeRefinery or are affiliated with digital humanities institutes in the Nordics and Baltics.
The preliminary schedule comprises an introduction to Jupyter notebooks [7] and basic programming concepts in the morning. After lunch, we will continue with a browser-based introduction to the version control system git [8], using the hosting service GitHub [9]. This will be followed by a demonstration of the interactive notebook-sharing service Binder [10]. Finally, the tutorial will conclude with a presentation of additional Python concepts, as well as an overview of resources and opportunities for further study of the presented and related topics. A website, similar to those of previous years, will be set up in due time before the conference and will provide details such as the final schedule, set-up instructions and an introduction of the instructors and helpers.
[1] “Software Carpentry Workshop: Programming with Python and Version Control with Git”, Rockenberger and Nielbo, DHNB 2018, Helsinki, Finland; https://kln-courses.github.io/2018-03-06-DHN2018/
[2] “Mixed Arts with CodeRefinery & Software Carpentry - Programming with Python and Automated Version Control”,Rockenberger, Eckardt, Kirstensen-McLachlan and Nielbo, DHNB 2019, Copenhagen, Denmark; https://kln-courses.github.io/mixed-arts/program/
[3] https://www.python.org/
[4] https://www.r-project.org/
[5] https://carpentries.org/
[6] https://coderefinery.org/
[7] https://jupyter.org/
[8] https://git-scm.com/
[9] https://github.com/
[10] https://mybinder.org/