---
author: Melvin Wevers & Nanne van Noord
tags: teaching
title: Course manual Foundations of Cultural and Social Data Analysis
---
# Foundations of Cultural and Social Data Analysis
**Academic Year**: 2023-2024
**Course catalogue number**: 158621086Y
**Credits**: 6EC
**Teaching Method**: Lecture and lab sessions once a week (3 hours)
**Schedule**: Mondays 9-12, OMHP, D1.18A
**Lecturers**: Melvin Wevers (melvin.wevers@uva.nl) & Nanne van Noord (n.j.e.vannoord@uva.nl)
## Course Content & Objectives :dart:
In this course, you will learn how to code in Python and analyze cultural data to answer questions relevant to the humanities. This type of data poses great challenges and offers many opportunities due to its variety and richness. Examples include historical and literary sources, social networks and media. During this course, students will be introduced to the main concepts and techniques for analyzing such a variety of data. In addition, we will critically reflect on the applied methods.
After completing this course, the student is able to:
- **Code** in Python to perform a variety of practical tasks.
- **Formulate** a humanities research question that invites the use of data analysis.
- **Apply** data analysis tools and techniques on humanities data.
- **Relate** data analysis results to humanities research questions.
- **Explain** the surplus-value and limitations of data analysis from a humanities perspective.
- **Reflect** on the implications of the use of data analysis in studying historical and contemporary cultures.
## Literature :books:
The main source we will use for this course will be:
- Folgert Karsdorp, Mike Kestemont, and Allen Riddell. *Humanities Data Analysis : Case Studies with Python*. Princeton, Princeton University Press, 2021. https://www.humanitiesdataanalysis.org/
Additional reading includes:
- D’Ignazio, Catherine, and Lauren F. Klein. “What Gets Counted Counts.” In *Data Feminism*. Cambridge, MA, USA: MIT Press, 2020. https://data-feminism.mitpress.mit.edu/pub/h1w0nbqp/release/3
- Pechenick, Eitan Adam, Christopher M. Danforth, and Peter Sheridan Dodds. “Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution.” PLOS ONE 10, no. 10 (October 7, 2015): https://doi.org/10.1371/journal.pone.0137041.
- Seltman, Howard J. “Exploratory Data Analysis.” In Experimental Design and Analysis, 61–100. Pittsburgh, Pennsylvania: Carnegie Mellon University Press, 2018. https://www.stat.cmu.edu/~hseltman/309/Book/chapter4.pdf
Optional reference work:
- Canning, John. *Statistics for the Humanities*. Brighton, UK: Creative Commons, 2014. http://www.statisticsforhumanities.net/book/wp-content/uploads/2014/07/StatisticsforHumanities%205Sept14.pdf
## Assessment :100:
1. **Coding exercises (60% (5 x 12%))** :computer:
- As a group (2-3 people) you will hand in the coding exercises on the **Wednesday after each lab session (6pm)**. These will be submitted via the Canvas Group Assigments.
- Please upload your work as a .ipynb file. **Your upload must include**:
- The code should be provided as a single Jupyter notebook with all cells executed and exported as *.ipynb
- Your code should contain appropriate comments and markdown cells with an explanation of what you did as well as your reasoning.
- Please state each group member's individual contribution at the beginning of the notebook.
- Please carefully cite your sources (including if you borrow substantial sections of code, e.g., from StackOverflow).
- **Rubric for assessment**
- 50% correctness and quality of the analysis or exercises (does it run?, does it produce the correct results?)
- 30% comments and explanations (including critical reflection on results)
- 10% quality of your code (See the [Zen of Python](https://peps.python.org/pep-0020/#the-zen-of-python))
- 10% extra mile bonus, for example doing the challenging exercises
2. **Project Proposal (40%)** :newspaper:
- **Deadline: May 22, 6pm**. Submitted via Canvas Group Assignments.
- The proposal should be written with the same group as the one for the coding exercises.
- The Project Proposal should include the following elements:
- project plan
- Intro + research question (500 words)
- describe data (500 words)
- method (200 words)
- Section on the benefits and limitations of data science tool for humanities research based on own project plan (800 words)
- **Rubric for assessment**
- Clarity (how well is the idea explained)
- Originality ()
- Feasibility (can the project be executed in the next course?)
- Writing (language, structure, citing)
- Critical reflection ()
:1234: **DATASETS**
- The Center for Digital Humanities Princeton has compiled an extensive list: https://cdh.princeton.edu/research/resources/humanities-datasets/
- Check out the Journal of Open Humanities Data: https://openhumanitiesdata.metajnl.com/
- Miriam Posner has collected a huge list of fascinating datasets: http://miriamposner.com/classes/dh201w19/final-project/datasets/
- For Twitter data see: https://github.com/shaypal5/awesome-twitter-data
- Early African-American Film Database, 1909–1930, [](https://zenodo.org/badge/latestdoi/62099402). Reference paper: https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.7. To read more about these data, see the [project website](http://dhbasecamp.humanities.ucla.edu/afamfilm/).
:exclamation:If you run into any issue, e.g., because some group members become unresponsive or do not do what is expected of them, please contact us. Individual grades might be adjusted accordingly.
:exclamation: if you run into technical issues, please post your question on the Canvas Discussion Board.
:exclamation: You pass the course if the weighted average score for the **Coding exercises** is **5.5**. The **Project Proposal** needs to be graded with at least a **5.5**. The group assignments will be assessed with group grades. The instructor can, however, assign individual grades.
The **Project Proposal** can be re-submitted. After receiving the grade and feedback you have one week to hand in an improved version. You need at least a **4.5** to be eligible for a resubmission.
## Schedule :calendar:
### Week 0 - April 1
<!-- * Send HDA book + instructions for 1st class -->
#### Homework
- Go through the getting started instructions in the `README.MD` file on the course [GitHub Repository](https://github.com/CANAL-amsterdam/Foundations-of-Cultural-and-Social-Data-Analysis)
- Study the notebook `Basics_of_Python.ipynb` in the folder '00-basics_of_python'
- Form groups using `People` in Canvas. The groups should have 2 or 3 people each.
- Read the preface from *Humanities Data Analysis*: https://www.humanitiesdataanalysis.org/preface/notebook.html
- Go through Chapter 1 from *Humanities Data Analysis* https://www.humanitiesdataanalysis.org/introduction-cook-books/notebook.html
`Go Through` involves more than just reading the chapters. Run the code in the interactive notebooks and try to understand how it works. Feel free to change elements and see what the impact is.
---
### Week 1 - April 8
#### Lecture
* Introduction to the course and the setup
* What is Data Science?
* Python 101
* Concepts from chp.1
#### Lab
<!-- * Check that everyone is in a group -->
* Do the Python 101 multiple-choice test
* Work on Chp.1 exercises easy + moderate (challenging is optional)
#### Homework
- Hand in exercises chapter 1
- Go through chapter 2: https://www.humanitiesdataanalysis.org/getting-data/notebook.html
---
### Week 2 - April 15
#### Lecture
- short recap
- What is data / data retrieval
- Method: Network analysis
- basic concepts
#### Lab
- Work on Chp2. Exercises easy + moderate (challenging is optional)
#### Homework
- Go through chapter 3: https://www.humanitiesdataanalysis.org/vector-space-model/notebook.html (Appendix Optional)
- Read Chapter 4 from https://direct.mit.edu/books/oa-monograph/4660/Data-Feminism
---
### Week 3 - April 22
#### Lecture
- short recap
- distance metrics (cosine, nearest neighbors)
- examples from research
#### Lab
- Work on Chp.3 Exercises easy, moderate, and challenging
#### Homework
- Go through chapter 4: https://www.humanitiesdataanalysis.org/working-with-data/notebook.html
- Read Peter Dodds article on Google Books / Culturomics: https://pdodds.w3.uvm.edu/research/papers/others/everything/pechenick2015a.pdf
---
### Week 4 - No class
---
### Week 5 - May 6
#### Lecture
- recap of course until now
- Pandas
- Time Series
- (cultural) change
- Discuss Dodds article
- Showcase examples from research
#### Lab
- Work on Chp.4 exercises easy + moderate (challenging is optional)
#### Homework
- Go through chapter 5: https://www.humanitiesdataanalysis.org/statistics-essentials/notebook.html
- Read Chapter 4 from _Experimental Design and Analysis_ by Howard J. Seltman: https://www.stat.cmu.edu/~hseltman/309/Book/chapter4.pdf
---
### Week 6 - May 13
#### Lecture
- Statistics
#### Lab
- Work on Chp.5 Exercises easy + moderate (challenging is optional)
---
### Week 7
**No Lecture and Lab**
- Online office hour for questions relating to project proposal.
---
## Fraud & Plagiarism
In academia, intellectual originality is highly valued. If you copy passages from another, or claim ideas as your own without referencing the true source, it is seen as a form of intellectual theft. This rule protects the intellectual property of others, as well as your own. The program places a high emphasis on academic integrity and adheres to strict rules regarding fraud and plagiarism. Fraud and plagiarism are defined as any action or omission by a student that wholly or partly makes it impossible to accurately judge their knowledge, understanding, and skills.
If plagiarism is suspected, the instructor is obliged to inform the examination board of this suspicion. The board may then exclude students from taking exams and from performing other activities that earn study credits for a period of up to one year.
All information on what the University of Amsterdam (UvA) considers as plagiarism and the procedures involved can be found in the regulations [Fraud and Plagiarism](https://student.uva.nl/en/topics/plagiarism-and-fraud). This regulation applies to all educational components; every student is expected to be familiar with these rules.
## Social Safety
The University of Amsterdam (UvA) is committed to fostering a positive working and study environment where interactions are respectful, no one feels unsafe, and everyone has the opportunity to develop their talents. We aim to ensure that everyone has a secure foundation for work or study, including when difficult or critical conversations are necessary. If needed, you can report inappropriate behavior to instructors, academic advisors, the Program Director, and/or confidential advisors. For more information, the Code of Conduct, and the various services available to students, please visit the UvA's [Social Safety page](https://www.uva.nl/en/about-the-uva/about-the-university/social-safety/social-safety.html).