owned this note
owned this note
Published
Linked with GitHub
# Seeing text as data
---
## Example: the Viral Texts project

Ryan Cordell and David Smith, Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines (2017), http://viraltexts.org

Ryan Cordell and David Smith, Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines (2017), http://viraltexts.org
---

---
## Example: Quantifying Kissinger project
{%vimeo 100791450 %}
---

https://vimeo.com/100791450
Credit:
Micki Kaufman,
[Quantifying Kissinger project ](https://blog.quantifyingkissinger.com/)
---
## Example: Visualizing Christianity and Islam in the Mediterranean

---

---
## Some problems with text (as) data
- we focus on end results or primary sources. We often tend to ignore the process that (computationally) produces certain knowledge sources, resouces, results
---
- texts and data are messy. They are often problematic, fragmented, inconsistent right from the start.They require significant personal labor and time investment in cleaning, accessibility, curation, maintenance
---
> No individual scholar can read and proofread each text, so the texts we use will have errors, from small typos to missing chapters, which may cause problems in the aggregate. Ideally, to address this issue, scholars could create a large, collaboratively edited collection of plain-text versions of literary works that would be open access."
Swafford, Joanna. “‘49. Messy Data and Faulty Tools | Joanna Swafford’ in ‘Debates in the Digital Humanities 2016’ on Manifold.” In Debates in the Digital Humanities. Accessed October 13, 2020. https://dhdebates.gc.cuny.edu/read/untitled/section/7e0afe14-e266-4359-aa4a-5dff02735e8b#ch49.
---
- text datasets need to be interrogated. Instead of using computational analysis to reinforce patterns and preconceived ideas we can use analysis and visualization to reveal omissions, exclusions, assumption in our data
---
- text analysis tools need to be approached critically. Some tools come with their own assumptions and conditions. Also, tools have their constraints and they often require considerable time in testing, training, troubleshooting, finetuning.
---
## How to see texts as data?
---
### Distant reading
Term (coined by Franco Moretti)referring to the use of computational methods to analyze literary texts.
> "I have chosen distant reading because the phrase underlines the macroscopic scale of recent literary-historical experiments, without narrowly specifying theoretical presuppositions, methods, or objects of analysis."
Ted Underwood,2017.
---
### Text mining
The automatic extraction of information from different textual resources with the purpose of revealing patterns, dimensions and relations through computational analysis.
---
### Topic modeling
Topic modeling is a method of analyzing corpora by identifying semantic classes of words ("topics") and using them as the basis for analysis. It uses algorithms to break down collections of documents into a range of topics that can be used to explore and analyze the text.