Seeing text as data

# Seeing text as data --- ## Example: the Viral Texts project ![Image source: Ryan Cordell and David Smith, Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines (2017), http://viraltexts.org.](https://i.imgur.com/JoVrtUI.png) Ryan Cordell and David Smith, Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines (2017), http://viraltexts.org ![Image source: Ryan Cordell and David Smith, Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines (2017), http://viraltexts.org.](https://i.imgur.com/0Db43Gz.png) Ryan Cordell and David Smith, Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines (2017), http://viraltexts.org --- ![Image source: The Library of Congress, Chronicling America](https://i.imgur.com/InVHFTs.png) --- ## Example: Quantifying Kissinger project {%vimeo 100791450 %} --- ![](https://i.imgur.com/0XjvTgS.png) https://vimeo.com/100791450 Credit: Micki Kaufman, [Quantifying Kissinger project ](https://blog.quantifyingkissinger.com/) --- ## Example: Visualizing Christianity and Islam in the Mediterranean ![](https://i.imgur.com/VsyKXZE.png) --- ![](https://i.imgur.com/dRaYR2Q.png) --- ## Some problems with text (as) data - we focus on end results or primary sources. We often tend to ignore the process that (computationally) produces certain knowledge sources, resouces, results --- - texts and data are messy. They are often problematic, fragmented, inconsistent right from the start.They require significant personal labor and time investment in cleaning, accessibility, curation, maintenance --- > No individual scholar can read and proofread each text, so the texts we use will have errors, from small typos to missing chapters, which may cause problems in the aggregate. Ideally, to address this issue, scholars could create a large, collaboratively edited collection of plain-text versions of literary works that would be open access." Swafford, Joanna. “‘49. Messy Data and Faulty Tools | Joanna Swafford’ in ‘Debates in the Digital Humanities 2016’ on Manifold.” In Debates in the Digital Humanities. Accessed October 13, 2020. https://dhdebates.gc.cuny.edu/read/untitled/section/7e0afe14-e266-4359-aa4a-5dff02735e8b#ch49. --- - text datasets need to be interrogated. Instead of using computational analysis to reinforce patterns and preconceived ideas we can use analysis and visualization to reveal omissions, exclusions, assumption in our data --- - text analysis tools need to be approached critically. Some tools come with their own assumptions and conditions. Also, tools have their constraints and they often require considerable time in testing, training, troubleshooting, finetuning. --- ## How to see texts as data? --- ### Distant reading Term (coined by Franco Moretti)referring to the use of computational methods to analyze literary texts. > "I have chosen distant reading because the phrase underlines the macroscopic scale of recent literary-historical experiments, without narrowly specifying theoretical presuppositions, methods, or objects of analysis." Ted Underwood,2017. --- ### Text mining The automatic extraction of information from different textual resources with the purpose of revealing patterns, dimensions and relations through computational analysis. --- ### Topic modeling Topic modeling is a method of analyzing corpora by identifying semantic classes of words ("topics") and using them as the basis for analysis. It uses algorithms to break down collections of documents into a range of topics that can be used to explore and analyze the text.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.