"A statistical representation of topics within a textual corpus of documents is referred to as topic models. Typically, each topic is represented by a set of related lemmas; these are often accompanied by weights to indicate the relative prominence of each lemma within a topic.
Arnold, Taylor, and Tilton, Lauren . 2015. Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text.
Topic modeling provides a suite of algorithms to discover hidden thematic structure in large collections of texts. The results of topic modeling algorithms can be used to summarize, visualize, explore, and theorize about a corpus.
Blei, David M. 2012. “Topic Modeling and Digital Humanities Journal of Digital Humanities.”
A visualization of the topic modeling results of a 350 document Environmental Humanities corpus.
Source:
Digital Environmental Humanities
https://dig-eh.org/dig-eh/TopicModelling/CircularDisciplines/
A visualization of co-occurring words with LANDSCAPE in topic models
Source: https://dig-eh.org/dig-eh/TopicModelling/Landscape/#Landscape
Digital Environmental Humanities
Exploratory topic modeling of Frederick W. Hasluck's 1929 "Christianity and Islam Under the Sultans.
Project: Visual Hasluck
Topic modeling is great for:
Topic modeling is perfectly designed for workshops and demonstrations, since you don’t have to start with a specific research question. A group of people with different interests can just pour a collection of texts into the computer, gather round, and see what patterns emerge. Generally, interesting patterns do emerge: topic modeling can be a powerful tool for discovery. But it would be a mistake to take this workflow as paradigmatic for text analysis. Usually researchers begin with specific research questions, and for that reason I suspect we’re often going to prefer supervised models.
Underwood, Ted. 2015. “Seven Ways Humanists Are Using Computers to Understand Text.” The Stone and the Shell (blog). June 4, 2015.
Algorithms do not interpret their own failures, but their errors generate moments of rupture for the feminist literary scholar, in whose hands error and marginality expose the fault lines encoded in predictive methods such as classifying and organizing text. (…) I use large poetic corpora to interrogate the assumptions of topic modeling—that documents sharing similar words likewise share thematic coherence. Where most topic-modeling results demonstrate thematic coherence, mine represent discourse coherence; subsequently, the model points to ekphrastic poems by women who share similar discourses as their male counterparts, but do so ironically.
Rhody, Lisa Marie. 2016. “‘46. Why I Dig: Feminist Approaches to Text Analysis |
What are some of the topics identified? Did you discover any meaningful semantic links or clusters?
What kind of differences do you see in the analysis, structuring and visualization results between Voyant Tools and Text Analyzer?
GUI component available here:
https://github.com/senderle/topic-modeling-tool
A great introduction to MALLET by Shawn Graham, Scott Weingart, and Ian Milligan (2017)
https://programminghistorian.org/en/lessons/topic-modeling-and-mallet
R is a programming language and free software environment for statistical computing
https://www.r-project.org/
![Humanities Data in R book]
Programming language
An example of counting word frequencies in Python from
the tutorial "Counting Word Frequencies with Python" by
William J. Turkel and Adam Crymble.
https://programminghistorian.org/en/lessons/counting-frequencies