Comparing corpora

Sometimes we are interested in comparing different corpora looking for stylistic diferences, intertextual references, influences or shared topics and patterns. This comparative look often comes before we even commit to analyzing a certain text or dataset and it may affect the way we think about our main research questions or the tools and methods we employ.

Example: Intertextuality

Go to Tesserae "a web interface for exploring intertextual parallels":

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Select a source and target author/text and click on "Compare texts":

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Copy the matched terms in the first three row and copy them here along with the session info (scroll down to the bottom of the page to get it) as shown below.

Source Text: aristotle.economics_book_3
Target Text: cicero.academica
Unit: phrase
Feature: stem
Stoplist size: 10
Stoplist basis: corpus
Stop words: qui, quis, sum, et, in, is, non, hic, ego, ut
Max distance: 10
Distance metric: freq
Score cutoff: 6
Filter: off

Results:

Matched on Score
enim, facio 14
enim, facio 14
multus, dolor 13

Take a moment to think about the matched terms. Do they make sense to you? Do they reveal something you didn't know about the texts? Would it be useful to change some variables and run a new session?

From the dropdown menu select "and format as" csv > Change Display to download a spreadsheet with the results of your session. Save the csv file locally so you can reuse it.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Using Tesserae you can also compare not just two but multiple texts. Go to the Corpus Wide Tool
and choose source and target texts to compare:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Here's an example of results when comparing different works of Aristophanes.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Click "Compare texts" and copy the first three matched terms as shown below.

Source Text: aristophanes.birds
Target Text: aristophanes.clouds
Unit: line
Feature: stem
Stoplist size: 10
Stoplist basis: corpus
Stop words: ὁ, ὅς, καί, δέ, τίς, αβγ, εἰμί, οὗτος, αὐτός, μένω
Max distance: 999
Distance metric: freq
Score cutoff: 0
Multi-text search: aristophanes.lysistrata
Multi cutoff: 0
Filter: off

Matched on Score
κακός, ἀπόλλυμι 9
δέω, ἔοικα 9
παίω, σφενδόνη 9

From the dropdown menu chage format and display to csv to download your session's results.