You are here: Home » 9. Introduction to Textual Analysis

9. Introduction to Textual Analysis

This module introduces techniques for text and data mining that can enable different ways of close and distant reading across a collection, or corpora, of texts. The term “distant reading” was coined by Franco Moretti, as a means for using computational analysis to find and visualize patterns in language used across many texts that may be difficult to see when reading each text individually. When dealing with large collections of digital-first texts (ie, electronic correspondence), it might be impossible for one person to closely read and review each piece. Digital textual analysis can be particularly helpful for the following tasks: find the meaning of words and documents; how words change over time; frequency of a term over time; concordance to a corpus; named entity recognition; text reuse; semantics of documents; and the semantics of words.

Outcomes

  1. Increased understanding of textual analysis techniques and why they are beneficial for certain types of research.
  2. Ability to assemble a data set appropriate for textual analysis.
  3. Ability to engage in quick textual analysis techniques using free online tools, Bookworm and Voyant Tools.

Readings

Dan Cohen, “Searching for the Victorians,” Dan Cohen’s Digital Humanities Blog (October 4, 2010). http://www.dancohen.org/2010/10/04/searching-for-the-victorians/ ♦ Estimated Read Time = 10 minutes

Allen Beye Riddell, “How to Read 22,198 Journal Articles: Studying the History of German Studies with Topic Models,” in Distant Readings: Topologies of German Culture in the Long Nineteenth Century, edited by Matt Erlin and Lynne Tatlock, 91–114. Rochester, NY: Camden House, 2014. PDF of article ♦ Estimated Read Time = 30 minutes

Megan R. Brett, “Topic Modeling: A Basic Introduction” Journal of Digital Humanities (2:1). http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/  ♦ Estimated Read Time = 8 minutes

Discussion Questions

  • Cohen’s data set is only the titles of nineteenth century works. What are the advantages and disadvantages of using that data set as he is doing? How do you think having the full text might change the results?
  • Riddell’s analysis concerns trends over time in a subfield. Would you expect to see similar trends in your field, particularly those presented in Figure 3.8?
  • Compare Riddell’s Figures 3.8 and 3.9. Does changing the scale of the vertical axis change your reading of the frequencies?
  • How do the differences in Cohen and Riddell’s corpora (primary sources vs. secondary literature) inform their approaches and analyses? Do you think that one of these corpora or types of sources is better suited to topic modeling than the other?
  • Based on Brett’s discussion of topic modeling, where in the progress of a project do you think it is best to conduct textual analysis?

Activity 1

Try Bookworm, http://bookworm.culturomics.org/ to find rhetorical trends in digitized texts found in Open Library and Google Books with this tutorial, http://ssrc.doingdh.org/try-bookworm/.  Bookworm identifies word frequencies over time. By comparing words/terms throughout a corpus of texts, it is possible to trace word use and changes over time, or see when particular terms enter the English language lexicon.

Activity 2:

Use Voyant Tools to analyze a corpora of texts to examine word frequency, a corpus grid, corpus summary, and keyword in context analysis: http://ssrc.doingdh.org/voyant-tutorial/

Project Lens

Take a look at either Ben Schmidt’s, State of the Union in Context, http://benschmidt.org/poli/2015-SOTU or
Lindsay King and Peter Leonard’s Robots Reading Vogue: http://dh.library.yale.edu/projects/vogue/

  • Look for the data they are using.  Would you consider this open data?
  • Is any of this analysis replicable (Robots Reading Vogue uses different methods, are some more easily done than others?)
  • Are the results of textual analyses presented in these project legible to someone not already an expert?
    (This might help: Benjamin Schmidt and Mitch Fraas, “The Language of the State of the Union,” The Atlantic (January 18, 2015), http://www.theatlantic.com/politics/archive/2015/01/the-language-of-the-state-of-the-union/384575/)