You are here: Home » Voyant Tutorial

Voyant Tutorial

Voyant is a web-based text analysis environment. The tool allows you to read and explore a corpus of texts using a multi-panel interface. The interactive visualizations that result from these explorations can be embedded in web pages.

Preparing a corpus:

In order to use Voyant you will need a corpus of texts to analyze. You can use a body of texts relevant to your field, the sample corpus provided below, or click “open” on the beginning page of Voyant to load either the texts of Shakespeare’s plays or the novels of Jane Austen.

If you want to load a corpus of texts from your own field, please note that the file or files must be machine readable. While Voyant will accept PDF files, these must have been OCRd–the process by which images of texts are transformed into machine-readable text. The process of converting images to text often introduces errors, such as formatting marks in PDFs may show up as random characters. It may be easier to convert content in a PDF into plain text (txt) or rich text (rtf) files, first, before uploading them into Voyant. For more information, see the Voyant documentation on loading texts.

A corpus should ideally contain either a series of texts or one very large text to analyze. A single journal article would be less productive to analyze than all articles from an issue of a journal, or even from a journal for an entire year. Likewise, you will have more robust results with multiple works by an author – or a very long novel such as War and Peace – than with a single work (see the example corpora of Shakespeare and Austen’s texts).

If you do not have a corpus and do not wish to use one of the example sets from Voyant, you can copy the following link and paste it into the field labelled “Add Texts”:  http://archive.lib.msu.edu/dinfo/sundayschoolbooks/ssb_txt.zip This is a collection of 19th Century American Sunday School Books from Michigan State University via the “Shaping the Values of Youth” collection which includes Sunday School Books published between 1807 and 1887. The corpus is made up of a total of 166 files, which were transcribed by hand. See http://www.lib.msu.edu/ssbdata/ for more information. You can also use the pre-loaded copora of Shakespeare or Austen, as noted above.

Upload a corpus:

  1. Navigate to http://voyant-tools.org/
  2. Add a corpus using one of the following options:
    • Paste the entire text into the “Add Texts” field;
    • Paste the link to the text (and only the text) into the Add Texts field, for example http://archive.lib.msu.edu/dinfo/sundayschoolbooks/ssb_txt.zip
    • Use the Upload option to add files from your computer; or
    • Use the open option to load the plays of Shakespeare or the novels of Austen

Voyant1

When you have added your corpus, click “Reveal.”

 

Voyant2

Once opened, in what they call the “default skin”- you will see five panels. Each of these is a tool, Cirrus, Reader, Trends, Summary, and Contexts. These tools interact with one another – if you modify one pane, you’ll see another update.

  • The appearance of each of these windows can be modified. Place the cursor on the ? symbol and a menu of options will appear. (Note: hover over the titles of these navigation buttons for descriptions.)Voyant3
    • Export (arrow moving out of a box) – opens a window in a tab of its own with export options.
    • Choose another tool (grid of four squares) – opens a dropdown menu with options.
    • Options (slider bar) – to further refine your results (available for some tools)

Tools:

When you first load a corpus, Voyant will display five tool areas. The primary tools in each section are:

  • Cirrus
    • A word cloud that visualizes the top frequency words of a corpus or document.
      • Central location and large size indicate greater frequency.
  • Reader
    • Text Reader- displays text for reading.
    • Prospect Viewer- displays an overview of the entire corpus.
  • Trends
    • A line graph that depicts the distribution of a word or words (occurrence across a corpus or document).
  • Summary
    • Provides information about the corpus.
  • Contexts
    • shows each occurrence of a keyword with surrounding text.

Note that for each area, there are additional tools, to the right of the primary tool for that area. For example, in the image below, the primary and active tool is Reader, but that area also has ready access to the tool TermsBerry:

A red rectangle annotation emphasizes the option "TermsBerry", to the right of option "Reader" which has a blue highlight.

Voyant has roughly twenty tools. You can change which ones appear in each section using the Choose Another Tool button (described above). Explore the descriptions of these tools to see what each tool does.

Stopwords are words not included in analysis of a corpus. Voyant automatically uses a basic list of stopwords, such as the, a, an, and so forth. You can modify the stopword list for many of Voyant’s tools using the Options setting for that tool. See their documentation for detailed instructions.

Things to do and ask

Looking at the initial display of analysis: What results did you expect? What surprises you?

Try at least eight of Voyant’s tools, making sure to explore the visualization and grid tools. What do the different tools reveal? What does each tool obscure?

Try changing the scale or number of items for various tools. How does that change the results, and what the results suggest about the corpus?

Export:

  • Bookmark a corpus to return to it later:
    • Click Export at the top of the page, select URL for this view. (Note, the team at Voyant indicates that the corpus will “accessible as long as it accessed at least once a month.”
  • Embed a corpus:
    • Click Export at the top of the page, select “an HTML snippet” and click export for the snippet to appear. Copy and paste the snippet in your page.