Textual Analysis in Voyant Tools
Textual Analysis in Voyant Tools
christophermchurch.com
Uploading our corpus
1. Download the Sunday Shows transcript corpus for March 2014 from
(http://www.christophermchurch.com/uploads/voyant-workshop/individual/data.zip)
You should save it to your Desktop for easy access.
2. Now navigate your browser to www.voyant-tools.org and click “Upload.” Add each of the text files
one at a time in the dialog. Make sure to add all the documents with prefixes FNS, SOTU, and MTP.
NOTE: To save time, you can grab the files from my server by putting the following URLS in the “Add Texts”
section (one per line). Copy and paste. (if the PDF gives you trouble, copy the links from here:
http://www.christophermchurch.com/uploads/voyant-workshop/docs/plain-text-list-of-links.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/FNS_2014-02-02.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/FNS_2014-02-09.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/FNS_2014-02-16.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/FNS_2014-02-23.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/FNS_2014-03-02.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/MTP_2014-02-02.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/MTP_2014-02-09.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/MTP_2014-02-16.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/MTP_2014-02-23.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/MTP_2014-03-02.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/SOTU_2014-02-02.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/SOTU_2014-02-09.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/SOTU_2014-02-16.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/SOTU_2014-02-23.txt
http://www.christophermchurch.com/uploads/voyant-workshop/individual/SOTU_2014-03-02.txt
3. Once you have added all the files (15 in all), click to continue to the analysis screen. Take
some time getting acquainted with the tools screen. Click around and see what happens. You can
read more about the analysis screen’s layout here: http://hermeneuti.ca/voyeur/users
christophermchurch.com
3|Textual analysis with Voyant Tools | Christopher M. Chur ch
christophermchurch.com
Doing Some Analysis
1. OK, at this point, all you really know is that the news
anchors and their guests use a lot of prepositions,
articles, conjunctions, and other stop words that tell
us very little about their topics of conversation.
d. We can also customize our stop word list if we want to get rid of other common words that we’re not
interested in (e.g. common verbs, pronouns). We do this by clicking “Edit Stop Words.” I’ve created a
more expansive list. (http://www.christophermchurch.com/uploads/voyant-workshop/stopwords.txt).
e. Please select all, copy and paste (CTRL-A, CTRL-C, CTRL-P) replace the current stop word list with the
custom one I’ve created. What has changed?
3. Now that we have the stop words settled, we can see what was
covered by each show on each date. Let’s check out the
Summary Statistics.
c. Take some time looking at the other Summary Statistics. Try clicking on things and see what happens
(note: most things can be clicked on, and the analysis tool will react by showing you more context).
christophermchurch.com
Aggregate Analysis
1. OK, now what if we wanted to compare the different Sunday Shows against one another for the entire month of
February? Well, to do so, we’d need to do some pre-processing on the data. In fact, the individual transcripts
were already pre-processed in order to remove the speaker tags from the text.
i. n.b. All data analysis requires manipulation of the data prior to analysis, and this varies from a
small amount to a great deal, and is always based on the scholar’s knowledge of the data set.
When doing data analysis, always try to make this transparent (you can see my pre-processing
code at https://github.com/cmchurch/NLP-SUNDAY-NEWS_tutorial).
2. So, now we need to start a new Voyant Tools display for our aggregate analysis. Leaving the previous one open,
create a new tab, go to voyant-tools.org, and enter the following three URLS:
http://www.christophermchurch.com/uploads/voyant-workshop/aggregate/FNS_all.txt
http://www.christophermchurch.com/uploads/voyant-workshop/aggregate/MTP_all.txt
http://www.christophermchurch.com/uploads/voyant-workshop/aggregate/SOTU_all.txt
NOTE: As mentioned earlier, you can also download the data and use the Upload Dialog
(http://www.christophermchurch.com/uploads/voyant-workshop/aggregate/data.zip).
3. As before, make sure you filter out the stop words to get a
meaningful picture of the corpus’ contents.
a. Let’s first search for all words in the corpus that have
“tax” in it.
b. This will plot all the words on our Word Trends widget.
christophermchurch.com
Aggregate Analysis (2)
c. Do the same for “healthcare” and then bring up the Favorites list by clicking on the heart.
n.b. ignore the error that pops up (it’s from the space character in “health care”
d. Now we can check both “obamacare” and “heath care” and see them on the Word Trends widget. Make
sure to uncheck the “collapse terms” button to make the comparison.
6. Now, we need to be careful about how we draw conclusions regarding comparisons between the three shows.
Many words have a wide variety of meanings (e.g. right vs left) that can complicate what we think we know.
To get a sense of how the word is being used, you use the
Keywords in Context widget.
7. Now, let’s save the link to our data. We can use the Export Option to save our work, or to use the same data
with different tools. Access the Export by clicking the diskette icon in the upper left corner.
christophermchurch.com
Other Tools to Try and Enrichment Activities
Collocation Networks
Voyant Tools used to Analyze Runaway Slave Ads in Mississippi and Arkansas:
http://digitalhistory.blogs.rice.edu/files/2014/02/voyant-presentation.pdf
christophermchurch.com
Humanists’ Listserv Data Set (note: very large and slow)
If you want to take a look at a fairly large data set, you can
explore the Humanists’ Listserv data, which contains all
listserv emails among humanists (think H-Net) from 1987
to 2008. You can do so by clicking “Open” on the main
voyant-tools screen. This can give you a change-over-time
view of what sorts of things scholars were discussing over the past two decades. Below are two examples of the changes
you can see in the data. You can replicate these results, or discover your own.
christophermchurch.com