This post will explore how to analyze multiple texts using Voyant. In the post below I’ll look specifically at word use through the tools that Voyant offers and explain how to use them.
Pew Research on Political Polarization and Media Habits came out with a study where respondents were asked whether they had heard of a news outlet, gone there for news in the past week, and whether they consider it trustworthy. As part of a year long effort to examine political polarization in the United States, the study looked at three different settings: the news media, social media, and the way people talk about politics with their friends and family¹. The study stated that overall the rifts in the political spectrum can be overstated and that the majority of Americans actually share a lot of common ground. This is overshadowed though by those at the left and right ends of the spectrum, who make up 20% of the public overall, but were found to have the greatest impact on the political process than those with mixed ideological views¹. This is because those on either end of the spectrum were more likely to “vote, donate to campaigns, and participate directly in politics”¹.
The idea behind this post is that where people get their news from can influence not only how they think about politics but also their views on the larger world around them. Using the “Trust” metric, I wanted to see the differences in word usage between the news outlets that are trusted across the spectrum of respondents. For this case I also only wanted to use those sources with a high amount of trust from the respondents, so anything below a 50% trust rating for a particular group (see below), I didn’t use. In addition, because there are differences in the amount of sources trusted between the liberal and conservative bases, I only chose articles from the top 3 with the highest trust rating for each group.
As a case to examine, I used the news articles about Freddie Gray, a young man who died in Baltimore police custody in April. The case of Freddie Gray and how the news media reported it highlighted the importance of word choice. A good example is the case of the word “Thug”. After Mayor Stephanie Rawlings-Blake apologized for her use of the word to describe the law breaking activities committed by the minority of protestors during the two nights of violence in Baltimore, the discussion that followed lead to the media no longer using the word².
Since different articles have a different amounts of words, I focused on the total amount of words from the different sources instead of number of articles. In addition to news articles there were also transcripts from shows used. The date range for the articles were only from April 15, four days before Gray died, to May 4, the day after the National Guard began pulling out of Baltimore.
Voyant is a web-based analysis tool to help you see through digital texts. It has several features as you’ll see below.
Once you get to the Voyant page you’ll notice a box for inputting URLs or text.
If you’re doing text analysis of news articles like I am, I’d highly recommend not using URLs. The reason you shouldn’t use URLs in this tool is because it’ll pick the page structure, such as links at the top, side, and bottom of the page, as well as advertisement data. If you look below anything with an arrow (plus more) will count in your corpus analysis. If you want a clear analysis, I recommend taking the time to copy and paste the title and text only from every article you’d like to use into a separate text-only file. This is the step that took me the longest but it was worth it when it came time to start using Voyant.
Uploading One Or More Text(s) for Analysis
Once you have the text you’d like to use in a text file there are a couple ways you can upload the text, all of which are highlighted below:
If you want to compare texts then you’d upload through the multiple texts option. Select all of the files of text you’d like to analyze at once.
Getting the Text Ready to Analyze
After clicking reveal you’ll see something like this below. This is a picture of a single text corpus, which I’m using to make the feature descriptions easier to understand.
In the Summary, Cirrus, and Words in Entire Corpus areas you’ll notice that the most common words used are “the”, “a”, “to”, etc. In analyzing the text I’m more interested in other parts of speech such as nouns and adjectives so I’m going to get rid of these words, which Voyant refers to as “Stop Words”. To get rid of the Stop Words:
Then choose the language of your text and click “OK”:
You can also edit words in the Stop Word list by click “Edit Stop Words” and save it as a “Custom List”. Remember that you’ll have to do this for both the Summary, Cirrus, and Words in Entire Corpus areas too. A custom list can be saved to easily implement across all three areas so you don’t have to retype stop words you’ve already added into another area. Now your screen will look more like this:
Explanation of Features
In terms of weather, cirrus clouds are those whispy clouds you see on clear days and are located at higher altitudes. This concept is applied to Voyant as well in the Cirrus Tool section. The Cirrus Tool is a word cloud that displays the highest frequency words in a corpus and makes the words different sizes in the corpus depending on their frequency. Remember you can always edit the words displayed in the “Stop Words” section (see above).
Summary, Keyword Trends & Context Tools
The Corpus Summary tool provides an overview of the following:
- number of words
- number of unique words
- longest and shortest documents
- highest and lowest vocabulary density
- most frequent words
- notable peaks in frequency
- distinctive words
You can see these statistics by clicking on a word in the Summary area or by manually entering it in the search area. If you click on a word in the Summary area then the Word Trends will expand from the right side of the screen:
You can also click to see the keyword in the context and where the words are located in the document overall:
Analyzing Multiple Texts
The real value of Voyant is being able to see trends across different corpuses of text. In my case, I broke the texts down across different sources (i.e. Rush, Hannity, and Fox News) to identify differences in frequency of word use:
In the example above you can see the relative frequency of when each source used the word “democrat” in reporting about Freddie Gray, most of which was done by the Rush Limbaugh Show. By contrast, you can upload all the text separated by liberal, conservative, and mixed ideologies to see the frequency of the same word, “democrat” (see below):
In this image you can see the raw frequencies of the mixed and liberal media using the word “democrat” 0-1 time(s) in all the Freddie Gray coverage versus 51 times from the conservative news text. The question here is ,”Why is this the case?”
Voyant can also reveal the differences in a news source’s angle, such as the frequency inverse across the spectrum on the frequency of the words “people” and “officers”:
Why do conservative sources use the word “people” twice as much as the word “officers”? And why do mixed and liberal media use the word “officers” twice as more frequently than “people”?
Why does the word “switchblade” have a higher frequency in the mixed and liberal news sources than the conservative one?
Limitations to Voyant: Going Deeper Into The Text
Voyant is great for visualizing information in an easily digestible way. A limitation to Voyant is being able to analyze the parts of speech in a corpus of text. There are a couple different ways to analyze the parts of speech, find named entities, etc in the corpus, which will be examined in the next post.
More to come soon. Thanks for reading!