Displaying Data: Analysing Text

Language is our primary form of communication, as the words we write or speak are used as tools to transfer meaning.  But what meaning can data visualisation give about words?  Can data visualisation reveal patterns and insights into large bodies of text or is it used simply as a medium to produce art?

In this post for our ongoing series on Displaying Data, I’ll be looking at ways bodies of text have been visualised.

Word Clouds

Also known as a “Tag Cloud”, this form of visualising text displays how frequently words appear in a body of text, by displaying a cluster of words in which they are all sized in proportion to their frequency.  You can see an example of this below, where I’ve used the text Plato’s Republic as an example:

analysing text

You can see words like “Yes”, “one” and “good” stand out the most, which shows that these words have been used and repeated the most in the book.

While this is the typical format for Word Clouds, they’re not limited to clusters/clouds.  The words can also be arranged in layouts other than a cloud cluster: on horizontal lines, columns or within a shape. Also, the size of the words can be in proportion to another variable assigned to them, not just by their frequency.  A good example of this would be to display a Word Cloud with all the World’s countries and have the size of the country’s name in proportion to its population size or GDP.

The downside to Word Clouds are that longer words and words that contain many ascenders and descenders are given more emphasis.  Also World Clouds are not great for any analytical accuracy and are therefore more for aesthetic use.

Word Clouds are not so popular anymore, as they have a bad reputation of being tacky and cheesy. However, below are some great examples of where Word Clouds have been combined with other methods of displaying data visualisation to produce some interesting results.

US presidential inauguration speeches: how does Obama’s second compare?

Working for The Guardian, Santiago Ortiz created a visualisation that uses all the words said in every US Presidential inauguration speech since Richard Nixon in 1969.

analysing text

Source: The Guardian

The colours used are assigned to each US President, which you can see in the legend at the bottom. Ortiz has combined both a Word Cloud and a 100% Bar Graph to produce this visualisation: the size of each word in all speeches are displayed in proportion to their frequency, while a 100% Bar Graph is displayed behind each word and is segmented based on how much each President has said it.

Filters are also in place to narrow down results.  Hovering your mouse over a word will only show the other words connected to it and the frequency it’s been mentioned by each President is displayed in the legend at the bottom as a mini bar graph.  If you hover the mouse over each President in the legend, then only their Word Clouds are displayed.

The Republican Nation Convention

Another political visualisation here from The New York Times, which has visualised how frequently speakers at the Republican Nation Convention have used specific phrases and words.  The data are sourced from the Federal News Service and is continuously updated.

This visualisation has combined a Word Cloud with a proportional area chart (displaying shapes in proportion to the data amount) and has displayed the number of mentions, allowing to accurate referencing.  Clicking on a “word bubble” will highlight every mention of that word in the transcripts below.

Word Trees

In this visualisation method, a tree of phrases is depicts the parallel sequencing of words in a body of text.  Like a Word Cloud, the size of the words displayed is proportional to their usage.  Word Trees are useful for showing which words most follow or precede a target word or to show a hierarchy of the terms.

analysing text

Source: The Wordtree

Understanding Shakespeare

In this B.A. thesis project, Stephan Thiel introduced a new way of reading drama to help people understand Shakespeare’s work.  Thiel produced an interesting variety of new ways in which we consume written narrative work and knowledge through the use of code and data visualisation.

In the above example, the major character’s speeches have been highlighted yellow to illustrate the amount of spoke words they’ve used, compared to the rest of the play. Also the size of each word is displayed in proportion to its frequency.

Liked this? Sign up and become a follower of Views - the go-to destination for breaking news, insights and everything VISU.AL