Displaying Data: Distribution & Ranges

Everything in the Universe is distributed and scattered around: stars permeate our galaxy, we all live in different parts of the world and furniture in your house is positioned seemingly in and organised fashion throughout your rooms.  But it’s the relationship of these distributions that can make them interesting.  How are distributions concentrated? By how much are they spread?  Why are the distributions positioned in a certain way?

In this post on Displaying Data, I will be looking at graphical representations of ranged and distributed data.

Histograms

While the graph below may look like your ordinary Bar Graph, there’s a distinct difference: Bar Graphs visualises and compare categorical data, while Histograms will visualise the distribution of data over a continuous interval or time period.  Each bar in a Histogram represents the tabulated frequency at each interval (1, 2, 3…).

data distribution

Therefore, Histograms are useful for giving an estimate of where values are concentrated in the data.  They also help show where the extremes are and whether there are any gaps or anomalies in the data.

Population Pyramids

In this type of chart, two Histograms are paired back-to-back to visualise the sex and age distribution of a population.  One side represents male, the other female.  The x-axis is used to plot the number of people in a population, while the y-axis divides the graph into age groups.

data distribution

Population Pyramids are useful for detecting changes or differences in population patterns.  Multiple Population Pyramids can be used alongside each other to compare different populations.

The shape of a Population Pyramid can say a lot about the state that particular population is in.  For example, a Population Pyramid with a wider top half and narrower base, suggests an ageing population.  You can see some recent, real-life examples of this in LSE Cities’ article Urban Age Cities Compared.

Point or Stripe Plots

Here points or stripes are plotted against a single axis.  You can see from where the points (or stripes) clutter around the most to be where most of the data is concentrated.  This type of graph is also good for simplistically showing how the data is distributed along the scale.

data distribution

Span Charts

You might know this chart under another name: range bar/column graph, difference graph or as a high-low graph.  But it’s function is all the same: to display dataset ranges between a minimum value and a maximum one, which is shown by the start and end of each bar.

graphics-03

This makes Span charts ideal for making comparisons between simple ranges.  However, Span charts are limited: they only focus the reader on the extreme values and give no information on the values in-between or the averages or the distribution of the data.  This is where the next chart is useful.

Box & Whisker Plots

Typically used in descriptive statistics, Box & Whisker Plots are a convenient way of displaying data through their quartiles.  This chart is divided into different ‘parts’ to give a more detailed analysis of the data: the center line (median), the box (upper to lower quartiles) and the whiskers (upper to lower extreme).

data distribution

Although Box & Whisker are hard to read without training, once you understand them you can see in the data the average, median and each percentile.  You will be also able to detect any outliers, if the data is symmetrical, how tightly the data is grouped and if the data is skewed in a particular direction.

Violin Plots

Similar to a Box & Whisker Plot, except that they also show the probability density of the data at different values by using a pair of conjoined kernel density plots over a box plot.

data distribution

Stem & Leaf Plots

Stem & Leaf Plots (or Stemplots), are a way of organising  and displaying data via their place value to show the data distribution.  While Stemplots don’t visually encode the data into a graph, they instead display the data raw as a useful reference tool.  A good example of this is the public transport schedule below, which has displayed the train times for both north and south bounds trains.

data distribution

So Stemplots help give a quick overview of the data distribution, are useful for highlighting outliers and finding the mode.

The major downside to Stemplots are that they’re limited in the size of dataset they can handle.  If it’s too little, then the Stemplot becomes unless.  Too much and the Stemplot becomes over-cluttered.

Dot Distribution Maps

This type of map helps show how data is distributed geographically, in order to detect spacial patterns, which might relate to the location.  The clustering of points on the map displays where values are concentrated geographically.graphics-08

If you are considering visualising distributed or ranged data and are still unsure with the best visualisation to apply, please feel to get in contact with us and we’ll be happy to talk you.

Liked this? Sign up and become a follower of Views - the go-to destination for breaking news, insights and everything VISU.AL