Posts filed under Graphics (336)

September 28, 2015

Seeing the margin of error

A detail from Andrew Chen’s visualisation of all the election polls in NZ:


His full graph is somewhat interactive: you can zoom in on times, select parties, etc. What I like about this format is how clear it makes the poll-to-poll variability.  The poll result for, say, National isn’t a line, it’s a cloud of uncertainty.

The cloud of uncertainty gets narrower for minor parties (as detailed in my cheatsheet), but for the major parties you can see it span an entire 10-percentage-point grid cell or more.

September 26, 2015

US:China graph of the day

This (via @albertocairo) is from the Guardian, two years ago.


At first it looks like a pie chart, but it isn’t. It’s a set of bar charts warped into a circle, so that the ratio of blue and red areas in a wedge is the square of the ratio of the numbers. Also, the circle format means the longest wedge in each pair must be the same length: 8.6% unemployment rate is the same as 4.6% military expenditure, 104% market capitalisation, and 46 Olympic gold medals.

Many of these are proportions or per-capita figures, but not all. Carbon emissions are national totals, making China look worse. Film industry revenues and exports are totals; they are also gross revenues — because the whole visual metaphor falls apart completely for numbers that can be negative. That’s why the current-year budget surplus/deficit isn’t treated like the other numbers.

There are also some unusual definitions. “Social media”, the bar where China is furthest behind, is defined just by the proportion who use Facebook, which obviously underestimates the social-media activity of the US (and also, perhaps, of China).

The post has some discussion of the difficulties — for example, the measurement and even the definition of unemployment in the two counties — and is much better than the graph.

Here’s a different take on the same countries, in the same format, from the World Economic Forum


They have similar problems with total vs proportion/mean variables. They solve the y-axis problem by working with international ranks, which at least gives a common scale. However, having 1 as the largest rank and some unspecified large number as the smallest rank does make the relationship between area and number fairly weird.  It also means that the actual numbers for each wedge aren’t fractions of a total in any sensible way.

If the main point is to be an eye-catching hook for the story, the Guardian graph is more successul

August 31, 2015

Graph of the day

Literally, this time. I got this from Andrew Gelman, but it’s too good not to share. It’s originally from the Wall Street Journal


Apart from the attempts to make the body part representative of the activity, the unwisdom of playing soccer in high heels, and the mystery of what it actually is that she’s eating or drinking (a martini? an icecream?), there are some generalisable graphical points.

First, comparison of area between different shapes is hard, and so isn’t a good way to display data: it’s not immediately clear whether the Knee of Religion is larger than the Forehead of Education or the Shoe of Caring.

Second, trying to code the direction of change with colour means you can’t use colour (consistently) to distinguish categories.

Third, some of the figures aren’t very helpful because they average over everyone: only about 60% of the adult population is in paid employment, and only a small proportion are in education. For people who work or study the time spent is a lot more than the average, for everyone else it’s zero.

And finally, if you have to write all the numbers on the graph, the graph isn’t doing its job.

August 25, 2015

Computation and art


Normally I wouldn’t be linking favourably to this scatterplot, which has an ill-defined sampling scheme, and where at least the y-axis data are objectively wrong.  On the other hand, normally the scatterplot would be there to convey information.  In this case it’s just an index to some beautiful animated triangular art


The point, and the relevance to this blog, is the way Matt Daniels has written software to make these pictures (relatively) easy to create.


Incidentally, before anyone starts complaining that sharks and fish are separate, that bit is exactly correct.  Fish (typical fish with bones, such as the swordfish in the animation) have a more recent common ancestor with sheep than with sharks.

August 23, 2015

Barcharts with delusions of grandeur

The cricket graphics system now allows 3-d barcharts projected over the playing field, and casting actual virtual shadows.


Yeah, nah.

August 19, 2015

Stereotype and caricature

I’ve posted a few times about the maps, word clouds, and so on that show the most distinctive words by gender or state — sometimes they are even mislabelled as the “most common” words.  As I explained, these are often very rare words; it’s just that they are slightly less rare in one group than in the others.

An old post from the XKCD blog gives a really good example. Randall Munroe set up a survey to show people colours and ask for the colour name. He got five million responses, from over 200,000 sessions, and came up with nearly 1000 reasonably well-characterised colours.  You can download the complete data, if you care.

The survey asked participants about their chromosomal sex, because two of the colour receptor genes are on the X-chromosome and this is linked to colour blindness (and possibly to tetrachromatic vision). It turned out that the basic colour names were very similar between male and female respondents, though women were slightly more likely to use modifiers (“lime green” vs “green”).

However, Munroe also looked at the responses that differed most in frequency between men and women. These were all uncommon responses, but all from multiple people, and after extensive spam filtering.

You can probably guess which group is which:

  1. Dusty Teal
  2. Blush Pink
  3. Dusty Lavender
  4. Butter Yellow
  5. Dusky Rose


  1. Penis
  2. Gay
  3. WTF
  4. Dunno
  5. Baige

(Presumably this is a gender effect, not an X-linked language defect.)


August 17, 2015

More diversity pie-charts

These ones are from the Seattle Times, since that’s where I was last week.

IMAG0103, like many other tech companies, had been persuaded to release figures on gender and ethnicity for its employees. On the original figures, Amazon looked  different from the other companies, but Amazon is unusual in being a shipping-things-around company as well as a tech company. Recently, they released separate figures for the ‘labourers and helpers’ vs the technical and managerial staff.  The pie chart shows how the breakdown makes a difference.

In contrast to Kirsty Johnson’s pie charts last week, where subtlety would have been wasted  given the data and the point she was making, here I think it’s more useful to have the context of the other companies and something that’s better numerically than a pie chart.

This is what the original figures looked like:


Here’s the same thing with the breakdown of Amazon employees into two groups:


When you compare the tech-company half of Amazon to other large tech companies, it blends in smoothly.

As a final point, “diversity” is really the wrong word here. The racial/ethnic diversity of the tech companies is pretty close to that of the US labour force, if you measure in any of the standard ways used in ecology or data mining, such as entropy or Simpson’s index.   The issue isn’t diversity but equal opportunity; the campaigners, led by Jesse Jackson, are clear on this point, but the tech companies and often the media prefer to talk about diversity.


August 14, 2015

Sometimes a pie chart is enough

From Kirsty Johnson, in the Herald, ethnicity in the highest and lowest decile schools in Auckland.


Statisticians don’t like pie charts because they are inefficient; they communicate numerical information less effectively than other forms, and don’t show subtle differences well.  Sometimes the differences are sufficiently unsubtle that a pie chart works.

It’s still usually not ideal to show just the two extreme ends of a spectrum, just as it’s usually a bad idea to show just two points in a time series. Here’s the full spectrum, with data from EducationCounts



[The Herald has shown the detailed school ethnicity data before in other contexts, eg the decile drift story and graphics from Nicholas Jones and Harkanwal Singh last year]

I’ve used counts rather than percentages to emphasise the variation in student numbers between deciles. The pattern of Māori and Pacific representation is clearly different in this graph: the numbers of Pacific students fall off dramatically as you move up the ranking, but the numbers of Māori students stabilise. There are almost half as many Māori students in decile 10 as in decile 1, but only a tenth as many Pacific students.

If you’re interested in school diversity, the percentages are the right format, but if you’re interested in social stratification, you probably want to know how students of different ethnicities are distributed across deciles, so the absolute numbers are relevant.


August 6, 2015

Graph legends: ordering and context

I’m not going to make a regular habit of criticising the Herald’s Daily Pie — for a start, it only appears in the print version, which I don’t see.  Today’s one, though, illustrates a couple of issues in graph legends


The first issue is ordering. That’s almost trivial with just two values, but I actually found it distracting to have “South Island” at the top of the legend, especially when the corresponding red wedge is higher on the page than the blue wedge. I had to look twice to work out which wedge was which.  Reordering with “North Island” at the top would have helped, as would putting the labels on the pie (instead of the numbers).

Second, there’s the Note:

The total pigs number includes all other pigs such as mated gilts, baconers, porkers, and piglets still on the farm.

which comes directly from the StatsNZ table (of data from the Agricultural Production Survey). I know that, because these tables are the only place Google can find even the sub-phrase “such as mated gilts”.  In the context of the table, the note says that the “at June 30” columns for total pigs include the “Breeding sows (1-year-old and over)” given in earlier columns of the table, plus other categories that someone interested in the data would probably be familiar with. Without the earlier columns, the reaction should be “other than what?”.

Looking at the StatsNZ table you also learn the reason why “At June 30” in the title is important. The total “includes piglets still on the farm”, but not the much larger number of ex-piglets that have become part of the pork products industry: there were over 600,000 piglets weaned on NZ farms during the year, but only 287,000 pigs still on farms as of June 30.

August 2, 2015

Pie chart of the week

A year-old pie chart describing Google+ users. On the right are two slices that would make up a valid but pointless pie chart: their denominator is Google+ users. On the left, two slices that have completely different denominators: all marketers and all Fortune Global 100 companies.

On top of that, it’s unlikely that the yellow slice is correct, since it’s not clear what the relevant denominator even is. And, of course, though most of the marketers probably identify as male or female, it’s not clear how the Fortune Global 100 Companies would report their gender.


From @NoahSlater, via @LewSOS, originally from kwikturnmedia about 18 months ago.