Posts filed under Graphics (367)

October 30, 2016

Suboptimal ways to present risk

Graeme Edgeler nominated this, from PBS Frontline, to @statschat as a bad graph


It’s actually almost a good graph, but I think it’s trying to do too many things at once. There are two basic numerical facts: the number of people trying to cross the Mediterranean to escape the Syrian crisis has gone down substantially; the number of deaths has stayed about the same.

If you want to show the increase in risk, it’s much more effective to use a fixed, round denominator —  the main reason to use this sort of graph is that people pick up risk information better as frequencies than as fractions.

Here’s the comparison using the same denominator, 269, for the two years. It’s visually obvious that there has been a three-fold increase in death rate.


It’s harder to convey all the comparisons clearly in one graph. A mosaic plot would work for higher proportions, which we can all hope doesn’t become a relevant fact.


October 18, 2016

The lack of change is the real story

The Chief Coroner has released provisional suicide statistics for the year to June 2016.  As I wrote last year, the rate of suicide in New Zealand is basically not changing.  The Herald’s story, by Martin Johnston, quotes the Chief Coroner on this point

“Judge Marshall interpreted the suicide death rate as having remained consistent and said it showed New Zealand still had a long way to go in turning around the unacceptably high toll of suicide.”

The headline and graphs don’t make this clear

Here’s the graph from the Herald


If you want a bar graph, it should go down to zero, and it would then show how little is changing


I’d prefer a line graph showing expected variation if there wasn’t any underlying change: the shading is one and two standard deviations around the average of the nine years’ rates


As Judge Marshall says, the suicide death rate has remained consistent. That’s our problem.  Focusing on the year to year variation misses the key point.

September 1, 2016

Transport numbers

Auckland Transport released new patronage data, and FigureNZ tidied it up to make it easily computer-readable, so I thought I’d look at some of it.  What I’m going to show is a decomposition of the data into overall trends, seasonal variation, and random stuff just happening. As usual, click to embiggen the pictures.

First, the trends: rides are up.


It’s hard to see the trend in ferry use, so here’s a version on a log scale — meaning that the same proportional trend would look the same for all three modes of transport


Train use is increasing (relatively) faster than bus or ferry use.  There’s also an interesting bump in the middle that we’ll get back to.

Now, the seasonal patterns. Again, these are on a logarithmic scale, so they show relative variation


The clearest signal is that ferry use peaks in summer, when the other modes are at their minimum. Also, the Christmas minimum is a bit lower for trains: to see this, we can combine the two graphs:


It’s not surprising that train use falls by more: they turn the trains off for a lot of the holiday period.

Finally, what’s left when you subtract the seasonal and trend components:


The highest extra variation in both train and ferry rides was in September and October 2011: the Rugby World Cup.


August 20, 2016

The statistical significance filter

Attention conservation notice: long and nerdy, but does have pictures.

You may have noticed that I often say about newsy research studies that they are are barely statistically significant or they found only weak evidence, but that I don’t say that about large-scale clinical trials. This isn’t (just) personal prejudice. There are two good reasons why any given evidence threshold is more likely to be met in lower-quality research — and while I’ll be talking in terms of p-values here, getting rid of them doesn’t solve this problem (it might solve other problems).  I’ll also be talking in terms of an effect being “real” or not, which is again an oversimplification but one that I don’t think affects the point I’m making.  Think of a “real” effect as one big enough to write a news story about.


This graph shows possible results in statistical tests, for research where the effect of the thing you’re studying is real (orange) or not real (blue).  The solid circles are results that pass your statistical evidence threshold, in the direction you wanted to see — they’re press-releasable as well as publishable.

Only about half the ‘statistically significant’ results are real; the rest are false positives.

I’ve assumed the proportion of “real” effects is about 10%. That makes sense in a lot of medical and psychological research — arguably, it’s too optimistic.  I’ve also assumed the sample size is too small to reliably pick up plausible differences between blue and yellow — sadly, this is also realistic.


In the second graph, we’re looking at a setting where half the effects are real and half aren’t. Now, of the effects that pass the threshold, most are real.  On the other hand, there’s a lot of real effects that get missed.  This was the setting for a lot of clinical trials in the old days, when they were done in single hospitals or small groups.


The third case is relatively implausible hypotheses — 10% true — but well-designed studies.  There are still the same number of false positives, but many more true positives.  A better-designed study means that positive results are more likely to be correct.


Finally, the setting of well-conducted clinical trials intended to be definitive, the sort of studies done to get new drugs approved. About half the candidate treatments work as intended, and when they do, the results are likely to be positive.   For a well-designed test such as this, statistical significance is a reasonable guide to whether the effect is real.

The problem is that the media only show a subset of the (exciting) solid circles, and typically don’t show the (boring) empty circles. So, what you see is


where the columns are 10% and 50% proportion of studies having a true effect, and the top and bottom rows are under-sized and well-design studies.


Knowing the threshold for evidence isn’t enough: the prior plausibility matters, and the ability of the study to demonstrate effects matters. Apparent effects seen in small or poorly-designed studies are less likely to be true.

August 18, 2016

Rigorously deidentified pie


Via Dale Warburton on Twitter, this graph comes from page 7 of the 2016 A-League Injury Report (PDF) produced by Professional Footballers Australia — the players’ association for the round-ball game.  It seems to be a sensible and worthwhile document, except for this pie chart. They’ve replaced the club names with letters, presumably for confidentiality reasons. Which is fine. But the numbers written on the graph bear no obvious relationship to the sizes of the pie wedges.

It’s been a bad week for this sort of thing: a TV barchart that went viral this week had the same sort of problem.

August 15, 2016

Graph of the week

From a real estate agent who will remain nameless


Another example of the rule ‘if you have to write out all the numbers, the graph isn’t doing its work.”

August 4, 2016

Garbage numbers

This appeared on Twitter


Now, I could just about believe NZ was near the bottom of the OECD, but to accept zero recycling and composting is a big ask.  Even if some of the recycling ends up in landfill, surely not all of it does.  And the garden waste people don’t charge enough to be putting all my wisteria clippings into landfill.

So, I looked up the source (updated link). It says to see the Annex Notes. Here’s the note for New Zealand

New Zealand: Data refer to amount going to landfill

The data point for New Zealand is zero by definition — they aren’t counting any of the recycling and composting.

When the most you can hope for is that the lies in the graph will be explained in the footnotes, you need to read the footnotes.


May 26, 2016

Budget visualisations

This will likely be updated as I find them

  1. From Keith Ng. Budget now and over time. This gets special mention for being inflation-adjusted (it’s in 2014 dollars). Doesn’t work on my phone, but works well on a small laptop screen
  2. NZ Herald. Works (though hard to read) on a mobile. Still hard to read on a small laptop screen, but attractive on a large screen. I still have reservations about the bubbles.
  3. Stuff has a set of charts. The surplus/deficit one is nicely clear, though there’s nothing about the financial crisis/recession as an explanation for a lot of it.
  4. The government has interactive charts of Core Crown Revenue, Core Crown Expenditure, and breakdown for a taxpayer. On the last one, they lose points for displaying just income tax, when the Treasury are about the only people who could easily do better.
April 29, 2016

Bar chart of the week

From the IMF, using OECD data, (via Sam Warburton)


Bar charts should start at zero (and probably shouldn’t  have distracting house/arrow/tree reflections in the background), but this graph would look even worse if the y-axis went down to zero. The problem is that ‘zero’ isn’t 0 for this sort of measurement.  The index is the price:income ratio now, divided by the price:income ratio in 2010, multiplied by 100.  The “no change” value is 100, which suggests using that for the floor of the bars.  Making the bars wider relative to the spaces gives easier comparisons and makes the graph less busy.  The colour scheme isn’t ideal for dichromats, but it only reinforces the information, it’s not needed to interpret anything.



The next step, as Sam suggested on Twitter, would be to give up on the ‘index’, which is really economist jargon, and just describe the change in %.  He also suggesting putting the two labels in colour (which required some fiddling: for the text colour to look like the bar colour it has to actually be darker).


One might also go back to the full names of the countries, but I quite like the abbreviations.


April 28, 2016


Most of Auckland is within walking distance of a school: there are over 500 schools in the 560 km2 of Auckland’s urban area. That’s usually regarded as a Good Thing, and Healthy. Auckland Transport’s “walking school bus” program takes advantage of it to get kids more active and to get cars off the roads. The coverage is pretty impressive: in this map by Stephen Davis, the circles show a 800m (half-mile, 2 km2 ) area around each school:


However, as a story at Stuff notes,  if most everywhere in Auckland is close to a school, the schools are going to be close to other establishments.  With a school on most square kilometres of urban land, there will be shops in the square kilometre around most schools selling fast food, or junk food.

That’s going to be even more true in denser, more walkable cities elsewhere, from Amsterdam to New AmsterdamYork.  “Near schools” isn’t a thing in cities. To reduce the number of these shops near schools, you have to reduce them everywhere.

This isn’t to say that all restrictions on fast-food sales are unreasonable, but having lots of things in a relatively small area is hard to avoid in cities. It’s how cities work.