Posts filed under Graphics (292)

January 29, 2015

Absolute risk/benefit calculators

An interesting interactive calculator for heart disease/stroke risk, from the University of Nottingham. It lets you put in basic, unchangeable factors (age,race,sex), modifiable factors (smoking, diabetes, blood pressure, cholesterol), and then one of a set of interventions

Here’s the risk for an imaginary unhealthy 50-year old taking blood pressure medications


The faces at the right indicate 10-year risk: without the unhealthy risk factors, if you had 100 people like this, one would have a heart attack, stroke, or heart disease death over ten years, with the risk factors and treatment four  would have an event (the pink and red faces).  The treatment would prevent five events in 100 people, represented by the five green faces.

There’s a long list of possible treatments in the middle of the page, with the distinctive feature that most of them don’t appear to reduce risk, from the best evidence available. For example, you might ask what this guy’s risk would be if he took vitamin and fish oil supplements. Based on the best available evidence, it would look like this:



The main limitation of the app is that it can’t handle more than one treatment at a time: you can’t look at blood pressure meds and vitamins, just at one or the other.

(via @vincristine)

January 8, 2015

Climate trends

From an interview with Robert Simmons, a data visualisation designer specialising in environmental data, this graph was created by Chloe Whiteaker (at Bloomberg) working with NASA’s Gavin Schmidt. It shows a thirty-year global temperature trend centered around each year.


If you just plotted the central point of each line segment you’d have a ‘local linear smoother’, one of the standard ways of drawing a smooth curve through a set of data. Plotting the whole line segment makes it clearer how the curve is computed.

(via Alberto Cairo)


January 3, 2015

Cancer isn’t just bad luck

From Stuff

Bad luck is responsible for two-thirds of adult cancer while the remaining cases are due to environmental risk factors and inherited genes, researchers from the Johns Hopkins Kimmel Cancer Center found.

The idea is that some, perhaps many, cancers come from simple copying errors in DNA replication. Although DNA copying and editing is impressively accurate, there’s about one error for every three cell divisions, even when nothing is wrong. Since the DNA error rate is basically constant, but other risk factors will be different for different cancers, it should be possible to separate them out.

For a change, this actually is important research, but it has still been oversold, for two reasons. Here’s the graph from the paper showing the ‘2/3′ figure: the correlation in this graph is about 0.8, so the proportion of variation explained is the square of that, about two-thirds.  (click to embiggen)


There are two things to notice about this graph. First, there are labels such as “Lung (smokers)” and “Lung (non-smokers)”, so it’s not as simple as ‘bad luck’.  Some risk factors have been taken into account. It’s not obvious whether this makes the correlation higher or lower.

Second, the y-axis is on a log scale, so the straight line fit isn’t to cancer incidence and the proportion of variation explained isn’t a proportion of cancer risk.  Using a log scale for incidence is absolutely right when showing the biological relationship, but you can’t read proportions of incidence explained off that graph.  This is what the graph looks like when the y-axis is incidence, either with the x-axis still on a logarithmic scale


or with neither axis on a logarithmic scale


The proportion of variation explained is 18% and 28% respectively.

It’s ok to transform the x-axis as much as we like, so I looked at a square root transformation on the x-axis (based on the slope of the log-log graph). This gets the proportion of incidence explained up to about one third. Not two-thirds.

Using the log scale gives a lot more weight to the very rare cancers in the lower left corner, which turn out not to have important modifiable risk factors. Using an untransformed y-axis gives equal weight to all cancers, which is what you want from a medical or public health point of view.

Except, even that isn’t quite right. If you look at my two graphs it’s clear that the correlation will be driven by the top three points. Two of those are familial colorectal cancers, and the incidence quoted is the incidence in people with the relevant mutations; the third is basal cell carcinoma, which barely counts as cancer from a medical or public health viewpoint

If we leave out the familial cancers and basal cell carcinoma, the proportion explained drops to about 10%.

If we leave out put back basal cell carcinoma as well, something statistically interesting happens. The correlation shoots back up again, but only because it’s being driven by a single point. A more honest correlation estimate, predicting each point based on the other points and not based on itself, is much lower.

So, in summary: the “two-thirds of cancers explained” is Just Wrong. Doing a mathematically correct calculation gives about one third. Doing a calculation that’s actually relevant to cancer in the population gives even smaller values. (update) That’s not to say that DNA replication errors are unimportant — the paper makes it clear that they are important.

December 27, 2014

The Lesser Spotted Hutt Man Drought

From the Christmas Eve edition of the Upper Hutt Leader, which you can read online:

Ladies, be warned — Upper Hutt is in  the grip of a man drought

Here’s the graph to prove it (via Richard Law, on Twitter)



As the graph clearly indicates, women outnumber men hugely in the 25-35 age range, and (of course) at the oldest ages. The problem is, the y-axis starts at 45%. For lines or points that’s fine, but for bar charts it isn’t — because the bars connect the points to the x-axis.

This is Stats New Zealand’s version of the graph, in standard ‘population pyramid’ form. It’s much less dramatic.


We could try a barchart with axis at zero


It’s still much less dramatic — and you can see why the paper chopped the ages off at 75, since using the full range available in the data wouldn’t have fit on their axes.  The y-axis wasn’t just trimmed to fit the data; it was trimmed beyond the data.

You could make a case that ‘zero’ in this example is actual 50%: we (well, not we, but journalists who have to fill space) care about the deficiency or surplus of members of the appropriate sex.


Or, you could look at deficiency or surplus of individuals, rather than percentages


Using individuals makes the younger age groups look more important, which helps the story, but on the other hand shows that the scale of this natural disaster isn’t all that devastating.

That’s basically what the expert quoted in the story says. Prof Garth Fletcher, from VUW, says

“People in Upper Hutt or Lower Hutt, they go to parties, they go to bars, they go to places in the wider Wellington area.”

It was only when you started having a gap between men and women of more than 5 or 10 percent that there would be real world implications, he said.


[Update: My data and graphs are for Upper Hutt (city). That’s about 2/3 of the Rimutaka electorate, which is where the paper’s data are for]

December 20, 2014

Not enough pie

From James Lee Gilbert on Twitter, a pie chart from WXII News (Winston-Salem, North Carolina)


This is from a (respectable, if pointless) poll conducted in North Carolina. As you can clearly see, half of the state favours the local team. Or, as you can clearly see from the numbers, one-third of the state does.

If you’re going to use a pie chart (which you usually shouldn’t), remember that the ‘slices of pie’ metaphor is the whole point of the design. If the slices only add up to 70%, you need to either add the “Other”/”Don’t Know”/”Refused” category, or choose a different graph.

If your graph makes it easy to confuse 1/3 and 1/2, it’s not doing its job.

December 15, 2014

Interactive city statistics from UK

From the Centre for Advanced Spatial Analysis, at University College London, beautiful and informative maps: is a mapping platform designed to explore the performance and dynamics of cities in Great Britain. The site brings together a wide range of key city indicators, including population, growth, housing, travel behaviour, employment, business location and energy use. These indicators are mapped using a new 3D approach that highlights the size and density of urban centres, and allows relationships between urban form and city performance to be analysed.

The credits are also interesting:

Maps created using TileMill opensource software by Mapbox. Website design uses the following javascript libraries- leaflet.js, mapbox.js and dimple.js (based on d3.js).

Source data Crown © Office for National Statistics, National Records of Scotland, DEFRA, Land Registry, DfT and Ordnance Survey 2014.

All the datasets used are government open data. Websites such as LuminoCity would not be possible without recent open data initiatives and the release of considerable government data into the public domain. Links to the specific datasets used in each map are provided to the bottom right of the page under “Source Data”.

The proliferation of interesting interactive graphics relies very heavily on open-source software (so designers don’t have to be expert programmers) and open data (to give something to display).

December 14, 2014

Statistics about the media: Lorde edition

From @andrewbprice on Twitter: number of articles in the NZ Herald each day about the musician Lorde


The scampi industry, which brings in similar export earnings (via Matt Nippert), doesn’t get anything like the coverage (and fair enough).

More surprisingly, Lorde seems to get more coverage than the mother of our next head of state but two.  It may seem that the royal couple is always in the paper, but actually whole weeks can sometimes go past without a Will & Kate story.

December 13, 2014

Barchart of the week


Via SkepChick, this chart from Venezolana de Televisión (Venezuelan national TV) during the 2013 elections almost makes Fox News look good.

December 12, 2014

Diversity maps

From Aaron Schiff, household income diversity at the census area level, for Auckland


The diversity measure is based on how well the distribution of income groups in the census area unit matches the distribution across the entire Auckland region, so in a sense it’s more a representativeness measure —  an area unit with only very high and very low incomes would have low diversity in this sense (but there aren’t really any). The red areas are low diversity and include the wealthy suburbs on the Waitemātā harbour and the Gulf, and the poor suburbs of south Auckland. This is an example of something that can’t be a dot map: diversity is intrinsically a property of an area, not an individual


From Luis Apiolaza, ethnic diversity in schools across the country



This screenshot shows an area in south Auckland, and it illustrates that ‘diversity’ really means ‘diversity’, it’s not just a code word for non-white. The low-diversity schools (white circles) in the lower half of the shot include Westmount School (99% Pākehā), but also Te Kura Māori o Ngā Tapuwae (99% Māori), and St Mary MacKillop Catholic School (90% Pasifika).  The high-diversity schools in the top half of the shot don’t have a majority of students from any ethnic group.

December 8, 2014

Political opinion: winning the right battles

From Lord Ashcroft (UK, Conservative) via Alex Harroway (UK, decidedly not Conservative), an examination of trends in UK opinion on a bunch of issues, graphed by whether they favour Labour or the Conservatives, and how important they are to respondents. It’s an important combination of information, and a good way to display it (or it would be if it weren’t a low-quality JPEG)



Ashcroft says

The higher up the issue, the more important it is; the further to the right, the bigger the Conservative lead on that issue. The Tories, then, need as many of these things as possible to be in the top right quadrant.

Two things are immediately apparent. One is that the golden quadrant is pretty sparsely populated. There is currently only one measure – being a party who will do what they say (in yellow, near the centre) – on which the Conservatives are ahead of Labour and which is of above average importance in people’s choice of party.

and Alex expands

When you campaign, you’re trying to do two things: convince, and mobilise. You need to win the argument, but you also need to make people think it was worth having the argument. The Tories are paying for the success of pouring abuse on Miliband with the people turned away by the undignified bully yelling. This goes, quite clearly, for the personalisation strategy in general.