Posts filed under Just look it up (230)

March 20, 2015

Ideas that didn’t pan out

One way medical statisticians are trained into skepticism over their careers is seeing all the exciting ideas from excited scientists and clinicians that don’t turn out to work. Looking at old hypotheses is a good way to start. This graph is from a 1986 paper in the journal Medical Hypotheses, and the authors are suggesting pork consumption is important in multiple sclerosis, because there’s a strong correlation between rates of multiple sclerosis and pork consumption across countries:


This wasn’t a completely silly idea, but it was never anything but suggestive, for two main reasons. First, it’s just a correlation. Second, it’s not even a correlation at the level of individual people — the graph is just as strong support for the idea that having neighbours who eat pork causes multiple sclerosis. Still, dietary correlations across countries have been useful in research.

If you wanted to push this idea today, as a Twitter account claiming to be from a US medical practice did, you’d want to look carefully at the graph rather than just repeating the correlation. There are some countries missing, and other countries that might have changed over the past three decades.

In particular, the graph does not have data for Korea, Taiwan, or China. These have high per-capita pork consumption, and very low rates of multiple sclerosis — and that’s even more true of Hong Kong, and specifically of Chinese people in Hong Kong.  In the other direction, the hypothesis would imply very low levels of multiple sclerosis among US and European Jews. I don’t have data there, but in people born in Israel the rate of multiple sclerosis is moderate among those of Ashkenazi heritage and low in others, which would also mess up the correlations.

You might also notice that the journal is (or was) a little non-standard, or as it said  “intended as a forum for unconventional ideas without the traditional filter of scientific peer review”.

Most of this information doesn’t even need a university’s access to scientific journals — it’s just out on the web.  It’s a nice example of how an interesting and apparently strong correlation can break down completely with a bit more data.

March 17, 2015

Bonus problems

If you hadn’t seen this graph yet, you probably would have soon.


The claim “Wall Street bonus were double the earnings of all full-time minimum wage workers in 2014″ was made by the Institute for Policy Studies (which is where I got the graph) and fact-checked by the Upshot blog at the New York Times, so you’d expect it to be true, or at least true-ish. It probably isn’t, because the claim being checked was missing an important word and is using an unfortunate definition of another word. One of the first hints of a problem is the number of minimum wage workers: about a million, or about 2/3 of one percent of the labour force.  Given the usual narrative about the US and minimum-wage jobs, you’d expect this fraction to be higher.

The missing word is “federal”. The Bureau of Labor Statistics reports data on people paid at or below the federal minimum wage of $7.25/hour, but 29 states have higher minimum wages so their minimum-wage workers aren’t counted in this analysis. In most of these states the minimum is still under $8/hr. As a result, the proportion of hourly workers earning no more than federal minimum wage ranges from 1.2% in Oregon to 7.2% in Tennessee (PDF).  The full report — and even the report infographic — say “federal minimum wage”, but the graph above doesn’t, and neither does the graph from Mother Jones magazine (it even omits the numbers of people)

On top of those getting state minimum wage we’re still short quite a lot of people, because “full-time” is defined by 35 or more hours per week at your principal job.  If you have multiple part-time jobs, even if you work 60 or 80 hours a week, you are counted as part-time and not included in the graph.

Matt Levine writes:

There are about 167,800 people getting the bonuses, and about 1.03 million getting full-time minimum wage, which means that ballpark Wall Street bonuses are 12 times minimum wage. If the average bonus is half of total comp, a ratio I just made up, then that means that “Wall Street” pays, on average, 24 times minimum wage, or like $174 an hour, pre-tax. This is obviously not very scientific but that number seems plausible.

That’s slightly less scientific than the graph, but as he says, is plausible. In fact, it’s not as bad as I would have guessed.

What’s particularly upsetting is that you don’t need to exaggerate or use sloppy figures on this topic. It’s not even that controversial. Lots of people, even technocratic pro-growth economists, will tell you the US minimum wage is too low.  Lots of people will argue that Wall St extracts more money from the economy than it provides in actual value, with much better arguments than this.

By now you might think to check carefully that the original bar chart is at least drawn correctly.  It’s not. The blue bar is more than half the height of the red bar, not less than half.

March 9, 2015

Not all there

One of the most common problems with data is that it’s not there. Families don’t answer their phones, over-worked nurses miss some forms, and even tireless electronic recorders have power failures.

There’s a large field of statistical research devoted to ways of fixing the missing-data problem. None of them work — that’s not my cynical opinion, that’s a mathematical theorem — but many of them are more likely to make things better than worse.  The best ways to handle data you don’t have depends on what sort of data and why you don’t have it, but even the best ways can confuse people who aren’t paying attention.

Just ignoring the missing data problem and treating the data you have as all the data is effectively assuming the missing data look just like the observed data. This is often very implausible. For example, in a weight-loss study it is much more likely that people who aren’t losing weight will drop out. If you just analyse data from people who stay in the study and follow all your instructions, unless this is nearly everyone, they will probably have lost weight (on average) even if your treatment is just staring at a container of felt-tip pens.

That’s why it is often sensible to treat missing observations as if they were bad. The Ministry of Health drinking water standards do this.  For example, they say that only 96.7% of New Zealand received water complying with the bacteriological standards. That sounds serious. Of the 3.3% failures, however, more than half (2.0%) were just failures to monitor thoroughly enough, and only 0.1% had E. coli transgression that were not followed up by immediate corrective action.

From a regulatory point of view, lumping these together makes sense. The Ministry doesn’t want to create incentives for data to ‘accidentally’ go missing whenever there’s a problem. From a public health point of view, though, you can get badly confused if you just look at the headline compliance figure and don’t read down to page 18.

The Ministry takes a similarly conservative approach to the other standards, and the detailed explanations are more reassuring than the headline compliance figures. There are a small number of water supplies with worrying levels of arsenic — enough to increase lifetime cancer risk by a tenth of a percentage point or so — but in general the biggest problem is inadequate fluoride concentrations in drinking water for nearly half of Kiwi kids.


March 5, 2015

Showing us the money

The Herald is running a project to crowdsource data entry and annotation for NZ political donations and expenses: it’s something that’s hard to automate and where local knowledge is useful. Today, they have an interactive graph for 2014 election donations and have made the data available


February 25, 2015

Wiki New Zealand site revamped

We’ve written before about Wiki New Zealand, which aims to ‘democractise data’. WNZ has revamped its website to make things clearer and cleaner, and you can browse here.

As I’m a postgraduate scarfie this year, the table on domestic students in tertiary education interested me – it shows that women (grey) are enrolled in greater numbers than men at every single level. Click the graph to embiggen.

Founder Lillian Grace talks about the genesis of Wiki New Zealand here, and for those who love the techy  side, here’s a video about the backend.












February 21, 2015

Another interesting thing about petrol prices

or What I Did At Open Data Day.

The government monitoring data on petrol prices go back to 2004, and while they show their data as time series, there are other ways to look at it.


The horizontal axis is the estimated cost of imported petrol plus all the taxes and levies. The vertical axis is the rest of the petrol price: it covers the cost hauling the stuff around the country, the cost of running petrol stations, and profit for both petrol stations and companies.

There’s an obvious change in 2012. From 2005 to 2012, the importer margin varied around 15c/litre, more or less independent of the costs. From 2012, the importer margin started rising, without any big changes in costs.

Very recently, things changed again: the price of crude oil fell, with the importer margin staying roughly constant and the savings being passed on to consumers. Then the New Zealand dollar fell, and the importer margin has fallen — either the increased costs from the lower dollar are being absorbed by the vendors, or they have been hedged somehow.


February 18, 2015

Petrol prices

From time to time I like to remind people about the national petrol price monitoring program. For example, when there’s a call for a review of fuel prices.

The Ministry of Business, Innovation & Employment (Economic Development Information) carries out weekly monitoring of “importer margins” for regular petrol and automotive diesel.  The weekly oil prices monitoring report is reissued each week with the previous week’s data.

The importer margin is the amount available to retailers to cover domestic transportation, distribution and retailing costs, and profit margins.

The purpose of this monitoring is to promote transparency in retail petrol and diesel pricing and is a key recommendation from the New Zealand Petrol Review

The importer margin for petrol over the past three years looks like this:


The wiggly blue line is the week-by-week estimated margin; the shaded area is centered around the red trend line and covers 50% of the data. The margin had been going up; the calls for a review came just after it plummeted.

At the same site, but updated only quarterly, is an international comparison of the cost of fuel broken down into tax and everything else.


January 29, 2015

Absolute risk/benefit calculators

An interesting interactive calculator for heart disease/stroke risk, from the University of Nottingham. It lets you put in basic, unchangeable factors (age,race,sex), modifiable factors (smoking, diabetes, blood pressure, cholesterol), and then one of a set of interventions

Here’s the risk for an imaginary unhealthy 50-year old taking blood pressure medications


The faces at the right indicate 10-year risk: without the unhealthy risk factors, if you had 100 people like this, one would have a heart attack, stroke, or heart disease death over ten years, with the risk factors and treatment four  would have an event (the pink and red faces).  The treatment would prevent five events in 100 people, represented by the five green faces.

There’s a long list of possible treatments in the middle of the page, with the distinctive feature that most of them don’t appear to reduce risk, from the best evidence available. For example, you might ask what this guy’s risk would be if he took vitamin and fish oil supplements. Based on the best available evidence, it would look like this:



The main limitation of the app is that it can’t handle more than one treatment at a time: you can’t look at blood pressure meds and vitamins, just at one or the other.

(via @vincristine)

January 20, 2015

Is it misleading to say a majority of US public school kids live in poverty?


Well, no.

Ok, yes, maybe.

This was the Washington Post headline: “Majority of U.S. public school students are in poverty“. It hasn’t made the NZ media, but some of you probably read about the rest of the world occasionally and might have seen it.

The original source, a report from the Southern Education Foundation, is careful not to use the word “poverty”.  They say 51% of public school students are low-income, defined as receiving free or subsidised school meals.  There’s a standard US government definition of poverty, used in defining eligibility for social programs, and by that definition 51% of public school students come from households with income less than 1.85 times the threshold for poverty.  The report also says what proportion get free school meals, for which the threshold is 1.35 times the poverty line, and it’s 44%.

They don’t give the proportion under the official poverty line. If the exact figure mattered for this post I could probably work it out from the American Community Survey, but since only about 10% of US kids are in private schools after kindergarten and before college, it’s going to be in the same ballpark as the proportion for all children — 22%.   It’s hard to see it being more than 30%.

On the other hand, the US has an unusual official definition of poverty.  In most Western countries, the poverty line is a set fraction (often 60%) of the median household income (adjusted somehow for household size). The US uses the price of a fixed set of foodstuffs and an estimate of what fraction of income goes on food, defined in 1963-4 and then updated using the CPI (actually, that’s what the Census Bureau uses, the rest of the government uses a simplified version of the same thing).  If you defined poverty by 60% of median household income, you’d come pretty close to the subsidized-meals threshold.  That is, defining poverty the way most other Western countries do, the headline is close to being correct.

On the other other hand, the Washington Post is a  US newspaper.  If you’re writing for the Post and you think it’s unreasonable to define ‘poverty’ to exclude a US family of three with an income (including cash benefits) of $20,000, I have some sympathy for your position. I still think you need to say your definition is different from the official one and wasn’t used by your source.

December 27, 2014

The Lesser Spotted Hutt Man Drought

From the Christmas Eve edition of the Upper Hutt Leader, which you can read online:

Ladies, be warned — Upper Hutt is in  the grip of a man drought

Here’s the graph to prove it (via Richard Law, on Twitter)



As the graph clearly indicates, women outnumber men hugely in the 25-35 age range, and (of course) at the oldest ages. The problem is, the y-axis starts at 45%. For lines or points that’s fine, but for bar charts it isn’t — because the bars connect the points to the x-axis.

This is Stats New Zealand’s version of the graph, in standard ‘population pyramid’ form. It’s much less dramatic.


We could try a barchart with axis at zero


It’s still much less dramatic — and you can see why the paper chopped the ages off at 75, since using the full range available in the data wouldn’t have fit on their axes.  The y-axis wasn’t just trimmed to fit the data; it was trimmed beyond the data.

You could make a case that ‘zero’ in this example is actual 50%: we (well, not we, but journalists who have to fill space) care about the deficiency or surplus of members of the appropriate sex.


Or, you could look at deficiency or surplus of individuals, rather than percentages


Using individuals makes the younger age groups look more important, which helps the story, but on the other hand shows that the scale of this natural disaster isn’t all that devastating.

That’s basically what the expert quoted in the story says. Prof Garth Fletcher, from VUW, says

“People in Upper Hutt or Lower Hutt, they go to parties, they go to bars, they go to places in the wider Wellington area.”

It was only when you started having a gap between men and women of more than 5 or 10 percent that there would be real world implications, he said.


[Update: My data and graphs are for Upper Hutt (city). That’s about 2/3 of the Rimutaka electorate, which is where the paper’s data are for]