Last week’s Stat of the Week nomination for the Northern Advocate didn’t, we thought point out anything particularly egregious. However, it did provoke me to read the story — I’d previously only seen the headline 22% statistic on Twitter. The story starts
Northland is in “crisis” as 22 per cent of students from schools surveyed turn up without any or very little lunch, according to the Te Tai Tokerau Principals Association.
‘Surveyed’ is presumably a gesture in the direction of the non-response problem: it’s based on information from about 1/3 of schools, which is made clear in the story. And it’s not as if the number actually matters: the Te Tai Tokerau Principals Association basically says it would still be a crisis if the truth was three times lower (ie, if there were no cases in schools that didn’t respond), and the Government isn’t interested in the survey.
More evidence that number doesn’t matter is that no-one seems to have done simple arithmetic. Later in the story we read
The schools surveyed had a total of 7352 students. Of those, 1092 students needed extra food when they came to school, he said.
If you divide 1092 by 7352 you don’t get 22%. You get 15%. There isn’t enough detail to be sure what happened, but a plausible explanation is that 22% is the simple average of the proportions in the schools that responded, ignoring the varying numbers of students at each school.
The other interesting aspect of this survey (again, if anyone cared) is that we know a lot about schools and so it’s possible to do a lot to reduce non-response bias. For a start, we know the decile for every school, which you’d expect to be related to food provision and potentially to response. We know location (urban/rural, which district). We know which are State Integrated vs State schools, and which are Kaupapa Māori. We know the number of students, statistics about ethnicity. Lots of stuff.
As a simple illustration, here’s how you might use decile and district information. In the Far North district there are (using Wikipedia because it’s easy) 72 schools. That’s 22 in decile one, 23 in decile two, 16 in decile three, and 11 in deciles four and higher. If you get responses from 11 of the decile-one schools and only 4 of the decile-three schools, you need to give each student in those decile-one schools a weight of 22/11=2 and each student in the decile-three schools a weight of 16/4=4. To the extent that decile predicts shortage of food you will increase the precision of your estimate, and to the extent that decile also predicts responding to the survey you will reduce the bias.
This basic approach is common in opinion polls. It’s the reason, for example, that the Green Party’s younger, mobile-phone-using support isn’t massively underestimated in election polls. In opinion polls, the main limit on this reweighting technique is the limited amount of individual information for the whole population. In surveys of schools there’s a huge amount of information available, and the limit is sample size.