Posts written by Thomas Lumley (681)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with.

May 13, 2013

Your guess is as good as ours

There’s currently discussion in NZ about whether to change the 5-yearly census.  North America is providing some examples of what not to do.

Canada decided a while back that they were going to chop most of the questions off the census and put them in a new survey.  The new survey is still sent to everyone, but is voluntary — the worst of both worlds, since a much smaller survey would allow for more effort per respondent in follow-up. Frances Woolley compares the race/ethnicity data from the 2006 Census and the new survey: the survey is dramatically overcounting minorities.

In the USA, a Republican congressman has proposed a bill that would stop the Department of Commerce and the Census Bureau from collecting basically anything other than the census.  That would wipe out the American Community Survey, the detailed 1%/year sample that provides a wide range of regional data. It would also wipe out the Current Population Survey, used to estimate the unemployment rate.  Fortunately for the US economy, there’s no chance of this bill becoming law: the business community hates it, and Senate will never pass it.  It’s still worrying that there’s a public-opinion advantage in pretending you want to abolish the government’s economic data collection.

May 12, 2013

Briefly

A simple exercise with numbers

Stuff has a headline Shoplifters cost $1b as staff theft soars“.  Let’s think about what we would need to know to interpret this number, and what we actually get told.

First, we note that nowhere in the story is there any evidence or informed opinion presented that staff theft has increased, just that it is high.  Also,  the $1 billion figure is fairly weak — the Retailers Association of New Zealand estimates $2 million per day, which is rounded up to $750 million per year, and then to ‘up to $1 billion’.

We don’t get told how this number is estimated: is it actual reports of theft, or imbalances between stock bought and stock sold, or just a impression from the retailers? Is it based on a representative survey, on informed opinion, or on some sort of bogus poll?  Is the cost based on actual wholesale costs paid by the retailer or is it inflated to include the anticipated retail price if the stuff had been sold? Does it include all retailers, or just members of the Retailers Association of New Zealand? Don’t wholesalers also have this problem?  We might hope that the Retailers Association website had some more details, but its press release and media log pages only go up to May 1.

If we were to stipulate the number for the purposes of analysis, does it sound plausible?  Unfortunately, as part of Statistics New Zealand’s ongoing endeavour to deliver a better web experience they are doing maintenance on their servers today, so the quality of my sources may not be up to standard. Still, the University of Auckland career planning site says that retail employs about 265 000 people in NZ.  If half the theft is by staff, that’s about $1900 average per year — and if, say, as many as 75% of them are honest, that would be about $7500 for the others, which seems a bit high.

The other half of the billion dollars, attributed to shoplifting rather than staff theft, would be an average of  $2000/year if spread over  5% of the population, which also seems a bit high.  Maybe I’m just naive and innocent about this, but the worst incident quoted in the story was $20000 by four people; $5000 each, and the next worst was $1100 dollars —  you’d think there would be better examples.

The same University of Auckland page says gross revenue in retail is $65 billion/year, so $1billion would be 1.5% of that. The Retail Association has a report (p15) saying that net margins are about 2-3% averaged over the industry, so if the $1 billion were real costs, it would mean the industry is losing more than a third of its profits to theft. You’d think that would be the headline, if it were true.

May 10, 2013

Good information design

The NZ stock exchange front page:

mrp

They know what their visitors are looking for, and they make it easy to find. (via @lyndonhood)

 

Briefly

  • Forbes has a profile of a soon-to-be billionaire statistician, Dennis Gillings.  He basically invented the commercial clinical research model, and his company, Quintiles, is going public. 
  • The New York Times has a story about data(!) and science(!) being used to modify Hollywood scripts.  As Matt Yglesias points out, the studios can’t really take it that seriously or they’d be paying more than $20 000 for the service
  • Some Big Data backlash, from Quartz. Most data isn’t big, most data isn’t very good quality, and most businesses are in more need of expertise on data analysis than on large-scale computing.
May 9, 2013

Counting signatures

A comment on the previous post about the asset-sales petition asked how the counting was done: the press release says

Upon receiving the petition the Office of the Clerk undertook a counting and sampling process. Once the signatures had been counted, a sample of signatures was taken using a methodology provided by the Government Statistician.

It’s a good question and I’d already thought of writing about it, so the commenter is getting a temporary reprieve from banishment for not providing a full name.  I don’t know for certain, and the details don’t seem to have been published, which is a pity — they would be interesting and educationally useful, and there doesn’t seem to be any need for confidentiality.

While I can’t be certain, I think it’s very likely that the Government Statistician provided the estimation methodology from Statistics New Zealand Working Paper No 10-04, which reviews and extends earlier research on petition counting.

There are several issues that need to be considered

  • removing signatures that don’t come with the required information
  • estimating the number of eligible vs ineligible signatures
  • estimating the number of duplicates
  • estimating the margin of error in the estimate
  • deciding what level of uncertainty is acceptable

The signatures without the required information are removed completely; that’s not based on sampling.  Estimating eligible vs ineligible signatures is fairly easy by checking a sufficiently-large random sample — in fact, they use a systematic sample, taking names at regular intervals through the petition list, which tends to give more precise results and to be more auditable.  

Estimating unique signatures is  tricky, because if you halve your sample size, you expect to see 1/4 as many duplicates, 1/8 as many triplicates, and so on. The key part of the working paper shows how to scale up the the sample data on eligible, ineligible, and duplicate, triplicate, etc, signatures to get the unique unbiased estimator of the number of valid signatures and its variance.

Once the level of uncertainty is specified, the formulas tell you what sample size to verify and what to do with the results.  I don’t know how the sample size is chosen, but it wouldn’t take a very large sample to get the uncertainty down to a few thousand, which would be good enough.   In fact, since the methodology is public and the parties have access to the electoral roll in electronic form, it’s a bit surprising that the petition organisers didn’t run a quick check themselves before submitting it.

 

 

May 8, 2013

Does emergency hospital choice matter?

The Herald has a completely over-the-top presentation of what might be an important issue. The headline is “Hospital choice key to kids’ survival”, and the story starts off

Where ambulances take badly injured children first seems to affect their chances, paediatric surgeons say.

Starship children’s hospital surgeons have found that sending badly injured children to the wrong hospital may be contributing to a child death rate from injuries that is twice the rate of Australia’s.

The data:

Six (7 per cent) of the 88 children who went first to Middlemore died, but so did one (8 per cent) of the 12 who went directly to Starship.

That is, to the extent the data tell us anything, the evidence is against the headline.  Of course, the uncertainties are huge: a 95% confidence interval for the relative odds of dying after being sent to Middlemore goes from a 40-fold decrease to a 12-fold increase.  There’s basically no information in the survival data.

So, how much of the two-fold higher rate of death in NZ compared to Australia could reasonably be explained by suboptimal hospital choice? One of the surgeons involved in the study says

… overseas research showed that a good trauma protocol system could cut the death rate for injured adults by 20 to 30 per cent, but there was no good data for children.

That is, hardly any of the difference between NZ and Australia — especially as this specific hospital-choice issue only applies to one sector of one city in New Zealand, with less than 10% of the national population.

On the other hand, we see

The head of Starship’s emergency department, Dr Mike Shepherd, said the major factors contributing to New Zealand’s high fatal injury rate for children lay outside the hospital system in policies such as driver blood-alcohol limits, graduated driver licensing, and laws requiring children’s booster seats and swimming pool fences.

That sounds plausible, but if it’s the whole story you would expect high levels of non-fatal as well as fatal injuries. The overall rate of hospitalisations for injuries in children 0-14 years is almost identical in NZ (1395 per 100 000 per year, p29) and Australia (‘about’ 1500 per 100 000 per year, page v).

 

May 7, 2013

Not adding up

As you know, the petition for a referendum over asset sales has not reached its goal yet, due to lots of invalid signatures. This is not a new problem — the petition over the anti-smacking law initially had 17% invalid signatures and also fell short of its threshold on the first round — but it does seem to be worse than usual.

3News displayed this graph of the shortfall

petition shortfall

 

It seemed to me that the 16,500 bar was a bit wider that I’d expect, so I checked on the video from the website.  On my screen capture, which I think is what you get if you click on the image, the black bar has 872 signatures per pixel, the blue bar has 1018 signatures per pixel, the whole red bar has 535 signatures per pixel, and the 16500 shortfall has 232 signatures per pixel.  That is, the vertical scale for the shortfall is about four times that for the valid signatures.

I’m really not accusing 3News of deliberately distorting the numbers — it looks to me as if the shortfall bar has been made the right height to contain its text, that the blue+red bars height is scaled to the available screen estate, and that the black bar is scaled to the total blue+red height .  But it’s a pity that the result is to amplify the visual size of the shortfall — and if the visual size weren’t important the graph would be a complete waste of time.

Scaled in proportion, the bars look like this

shortfall

 

Video on randomised trials — in poverty relief

The organisation Innovations for Poverty Action have made a neat little animated video explaining how and why they do randomised controlled trials of poverty-relief programs.

 

 

Modestly significant

From a comment piece in Stuff, by Bruce Robertson (of Hospitality NZ)

In the past five years, the level of hazardous drinking has significantly decreased for men (from 30 per cent to 26 per cent) and marginally decreased for women (13 per cent to 12 per cent).

There was a modest but important drop in the rates of hazardous drinking among Maori adults, with the rate falling from 33 per cent to 29 per cent in the latest survey.

As @tui_talk pointed out on Twitter, that’s a four percentage point decrease described as “significant” for men and “modest” for Maori.

At first I thought this might be a confusion of “statistically significant” with “significant”, with the decrease in men being statistically significant but the difference in Maori not, but in fact the MoH report being referenced says (p4)

As a percentage of all Māori adults, hazardous drinking patterns significantly decreased from 2006/07 (33%) to 2011/12 (29%).