July 5, 2012

# Denominators yet again

Tony Cooper, in a Stat of the Week nomination, point us to the Herald’s headline

# “Men over 50 nation’s biggest drinkers”

When you look at the body of the text, though, the data only say that men over 50, in aggregate, drank more than the other subgroups of the population.  That’s somewhat relevant if you are planning a sales campaign, in which case the Roy Morgan report might be useful.  It doesn’t tell you which group are the biggest drinkers, because that depends on per-person alcohol consumption.

As two of the experts actually quoted in the story said, men over 50 accounted for the largest chunk of the booze because there are a lot of men over 50, not because they are the heaviest drinkers.  A little simple arithmetic shows that, per person, men over 50 drank less than men 35-50, and less than men 25-34.  Not doing the arithmetic is one thing, but it really doesn’t look good when the headline also directly contradicts what your sources are quoted as telling you.

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

• Seeing this is all based on sample data, there is a dead heat between the groups of men 25-34, 35-49 and 50+.

And if you are being picky about denominators, Tony’s calculations are a little wrong — the survey was of the 18+ population, not the total population.

It’s also worth noting that the questions here is about “glasses of alcoholic drink” but says nothing about the potency of those drinks so is not tailored to answer heaviest drinker question to start with.

• Thomas Lumley

The denominators will have the right relation to each other still.

The “glasses of alcoholic drink” point is good — the survey doesn’t even try to do what the Herald wants it to do — but the differences in potency must be evened out somewhat by differences in drink size.

• Thomas Lumley

And I don’t think it’s a dead heat: the sample size is 11,000, so the standard error is about 1% of the standard deviation. That should be sufficient to distinguish these groups, though you’d need the actual standard deviations to be sure.

• So 2 standard errors is around 1%. Add that to the uncertainty in the size of each group in the population (the ratio of interest is proportion drunk by group/relative size of group) and you can’t tell these numbers apart.

Using 2006 census data for population sizes, i get relative consumptions of 1.56 for the 25-34 age group, 1.50 for 35-49 and 1.52 for 50+.

• Thomas Lumley

The sampling error in the number of people in the group is positively correlated with the sampling error in the total amount drunk, so the errors don’t add: the error in the ratio is not that much larger than the error in the numerator.

But still, you’re probably right that they are indistinguishable. If the number of drinks per person was Poisson, a simulation gives a standard error of 0.4 percentage points on the original scale, which translates to about 0.03 on the relative consumption scale.

So the basic conclusion is that to the (very limited) extent we can believe the data at all, there’s not much difference by age.