Posts filed under Correlation vs Causation (53)

January 14, 2014

Causation, counterfactuals, and Lotto

A story in the Herald illustrates a subtle technical and philosophical point about causation. One of Saturday’s Lotto winners says

“I realised I was starving, so stopped to grab a bacon and egg sandwich.

“When I saw they had a Lotto kiosk, I decided to buy our Lotto tickets while I was there.

“We usually buy our tickets at the supermarket, so I’m glad I followed my gut on this one,” said one of the couple, who wish to remain anonymous.

Assuming it was a random pick, it’s almost certainly true that if they had not bought the ticket at that Lotto kiosk at that time, they would not have won.  On the other hand, if Lotto is honest, buying at that kiosk wasn’t a good strategy — it had no impact on the chance of winning.

There is a sense in which buying the bacon-and-egg sandwich was a cause of the win, but it’s not a very useful sense of the word ’cause’ for most statistical purposes.

November 7, 2013

Why you should eat in crowded food halls

There’s a couple of posts being promoted on the internet about an important and relatively subtle form of selection bias.  Epidemiologists know it as Berkson’s Paradox, in modern causal inference terminology it’s ‘conditioning on colliders’, and for an economist it’s a consequence of production-possibility frontier.

The basic issue is very simple. As Gabriel Rossman puts it at The Atlantic

 There is no ontological reason why we can’t have shoes that are both hideous and uncomfortable but rather there is a practical reason in that nobody wears shoes that are terrible in every way and so such shoes don’t make it unto the market. 

In the same way, there’s no necessary reason why cricketers who are good at bowling have to be bad at batting.  Being able to deliver the ball so it misleads or outpaces the batsman doesn’t make it any harder to spot bowling trickery or to react fast. And in fact, if you look at 12-year-olds, often the same kids are good at batting and bowling.  In international-level cricket, though, all-rounders are pretty rare, and someone who can take 5 wickets in an Test innings is very unlikely to be able to score a Test century.  The slight positive correlation you see in kids turns into a strong negative correlation in adults. The reason is that getting into an international cricket team requires you to be very, very good at batting or very, very good at bowling. Since it’s more likely that you’re very, very good at one thing than two, most international cricketers are either batsmen or bowlers, but not both. Among those who are selected, there’s a negative correlation.

There are examples in the social sciences: opposition to marijuana legalisation is positively correlated with opposition to government wealth redistribution in the US as a whole, but uncorrelated among Republican voters.

There are examples in medicine: the genetic variant Factor V Leiden is strongly associated with deep-vein thrombosis in the population in general, but not at all predictive of recurrence in people who have already had one.

And there are examples in dining: for a given price, a successful restaurant has to do well enough on some combination of food quality, pleasant ambience, trendiness, etc. So these will end up negatively correlated, and if you want good inexpensive food in downtown Auckland, try one of the Asian food courts.

(via @gnat, who points to one of the posts and notes: Anyone who thinks it’s possible to draw truthful conclusions from data analysis without really learning statistics needs to read this.)

October 27, 2013

Fast-food outlets and obesity

Everyone knows that areas with more fast-food stores have more overweight people, and it certainly makes sense that fast food is bad for you. Like almost everything else, though, it gets more complicated when you start looking carefully.

Firstly, earlier this year Eric Crampton wrote in NBR about some research by an economics PhD student, Rachel Webb, who was trying to take advantage of this well-known relationship to unpick some aspects of correlation vs causation in the relationship between mother’s weight and infant’s birthweight. She found that, actually, areas in New Zealand with more fast-food outlets didn’t have more obesity to any useful and consistent extent.

Secondly, there’s new research on diet and fast food using data from the big NHANES surveys in the USA.  It confirms, as you might expect, that people who eat more fast food also eat less healthily at other times.



October 10, 2013

Innovation and indexes

The 2013 Global Innovation Index is out, with writeups in Scientific American and the NZ internets, but not this year in the NZ press. Stuff, instead, tells us “Low worker engagement holds NZ back”, quoting Gallup’s ‘employee engagement’ figure of 23% for NZ, without much attempt to compare to other countries.

The two international rankings are very different: of the 16 countries above us in the Global Innovation Index, 13 have significantly lower employee engagement ratings, one (Denmark) is about the same, and one (USA) is higher (one, Hong Kong, is missing because Gallup lumps it in with the rest of the PRC).  It’s also important to consider what is behind these ratings. If you search on  “Gallup employee engagement”, you get results mostly focused on Gallup’s consulting services — getting you to worry about employee engagement is one of the ways they make money.  The Global Innovation Index, on the other hand, came from a business school and was initially sponsored by the Confederation of Indian Industry  and has now expanded with wider sponsorship and academic involvement: it’s not biased in any way that’s obviously relevant to New Zealand.

With any complicated scoring system, different countries will do well on different components of the score.  If you believe, with the authors of Why Nations Fail,  that quality of institutions is the most important factor, you might focus on the “Institutions” component of the innovation index, where New Zealand is in third place. If you’re AMP econonomist Bevan Graham you might think the ‘business sophistication’ component is more important and note that NZ falls to 28th.

If you want NZ innovation to improve, the reverse approach might be more helpful: look at where NZ ranks poorly, and see if these are things we want to change (innovation isn’t everything) and how we might change them.



October 2, 2013

Cough, choke, history

If the PubMed research database is still surviving the US government shutdown, you can read a paper published 63 years ago today on lung cancer

In England and Wales the phenomenal increase in the
number of deaths attributed to cancer of the lung provides
one of the most striking changes in the pattern of

mortality recorded by the Registrar-General. For example,
in the quarter of a century between 1922 and 1947 the
annual number of deaths recorded increased from 612 to
9,287, or roughly fifteenfold. This remarkable increase is,
of course, out of all proportion to the increase of population

Some people were arguing that the increase was just due to better diagnosis of lung cancer, and even  those who believed in a real increase weren’t sure of the reason

Two main causes have from time to time been put forward:
(1) a general atmospheric pollution from the exhaust

fumes of cars, from the surface dust of tarred roads, and
from gas-works, industrial plants, and coal fires; and
(2) the smoking of tobacco.

Richard Doll and Austin Bradford Hill decided to compare histories of smoking in lung cancer patients and those in hospital for other reasons. As you know, they found that the lung cancer patients were much more likely to be heavy smokers. It’s also interesting to read what other possibilities they considered, and how they tried to rule them out.

This sort of study isn’t completely definitive, and, famously, the eminent statistician and geneticist (and heavy smoker) R. A. Fisher was never convinced. He thought that genetic factors might well be responsible. Further evidence was provided by experiments in animals (such the ‘smoking beagles‘ of Duke University) showed that smoking really could cause cancer. Also, much more recently, studies of twins and studies that actually measured genotypes showed that genetic differences weren’t a big enough contributor to lung cancer to explain the correlation.

In contrast to, say, alcohol or opium, tobacco has been a public health problem only for about a century: tobacco smoking became very widespread in men during the first world war. With a bit of effort and some luck, future generations might see it as an inexplicable historical anomaly, like a deadly version of canasta.

August 18, 2013

Correlation, genetics, and causation

There’s an interesting piece on cannabis risks at Project Syndicate. One of the things they look at is the correlation between frequent cannabis use and psychosis.  Many people are, quite rightly, unimpressed with the sort of correlation, since it isn’t hard to come up with explanations for psychosis causing cannabis use or for other factors causing both.

However, there is also some genetic data.  The added risk of psychosis seems to be confined to people with two copies of a particular genetic variant in a gene called AKT1. This is harder to explain as confounding (assuming the genetics has been done right), and is one of the things genetics is useful for. This isn’t just a one-off finding; it was found in one study and replicated in another.

On the other hand, the gene AKT1 doesn’t seem to be very active in brain cells, making it more likely that the finding is just a coincidence.  This is one of the things bioinformatics is good for.

In times like these it’s good to remember Ben Goldacre’s slogan “I think you’ll find it’s a bit more complicated than that.”

Killing people

TV3 has tried to stir up the issue of the death penalty in New Zealand.  They have a poll showing majority opposition by the country as a whole, and by supporters of every party except NZ First.  Even the Sensible Sentencing Trust isn’t in favor.

The lead-in to the story is that the murder rate has never ‘recovered’ from the abolition of the death penalty.  They have a graph showing homicides per capita rising and then falling again, but not to the earlier levels.

Using the term ‘recovered’ comes very close to asserting a causal connection; but is there even a reliable correlation? International comparisons are useful here.  I don’t have long time series for homicide, but Kieran Healy has produced a graph of international trends in deaths due to assault. This isn’t the same as homicide, but is close enough to be relevant.

Here’s the New Zealand panel, with the arrow indicating the abolition of the death penalty. The details are slightly different from those for homicide, but the basic trend is the same that TV3 reports.



and here’s the international comparison, with NZ second from the bottom, on the left. As usual, click to embiggen



The NZ pattern is very similar to other countries, including Australia (where abolition didn’t happen until about 10 years later), Finland (where it was abolished in 1949 for crimes committed in peacetime), and Switzerland (1942).

If you look at the countries that still have the death penalty, murder rates are low and falling in Japan, South Korea had the same sort of rise and fall that NZ has had (over a shorter time scale), and of course there’s the USA.

It doesn’t look as though the death penalty is a major driving force in these patterns.

May 20, 2013

International Clinical Trials Day

Two hundred and sixty six years ago today, James Lind began what is regarded as the first proper controlled clinical trial

On the 20th May, 1747, I took twelve patients in the scurvy on board the Salisbury at sea. Their cases were as similar as I could have them. They all in general had putrid gums, the spots and lassitude, with weakness of their knees. They lay together in one place, being a proper apartment for the sick in the fore-hold; and had one diet in common to all, viz., water gruel sweetened with sugar in the morning; fresh mutton broth often times for dinner; at other times puddings, boiled biscuit with sugar etc.; and for supper barley, raisins, rice and currants, sago and wine, or the like. Two of these were ordered each a quart of cyder a day. Two others took twenty five gutts of elixir vitriol three times a day upon an empty stomach, using a gargle strongly acidulated with it for their mouths. Two others took two spoonfuls of vinegar three times a day upon an empty stomach, having their gruels and their other food well acidulated with it, as also the gargle for the mouth. Two of the worst patients, with the tendons in the ham rigid (a symptom none the rest had) were put under a course of sea water. Of this they drank half a pint every day and sometimes more or less as it operated by way of gentle physic. Two others had each two oranges and one lemon given them every day. These they eat with greediness at different times upon an empty stomach. They continued but six days under this course, having consumed the quantity that could be spared. The two remaining patients took the bigness of a nutmeg three times a day of an electuray recommended by an hospital surgeon made of garlic, mustard seed, rad. raphan., balsam of Peru and gum myrrh, using for common drink narley water well acidulated with tamarinds, by a decoction of wich, with the addition of cremor tartar, they were gently purged three or four times during the course.

The consequence was that the most sudden and visible good effects were perceived from the use of the oranges and lemons; one of those who had taken them being at the end of six days fit four duty. The spots were not indeed at that time quite off his body, nor his gums sound; but without any other medicine than a gargarism or elixir of vitriol he became quite healthy before we came into Plymouth, which was on the 16th June. The other was the best recovered of any in his condition, and being now deemed pretty well was appointed nurse to the rest of the sick …

Lind knew very little about scurvy apart from the typical progress of the disease, and he had no real idea of how the treatments might work.  That’s a handicap in coming up with ideas for treatment, but not in doing fair tests of whether treatments work.

The trial didn’t have an untreated group: all the patients got one of the treatments recommended by experts.  There’s no need for a controlled trial to have an untreated group — if there is an existing treatment, you want to compare to that treatment; if there is none, you may want to compare immediate vs delayed treatment.

Despite the dramatic success of fruit juice in the trial, it wasn’t adopted as a treatment. That, sadly, can still be the case today.  New drugs or surgical techniques are taken up enthusiastically, but boring interventions like nurse home visits or surgery checklists get less attention. Still, things are much better than they were even twenty years ago. Nearly all of medicine accepts the idea of randomized controlled comparison, and it is spreading to other areas such as development economics.

There are two excellent, free books about clinical trials and health choices: Testing Treatments, from the James Lind Initiative, and Smart Health Choices, from Les Irwig (at Sydney Uni) and coworkers.

Much of clinical trials development is unapologetically technical, but there are important areas where public participation can help:

  • The James Lind Alliance asks patients and clinicians to say what questions matter most.  Clinical trials still tend to answer questions that are scientifically interesting or commercially important, not what actually matters most to patients
  • The Cochrane Collaboration is attempting to collect and summarise all randomized trials.  The Cochrane Consumers Network is for non-specialist participation — in particular as Consumer Referees to help ensure that summaries of research address the questions that are important to consumers and are presented in language that consumers can understand.
  • Ethics Committees that review clinical trials and other human research in New Zealand are required to have non-specialist community members. This is a substantial commitment, but one that is important if ethics committees are to be more than just red tape.
  • And if you haven’t yet signed the AllTrials petition, calling for the results of all clinical trials to be published so we can know what treatments work, now would be an excellent time.
May 6, 2013

Some surprising things

  • From Felix Salmon: US population is increasing, and people are moving to the cities, so why is (sufficiently fine-scale) population density going down? Because rich people take up more space and fight for stricter zoning.  You’ve heard of NIMBYs, but perhaps not of BANANAs
  • From the New York Times.  One of the big credit-rating companies is no longer using debts referred for collection as an indicator, as long as they end up paid.  This isn’t a new spark of moral feeling, it’s just for better prediction.
  • And from Felix Salmon again: Firstly, Americans are bad at statistics. When it comes to breast cancer, they massively overestimate the probability that early diagnosis and treatment will lead to a cure, while they also massively underestimate the probability that an undetected cancer will turn out to be harmless.
March 20, 2013

Big Data is not enough

There’s a good piece in one of the New Yorker‘s blogs  about the Human Connectome Project and the proposed Brain Activity Map.  The Connectome project has produced two terabytes of functional and structural data on the brains of 68 volunteers, and the Brain Activity Map is more or less what it says on the tin.

Almost certainly people will be able to do something useful with all this data, but some of the claims for what it means to our understanding of the brain are a bit much. As the RealClearScience blog points out (in a slightly different context), we know the complete nervous system of the nematode C. elegans.  We know every cell and all its connnections to other cells.  We still can’t use this knowledge to reliably predict the nematode’s behaviour even by brute-force simulation, let alone by sophisticated analysis.

Simply using Big Data to work out how a complex system functions requires a lot of simplifying assumptions to be true.  This isn’t because we’re not smart enough to build more complex models (though we’re not), and it isn’t because the computation is beyond us (though it is), it’s a fundamental limitation on learning without an underlying theory to help you.

The way Amazon or Netflix does prediction with all its data is to look for a large group of people who are similar to you, in relevant ways, and see what they bought or watched. That sounds easy, but the weasel words are ‘in relevant ways’.  If you have a moderately large number of variables, there are far too many ways in which people, or nerve signals, or protein concentrations could be similar, and you need to decide which ones are relevant.   This is critical, because finding the true relationships in large numbers of associations is only possible if nearly all the associations are zero; in current jargon, the model is ‘sparse’.

In order to see sparseness, you need to know how to look.    Consider the economy: if you just look at associations between measurements, everything is correlated: you see inflation, you see population growth,  you see seasonal variation.  These patterns need to be removed to get a sparse model where you’ve got some hope of disentangling cause and effect.

In really complex fields like brain activity, we don’t know enough about how to pose the problem so that Big Data will have a hope of solving it.