Posts written by Thomas Lumley (1590)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

October 12, 2015

Elephants and cancer: getting it backwards

One News had a story tonight about elephants. This is how it starts

NZ anchor: An American researcher thinks he may have come up with a new weapon in the fight against cancer, inspired by a trip to the zoo. He remembered that elephants almost never get cancer and wondered whether what protects them could also help us.

US reporter: Elephants have survived 55 million years on this earth. They’ve evolved to beat cancer, and they might just help us beat it too

That’s a nice story, but it’s basically backwards from the more-plausible story in Nature News, and the (open-access) paper in JAMA.

The distinctive feature of elephant blood, according to either version of the story, is that elephants have many more copies of the tumour-suppressor gene p53. This gene makes a key protein in the mechanism that causes cells with DNA damage to kill themselves rather than reproducing and turning into tumours.  A large proportion of tumours have mutations in p53, and people who inherit a damaged copy of the gene tend to develop cancer (including some unusual forms) early in life.  We’ve known about p53 for a long time — decades — so while it is a target for drug development, it isn’t by any means a new target.  We haven’t got far with it because it’s hard to mimic the effect of a protein that acts inside the cell nucleus.

The story in Nature News is that the American researcher, Dr Jordan Schiffman, specialises in treating children with familial cancer, including ones who have inherited mutations in p53 (Li-Fraumeni syndrome). He heard a talk about elephants having many copies of p53. He then went to his local zoo to find out what the cancer rate was in elephants, and confirmed it was low.   This is important;  lots of people will tell you that sharks, for example, don’t get cancer, and that’s just not true.  Elephants, on the other hand, really do seem to have a surprisingly low rate of cancer.

Since elephants have a lot of cells and live a long time, you’d expect them to have a lot of chances to get cancer. Studying elephants makes sense as a way to find completely new ways of treating or preventing cancer. Unfortunately, it seems that a major reason elephants don’t get cancer  is that they have lots of redundant p53 genes, which isn’t a new treatment target. (Other reasons may be that they don’t smoke and they eat vegetarian diets.)

So, while it’s true that elephants have multiple copies of the p53 gene, everything else in the story is basically backwards. Looking for new cancer treatment targets in elephants is a good idea, but that’s isn’t quite what they did. The findings are good news for elephants but they are bad news for us; p53 isn’t a promising new treatment target, it’s one of the oldest ones we have.

October 11, 2015

With the potential to miss us completely

Q: Did you see there’s a giant rock with the potential to end life on Earth?

A: This one?

Q: Yes. Are they exaggerating?

A: Depends what you mean. In a sense it does have the potential to end human life on Earth, but it would have to actually hit Earth to do that.

Q: But it’s  “similar to the 1862 Apollo asteroid which was classified as a potentially hazardous object”

A: Similar except for being a lot further away. As the story says, “Potentially Hazardous Objects” approach closer than 7,402,982km, and this one is about 25 million km away at its closest.

Q: That’s an awfully precise number, 7,402,982, isn’t it? Why do they need it to the nearest kilometre?

A: They don’t. It’s 0.05 Astronomical Units, and whoever did the conversion doesn’t understand about significant digits. Wikipedia, for example, rounds it to 7.5 million km.

Q: And the other really precise numbers? It says the asteroid is moving at 64,374km/hr, but surely the speed will change more than 1km/hr because, you know, gravity and physics and stuff?

A: That’s 40,000 miles per hour. Again, looks like one significant digit in the original.

Q: So how far away is this asteroid compared to, say, the moon?

A: To one significant figure, 100 times further away.

Q: That’s quite a lot. Why is NASA making a fuss about this asteroid?

A: They aren’t. They issued a press release about asteroid rumours in  August, headlined “There is no asteroid threatening Earth“.  The NASA @asteroidwatch twitterwallah is getting a bit tetchy about the whole thing.

Q: Does the asteroid have something to do with the “blood moon” we had recently?

A: Only in the sense that they were both completely unsurprising and harmless astronomical events.


(h/t @philiplyth)

Gay gene update

Yesterday I wrote about a ‘gay epigenetics’ story in the Herald, and wasn’t convinced that there was anything worth publicising at this point, and that there wasn’t enough detail to interpret the results.

Ed Yong, a science journalist who was actually at the conference, has a story today in the Atlantic. He fingers the conference as the responsible party for the publicity (here’s their press release), though with the active cooperation of the researchers.

His story has more detail and makes it clear that there’s very little evidence, and more importantly that the lead researcher knew this:

“The reality is that we had basically no funding,” he said. “The sample size was not what we wanted. But do I hold out for some impossible ideal or do I work with what I have? I chose the latter.”

For pilot research presented to consenting scientists that might be reasonable, but for press releases it isn’t.

Epigenetics is an area of science where New Zealand has an international reputation. It would be a pity if it ended up as one of the areas where you can be sure that basically nothing that makes it to the newspapers is true.

October 10, 2015

Return of the brother of the gay gene

From the Herald (from the Telegraph)

Factors ranging from exposure to certain chemicals to childhood abuse, diet and exercise may affect the DNA controlling sexuality, according to research being presented at a US conference on genetics.

They believe they can predict with 70 per cent accuracy whether a man is gay or straight, simply by looking at those parts of the genome.

[There’s a slightly better story in Nature News.]

70% accuracy doesn’t seem all that impressive. Using the usual figures on the proportion of men who are gay, the approach of assuming everyone is straight unless you are told otherwise is better than 90% accurate, and doesn’t need expensive genetics.  Presumably they mean something different by 70% accuracy, but we don’t know what.

More importantly, this is research in identical twins.  If you take pairs of people who are genetically identical, had the same environment in the womb, and then very similar environments in infancy and childhood, you’ve stripped out nearly all the other factors that could affect sexual orientation. That’s the point of doing the research this way — you get a clearer view of potentially-small differences — but it’s a limitation when you’re trying to make claims about people in general.

Also, there’s an important difference between genetics and epigenetics here. The epigenetic markers, as the story says, can be affected by things that happen to you during childhood. But that means we can’t necessarily assume the correlations between epigenetic differences and sexual orientation are causal.  The “factors ranging from exposure to certain chemicals to childhood abuse, diet and exercise” that can affect epigenetic markers could also affect sexual orientation directly — especially since the epigenetic markers were measured in cells from the lining of the mouth, not in, say, the brain.

On top of all that, this is another annoying example of research being publicised before it’s published. It’s not at all impossible that the claims are true,  but there isn’t enough public information to tell. The research was presented at the conference of the American Society for Human Genetics. People at the conference would have been able to see more detail, and maybe ask questions. We can’t. We won’t be able to until there’s a published research paper. That would have been the time for publicity.

And finally, there’s an interesting assumption revealed in the headline “Boys ‘turned gay by childhood shift in genes’“. The research looked at differences between identical twins. It says absolutely nothing about which twin changed and which one stayed the same — you could equally well say “Boys turned straight by childhood shift in genes”.


Predicting abortion attitudes

Quartz has an interesting analysis of a recent Twitter storm over abortion, triggered by the US Republicans’ attempts to defund Planned Parenthood.  The headline is striking “How to tell whether a Twitter user is pro-choice or pro-life without reading any of their tweets.”

The writers describe how they could use words in twitter profiles to predict people’s attitudes.  They also found that social network structure was a very strong predictor: people shared the views of those they followed.  They write “so polarized is the social network structure that even very basic, obvious characteristics stop mattering if we know who your friends are”

It might seem strange that you could do so well in predicting attitudes across multiple countries on a controversial topic. It would be strange, except that the data they used was restricted to a small group of people who were participating in a Twitter argument about abortion. The story admits this, but not until near the end.

In real life, you probably can’t learn that much about someone’s views on abortion by whether they tweet about cats or football. In the context of a small, highly polarised argument, you probably can.  In real life, people don’t necessarily agree with the views of the people they follow on Twitter, but in that context it’s not surprising that they do.  And in real life, if someone wants to find out your views on a controversial topic they’d probably be better off asking you than tracking down all your friends and asking them.


October 9, 2015

Predictive analytics and the rise of the machines

Some cautionary tales

  • “I would like to challenge this picture, and ask you to imagine data not as a pristine resource, but as a waste product, a bunch of radioactive, toxic sludge that we don’t know how to handle.” A talk by Maciej Ceglowski
  • How do you measure whether automated decision making ends up discriminating by race, when it doesn’t explicitly use race as an input? Two posts by Cathy O’Neil
  • A computer program that was accidentally trained to discriminate by gender and ethnicity
  • Why modern predictive analytics doesn’t give ‘algorithms’ in the sense of ‘recipes’, by Suresh Venkat (via @ndiakopoulos)


  • A 2010 post complaining about a continuing problem: when the media report on scientific papers that the journals haven’t yet made available to scientists.
  • Which bar is closest to a whole number in length?xl
    That’s right, the smallest one is exactly 1.0 and the others are all slightly larger than a whole number. Inspired by one of Kieran Healy’s examples
  • Linguistic statistics: G K Chesterton almost never used feminine pronouns in his novels.
  • The famous London Underground map, labelled with rents in the neighbourhood of each station. Would be interesting to see an Auckland map using trains and major bus routes. (via Flowing Data)
October 8, 2015

He’s a lumberjack and he’s inconsistently counted

Official statistics agencies publish lots of useful information that gets used by researchers, by educators, by businesses, by journalists, and (with the help of groups like Figure.NZ) by everyone else.  A dilemma for these agencies is how to handle changes in the best ways to measure something. If you never change the definitions you get perfectly consistent reports of no-longer-useful information. If you do change the definitions, things don’t match up.

This graph is from a blog post by a Canadian economist, Liveo Di Matteo. It shows the number of Canadians employed in the lumber industry over time, patched together from several Statistics Canada time series.


Dr Di Matteo is a professional, and wasn’t trying to do anything subtle here — he just wanted a lecture slide — and a lot of this data was from the time when Stats Canada was among the best in the world, so it’s not a problem that’s easy to avoid. It’s just harder than it sounds to define who works in the lumber industry. For example, are the log drivers in the lumber industry, or are they something like “transport workers, not elsewhere classified”?


October 6, 2015

When the lack of news is the story

There are new (provisional) suicide figures out for the year to June, and the Herald has a story (and has embedded the summary report).

The problem with news stories on this topic is that the important statistics haven’t changed.  Suicide rates have been pretty much constant over the 9 years of data shown. It’s still true that New Zealand has a high suicide rate, that it’s much higher for men than women, and that it’s much higher for Māori than non-Māori, and lowest for Asians.

There were slightly more suicides this year than in recent years, almost the same per-capita rate as in 2011/12.  Most of the increase was in men, but it’s still not any sort of clear sign of a trend.  The Herald story leads with the changes, as news has to, but the real story is that we still haven’t managed to change anything.

October 5, 2015

Our favourite bogus poll

It’s time for Forest & Bird’s Bird of the Year competition. As with any bogus poll, we won’t learn what the true popularity of the various NZ birds actually is.


As long as it’s clear that bogus polls are being used for entertainment and advertising, not to collect information, there isn’t a statistical problem with them.