Posts filed under Risk (174)

August 30, 2015

Genetically targeted cancer treatment

Targeting cancer treatments to specific genetic variants has certainly had successes with common mutations — the most well known example must be Herceptin for an important subset of  breast cancer.  Reasonably affordable genetic sequencing has the potential for finding specific, uncommon mutations in cancers where there isn’t a standard, approved drug.

Most good ideas in medicine don’t work, of course, so it’s important to see if this genetic sequencing really helps, and how much it costs.  Ideally this would be in a randomised trial where patients are randomised to the best standard treatment or to genetically-targeted treatment. What we have so far is a comparison of disease progress for genetically-targeted treatment compared to a matched set of patients from the same clinic in previous years.  Here’s a press release, and two abstracts from a scientific conference.

In 72 out of 243 patients whose disease had progressed despite standard treatment, the researchers found a mutation that suggested the patient would benefit from some drug they wouldn’t normally have got. The median time until these patients starting getting worse again was 23 weeks; in the historical patients it was 12 weeks.

The Boston Globe has an interesting story talking to researchers and a patient (though it gets some of the details wrong).  The patient they interview had melanoma and got a drug approved for melanoma patients but only those with one specific mutation (since that’s where the drug was tested). Presumably, though the story doesn’t say, he had a different mutation in the same gene — that’s where the largest benefit of sequencing is likely to be.

An increase from 12 to 23 weeks isn’t terribly impressive, and it came at a cost of US$32000 — the abstract and press release say there wasn’t a cost increase, but that’s because they looked at cost per week, not total cost.  It’s not nothing, though; it’s probably large enough that a clinical trial makes sense and small enough that a trial is still ethical and feasible.

The Boston Globe story is one of the first products of their new health-and-medicine initiative, called “Stat“. That’s not short for “statistics;” it’s the medical slang meaning “right now”, from the Latin statum.

August 22, 2015

Changing who you count

The New York Times has a well-deserved reputation for data journalism, but anyone can have a bad day.  There’s a piece by Steven Johnson on the non-extinction of the music industry (which I think makes some good points), but which the Future of Music Coalition doesn’t like at all. And they also have some good points.

In particular, Johnson says

“According to the OES, in 1999 there were nearly 53,000 Americans who considered their primary occupation to be that of a musician, a music director or a composer; in 2014 more than 60,000 people were employed writing, singing, or playing music. That’s a rise of 15 percent.”


He’s right. This is a graph (not that you really need one)


The Future of Music Coalition give the numbers for each year, and they’re interesting. Here’s a graph of the totals:


There isn’t a simple increase; there’s a weird two-humped pattern. Why?

Well, if you look at the two categories, “Music Directors and Composers” and “Musicians and Singers”, making up the total, it’s quite revealing


The larger category, “Musicians and Singers”, has been declining.  The smaller category, “Music Directors and Composers” was going up slowly, then had a dramatic three-year, straight-line increase, then decreased a bit.

Going  into the Technical Notes for the estimates (eg, 2009), we see

May 2009 estimates are based on responses from six semiannual panels collected over a 3-year period

That means the three-year increase of 5000 jobs/year is probably a one-off increase of 15,000 jobs. Either the number of “Music Directors and Composers” more than doubled in 2009, or more likely there was a change in definitions or sampling approach.  The Future of Music Coalition point out that Bureau of Labor Statistics FAQs say this is a problem (though they’ve got the wrong link: it’s here, question F.1)

Challenges in using OES data as a time series include changes in the occupational, industrial, and geographical classification systems

In particular, the 2008 statistics estimate only 390 of these people as being employed in primary and secondary schools; the 2009 estimate is 6000, and the 2011 estimate is 16880. A lot of primary and secondary school teachers got reclassified into this group; it wasn’t a real increase.

When the school teachers are kept out of  “Music Directors and Composers”, to get better comparability across years, the change is from 53000 in 1999 to 47000 in 2014. That’s not a 15% increase; it’s an 11% decrease.

Official statistics agencies try not to change their definitions, precisely because of this problem, but they do have to keep up with a changing world. In the other direction, I wrote about a failure to change definitions that led the US Census Bureau to report four times as many pre-schoolers were cared for by fathers vs mothers.

August 5, 2015

What does 90% accuracy mean?

There was a lot of coverage yesterday about a potential new test for pancreatic cancer. 3News covered it, as did One News (but I don’t have a link). There’s a detailed report in the Guardian, which starts out:

A simple urine test that could help detect early-stage pancreatic cancer, potentially saving hundreds of lives, has been developed by scientists.

Researchers say they have identified three proteins which give an early warning of the disease, with more than 90% accuracy.

This is progress; pancreatic cancer is one of the diseases where there genuinely is a good prospect that early detection could improve treatment. The 90% accuracy, though, doesn’t mean what you probably think it means.

Here’s a graph showing how the error rate of the test changes with the numerical threshold used for diagnosis (figure 4, panel B, from the research paper)


As you move from left to right the threshold decreases; the test is more sensitive (picks up more of the true cases), but less specific (diagnoses more people who really don’t have cancer). The area under this curve is a simple summary of test accuracy, and that’s where the 90% number came from.  At what the researchers decided was the optimal threshold, the test correctly reported 82% of early-stage pancreatic cancers, but falsely reported a positive result in 11% of healthy subjects.  These figures are from the set of people whose data was used in putting the test together; in a new set of people (“validation dataset”) the error rate was very slightly worse.

The research was done with an approximately equal number of healthy people and people with early-stage pancreatic cancer. They did it that way because that gives the most information about the test for given number of people.  It’s reasonable to hope that the area under the curve, and the sensitivity and specificity of the test will be the same in the general population. Even so, the accuracy (in the non-technical meaning of the word) won’t be.

When you give this test to people in the general population, nearly all of them will not have pancreatic cancer. I don’t have NZ data, but in the UK the current annual rate of new cases goes from 4 people out of 100,000 at age 40 to 100 out of 100,000 people 85+.   The average over all ages is 13 cases per 100,000 people per year.

If 100,000 people are given the test and 13 have early-stage pancreatic cancer, about 10 or 11 of the 13 cases will have positive tests, but so will 11,000 healthy people.  Of those who test positive, 99.9% will not have pancreatic cancer.  This might still be useful, but it’s not what most people would think of as 90% accuracy.


July 28, 2015

Recreational genotyping: potentially creepy?

Two stories from this morning’s Twitter (via @kristinhenry)

  • 23andMe has made available a programming interface (API) so that you can access and integrate your genetic information using apps written by other people.  Someone wrote and published code that could be used to screen users based on sex and ancestry. (Buzzfeed, FastCompany). It’s not a real threat, since apps with more than 20 users need to be reviewed by 23andMe, and since users have to agree to let the code use their data, and since Facebook knows far more about you than 23andMe, but it’s not a good look.
  • Google’s Calico project also does cheap public genotyping and is combining their DNA data (more than a million people) with family trees from This is how genetic research used to be done: since we know how DNA is inherited, connecting people with family trees deep into the past provides a lot of extra information. On the other hand, it means that if a few distantly-related people sign up for Calico genotying, Google will learn a lot about the genomes of all their relatives.

It’s too early to tell whether the people who worry about this sort of thing will end up looking prophetic or just paranoid.

July 24, 2015

Are beneficiaries increasingly failing drug test?

Stuff’s headline is “Beneficiaries increasingly failing drug tests, numbers show”.

The numbers are rates per week of people failing or refusing drug tests. The number was 1.8/week for the first 12 weeks of the policy and 2.6/week for the whole year 2014, and, yes, 2.6 is bigger than 1.8.  However, we don’t know how many tests were performed or demanded, so we don’t know how much of this might be an increase in testing.

In addition, if we don’t worry about the rate of testing and take the numbers at face value, the difference is well within what you’d expect from random variation, so while the numbers are higher it would be unwise to draw any policy conclusions from the difference.

On the other hand, the absolute numbers of failures are very low when compared to the estimates in the Treasury’s Regulatory Impact Statement.

MSD and MoH have estimated that once this policy is fully implemented, it may result in:

• 2,900 – 5,800 beneficiaries being sanctioned for a first failure over a 12 month period

• 1,000 – 1,900 beneficiaries being sanctioned for a second failure over a 12 month period

• 500 – 1,100 beneficiaries being sanctioned for a third failure over a 12 month period.

The numbers quoted by Stuff are 60 sanctions in total over eighteen months, and 134 test failures over twelve months.  The Minister is quoted as saying the low numbers show the program is working, but as she could have said the same thing about numbers that looked like the predictions, or numbers that were higher than the predictions, it’s also possible that being off by an order of magnitude or two is a sign of a problem.


July 22, 2015

Are reusable shopping bags deadly?

There’s a research report by two economists arguing that San Francisco’s bag on plastic shopping bags has led to a nearly 50% increase in deaths from foodborne disease, an increase of about 5.5 deaths per year.  I was asked my opinion on Twitter. I don’t believe it.

What the analysis does show is some evidence that emergency room visits for foodborne disease have increased: the researchers analysed admissions for E. coli, Salmonella, and Campylobacter infection, and found an increase in San Francisco but not in neighbouring counties. There’s a statistical issue in that the number of counties is small and the standard error estimates tend to be a bit unreliable in that setting, but that’s not prohibitive. There’s also a statistical issue in that we don’t know which (if any) infections were related to contamination of raw food, but again that’s not prohibitive.

The problem with the analysis of deaths is the definition: the deaths in the analysis were actually all of the ICD10 codes A00-A09. Most of this isn’t foodborne bacterial disease, and a lot of the deaths from foodborne bacterial disease will be in settings where shopping bags are irrelevant. In particular, two important contributors are

  • Clostridium difficile infections after antibiotic use, which has a fairly high mortality rate
  • Diarrhoea in very frail elderly people, in residential aged care or nursing homes.

In the first case, this has nothing to do with food. In the second case, it’s often person-to-person transmission (with norovirus a leading cause), but even if it is from food, the food isn’t carried in reusable shopping bags.

Tomás Aragón with the San Francisco department of Public Health, has a more detailed breakdown of the death data than were available to the researchers. His memo I think is too negative on the statistical issues, but the data underlying the A00-A09 categories are pretty convincing:


Category A021 is Salmonella (other than typhoid); A048 and A049 are other miscellaneous bacterial infections; A081 and A084 are viral. A090 and A099 are left-over categories that are supposed to exclude foodborne disease but will capture some cases where the mechanism of infection wasn’t known.  A047 is Clostridium difficile.   The apparent signal is in the wrong place. It’s not obvious why the statistical analysis thinks it has found evidence of an effect of the plastic-bag ban, but it is obvious that it hasn’t.

Here, for comparison, are New Zealand mortality data for specific foodborne infections, from, the most recent year available


Over the three years, there were only ten deaths where the underlying cause was one of these food-borne illnesses — a lot of people get sick, but very few die.


The mortality data don’t invalidate the analysis of hospital admissions, where there’s a lot more information and it is actually about (potentially) foodborne diseases.  More data from other cities — especially ones that are less atypical than San Francisco — would be helpful here, and it’s possible that this is a real effect of reusing bags. The economic analysis,however, relies heavily on the social costs of deaths.

July 11, 2015

What’s in a name?

The Herald was, unsurprisingly, unable to resist the temptation of leaked data on house purchases in Auckland.  The basic points are:

  • Data on the names of buyers for one agency, representing 45% fo the market, for three months
  • Based on the names, an estimate that nearly 40% of the buyers were of Chinese ethnicity
  • This is more than the proportion of people of Chinese ethnicity in Auckland
  • Oh Noes! Foreign speculators! (or Oh Noes! Foreign investors!)

So, how much of this is supported by the various data?

First, the surnames.  This should be accurate for overall proportions of Chinese vs non-Chinese ethnicity if it was done carefully. The vast majority of people called, say, “Smith” will not be Chinese; the vast majority of people called, say, “Xu” will be Chinese; people called “Lee” will split in some fairly predictable proportion.  The same is probably true for, say, South Asian names, but Māori vs non-Māori would be less reliable.

So, we have fairly good evidence that people of Chinese ancestry are over-represented as buyers from this particular agency, compared to the Auckland population.

Second: the representativeness of the agency. It would not be at all surprising if migrants, especially those whose first language isn’t English, used real estate agents more than people born in NZ. It also wouldn’t be surprising if they were more likely to use some agencies than others. However, the claim is that these data represent 45% of home sales. If that’s true, people with Chinese names are over-represented compared to the Auckland population no matter how unrepresentative this agency is. Even if every Chinese buyer used this agency, the proportion among all buyers would still be more than 20%.

So, there is fairly good evidence that people of Chinese ethnicity are buying houses in Auckland at a higher rate than their proportion of the population.

The Labour claim extends this by saying that many of the buyers must be foreign. The data say nothing one way or the other about this, and it’s not obvious that it’s true. More precisely, since the existence of foreign investors is not really in doubt, it’s not obvious how far it’s true. The simple numbers don’t imply much, because relatively few people are housing buyers: for example, house buyers named “Wang” in the data set are less than 4% of Auckland residents named “Wang.” There are at least three other competing explanations, and probably more.

First, recent migrants are more likely to buy houses. I bought a house three years ago. I hadn’t previously bought one in Auckland. I bought it because I had moved to Auckland and I wanted somewhere to live. Consistent with this explanation, people with Korean and Indian names, while not over-represented to the same extent are also more likely to be buying than selling houses, by about the same ratio as Chinese.

Second, it could be that (some subset of) Chinese New Zealanders prefer real estate as an investment to, say, stocks (to an even greater extent than Aucklanders in general).  Third, it could easily be that (some subset of) Chinese New Zealanders have a higher savings rate than other New Zealanders, and so have more money to invest in houses.

Personally, I’d guess that all these explanations are true: that Chinese New Zealanders (on average) buy both homes and investment properties more than other New Zealanders, and that there are foreign property investors of Chinese ethnicity. But that’s a guess: these data don’t tell us — as the Herald explicitly points out.

One of the repeated points I  make on StatsChat is that you need to distinguish between what you measured and what you wanted to measure.  Using ‘Chinese’ as a surrogate for ‘foreign’ will capture many New Zealanders and miss out on many foreigners.

The misclassifications aren’t just unavoidable bad luck, either. If you have a measure of ‘foreign real estate ownership’ that includes my next-door neighbours and excludes James Cameron, you’re doing it wrong, and in a way that has a long and reprehensible political history.

But on top of that, if there is substantial foreign investment and if it is driving up prices, that’s only because of the artificial restrictions on the supply of Auckland houses. If Auckland could get its consent and zoning right, so that more money meant more homes, foreign investment wouldn’t be a problem for people trying to find somewhere to live. That’s a real problem, and it’s one that lies within the power of governments to solve.

July 9, 2015

Interesting graph of the day

From Matt Levine at Bloomberg


This is a graph of cumulative US stock trades today. The pink circle is centred at 11:32am, when the New York Stock Exchange had technical problems and shut down. Notice how nothing happens: the computers adapt very quickly to having a slightly smaller range of places to trade. As Levine puts it:

“For the most part the system is muddling along, relatively normally,” says a guy, and presumably if you asked a computer it would be even more chill.

June 7, 2015

What does 80% accurate mean?

From Stuff (from the Telegraph)

And the scientists claim they do not even need to carry out a physical examination to predict the risk accurately. Instead, people are questioned about their walking speed, financial situation, previous illnesses, marital status and whether they have had previous illnesses.

Participants can calculate their five-year mortality risk as well as their “Ubble age” – the age at which the average mortality risk in the population is most similar to the estimated risk. Ubble stands for “UK Longevity Explorer” and researchers say the test is 80 per cent accurate.

There are two obvious questions based on this quote: what does it mean for the test to be 80 per cent accurate, and how does “Ubble” stand for “UK Longevity Explorer”? The second question is easier: the data underlying the predictions are from the UK Biobank, so presumably “Ubble” comes from “UK Biobank Longevity Explorer.”

An obvious first guess at the accuracy question would be that the test is 80% right in predicting whether or not you will survive 5 years. That doesn’t fly. First, the test gives a percentage, not a yes/no answer. Second, you can do a lot better than 80% in predicting whether someone will survive 5 years or not just by guessing “yes” for everyone.

The 80% figure doesn’t refer to accuracy in predicting death, it refers to discrimination: the ability to get higher predicted risks for people at higher actual risk. Specifically, it claims that if you pick pairs of  UK residents aged 40-70, one of whom dies in the next five years and the other doesn’t, the one who dies will have a higher predicted risk in 80% of pairs.

So, how does it manage this level of accuracy, and why do simple questions like self-rated health, self-reported walking speed, and car ownership show up instead of weight or cholesterol or blood pressure? Part of the answer is that Ubble is looking only at five-year risk, and only in people under 70. If you’re under 70 and going to die within five years, you’re probably sick already. Asking you about your health or your walking speed turns out to be a good way of finding if you’re sick.

This table from the research paper behind the Ubble shows how well different sorts of information predict.


Age on its own gets you 67% accuracy, and age plus asking about diagnosed serious health conditions (the Charlson score) gets you to 75%.  The prediction model does a bit better, presumably it’s better at picking up a chance of undiagnosed disease.  The usual things doctors nag you about, apart from smoking, aren’t in there because they usually take longer than five years to kill you.

As an illustration of the importance of age and basic health in the prediction, if you put in data for a 60-year old man living with a partner/wife/husband, who smokes but is healthy apart from high blood pressure, the predicted percentage for dying is 4.1%.

The result comes with this well-designed graphic using counts out of 100 rather than fractions, and illustrating the randomness inherent in the prediction by scattering the four little red people across the panel.


Back to newspaper issues: the Herald also ran a Telegraph story (a rather worse one), but followed it up with a good repost from The Conversation by two of the researchers. None of these stories mentioned that the predictions will be less accurate for New Zealand users. That’s partly because the predictive model is calibrated to life expectancy, general health positivity/negativity, walking speeds, car ownership, and diagnostic patterns in Brits. It’s also because there are three questions on UK government disability support, which in our case we have not got.


May 30, 2015

Coffee health limit exaggerated

The Herald says

Drinking the caffeine equivalent of more than four espressos a day is harmful to health, especially for minors and pregnant women, the European Union food safety agency has said.

“It is the first time that the risks from caffeine from all dietary sources have been assessed at EU level,” the EFSA said, recommending that an adult’s daily caffeine intake remain below 400mg a day.

Deciding a recommended limit was a request of the European Commission, the EU’s executive body, to try to find a Europe-wide benchmark for caffeine consumption.

But regulators said the most worrying aspect was not the espressos and lattes consumed on cafe terraces across Europe, but Red Bull-style energy drinks, hugely popular with the young.

Contrast that with the Scientific Opinion on the safety of caffeine from the EFSA Panel on Dietetic Products, Nutrition, and Allergies (PDF of the whole thing). First, what they were asked for

the EFSA Panel … was asked to deliver a scientific opinion on the safety of caffeine. Advice should be provided on a daily intake of caffeine, from all sources, that does not give rise to concerns about harmful effects to health for the general population and for specific subgroups of the population. Possible interactions between caffeine and other constituents of so-called “energy drinks”, alcohol, synephrine and physical exercise should also be addressed.

and what they concluded (there’s more than 100 pages extra detail if you want it)

Single doses of caffeine up to 200 mg, corresponding to about 3 mg/kg bw for a 70-kg adult are unlikely to induce clinically relevant changes in blood pressure, myocardial blood flow, hydration status or body temperature, to  reduce perceived extertion/effort during exercise or to mask the subjective perception of alcohol intoxication. Daily caffeine intakes from all sources up to 400 mg per day do not raise safety concerns for adults in the general population, except pregnant women. Other common constituents of “energy drinks” (i.e. taurine, D-glucurono-γ- lactone) or alcohol are unlikely to adversely interact with caffeine. The short- and long-term effects of co-consumption of caffeine and synephrine on the cardiovascular system have not been adequately investigated in humans. Daily caffeine intakes from all sources up to 200 mg per day by pregnant women do not raise safety concerns for the fetus. For children and adolescents, the information available is insufficient to base a safe level of caffeine intake. The Panel considers that caffeine intakes of no concern derived for acute consumption in adults (3 mg/kg bw per day) may serve as a basis to derive daily caffeine intakes of no concern for children and adolescents.

Or, in even shorter paraphrase.

<shrugs> If you need a safe level, four cups a day seems pretty harmless in healthy people, and there doesn’t seem to be a special reason to worry about teenagers.