Posts filed under Risk (176)

September 18, 2015

Compared to what? (transport chaos edition)

A while back, it looked as though the negotiations between NZ Bus and its drivers would break down and we would have bus strikes in Auckland. I considered various contingency plans: working from home for all or part of  a day, taking a train to Newmarket or Britomart and walking to the University, cycling, or catching a ride with a colleague who lives nearby. Some of these were options because we would have a week or so of warning before the strike.

If public transport in Auckland became permanently bad — if it went back to its state 20 years ago — I would have different options. I probably wouldn’t live in a house in Onehunga; I’d live in an apartment near the city centre. Moving to the city centre wouldn’t be a sensible response to a single day’s stoppage, but it would be sensible if the lack of buses was permanent.

Transport Blog has a post about the congestion benefits of the Wellington rail system, based on the week in June 2013 that it was taken out by a storm. On weekdays during this period, about 4000 people who would normally take the train into Wellington couldn’t. The roads became much more congested, and these delays can be valued (using plausible-looking assumptions) as worth over $5 million. Scaling this up to a full working year, the benefit to drivers in reduced driving time is worth rather a lot more than the public subsidy to the entire Wellington public transit system.

There’s a problem with simply scaling up the costs. If the Hutt Valley train line didn’t exist, some of those 4000 people would either live somewhere else or work somewhere else. Driving for an extra two hours each way was a rational response by them to a short-term outage, but in the long term they would reorganise their lives to not do it.

Now, there’s obviously a cost to moving from the Hutt to Wellington for these people — otherwise they’d be living in Wellington already — but the cost is less than would be estimated from the travel time during the outage. It’s hard to tell how much less without a lot more data and modelling.

On the other hand, while the storm data almost certainly overestimate the congestion-cost benefits of the train line, the magnitude of the estimated benefit is so large that the conclusion could quite easily hold even with better estimates.

September 9, 2015

Assessing popular opinion

One of the important roles played by good-quality opinion polls before an election is getting people’s expectations right.  It’s easy to believe that the opinions you hear everyday are representative, but for a lot of people they won’t be.  For example, here are the percentages for the National Party for each polling place in Auckland Central in the 2014 election. The curves show the margin of error around the overall vote for the electorate, which in this case wasn’t far from the overall for the whole country.


For lots of people in Auckland Central, their neighbours vote differently than the electorate as a whole.  You could do this for the whole country, especially if the data were in a more convenient form, and it would be more dramatic.

Pauline Kael, the famous New York movie critic, mentioned this issue in a talk to the Modern Languages Association

“I live in a rather special world. I only know one person who voted for Nixon. Where they are I don’t know. They’re outside my ken. But sometimes when I’m in a theater I can feel them.”

She’s usually misquoted in a way that reverses her meaning, but still illustrates the point.

It’s hard to get hold of popular opinion just from what you happen to come across in ordinary life, but there are some useful strategies. For example, on the flag question

  • How many people do you personally know in real life who had expressed a  preference for one of the Lockwood fern flags and now prefer Red Peak?
  • How many people do you follow on Twitter (or friend on Facebook, or whatever on WhatsApp) who had expressed a  preference for one of the Lockwood fern flags and now prefer Red Peak?

For me, the answer to both of these is “No-one”: the Red Peak enthusiasts that I know aren’t Lockwood converts. I know of some people who have changed their preferences that way — I heard because of my last StatsChat post — but I have no idea what the relevant denominator is.

The petition is currently just under 34,000 votes, having slowed down in the past day or so. I don’t see how Red Peak could have close to a million supporters.  More importantly, anyone who knows that it does must have important evidence they aren’t sharing. If the groundswell is genuinely this strong, it should be possible to come up with a few thousand dollars to get at least a cheap panel survey and demonstrate the level of support.

I don’t want to go too far in being negative. Enthusiasm for this option definitely goes beyond disaffected left-wing twitterati — it’s not just Red pique — but changing the final four at this point really should require some reason to believe the new flag could win. I don’t see it.

Opinion is still evolving, and maybe this time we’ll keep the Australia-lite flag and the country will support something I like next time.


August 30, 2015

Genetically targeted cancer treatment

Targeting cancer treatments to specific genetic variants has certainly had successes with common mutations — the most well known example must be Herceptin for an important subset of  breast cancer.  Reasonably affordable genetic sequencing has the potential for finding specific, uncommon mutations in cancers where there isn’t a standard, approved drug.

Most good ideas in medicine don’t work, of course, so it’s important to see if this genetic sequencing really helps, and how much it costs.  Ideally this would be in a randomised trial where patients are randomised to the best standard treatment or to genetically-targeted treatment. What we have so far is a comparison of disease progress for genetically-targeted treatment compared to a matched set of patients from the same clinic in previous years.  Here’s a press release, and two abstracts from a scientific conference.

In 72 out of 243 patients whose disease had progressed despite standard treatment, the researchers found a mutation that suggested the patient would benefit from some drug they wouldn’t normally have got. The median time until these patients starting getting worse again was 23 weeks; in the historical patients it was 12 weeks.

The Boston Globe has an interesting story talking to researchers and a patient (though it gets some of the details wrong).  The patient they interview had melanoma and got a drug approved for melanoma patients but only those with one specific mutation (since that’s where the drug was tested). Presumably, though the story doesn’t say, he had a different mutation in the same gene — that’s where the largest benefit of sequencing is likely to be.

An increase from 12 to 23 weeks isn’t terribly impressive, and it came at a cost of US$32000 — the abstract and press release say there wasn’t a cost increase, but that’s because they looked at cost per week, not total cost.  It’s not nothing, though; it’s probably large enough that a clinical trial makes sense and small enough that a trial is still ethical and feasible.

The Boston Globe story is one of the first products of their new health-and-medicine initiative, called “Stat“. That’s not short for “statistics;” it’s the medical slang meaning “right now”, from the Latin statum.

August 22, 2015

Changing who you count

The New York Times has a well-deserved reputation for data journalism, but anyone can have a bad day.  There’s a piece by Steven Johnson on the non-extinction of the music industry (which I think makes some good points), but which the Future of Music Coalition doesn’t like at all. And they also have some good points.

In particular, Johnson says

“According to the OES, in 1999 there were nearly 53,000 Americans who considered their primary occupation to be that of a musician, a music director or a composer; in 2014 more than 60,000 people were employed writing, singing, or playing music. That’s a rise of 15 percent.”


He’s right. This is a graph (not that you really need one)


The Future of Music Coalition give the numbers for each year, and they’re interesting. Here’s a graph of the totals:


There isn’t a simple increase; there’s a weird two-humped pattern. Why?

Well, if you look at the two categories, “Music Directors and Composers” and “Musicians and Singers”, making up the total, it’s quite revealing


The larger category, “Musicians and Singers”, has been declining.  The smaller category, “Music Directors and Composers” was going up slowly, then had a dramatic three-year, straight-line increase, then decreased a bit.

Going  into the Technical Notes for the estimates (eg, 2009), we see

May 2009 estimates are based on responses from six semiannual panels collected over a 3-year period

That means the three-year increase of 5000 jobs/year is probably a one-off increase of 15,000 jobs. Either the number of “Music Directors and Composers” more than doubled in 2009, or more likely there was a change in definitions or sampling approach.  The Future of Music Coalition point out that Bureau of Labor Statistics FAQs say this is a problem (though they’ve got the wrong link: it’s here, question F.1)

Challenges in using OES data as a time series include changes in the occupational, industrial, and geographical classification systems

In particular, the 2008 statistics estimate only 390 of these people as being employed in primary and secondary schools; the 2009 estimate is 6000, and the 2011 estimate is 16880. A lot of primary and secondary school teachers got reclassified into this group; it wasn’t a real increase.

When the school teachers are kept out of  “Music Directors and Composers”, to get better comparability across years, the change is from 53000 in 1999 to 47000 in 2014. That’s not a 15% increase; it’s an 11% decrease.

Official statistics agencies try not to change their definitions, precisely because of this problem, but they do have to keep up with a changing world. In the other direction, I wrote about a failure to change definitions that led the US Census Bureau to report four times as many pre-schoolers were cared for by fathers vs mothers.

August 5, 2015

What does 90% accuracy mean?

There was a lot of coverage yesterday about a potential new test for pancreatic cancer. 3News covered it, as did One News (but I don’t have a link). There’s a detailed report in the Guardian, which starts out:

A simple urine test that could help detect early-stage pancreatic cancer, potentially saving hundreds of lives, has been developed by scientists.

Researchers say they have identified three proteins which give an early warning of the disease, with more than 90% accuracy.

This is progress; pancreatic cancer is one of the diseases where there genuinely is a good prospect that early detection could improve treatment. The 90% accuracy, though, doesn’t mean what you probably think it means.

Here’s a graph showing how the error rate of the test changes with the numerical threshold used for diagnosis (figure 4, panel B, from the research paper)


As you move from left to right the threshold decreases; the test is more sensitive (picks up more of the true cases), but less specific (diagnoses more people who really don’t have cancer). The area under this curve is a simple summary of test accuracy, and that’s where the 90% number came from.  At what the researchers decided was the optimal threshold, the test correctly reported 82% of early-stage pancreatic cancers, but falsely reported a positive result in 11% of healthy subjects.  These figures are from the set of people whose data was used in putting the test together; in a new set of people (“validation dataset”) the error rate was very slightly worse.

The research was done with an approximately equal number of healthy people and people with early-stage pancreatic cancer. They did it that way because that gives the most information about the test for given number of people.  It’s reasonable to hope that the area under the curve, and the sensitivity and specificity of the test will be the same in the general population. Even so, the accuracy (in the non-technical meaning of the word) won’t be.

When you give this test to people in the general population, nearly all of them will not have pancreatic cancer. I don’t have NZ data, but in the UK the current annual rate of new cases goes from 4 people out of 100,000 at age 40 to 100 out of 100,000 people 85+.   The average over all ages is 13 cases per 100,000 people per year.

If 100,000 people are given the test and 13 have early-stage pancreatic cancer, about 10 or 11 of the 13 cases will have positive tests, but so will 11,000 healthy people.  Of those who test positive, 99.9% will not have pancreatic cancer.  This might still be useful, but it’s not what most people would think of as 90% accuracy.


July 28, 2015

Recreational genotyping: potentially creepy?

Two stories from this morning’s Twitter (via @kristinhenry)

  • 23andMe has made available a programming interface (API) so that you can access and integrate your genetic information using apps written by other people.  Someone wrote and published code that could be used to screen users based on sex and ancestry. (Buzzfeed, FastCompany). It’s not a real threat, since apps with more than 20 users need to be reviewed by 23andMe, and since users have to agree to let the code use their data, and since Facebook knows far more about you than 23andMe, but it’s not a good look.
  • Google’s Calico project also does cheap public genotyping and is combining their DNA data (more than a million people) with family trees from This is how genetic research used to be done: since we know how DNA is inherited, connecting people with family trees deep into the past provides a lot of extra information. On the other hand, it means that if a few distantly-related people sign up for Calico genotying, Google will learn a lot about the genomes of all their relatives.

It’s too early to tell whether the people who worry about this sort of thing will end up looking prophetic or just paranoid.

July 24, 2015

Are beneficiaries increasingly failing drug test?

Stuff’s headline is “Beneficiaries increasingly failing drug tests, numbers show”.

The numbers are rates per week of people failing or refusing drug tests. The number was 1.8/week for the first 12 weeks of the policy and 2.6/week for the whole year 2014, and, yes, 2.6 is bigger than 1.8.  However, we don’t know how many tests were performed or demanded, so we don’t know how much of this might be an increase in testing.

In addition, if we don’t worry about the rate of testing and take the numbers at face value, the difference is well within what you’d expect from random variation, so while the numbers are higher it would be unwise to draw any policy conclusions from the difference.

On the other hand, the absolute numbers of failures are very low when compared to the estimates in the Treasury’s Regulatory Impact Statement.

MSD and MoH have estimated that once this policy is fully implemented, it may result in:

• 2,900 – 5,800 beneficiaries being sanctioned for a first failure over a 12 month period

• 1,000 – 1,900 beneficiaries being sanctioned for a second failure over a 12 month period

• 500 – 1,100 beneficiaries being sanctioned for a third failure over a 12 month period.

The numbers quoted by Stuff are 60 sanctions in total over eighteen months, and 134 test failures over twelve months.  The Minister is quoted as saying the low numbers show the program is working, but as she could have said the same thing about numbers that looked like the predictions, or numbers that were higher than the predictions, it’s also possible that being off by an order of magnitude or two is a sign of a problem.


July 22, 2015

Are reusable shopping bags deadly?

There’s a research report by two economists arguing that San Francisco’s bag on plastic shopping bags has led to a nearly 50% increase in deaths from foodborne disease, an increase of about 5.5 deaths per year.  I was asked my opinion on Twitter. I don’t believe it.

What the analysis does show is some evidence that emergency room visits for foodborne disease have increased: the researchers analysed admissions for E. coli, Salmonella, and Campylobacter infection, and found an increase in San Francisco but not in neighbouring counties. There’s a statistical issue in that the number of counties is small and the standard error estimates tend to be a bit unreliable in that setting, but that’s not prohibitive. There’s also a statistical issue in that we don’t know which (if any) infections were related to contamination of raw food, but again that’s not prohibitive.

The problem with the analysis of deaths is the definition: the deaths in the analysis were actually all of the ICD10 codes A00-A09. Most of this isn’t foodborne bacterial disease, and a lot of the deaths from foodborne bacterial disease will be in settings where shopping bags are irrelevant. In particular, two important contributors are

  • Clostridium difficile infections after antibiotic use, which has a fairly high mortality rate
  • Diarrhoea in very frail elderly people, in residential aged care or nursing homes.

In the first case, this has nothing to do with food. In the second case, it’s often person-to-person transmission (with norovirus a leading cause), but even if it is from food, the food isn’t carried in reusable shopping bags.

Tomás Aragón with the San Francisco department of Public Health, has a more detailed breakdown of the death data than were available to the researchers. His memo I think is too negative on the statistical issues, but the data underlying the A00-A09 categories are pretty convincing:


Category A021 is Salmonella (other than typhoid); A048 and A049 are other miscellaneous bacterial infections; A081 and A084 are viral. A090 and A099 are left-over categories that are supposed to exclude foodborne disease but will capture some cases where the mechanism of infection wasn’t known.  A047 is Clostridium difficile.   The apparent signal is in the wrong place. It’s not obvious why the statistical analysis thinks it has found evidence of an effect of the plastic-bag ban, but it is obvious that it hasn’t.

Here, for comparison, are New Zealand mortality data for specific foodborne infections, from, the most recent year available


Over the three years, there were only ten deaths where the underlying cause was one of these food-borne illnesses — a lot of people get sick, but very few die.


The mortality data don’t invalidate the analysis of hospital admissions, where there’s a lot more information and it is actually about (potentially) foodborne diseases.  More data from other cities — especially ones that are less atypical than San Francisco — would be helpful here, and it’s possible that this is a real effect of reusing bags. The economic analysis,however, relies heavily on the social costs of deaths.

July 11, 2015

What’s in a name?

The Herald was, unsurprisingly, unable to resist the temptation of leaked data on house purchases in Auckland.  The basic points are:

  • Data on the names of buyers for one agency, representing 45% fo the market, for three months
  • Based on the names, an estimate that nearly 40% of the buyers were of Chinese ethnicity
  • This is more than the proportion of people of Chinese ethnicity in Auckland
  • Oh Noes! Foreign speculators! (or Oh Noes! Foreign investors!)

So, how much of this is supported by the various data?

First, the surnames.  This should be accurate for overall proportions of Chinese vs non-Chinese ethnicity if it was done carefully. The vast majority of people called, say, “Smith” will not be Chinese; the vast majority of people called, say, “Xu” will be Chinese; people called “Lee” will split in some fairly predictable proportion.  The same is probably true for, say, South Asian names, but Māori vs non-Māori would be less reliable.

So, we have fairly good evidence that people of Chinese ancestry are over-represented as buyers from this particular agency, compared to the Auckland population.

Second: the representativeness of the agency. It would not be at all surprising if migrants, especially those whose first language isn’t English, used real estate agents more than people born in NZ. It also wouldn’t be surprising if they were more likely to use some agencies than others. However, the claim is that these data represent 45% of home sales. If that’s true, people with Chinese names are over-represented compared to the Auckland population no matter how unrepresentative this agency is. Even if every Chinese buyer used this agency, the proportion among all buyers would still be more than 20%.

So, there is fairly good evidence that people of Chinese ethnicity are buying houses in Auckland at a higher rate than their proportion of the population.

The Labour claim extends this by saying that many of the buyers must be foreign. The data say nothing one way or the other about this, and it’s not obvious that it’s true. More precisely, since the existence of foreign investors is not really in doubt, it’s not obvious how far it’s true. The simple numbers don’t imply much, because relatively few people are housing buyers: for example, house buyers named “Wang” in the data set are less than 4% of Auckland residents named “Wang.” There are at least three other competing explanations, and probably more.

First, recent migrants are more likely to buy houses. I bought a house three years ago. I hadn’t previously bought one in Auckland. I bought it because I had moved to Auckland and I wanted somewhere to live. Consistent with this explanation, people with Korean and Indian names, while not over-represented to the same extent are also more likely to be buying than selling houses, by about the same ratio as Chinese.

Second, it could be that (some subset of) Chinese New Zealanders prefer real estate as an investment to, say, stocks (to an even greater extent than Aucklanders in general).  Third, it could easily be that (some subset of) Chinese New Zealanders have a higher savings rate than other New Zealanders, and so have more money to invest in houses.

Personally, I’d guess that all these explanations are true: that Chinese New Zealanders (on average) buy both homes and investment properties more than other New Zealanders, and that there are foreign property investors of Chinese ethnicity. But that’s a guess: these data don’t tell us — as the Herald explicitly points out.

One of the repeated points I  make on StatsChat is that you need to distinguish between what you measured and what you wanted to measure.  Using ‘Chinese’ as a surrogate for ‘foreign’ will capture many New Zealanders and miss out on many foreigners.

The misclassifications aren’t just unavoidable bad luck, either. If you have a measure of ‘foreign real estate ownership’ that includes my next-door neighbours and excludes James Cameron, you’re doing it wrong, and in a way that has a long and reprehensible political history.

But on top of that, if there is substantial foreign investment and if it is driving up prices, that’s only because of the artificial restrictions on the supply of Auckland houses. If Auckland could get its consent and zoning right, so that more money meant more homes, foreign investment wouldn’t be a problem for people trying to find somewhere to live. That’s a real problem, and it’s one that lies within the power of governments to solve.

July 9, 2015

Interesting graph of the day

From Matt Levine at Bloomberg


This is a graph of cumulative US stock trades today. The pink circle is centred at 11:32am, when the New York Stock Exchange had technical problems and shut down. Notice how nothing happens: the computers adapt very quickly to having a slightly smaller range of places to trade. As Levine puts it:

“For the most part the system is muddling along, relatively normally,” says a guy, and presumably if you asked a computer it would be even more chill.