Posts filed under Risk (216)

January 8, 2018

Not dropping every year

Stuff has a story on road deaths, where Julie Ann Genter claims the Roads of National Significance are partly responsible for the increase in death rates. Unsurprisingly, Judith Collins disagrees.  The story goes on to say (it’s not clear if this is supposed to be indirect quotation from Judith Collins)

From a purely statistical viewpoint the road toll is lowering – for every 10,000 cars on the road, the number of deaths is dropping every year.

From a purely statistical viewpoint, this doesn’t seem to be true. The Ministry of Transport provides tables that show a rate of fatalities per 10,000 registered vehicles of 0.077 in 2013, 0.086 in 2014,  0.091 in 2015, and  0.090 in 2016. Here’s a graph, first raw

and now with a fitted trend (on a log scale, since the trend is straighter that way)

Now, it’s possible there’s some other way of defining the rate that doesn’t show it going up each year. And there’s a question of random variation as always. But if you scale for vehicles actually on the road, by using total distance travelled, we saw last year that there’s pretty convincing evidence of an increase in the underlying rate, over and above random variation.

The story goes on to say “But Genter is not buying into the statistics.” If she’s planning to make the roads safer, I hope that isn’t true.

November 23, 2017

More complicated than that

Science Daily

Computerized brain-training is now the first intervention of any kind to reduce the risk of dementia among older adults.

Daily Telegraph

Pensioners can reduce their risk of dementia by nearly a third by playing a computer brain training game similar to a driving hazard perception test, a new study suggests.

Ars Technica

Speed of processing training turned out to be the big winner. After ten years, participants in this group—and only this group—had reduced rates of dementia compared to the controls

The research paper is here, and the abstract does indeed say “Speed training resulted in reduced risk of dementia compared to control, but memory and reasoning training did not”

They’re overselling it a bit. First, these are intervals showing the ratios of number of cases with and without the three types of treatment, including the uncertainty


Summarising this as “speed training works but the other two don’t” is misleading.  There’s pretty marginal evidence that speed training is beneficial and even less evidence that it’s better than the other two.

On top of that, the results are for less than half the originally-enrolled participants, the ‘dementia’ they’re measuring isn’t a standard clinical definition, and this is a study whose 10-year follow-up ended in 2010 and that had a lot of ‘primary outcomes’ it was looking for — which didn’t include the one in this paper.

The study originally expected to see positive results after two years. It didn’t. Again, after five years, the study reported “Cognitive training did not affect rates of incident dementia after 5 years of follow-up.”  Ten-year results reported in 2014, showed relatively modest differences in people’s ability to take care of themselves, as Hilda Bastian commented.

So. This specific type of brain training might actually help. Or one of the other sorts of brain training they tried might help. Or, quite possibly, none of them might help.  On the other hand, these are relatively unlikely to be harmful, and maybe someone will produce an inexpensive app or something.

October 30, 2017

Past results do not imply future performance


A rugby team that has won a lot of games this year is likely to do fairly well next year: they’re probably a good team.  Someone who has won a lot of money betting on rugby this year is much less likely to keep doing well: there was probably luck involved. Someone who won a lot of money on Lotto this year is almost certain to do worse next year: we can be pretty sure the wins were just luck. How about mutual funds and the stock market?

Morningstar publishes ratings of mutual funds, with one to five stars based on past performance. The Wall Street Journal published an article saying (a) investors believe these are predictive of future performance and (b) they’re wrong.  Morningstar then fought back, saying (a) we tell them it’s based on past performance, not a prediction and (b) it is, too, predictive. And, surprisingly, it is.

Matt Levine (of Bloomberg; annoying free registration) and his readers had an interesting explanation (scroll way down)

Several readers, though, proposed an explanation. Morningstar rates funds based on net-of-fee performance, and takes into account sales loads. And fees are predictive. Funds that were good at picking stocks in the past will, on average, be average at picking stocks in the future; funds that were bad at picking stocks in the past will, on average, be average at picking stocks in the future; that is in the nature of stock picking. But funds with low fees in the past will probably have low fees in the future, and funds with high fees in the past will probably have high fees in the future. And since net performance is made up of (1) stock picking minus (2) fees, you’d expect funds with low fees to have, on average, persistent slightly-better-than-average performance.

That’s supported by one of Morningstar’s own reports.

The expense ratio and the star rating helped investors make better decisions. The star rating and expense ratios were pretty even on the success ratio–the closest thing to a bottom line. By and large, the star ratings from 2005 and 2008 beat expense ratios while expense ratios produced the best success ratios in 2006 and 2007. Overall, expense ratios outdid stars in 23 out of 40 (58%) observations.

A better data analysis for our purposes would look at star ratings for different funds matched on fees, rather than looking at the two separately.  It’s still a neat example of how you need to focus on the right outcome measurement. Mutual fund trading performance may not be usefully predictable, but even if it isn’t, mutual fund returns to the customer are, at least a little bit.


October 23, 2017

Questions to ask

There’s a story in a lot of the British media (via Robin Evans on Twitter) about a plan to raise speed limits near highway roadworks. The speed limit is currently 50mph and the proposal is to raise it to 55mph or 60mph.

Obviously this is an significant issue, with potential safety and travel time consequences.  And Highways England did some research. This is the key part of the description in the stories (presumably from a press release that isn’t yet on the Highways England website)

More than 36 participants took part in each trial and were provided with dashcams and watches incorporating heart-rate monitors and GPS trackers to measure their reactions.

The tests took place at 60mph on the M5 between junction 4a (Bromsgrove) to 6 (Worcester) and at 55mph on the M3 in Surrey between junction 3 and 4a.

According to Highways England 60% of participants recorded a decrease in average heart rate in the 60mph trial zone and 56% presented a decrease on the 55mph trial.

That’s a bit light on detail — how many more than 36; does 60% decrease mean 40% increase; are they saying that the 4 percentage point difference between 55 and 60mph is enough to matter or not enough to matter?

More importantly, though, why is a heart rate decrease in drivers even the question?  I’m not saying it can’t be. Maybe there’s some good reason why it’s reliable information about safety, but if there is the journalists didn’t think to ask about it.

A few stories, such as the one in the Mirror, had a little bit more

“Increasing the speed limit to 60mph where appropriate also enables motorists who feel threatened by the close proximity of HGVs in roadworks to free themselves.”

Even so, is this a finding of the research (why motorists felt safer, or even that they felt safer)? Is it a conclusion from the heart rate monitors? Is it from asking the drivers? Is it just a hypothetical explanation pulled out of the air?

If you’re going to make a scientific-sounding measurement the foundation of this story, you need to explain why it answers some real question. And linking to more information would, as usual, be nice.

October 13, 2017

Road deaths up

Sam Warburton (the economist, not the rugby player) has been writing about the recent increase in road deaths. Here are the counts (with partial 2017 data)


The first question you should ask is whether this is explained by population increases or by driving increases. That is, we want rates — deaths per unit of distance travelled


There’s still an increase, but now the 2017 partial data are in line with the increase. The increase cannot be explained simply by more cars being on the roads.

The next question is about uncertainty.  Traditionally, news stories about the road toll were based on one month of data and random variation could explain it all. We still need a model for how much random variation to expect.  What I said before was

The simplest mathematical model for counts is the Poisson process.  If dying in a car crash is independent for any two people in NZ, and the chance is small for any person (but not necessarily the same for different people) then number of deaths over any specified time period will follow a Poisson distribution.    The model cannot be exactly right — multiple fatalities would be much rarer if it were — but it is a good approximation, and any more detailed model would lead to more random variation in the road toll than the Poisson process does.

In that case I was arguing that there wasn’t any real evidence of a change, so using an underestimate of the random variation made my case harder. In this case I’m arguing the change is larger than random variation, so I need to make sure I don’t underestimate random variation.

What I did was fit a Bayesian model with two extra random components.  The first was the trend over time. To avoid making assumptions about the shape of the trend I just assumed that the difference between adjacent years was relatively small and random. The second random component was a difference between the trend value for a year and the ‘true’ rate for that year. On top of all of that, there’s Poisson variation.  Since the size of the two additional random components is estimated from the data, they will capture all the variation.


For each year, there is a 50% probability that the underlying rate is in the darker blue interval, and a 95% probability it’s in the light blue interval.  The trend is smoother than the data because the data has both the Poisson variation and the extra year-specific deviation. There’s more uncertainty in 2001 because we didn’t use pre-2001 data to tie it down at all, but that won’t affect the later half of the time period much.

It looks from the graph as though there was a minimum in 2013-14 and an increased rate since then.  One of the nice things about these Bayesian models is that you can easily and meaningfully ask for the probability that each year was the minimum. The probability is 54% for 2013 and 27% for 2014: there really was a minimum around then.

The probability that the rate is higher in 2017 than in 2013 is over 90%. This one isn’t just random variation, and it isn’t population increase.


Update: Peter Ellis, who has more experience with NZ official statistics and with Bayesian state-space time series models, gets qualitatively similar results

September 27, 2017

Stat Soc of Australia on Marriage Survey

The Statistical Society of Australia has put out a press release on the Australian Marriage Law Postal Survey.  Their concern, in summary, is that if this is supposed to be a survey rather than a vote, the Government has required a pretty crap survey and this isn’t good.

The SSA is concerned that, as a result, the correct interpretation of the Survey results will be missed or ignored by some community groups, who may interpret the resulting proportion for or against same-sex marriage as representative of the opinion of all Australians. This may subsequently, and erroneously, damage the reputation of the ABS and the statistical community as a whole, when it is realised that the Survey results can not be understood in these terms.


The SSA is not aware of any official statistics based purely on unadjusted respondent data alone. The ABS routinely adjusts population numbers derived from the census to allow for under and over enumeration issues via its post-enumeration survey. However, under the Government direction, there is there no scope to adjust for demographic biases or collect any information that might enable the ABS to even indicate what these biases might be.

If the aim was to understand the views of all Australians, an opinion survey would be more appropriate. High quality professionally-designed opinion surveys are routinely carried out by market research companies, the ABS, and other institutions. Surveys can be an efficient and powerful tool for canvassing a population, making use of statistical techniques to ensure that the results are proportioned according to the demographics of the population. With a proper survey design and analysis, public opinion can be reliably estimated to a specified accuracy. They can also be implemented at a fraction of the cost of the present Postal Survey. The ABS has a world-class reputation and expertise in this area.

(They’re not actually saying this is the most important deficiency of the process, just that it’s the most statistical one)

September 10, 2017

Should there be an app for that?

As you may have heard, researchers at Stanford have tried to train a neural network to predict sexual orientation from photos. Here’s the Guardian‘s story.

Artificial intelligence can accurately guess whether people are gay or straight based on photos of their faces, according to new research that suggests machines can have significantly better “gaydar” than humans.

There are a few questions this should raise.  Is it really better? Compared to whose gaydar? And WTF would think this was a good idea?

As one comment on the study says

Finally, the predictability of sexual orientation could have serious and even life-threatening implications to gay men and women and the society asa whole. In some cultures, gay men and women still suffer physical and psychological abuse at the hands of governments, neighbors, and even their own families.

No, I lied. That’s actually a quote from the research paper (here). The researchers say this sort of research is ethical and important because people don’t worry enough about their privacy. Which is a point of view.

So, you might wonder about the details.

The data came from a dating website, using self-identified gender for the photo combined with the gender they were interested in dating to work out sexual orientation. That’s going to be pretty accurate (at least if you don’t care how bisexual people are classified, which they don’t seem to). It’s also pretty obvious that the pictures weren’t put up for the purpose of AI research.

The Guardian story says

 a computer algorithm could correctly distinguish between gay and straight men 81% of the time, and 74% for women

which is true, but is a fairly misleading summary of accuracy.  Presented with a pair of faces, one of which was gay and one wasn’t, that’s how accurate the computer was.  In terms of overall error rate, you can do better that 81% or 74% just by assuming everyone is straight, and the increase in prediction accuracy in random people over the human judgment is pretty small.

More importantly, these are photos from dating profiles. You’d expect dating profile photos to give more hints about sexual orientation than, say, passport photos, or CCTV stills.  That’s what they’re for.  The researchers tried to get around this, but they were limited by the mysterious absence of large databases of non-dating photos classified by sexual orientation.

The other question you might have is about the less-accurate human ratings.  These were done using Amazon’s Mechanical Turk.  So, a typical Mechanical Turk worker, presented only with a single pair of still photos, does do a bit worse than a neural network.  That’s basically what you’d expect with the current levels of still image classification: algorithms can do better than people who aren’t particularly good and who don’t get any particular training.  But anyone who thinks that’s evidence of significantly better gaydar than humans in a meaningful sense must have pretty limited experience of social interaction cues. Or have some reason to want the accuracy of their predictions overstated.

The research paper concludes

The postprivacy world will be a much safer and hospitable place if inhabited by well-educated, tolerant people who are dedicated to equal rights.

That’s hard to argue with. It’s less clear that normalising the automated invasion of privacy and use of personal information without consent is the best way to achieve this goal.

August 16, 2017

Seatbelts save (some) lives

It’s pretty standard that headlines (and often politicians) overstate the likely effect of road safety precautions — eg, the claim that lowering the blood alcohol limit would prevent all deaths in which drivers were over the limit, which it obviously won’t.

This is from the Herald’s front page.


On the left, the number 94 is the number of people who died in crashes while not wearing seatbelts. On the right (and in the story), the we find that this is about a third of all the deaths. It’s quite possible to wear a seatbelt and still die in a crash.

Looking for research, I found this summary from a UK organisation that does independent reviews on road safety issues. They say seatbelts in front seats prevent about 45% of fatal injuries in front seat passengers. For rear-seat passengers the data are less clear.

So, last year probably about 45 people died on our roads because they weren’t wearing seatbelts. That’s a big enough number to worry about: we don’t need to double it.

August 8, 2017

Breast cancer alcohol twitter

Twitter is not an ideal format for science communication, because of the 140-character limitations: it’s easy to inadvertently leave something out.  Here’s one I was referred to this morning (link, so you can see if it is retracted)


Usually I’d think it was a bit unfair to go after this sort of thing on StatsChat.  The reason I’m making an exception here is the hashtag: this is a political statement by a person of mana.

There’s one gross inaccuracy (which I missed on first reading) and one sub-optimal presentation of risk.  To start off, though, there’s nothing wrong with the underlying number: unlike many of its ilk it isn’t an extrapolation from high levels of drinking and it isn’t obviously confounded, because moderate drinkers are otherwise in better health than non-drinkers on average.  The underlying number is that for each standard drink per day, the rate of breast cancer increases by a factor of about 1.1.

The gross inaccuracy is the lack of a per day qualifier, making the statement inaccurate by a factor of several thousand.  An average of one standard drink per day is not a huge amount, but it’s probably more than the average for women in NZ (given the  2007/08 New Zealand Alcohol and Drug Use Survey finding that about half of women drank alcohol less than weekly).

Relative rates are what the research produces, but people tend to think in absolute risks, despite the explicit “relative risk” in the tweet.  The rate of breast cancer in middle age (what the data are about) is fairly low. The lifetime risk for a 45 year old woman (if you don’t die of anything else before age 90) is about 12%.  A 10% increase in that is 13.2%, not 22%. It would take about 7 drinks per day to roughly double your risk (1.17=1.94)  — and you’d have other problems as well as breast cancer risk.


April 14, 2017

Cyclone uncertainty

Cyclone Cook ended up a bit east of where it was expected, and so Auckland had very little damage.  That’s obviously a good thing for Auckland, but it would be even better if we’d had no actual cyclone and no forecast cyclone.  Whether the precautions Auckland took were necessary (at the time) or a waste  depends on how much uncertainty there was at the time, which is something we didn’t get a good idea of.

In the southeastern USA, where they get a lot of tropical storms, there’s more need for forecasters to communicate uncertainty and also more opportunity for the public to get to understand what the forecasters mean.  There’s scientific research into getting better forecasts, but also into explaining them better. Here’s a good article at Scientific American

Here’s an example (research page):


On the left is the ‘cone’ graphic currently used by the National Hurricane Center. The idea is that the current forecast puts the eye of the hurricane on the black line, but it could reasonably be anywhere in the cone. It’s like the little blue GPS uncertainty circles for maps on your phone — except that it also could give the impression of the storm growing in size.  On the right is a new proposal, where the blue lines show a random sample of possible hurricane tracks taking the uncertainty into account — but not giving any idea of the area of damage around each track.

There’s also uncertainty in the predicted rainfall.  NIWA gave us maps of the current best-guess predictions, but no idea of uncertainty.  The US National Weather Service has a new experimental idea: instead of giving maps of the best-guess amount, give maps of the lower and upper estimates, titled: “Expect at least this much” and “Potential for this much”.

In New Zealand, uncertainty in rainfall amount would be a good place to start, since it’s relevant a lot more often than cyclone tracks.

Update: I’m told that the Met Service do produce cyclone track forecasts with uncertainty, so we need to get better at using them.  It’s still likely more useful to experiment with rainfall uncertainty displays, since we get heavy rain a lot more often than cyclones.