Posts filed under Probability (57)

February 27, 2015

Quake prediction: how good does it need to be?

From a detailed story in the ChCh Press, (via Eric Crampton) about various earthquake-prediction approaches

About 40 minutes before the quake began, the TEC in the ionosphere rose by about 8 per cent above expected levels. Somewhat perplexed, he looked back at the trend for other recent giant quakes, including the February 2010 magnitude 8.8 event in Chile and the December 2004 magnitude 9.1 quake in Sumatra. He found the same increase about the same time before the quakes occurred.

Heki says there has been considerable academic debate both supporting and opposing his research.

To have 40 minutes warning of a massive quake would be very useful indeed and could help save many lives. “So, why 40 minutes?” he says. “I just don’t know.”

He says if the link were to be proved more firmly in the future it could be a useful warning tool. However, there are drawbacks in that the correlation only appears to exist for the largest earthquakes, whereas big quakes of less than magnitude 8.0 are far more frequent and still cause death and devastation. Geomagnetic storms can also render the system impotent, with fluctuations in the total electron count masking any pre-quake signal.

Let’s suppose that with more research everything works out, and there is a rise in this TEC before all very large quakes. How much would this help in New Zealand? The obvious place is Wellington. A quake over 8.0 magnitude has been observed in the area in 1855, when it triggered a tsunami. A repeat would also shatter many of the earthquake-prone buildings. A 40-minute warning could save many lives. It appears that TEC shouldn’t be that expensive to measure: it’s based on observing the time delays in GPS satellite transmissions as they pass through the ionosphere, so it mostly needs a very accurate clock (in fact, NASA publishes TEC maps every five minutes). Also, it looks like it would be very hard to hack the ionosphere to force the alarm to go off. The real problem is accuracy.

The system will have false positives and false negatives. False negatives (missing a quake) aren’t too bad, since that’s where you are without the system. False positives are more of a problem. They come in two forms: when the alarm goes off completely in the absence of a quake, and when there is a quake but no tsunami or catastrophic damage.

Complete false predictions would need to be very rare. If you tell everyone to run for the hills and it turns out to be sunspots or the wrong kind of snow, they will not be happy: the cost in lost work (and theft?) would be substantial, and there would probably be injuries.  Partial false predictions, where there was a large quake but it was too far away or in the wrong direction to cause a tsunami, would be just as expensive but probably wouldn’t cause as much ill-feeling or skepticism about future warnings.

Now for the disappointment. The story says “there has been considerable academic debate”. There has. For example, in a (paywalled) paper from 2013 looking at the Japanese quake that prompted Heki’s idea

A detailed analysis of the ionospheric variability in the 3 days before the earthquake is then undertaken, where a simultaneous increase in foF2 and the Es layer peak plasma frequency, foEs, relative to the 30-day median was observed within 1 h before the earthquake. A statistical search for similar simultaneous foF2 and foEs increases in 6 years of data revealed that this feature has been observed on many other occasions without related seismic activity. Therefore, it is concluded that one cannot confidently use this type of ionospheric perturbation to predict an impending earthquake.

In translation: you need to look just right to see this anomaly, and there are often anomalies like this one without quakes. Over four years they saw 24 anomalies, only one shortly before a quake.  Six complete false positives per year is obviously too many.  Suppose future research could refine what the signal looks like and reduce the false positives by a factor of ten: that’s still evacuation alarms with no quake more than once every two years. I’m pretty sure that’s still too many.


December 19, 2014

Moving the goalposts

A century ago there was no useful treatment for cancer, nothing that would postpone death. A century ago, there wasn’t any point in screening for cancer; you might as well just wait for it to turn up. A century ago, it would still have been true that early diagnosis would improve 1-year survival.

Cancer survival is defined as time from diagnosis to death. That’s not a very good definition, but there isn’t a better one available since the start of a tumour is not observable and not even very well defined.  If you diagnose someone earlier, and do nothing else helpful, the time from diagnosis to death will increase. In particular, 1-year survival is likely to increase a lot, because you don’t have to move diagnosis much earlier to get over the 1-year threshold.  Epidemiologists call this “lead-time bias.”

The Herald has a story today on cancer survival in NZ and Australia that completely misses this issue. It’s based on an article in the New Zealand Medical Journal that also doesn’t discuss the issue, though the editorial commentary in the journal does, and also digs deeper:

If the average delay from presentation to diagnosis was 4 weeks longer in New Zealand due to delay in presentation by the patient, experimentation with alternative therapy, or difficulty in diagnosis by the doctor, the 1-year relative survival would be about 7% poorer compared to Australia. The range of delay among patients is even more important and if even relatively few patients have considerable delay this can greatly influence overall relative survival due to a lower chance of cure. Conversely, where treatment is seldom effective, 1-year survival may be affected by delay but it may have little influence on long-term survival differences. This was apparent for trans-Tasman differences in relative survival for cancers of the pancreas, brain and stomach.  However, relative survival for non-Hodgkin lymphoma was uniformly poorer in New Zealand suggesting features other than delay in diagnosis are important.

That is, part of the difference between NZ and Australian cancer survival rates is likely to be lead-time bias — Australians find out they have incurable cancer earlier than New Zealanders do — but part of it looks to be real advantages in treatment in Australia.

Digging deeper like this is important. You can always increase time from diagnosis to death by earlier diagnosis. That isn’t as useful as increasing it by better treatment.

[update: the commentary seems to have become available only to subscribers while I was writing this]

December 7, 2014

Bot or Not?

Turing had the Imitation Game, Phillip K. Dick had the Voight-Kampff Test, and spammers gave us the CAPTCHA.  The Truthy project at Indiana University has BotOrNot, which is supposed to distinguish real people on Twitter from automated accounts, ‘bots’, using analysis of their language, their social networks, and their retweeting behaviour. BotOrNot seems to sort of work, but not as well as you might expect.

@NZquake, a very obvious bot that tweets earthquake information from GeoNet, is rated at an 18% chance of being a bot.  Siouxsie Wiles, for whom there is pretty strong evidence of existence as a real person, has a 29% chance of being a bot.  I’ve got a 37% chance, the same as @fly_papers, which is a bot that tweets the titles of research papers about fruit flies, and slightly higher than @statschat, the bot that tweets StatsChat post links,  or @redscarebot, which replies to tweets that include ‘communist’ or ‘socialist’. Other people at a similar probability include Winston Peters, Metiria Turei, and Nicola Gaston (President of the NZ Association of Scientists).

PicPedant, the twitter account of the tireless Paulo Ordoveza, who debunks fake photos and provides origins for uncredited ones, rates at 44% bot probability, but obviously isn’t.  Ben Atkinson, a Canadian economist and StatsChat reader, has a 51% probability, and our only Prime Minister (or his twitterwallah), @johnkeypm, has a 60% probability.


November 16, 2014

John Oliver on the lottery

When statisticians get quoted on the lottery it’s pretty boring, even if we can stop ourselves mentioning the Optional Stopping Theorem.

This week, though, John Oliver took on the US state lotteries: “..,more than Americans spent on movie tickets, music, porn, the NFL, Major League Baseball, and video games combined. “

(you might also look at David Fisher’s Herald stories on the lottery)

September 1, 2014

Sometimes there isn’t a (useful) probability

In this week’s Slate Money podcast (starting at about 2:50), there’s an example of a probability puzzle that mathematically trained people tend to get wrong.  In summary, the story is

You’re at a theatre watching a magician. The magician hands a pack of cards to one member of the audience  and asks him to check that it is an ordinary pack, and to shuffle it. He asks another member of the audience to name a card. She says “Ace of Hearts”.  The magician covers his eyes, reaches out to the pack of cards, fumbles around a bit, and pulls out a card. What’s the probability that it is the Ace of Hearts?

It’s very tempting to say 1 in 52, because the framing of the puzzle prompts you to think in terms of equal-probability sampling.  Of course, as Felix Salmon points out, this is the only definitively wrong answer. The guy’s a magician. Why would he be doing this if the probability was going to be 1 in 52?

With an ordinary well-shuffled pack of cards and random selection we do know the probability: if you like the frequency interpretation of probability it’s an unknown number quite close to 1 in 52, if you like the subjective interpretation it should be a distribution of numbers quite close to 1 in 52.

With a magic trick we’d expect the probability (in the frequency sense) to be close to either zero or one, depending on the trick, but we don’t know.  Under the subjective interpretation of probability then you do know what the probability is for you, but you’ve got no real reason to expect it to be similar for other people.


August 16, 2014

Lotto and concrete implementation

There are lots of Lotto strategies based on trying to find patterns in numbers.

Lotto New Zealand televises its draws, and you can find some of them on YouTube.

If you have a strategy for numerological patterns in the Lotto draws, it might be a good idea to watch a few Lotto draws and ask yourself how the machine knows to follow your pattern.

If you’re just doing it for entertainment, go in good health.

July 13, 2014

100% accurate medical testing

The Wireless has a story about a fatal disease where there’s an essentially 100% accurate test available.

Alice Harbourne has a 50% chance of Huntington’s Disease. If she gets tested, she will have either a 0% or 100% chance, and despite some recent progress on the mechanism of the disease, there is no treatment.

May 28, 2014

Monty Hall problem and data

Tonight’s Mythbusters episode on Prime looked at the Monty Hall/Pick-a-Door problem, using experimental data as well as theory.

For those of you who haven’t been exposed to it, the idea is as follows:

There are three doors. Behind one is a prize. The contestant picks a door. The host then always opens one of the other doors, which he knows does not contain the prize. The contestant is given an opportunity to change their choice to the other unopened door. Should they take this choice?

The stipulation that the host always makes the offer and always opens an empty door is critical to the analysis. It was present in the original game-show problem and was explicit in Mythbusters.

A probabilistic analysis is straightforward. The chance that the prize is behind the originally-chosen door is 1/3.  It has to be somewhere. So the chance of it being behind the remaining door is 2/3.  You can do this more carefully by enumerating all possibilities, and you get the same answer.

The conclusion is surprising. Almost everyone, famously including both Marilyn vos Savant, and Paul Erdős, gets it wrong. Less impressively, so did I as an undergraduate, until I was convinced by writing a computer simulation (I didn’t need to run it; writing it was enough).  The compelling error is probably an example of the endowment effect.

All of the Mythbusters live subjects chose to keep their original choice,ruining the comparison.  The Mythbusters then ran a moderately large series of random choices where one person always switched and the other did not.  They got 38 wins out of 49 for switching and 11 for not switching. That’s a bit more extreme than you’d expect, but not unreasonably so. It gives a 95% confidence interval (analogous to the polling margin of error)  from 12% to 37%.

The Mythbusters are sometimes criticised for insufficient replication, but in this case 49 is plenty to distinguish the ‘obvious’ 50% success rate from the true 33%. It was a very nicely designed experiment.

‘Balanced’ Lotto reporting

From ChCh Press

Are you feeling lucky?

The number drawn most often in Saturday night’s Lotto is one.

The second is seven, the third is lucky 13, followed by 21, 38 and 12.

And if you are selecting a Powerball for Saturday’s draw, the record suggests two is a much better pick than seven.

The numbers are from Lotto Draw Frequency data provided by Lotto NZ for the 1406 Lottery family draws held to last Wednesday.

The Big Wednesday data shows the luckiest numbers are 30, 12, 20, 31, 28 and 16. And heads is drawn more often (232) than tails (216), based on 448 draws to last week.

In theory, selecting the numbers drawn most often would result in more prizes and avoiding the numbers drawn least would result in fewer losses. The record speaks for itself.

Of course this is utter bollocks. The record is entirely consistent with the draw being completely unpredictable, as you would also expect it to be if you’ve ever watched a Lotto draw on television and seen how they work.

This story is better than the ones we used to see, because it does go on and quote people who know what they are talking about, who point out that predicting this way isn’t going to work, and then goes on to say that many people must understand this because they do just take random picks.  On the other hand, that’s the sort of journalistic balance that gets caricatured as “Opinions differ on shape of Earth.”

In world historical terms it doesn’t really matter how these lottery stories are written, but they are missing a relatively a simple opportunity to demonstrate that a paper understands the difference between fact and fancy and thinks it matters.

$5 million followup

It’s gettable, but it’s hard – that’s why it’s five million dollars.”

“The chances of picking every game correctly were astronomical”

  • NBR (paywalled)

“crystal ball gazing of such magnitude that University of Auckland statistics expert associate professor David Scott doesn’t think either will have to pay out.”

“quite hard to win  “

“someone like you [non-expert] has as much chance  because [an expert] wouldn’t pick an upset”

“An expert is less likely to win it than someone who just has a shot at it.”

“It’s only 64 games and, as I say, there’s only 20 tricky ones I reckon”


Yeah, nah.