Posts filed under Probability (56)

December 19, 2014

Moving the goalposts

A century ago there was no useful treatment for cancer, nothing that would postpone death. A century ago, there wasn’t any point in screening for cancer; you might as well just wait for it to turn up. A century ago, it would still have been true that early diagnosis would improve 1-year survival.

Cancer survival is defined as time from diagnosis to death. That’s not a very good definition, but there isn’t a better one available since the start of a tumour is not observable and not even very well defined.  If you diagnose someone earlier, and do nothing else helpful, the time from diagnosis to death will increase. In particular, 1-year survival is likely to increase a lot, because you don’t have to move diagnosis much earlier to get over the 1-year threshold.  Epidemiologists call this “lead-time bias.”

The Herald has a story today on cancer survival in NZ and Australia that completely misses this issue. It’s based on an article in the New Zealand Medical Journal that also doesn’t discuss the issue, though the editorial commentary in the journal does, and also digs deeper:

If the average delay from presentation to diagnosis was 4 weeks longer in New Zealand due to delay in presentation by the patient, experimentation with alternative therapy, or difficulty in diagnosis by the doctor, the 1-year relative survival would be about 7% poorer compared to Australia. The range of delay among patients is even more important and if even relatively few patients have considerable delay this can greatly influence overall relative survival due to a lower chance of cure. Conversely, where treatment is seldom effective, 1-year survival may be affected by delay but it may have little influence on long-term survival differences. This was apparent for trans-Tasman differences in relative survival for cancers of the pancreas, brain and stomach.  However, relative survival for non-Hodgkin lymphoma was uniformly poorer in New Zealand suggesting features other than delay in diagnosis are important.

That is, part of the difference between NZ and Australian cancer survival rates is likely to be lead-time bias — Australians find out they have incurable cancer earlier than New Zealanders do — but part of it looks to be real advantages in treatment in Australia.

Digging deeper like this is important. You can always increase time from diagnosis to death by earlier diagnosis. That isn’t as useful as increasing it by better treatment.

[update: the commentary seems to have become available only to subscribers while I was writing this]

December 7, 2014

Bot or Not?

Turing had the Imitation Game, Phillip K. Dick had the Voight-Kampff Test, and spammers gave us the CAPTCHA.  The Truthy project at Indiana University has BotOrNot, which is supposed to distinguish real people on Twitter from automated accounts, ‘bots’, using analysis of their language, their social networks, and their retweeting behaviour. BotOrNot seems to sort of work, but not as well as you might expect.

@NZquake, a very obvious bot that tweets earthquake information from GeoNet, is rated at an 18% chance of being a bot.  Siouxsie Wiles, for whom there is pretty strong evidence of existence as a real person, has a 29% chance of being a bot.  I’ve got a 37% chance, the same as @fly_papers, which is a bot that tweets the titles of research papers about fruit flies, and slightly higher than @statschat, the bot that tweets StatsChat post links,  or @redscarebot, which replies to tweets that include ‘communist’ or ‘socialist’. Other people at a similar probability include Winston Peters, Metiria Turei, and Nicola Gaston (President of the NZ Association of Scientists).

PicPedant, the twitter account of the tireless Paulo Ordoveza, who debunks fake photos and provides origins for uncredited ones, rates at 44% bot probability, but obviously isn’t.  Ben Atkinson, a Canadian economist and StatsChat reader, has a 51% probability, and our only Prime Minister (or his twitterwallah), @johnkeypm, has a 60% probability.


November 16, 2014

John Oliver on the lottery

When statisticians get quoted on the lottery it’s pretty boring, even if we can stop ourselves mentioning the Optional Stopping Theorem.

This week, though, John Oliver took on the US state lotteries: “..,more than Americans spent on movie tickets, music, porn, the NFL, Major League Baseball, and video games combined. “

(you might also look at David Fisher’s Herald stories on the lottery)

September 1, 2014

Sometimes there isn’t a (useful) probability

In this week’s Slate Money podcast (starting at about 2:50), there’s an example of a probability puzzle that mathematically trained people tend to get wrong.  In summary, the story is

You’re at a theatre watching a magician. The magician hands a pack of cards to one member of the audience  and asks him to check that it is an ordinary pack, and to shuffle it. He asks another member of the audience to name a card. She says “Ace of Hearts”.  The magician covers his eyes, reaches out to the pack of cards, fumbles around a bit, and pulls out a card. What’s the probability that it is the Ace of Hearts?

It’s very tempting to say 1 in 52, because the framing of the puzzle prompts you to think in terms of equal-probability sampling.  Of course, as Felix Salmon points out, this is the only definitively wrong answer. The guy’s a magician. Why would he be doing this if the probability was going to be 1 in 52?

With an ordinary well-shuffled pack of cards and random selection we do know the probability: if you like the frequency interpretation of probability it’s an unknown number quite close to 1 in 52, if you like the subjective interpretation it should be a distribution of numbers quite close to 1 in 52.

With a magic trick we’d expect the probability (in the frequency sense) to be close to either zero or one, depending on the trick, but we don’t know.  Under the subjective interpretation of probability then you do know what the probability is for you, but you’ve got no real reason to expect it to be similar for other people.


August 16, 2014

Lotto and concrete implementation

There are lots of Lotto strategies based on trying to find patterns in numbers.

Lotto New Zealand televises its draws, and you can find some of them on YouTube.

If you have a strategy for numerological patterns in the Lotto draws, it might be a good idea to watch a few Lotto draws and ask yourself how the machine knows to follow your pattern.

If you’re just doing it for entertainment, go in good health.

July 13, 2014

100% accurate medical testing

The Wireless has a story about a fatal disease where there’s an essentially 100% accurate test available.

Alice Harbourne has a 50% chance of Huntington’s Disease. If she gets tested, she will have either a 0% or 100% chance, and despite some recent progress on the mechanism of the disease, there is no treatment.

May 28, 2014

Monty Hall problem and data

Tonight’s Mythbusters episode on Prime looked at the Monty Hall/Pick-a-Door problem, using experimental data as well as theory.

For those of you who haven’t been exposed to it, the idea is as follows:

There are three doors. Behind one is a prize. The contestant picks a door. The host then always opens one of the other doors, which he knows does not contain the prize. The contestant is given an opportunity to change their choice to the other unopened door. Should they take this choice?

The stipulation that the host always makes the offer and always opens an empty door is critical to the analysis. It was present in the original game-show problem and was explicit in Mythbusters.

A probabilistic analysis is straightforward. The chance that the prize is behind the originally-chosen door is 1/3.  It has to be somewhere. So the chance of it being behind the remaining door is 2/3.  You can do this more carefully by enumerating all possibilities, and you get the same answer.

The conclusion is surprising. Almost everyone, famously including both Marilyn vos Savant, and Paul Erdős, gets it wrong. Less impressively, so did I as an undergraduate, until I was convinced by writing a computer simulation (I didn’t need to run it; writing it was enough).  The compelling error is probably an example of the endowment effect.

All of the Mythbusters live subjects chose to keep their original choice,ruining the comparison.  The Mythbusters then ran a moderately large series of random choices where one person always switched and the other did not.  They got 38 wins out of 49 for switching and 11 for not switching. That’s a bit more extreme than you’d expect, but not unreasonably so. It gives a 95% confidence interval (analogous to the polling margin of error)  from 12% to 37%.

The Mythbusters are sometimes criticised for insufficient replication, but in this case 49 is plenty to distinguish the ‘obvious’ 50% success rate from the true 33%. It was a very nicely designed experiment.

‘Balanced’ Lotto reporting

From ChCh Press

Are you feeling lucky?

The number drawn most often in Saturday night’s Lotto is one.

The second is seven, the third is lucky 13, followed by 21, 38 and 12.

And if you are selecting a Powerball for Saturday’s draw, the record suggests two is a much better pick than seven.

The numbers are from Lotto Draw Frequency data provided by Lotto NZ for the 1406 Lottery family draws held to last Wednesday.

The Big Wednesday data shows the luckiest numbers are 30, 12, 20, 31, 28 and 16. And heads is drawn more often (232) than tails (216), based on 448 draws to last week.

In theory, selecting the numbers drawn most often would result in more prizes and avoiding the numbers drawn least would result in fewer losses. The record speaks for itself.

Of course this is utter bollocks. The record is entirely consistent with the draw being completely unpredictable, as you would also expect it to be if you’ve ever watched a Lotto draw on television and seen how they work.

This story is better than the ones we used to see, because it does go on and quote people who know what they are talking about, who point out that predicting this way isn’t going to work, and then goes on to say that many people must understand this because they do just take random picks.  On the other hand, that’s the sort of journalistic balance that gets caricatured as “Opinions differ on shape of Earth.”

In world historical terms it doesn’t really matter how these lottery stories are written, but they are missing a relatively a simple opportunity to demonstrate that a paper understands the difference between fact and fancy and thinks it matters.

$5 million followup

It’s gettable, but it’s hard – that’s why it’s five million dollars.”

“The chances of picking every game correctly were astronomical”

  • NBR (paywalled)

“crystal ball gazing of such magnitude that University of Auckland statistics expert associate professor David Scott doesn’t think either will have to pay out.”

“quite hard to win  “

“someone like you [non-expert] has as much chance  because [an expert] wouldn’t pick an upset”

“An expert is less likely to win it than someone who just has a shot at it.”

“It’s only 64 games and, as I say, there’s only 20 tricky ones I reckon”


Yeah, nah.


May 27, 2014

What’s a shot at $5million worth?

In March, the US billionaire Warren Buffett offered a billion dollar prize to anyone who could predict all 63 ‘March Madness’ college basketball games. Unsurprisingly, many tried but no-one succeeded.

The New Zealand TAB are offering NZ$5 million to anyone who can predict all 64 games in the 2014 World Cup (soccer, in Rio de Janeiro (probably)). It’s free to enter. What’s it worth to an entrant, and what is the expected cost to the TAB?

If the pool games had equal probability of win/loss/draw and the finals series games were 50:50, which is the worst case for punters (well, almost), the chance of winning would be 1 in 5,227,573,613,485,916,806,405,226,496. That’s presumably also your chance of winning if you use random picks, which the TAB helpfully provides. At those odds, the value of an entry is approximately 1 ten-thousand-million-billionth of a cent (10-19 cents), which is probably less than the cost to you of

By entering this Competition, an Entrant agrees to receive marketing and promotional material from the Promoter (including electronic material).

Of course, you could do better by picking carefully. Suppose that a dozen of the pool round games were completely predictable walkovers, the remaining 34 you could get  70% right, and you could get 50% for final games. That would be doing pretty well.  In that case the value of entering is hugely better — it’s almost a twentieth of a cent.   If you can get 70% accuracy for the final games as well, the value of entering would be nearly ten cents.

But if you can predict a dozen of the games with perfect accuracy and get 70% right for the rest, you’d be much better off just betting.  I looked at an online betting site, and the smallest payoffs I could find in the pool games were 2/9 for Brazil to beat Cameroon and 2/11 for Argentina to beat Iran.  If you have a dozen pool matches where you’re 100% certain, you can make rather more than ten cents even on a minimum bet.

So, what’s this all costing the TAB? It’s almost certainly less than the cost of sending a text message to every entrant, which is part of the process. There are maybe three million people eligible to enter, and a maximum of one entry per person. Given that duplicate winners will split the prize, I can’t really believe in an expected prize cost to TAB of more than 0.01 cents per entrant, which works out at about $1200 if every adult Kiwi enters. They should be able to insure against a win and pay not much more than this. The cost of advertising campaign will dwarf the prize costs.

The real incentive to enter is that there will be five $1000 consolation prizes for the best entries when no-one wins the big prize. What matters in figuring the odds for this  is not the total number of total entries (which might be a million), but the number of seriously competitive entries. That could be as low as a few tens of thousands, giving an expected value of entry as high as twenty cents if you’re prepared to put some effort into research.


[Update: It’s actually slightly worse than this, though not importantly so. You may need to predict numbers of goals scored in order to break ties when setting up the knockout rounds.]