Posts filed under Simulation (8)

March 11, 2016

Getting to see opinion poll uncertainty

Rock’n Poll has a lovely guide to sampling uncertainty in election polls, guiding you step by step to see how approximate the results would be in the best of all possible worlds. Highly recommended.

Of course, we’re not in the best of all possible worlds, and in addition to pure sampling uncertainty we have ‘house effects’ due to different methodology between polling firms and ‘design effects’ due to the way the surveys compensate for non-response.  And on top of that there are problems with the hypothetical question ‘if an election were held tomorrow’, and probably issues with people not wanting to be honest.

Even so, the basic sampling uncertainty gives a good guide to the error in opinion polls, and anything that makes it easier to understand is worth having.

poll-land

(via Harkanwal Singh)

May 27, 2015

We like to drive in convoys

This isn’t precisely statistics, more applied probability, but that still counts.  First, an interactive from Lewis Lehe, a PhD student in Transport Engineering at UC Berkeley. It shows why buses always clump together.

busclump

You might also like his simulations of bottlenecks/gridlock and of congestion waves in traffic (via @flowingdata)

 

And second, a video from the New York subway system. When a train gets delayed, it holds up all the trains behind it. More surprisingly, the system is set up to delay the train in front of it, to keep the maximum gap between trains smaller.

July 27, 2014

Air flight crash risk

David Spiegelhalter, Professor of the Public Understanding of Risk at Cambridge University, has looked at the chance of getting three fatal plane crashes in the same 8-day period, based on the average rate of fatal crashes over the past ten years.  He finds that if you look at all 8-day periods in ten years, three crashes is actually the most likely way for the worst week to turn out.

He does this with maths. It’s easier to do it by computer simulation: arrange the 91 crashes randomly among the 3650 days and count up the worst week. When I do this 10,000 times (which takes seconds). I get

crashes

 

The recent crashes were separate tragedies with independent causes — two different types of accident and one deliberate shooting — they aren’t related like, say, the fires in the first Boeing Dreamliners were. There’s no reason for the recent events should make you more worried about flying.

May 28, 2014

Monty Hall problem and data

Tonight’s Mythbusters episode on Prime looked at the Monty Hall/Pick-a-Door problem, using experimental data as well as theory.

For those of you who haven’t been exposed to it, the idea is as follows:

There are three doors. Behind one is a prize. The contestant picks a door. The host then always opens one of the other doors, which he knows does not contain the prize. The contestant is given an opportunity to change their choice to the other unopened door. Should they take this choice?

The stipulation that the host always makes the offer and always opens an empty door is critical to the analysis. It was present in the original game-show problem and was explicit in Mythbusters.

A probabilistic analysis is straightforward. The chance that the prize is behind the originally-chosen door is 1/3.  It has to be somewhere. So the chance of it being behind the remaining door is 2/3.  You can do this more carefully by enumerating all possibilities, and you get the same answer.

The conclusion is surprising. Almost everyone, famously including both Marilyn vos Savant, and Paul Erdős, gets it wrong. Less impressively, so did I as an undergraduate, until I was convinced by writing a computer simulation (I didn’t need to run it; writing it was enough).  The compelling error is probably an example of the endowment effect.

All of the Mythbusters live subjects chose to keep their original choice,ruining the comparison.  The Mythbusters then ran a moderately large series of random choices where one person always switched and the other did not.  They got 38 wins out of 49 for switching and 11 for not switching. That’s a bit more extreme than you’d expect, but not unreasonably so. It gives a 95% confidence interval (analogous to the polling margin of error)  from 12% to 37%.

The Mythbusters are sometimes criticised for insufficient replication, but in this case 49 is plenty to distinguish the ‘obvious’ 50% success rate from the true 33%. It was a very nicely designed experiment.

March 9, 2013

The HarleMCMC Shake

I’m sure that many of our readers are familiar with the latest internet trend, the Harlem Shake. Recently, a statistical version appeared that demonstrates some properties of popular Markov Chain Monte Carlo (MCMC) algorithms. MCMC methods are computer algorithms that are used to draw random samples from probability distributions that might have complicated shapes and live in multi-dimensional spaces.

MCMC was originally invented by physicists (justifying my existence in a statistics department) and is particularly useful for doing a kind of statistics called “Bayesian Inference” where probabilities are used to describe degrees of certainty and uncertainty, rather than frequencies of occurrence (insert plug for STATS331, taught by me, here).

Anyway, onto the HarleMCMC shake. It begins by showing the Metropolis-Hastings method, which is very useful and quite simple to do, but can (in some problems) be very slow, which corresponds to the subdued mood at the beginning of a Harlem Shake. As the song switches into the intense phase, the method is replaced by the “Hamiltonian MCMC” method which can be much more efficient. The motion is much more intense and efficient after that!

Here is the original video by PhD students Tamara Broderick (UC Berkeley) and David Duvenaud (U Cambridge):

http://www.youtube.com/watch?v=Vv3f0QNWvWQ

Naturally, this inspired those of us who work on our own MCMC algorithms to create response videos showing that Hamiltonian MCMC isn’t the only efficient method! Within one day, NYU PhD student Daniel Foreman-Mackey had his own version that uses his emcee sampler. I also had a go using my DNest sampler, but it has not been set to music yet.

So, next time you read or hear about a great new MCMC method, you should ask the authors how well it performs on the “Harlem Shake Distribution”. Oh and thanks to Auckland PhD student Jared Tobin for linking me to the original video!

November 12, 2012

The rise of the machines

The Herald has an interesting story on the improvements in prediction displayed last week: predicting the US election results , but more importantly, predicting the path of Hurricane Sandy.  They say

In just two weeks, computer models have displayed an impressive prediction prowess.

 The math experts came out on top thanks to better and more accessible data and rapidly increasing computer power.

It’s true that increasing computer power has been important in both these examples, but it’s been important in two quite different ways.  Weather prediction, use the most powerful computers that the metereologists can afford, and they are still nowhere near the point of diminishing returns.  There aren’t many problems like this.

Election forecasting, on the other hand, uses simple models that could even be run on hand calculators, if you were sufficiently obsessive and knowledgeable about computational statistics and numerical approximations.  The importance of increases in computer power is that anyone in the developed world has access to computing resources that make the actual calculations trivial.  Ubiquitous computing, rather than supercomputers, are what has revolutionised statistics.  If you combine the cheapest second-hand computer you can find with free software downloaded from the Internet, you have the sort of modelling resources that the top academic and industry research groups were just starting to get twenty years ago.

Cheap computing means that we can tackle problems that wouldn’t have been worthwhile before.  For example, in a post about the lottery, I wanted to calculate the probability that distributing 104 wins over 12 months would give 15 or more wins in one of the months.  I could probably work this out analytically, at least to a reasonable approximation, but it would be too slow and too boring to do for a blog post.  In less than a minute I could write code to estimate the probabilities by simulation, and run 10,000 samples.  If more accuracy was needed I could easily extend this to millions of samples.  That particular calculation doesn’t really matter to anyone, but the same principles apply to real research problems.

August 26, 2011

Visualizing uncertainty

Hurricane Irene is heading for somewhere on the US East Coast, though it’s not clear where.  Weather Underground has a nice range of displays indicating the uncertainty in predictions of both location and storm intensity.

July 22, 2011

NZ 2011 Referendum Voting System Simulator

New Zealanders will vote in a referendum in November asking whether they want to change the current voting system used for deciding the makeup of Parliament.

Dr Geoffrey Pritchard and Dr Mark C. Wilson, members of the Centre for Mathematical Social Science at the University of Auckland, have created a simulator intended to voters to compare the 5 proposed electoral systems in a quantitative way, by allowing them to compute quickly, for a given polling scenario, the party seat distribution in Parliament under each system.

You can try it out by going to http://cmss.auckland.ac.nz/2011-referendum-simulator/ and they would appreciate any feedback on how to improve it.

It is written in Javascript and the source code is publicly available. The assumptions made are detailed in the FAQ.

They hope that this will allow a better understanding of the consequences of adopting any of these systems, and complement the qualitative information given by the Electoral Commission.