Posts filed under Sports statistics (12)

May 6, 2013

Frontiers in pie-charts

The `pie’ in pie-chart is a metaphor — the charts are divided into slices in the way that certain kinds of pie are, and the slices add up to the whole pie.

Or, at least, that’s usually the idea.  One of StatChats’s foreign correspondents sent in this effort from the BBC

piefail

 

This kind of pie doesn’t get divided into slices — it would just fall apart.  And in this graph the slices don’t add up to anything meaningful — for those of you not up on the British sports scene: there are actually more than ten football clubs.  In the graphic we have Premier League teams such as Arsenal and Manchester City mixed in with  Albion Rovers and Brechin City from the Scottish 2nd division.

The pie price pie exemplifies a general rule, if you have to write all the data values on your graph, the graph isn’t doing its share of the work.

March 20, 2013

The revolution in basketball analytics

From the Grantland blog at ESPN

New technology and statistics will change the way we understand basketball, even if they also create friction between coaches and front-office personnel trying to integrate new concepts into on-court play. The most important innovation in the NBA in recent years is a camera-tracking system, known as SportVU, that records every movement on the floor and spits it back at its front-office keepers as a byzantine series of geometric coordinates. Fifteen NBA teams have purchased the cameras, which cost about $100,000 per year, from STATS LLC; turning those X-Y coordinates into useful data is the main challenge those teams face.

Some teams are just starting with the cameras, while others that bought them right away are far ahead and asking very interesting questions. Those 15 teams have been very secretive in revealing how they’ve used the data, but one team that has made serious progress — the Toronto Raptors — opened up the black box in a series of meetings this month with Grantland.

The Raptors do have a current record of 26-41, so there seem to be limits to what the analytics can achieve…

October 12, 2012

There’s nothing like a good joke.

Q:  Have you started eating more chocolate yet?

A: I assume this is about the New England Journal paper.

Q: Of course.  You could increase your chance of a Nobel Prize

A: There are several excellent reasons why I am not going to get a Nobel Prize, but in any case I don’t have to eat the chocolate: anyone in Australia or New Zealand would do just as well. You can have my share.

Q:  What do you mean?

A: The article didn’t look at chocolate consumption by Nobel Prize winners, it looked at chocolate consumption in countries named in the official biographical information about Nobel Prize winners.  This typically includes where they were born and where they worked when they did the prize-winning research, and in some cases yet another country where they currently work.

Q: Does the article admit this?

A: In part.  The author admits that this is just per-capita data, not individual data.  Because he just got the Nobel Prize data from Wikipedia, rather than from the primary source, he doesn’t seem to have noticed that multiple countries per recipient are counted.

Q: Would the New England Journal of Medicine usually accept Wikipedia as a data source when the primary data are easily available?

A: No.

Q: What about the chocolate data?

A: The author doesn’t say whether the chocolate consumption measures weight as consumed (ie, including milk and sugar) or weight of actual chocolate content. That’s especially sloppy since he goes on and on about flavanols. Also, the Nobel Prize data is for 1901-2011 and the chocolate data is mostly just from 2010 or 2011: chocolate consumption in many countries has changed over the past century.

Q: Do you want to say something about correlation and causation now?

A: No, that’s what you say when you don’t know what causes spurious correlations.

Q: So what did cause this correlation?

A: There are at least two likely contributions.  The first is just that wealthy countries tend to have more chocolate consumption and more Nobel Prizes.  Chocolate and research are expensive.  The second is more interesting: it’s the same reason that storks per capita and birth rates are correlated.

Q: Storks bring chocolate as well as babies?

A: Not quite.  Birth rates and storks per capita tend to be correlated because they are both multiples of the reciprocal of population size.   Jerzy Neyman pointed this out in the prehistory of statistics, and Richard Kronmal brought it up again in 1993.  More recently, someone has done the computation with real data (p=0.008). Imperfect standardisation will induce correlation, and since Nobel Prizes almost certainly don’t depend linearly on population, the correction is bound to be imperfect.

Q: Why did the New England Journal publish this article?

A: It wasn’t published as a research article; it was in their ‘Occasional Notes’ series, which the journal describes as “accounts of personal experiences or descriptions of material from outside the usual areas of medical research and analysis.”

Q: Isn’t it good that stuffy medical journals do this sort of thing occasionally? There’s nothing like a good joke

A: Well, you might hope they would do it better, like the BMJ does.  This is nothing like a good joke.

 

September 3, 2012

Top of the table: David Scott’s Super 15 success

This appeared in Uni News,  the internal University of Auckland magazine sent to a wide outside audience, a week or so after the end of the Super 15. David came top equal with sports writer Dylan Cleaver, and says he’ll be spending time over this summer “looking at some possible improvements to the prediction method”. Watch this space!

 

August 14, 2012

London 2012 and data journalism: What did we learn at the Olympics?

Fascinating item in The Guardian, which looks at the Olympics from a data journalist’s point of view …. and does a great job.

 

August 4, 2012

Who to test?

Sammie Jia draws my attention to a story in Nature News about drug-testing for atheletes.  The story mentions the  impressive performance of swimmer Ye Shiwen in the women’s 400m individual medley, and suggests that tests should focus on atheletes who have unusually good performances or strong improvements.

I’m surprised this isn’t already being done.  There are at least three good arguments for it.  Firstly, if doping works, the rate will be higher among athletes who have improved dramatically, so more cheats will be found with lower testing costs.  Secondly, the successes of performance-enhancing drugs are the cases it’s most useful to catch, either if you think the point is deterrence in young atheletes or if you are doing it for some sort of abstract ideal of fair competition.  And finally, athletes who make dramatic performance gains honestly deserve to have any doubts removed.

In contrast to many other approaches to targeting tests, this one is very hard to game.  The whole point of doping is to get otherwise-impossible improvements in performance, so there isn’t any useful way to avoid attention. Targeting performance improvements would be even more efficient in circumstances where taking and storing samples is cheaper than analyzing them: it would be possible to store a larger collection of samples and retrospectively test performers who had done surprisingly well.   That, more or less, is how outcome-dependent sampling is used in medical research:  we take blood samples from everyone, but focus the expensive assays on people who, possibly decades later, stand out for good or bad health.

February 7, 2012

Superbowl statistics

American football games, like many sporting events, start with a coin toss, in this case to decide which team is playing in which direction.   At the last 14 Superbowls, the team from the National Football Conference has won the toss (via).  In a standard test of the hypothesis that the coin was fair, the p-value would be 0.0001.  So, does this mean the NFC is cheating? Well, no.  We have overwhelmingly good reasons to believe that coin tosses are very close to fair, and a mere 1 in 8000 coincidence shouldn’t change our minds.   As Tom Stoppard put it in  Rosencrantz and Guildensten Are Dead: ”A spectacular vindication of the principle that each coin, spun individually, is just as likely to come up head as tails, and should cause no surprise each individual time it does.”

The generalization of this principle to studies purporting to find small, but statistically significant, benefits of homeopathy is left as an exercise to the reader.

December 31, 2011

Student multitasking

Another seasonal phenomenon at this time of year is the end of US college football. For those who haven’t encountered the game, American football is not entirely unlike rugby, only with less actual kicking and more ad breaks.

Some economists in Oregon have looked at the relationship between the average male:female GPA difference  at the University of Oregon and the performance of the Ducks, the University’s football team.

So what did the economists find? While the average GPA for male students was always lower than for female students, there was a definite pattern with a larger gap in years when the Ducks did well and a smaller gap when the team did poorly. (more…)

October 21, 2011

Our RWC 2015 team was born in March 1987

Were you born between January and March in 1987? Congratulations – you’re picked for the RWC 2015 New Zealand team!

This rather ridiculous (and untrue) piece of information I just made up was concocted by examining some data and coming to an unsubstantiated conclusion. I was inspired to do this because I read recently in a British tabloid that one should “Give birth in March for a pilot” and “Victoria Beckham’s [daughter] likely to become bricklayer”. Finding the exact source of the study from the Office of National Statistics was troublesome but instead led me to a lot of advice for when to get pregnant so your child could be a dentist.

Without seeing the original study we cannot say what got twisted around between when the UK Census was collected and when the tabloids hit the news stands. The methodological insight that we get from the Daily Mail suggests that the monthly professions-of-choice are those “with the greatest percentage above the monthly average”. Well, pick a bunch of numbers and there will be a biggest one! It doesn’t necessarily condemn your January-born aspiring sheet-metal worker to the life of a GP.

A further concern arises from multiple comparisons. The more things you look for, the more “oddities” or coincidences you’ll find – none of them have to mean anything at all. Compare 19 professions against 12 months and that’s 228 chances to find something a little unusual. You’re sure to go away with a juicy collection of headlines for these pains. Even further, oddities in the statistical sense can be decidedly underwhelming in practical terms, if we are dealing with huge numbers of respondents as in the UK census. It might be statistically all-but-certain that “Spring birth conveys height advantage” but the height advantage in question turns out to be only 6 mm.

One place where we can see a real and well-studied effect from month of birth is sport. Sport is seasonal and, unlike dentistry, has a very clear starting time every year. If sports are organised by age-group and you are among the oldest in the group, you have almost a year’s advantage over the youngest. For children, a year is a big deal in terms of size, physical coordination, and maturity – and this advantage snowballs throughout childhood as you get picked for the best teams, practise more, play against better opponents, and on and on. Ad Dudink examined Dutch and English soccer players in 1994, following in the footsteps of Barnsley and Thompson who examined Canadian hockey players in 1985 and 1988.

As for whether you’ll be a dentist or a bricklayer, it is possible that this can be affected by birth month, because the age differential in the school year affects children’s academic outcomes in a similar (but less drastic) way to sports teams. In the UK, children start school in September, so September-born children have a year’s maturity advantage over their August-born classmates. This is not a temporary effect: studies have shown that the advantage/disadvantage continues to school-leaving exams and university.

In New Zealand our school season begins in February, so don’t expect the same education outcomes to birth month misconnections as the United Kingdom.

But how about them All Blacks?

I extracted the place and date of birth of each of the team members listed for the All Blacks and French teams from the Rugby World Cup 2011 website, which I then ran through sed, R and finally dumped into Excel.

Then I separated the players out into hemisphere of birth, as each hemisphere has a different season start. All the French players were born in the northern hemisphere, and all the All Blacks were born in the southern hemisphere, making my life a bit easier.

I’ve plotted them here. French (blue) above the equator, and All Blacks (black) below:

Graph of data

Team members by quarter of birth and hemisphere, NZL vs FRA

Eyeballing that does suggest some stories about when to be born if you want to play for the All Blacks or the French, but being born in January to March isn’t going to get you straight onto the All Black squad. There are many other factors that influence your selection:

Eat a healthy diet, high in Weet-Bix, exercise often, and most importantly, you can increase your chances of being on the squad by starting to play rugby.

A few references and citations for further reading:

^ Jessica Utts (2003). What Educated Citizens Should Know About Statistics and Probability. The American Statistician. May 1, 2003, 57(2): 74-79. doi:10.1198/0003130031630

^ Weber GW, Prossinger H, Seidler H (1998). Height depends on month of birth. Nature, 391(6669), 754-755 doi:10.1038/35781

^ Dudink A (1994). Birth date and sporting success. Nature, 368(6472), 592.

^ Barnsley RH, Thompson AH, Barnsley PE (1985). Hockey success and birth-date: The relative age effect. Journal of the Canadian Association for Health, Physical Education, and Recreation, Nov.-Dec., 23-28.

^ Barnsley RH, Thompson AH (1988). Birthdate and success in minor hockey: The key to the N.H.L.. Canadian Journal of Behavioral Science 20, 167-176.

Wiseman, R (2008). Quirkology: The Curious Science Of Everyday Lives, 28-29 ISBN: 9780330448093

Back in 2008 the All Black squad was also dominated by January – March births: http://rowansimpson.com/­2008/12/07/31-december/

October 5, 2011

Rugby World Cup 2011 predictions from David Scott …

Ratings at the Start of RWC 2011

Here are the team ratings at the start of RWC 2011.

  Rating
New Zealand 30.92
Australia 22.97
South Africa 20.50
England 13.19
France 11.42
Wales 9.99
Ireland 8.65
Argentina 6.42
Scotland 4.50
Italy -2.78
Samoa -7.38
Canada -15.03
Tonga -15.11
Fiji -15.70
Japan -22.52
Georgia -25.35
USA -27.29
Romania -28.18
Russia -33.12
Namibia -38.21

Of interest in this table is that the eight top-ranked teams are the ones which have progressed to the quarter finals.

Current Team Ratings

Here are the team ratings as of October 05, 2011

  Rating
New Zealand 31.55
Australia 21.39
South Africa 21.20
Wales 14.71
England 13.78
Ireland 10.79
France 8.91
Argentina 5.56
Scotland 2.25
Samoa -3.67
Italy -4.34
Tonga -11.34
Canada -16.43
Fiji -20.17
Georgia -21.62
Japan -23.03
USA -26.09
Romania -29.39
Russia -33.31
Namibia -42.87

The most notable change here is the improvement of Wales taking it above England. Ireland has also improved, but France is down. Of the lesser teams, Samoa, Tonga and Georgia have improved, Fiji has declined and Namibia has deservedly sunk further into the mire.

Performance So Far

So far there have been 40 matches played, 36 of which were correctly predicted, a success rate of 90%.
Here are the predictions for the games so far.

  Game Date Score Prediction Correct
1 New Zealand vs. Tonga Sep 09 41 – 10 51.03 TRUE
2 Argentina vs. England Sep 10 9 – 13 -6.77 TRUE
3 Fiji vs. Namibia Sep 10 49 – 25 22.51 TRUE
4 France vs. Japan Sep 10 47 – 21 33.94 TRUE
5 Scotland vs. Romania Sep 10 34 – 24 32.68 TRUE
6 Australia vs. Italy Sep 11 32 – 6 25.75 TRUE
7 Ireland vs. USA Sep 11 22 – 10 35.93 TRUE
8 South Africa vs. Wales Sep 11 17 – 16 10.51 TRUE
9 Samoa vs. Namibia Sep 14 49 – 12 30.95 TRUE
10 Scotland vs. Georgia Sep 14 15 – 6 28.04 TRUE
11 Tonga vs. Canada Sep 14 20 – 25 1.53 FALSE
12 Russia vs. USA Sep 15 6 – 13 -7.75 TRUE
13 New Zealand vs. Japan Sep 16 83 – 7 56.20 TRUE
14 Argentina vs. Romania Sep 17 43 – 8 33.00 TRUE
15 Australia vs. Ireland Sep 17 6 – 15 16.26 FALSE
16 South Africa vs. Fiji Sep 17 49 – 3 35.32 TRUE
17 England vs. Georgia Sep 18 41 – 10 36.79 TRUE
18 France vs. Canada Sep 18 46 – 19 25.29 TRUE
19 Wales vs. Samoa Sep 18 17 – 10 17.64 TRUE
20 Italy vs. Russia Sep 20 53 – 17 30.27 TRUE
21 Tonga vs. Japan Sep 21 31 – 18 9.44 TRUE
22 South Africa vs. Namibia Sep 22 87 – 0 59.41 TRUE
23 Australia vs. USA Sep 23 67 – 5 46.40 TRUE
24 England vs. Romania Sep 24 67 – 3 39.03 TRUE
25 New Zealand vs. France Sep 24 37 – 17 24.98 TRUE
26 Argentina vs. Scotland Sep 25 13 – 12 5.63 TRUE
27 Fiji vs. Samoa Sep 25 7 – 27 -10.39 TRUE
28 Ireland vs. Russia Sep 25 62 – 12 42.28 TRUE
29 Wales vs. Namibia Sep 26 81 – 7 50.92 TRUE
30 Canada vs. Japan Sep 27 23 – 23 9.11 FALSE
31 Italy vs. USA Sep 27 27 – 10 24.34 TRUE
32 Georgia vs. Romania Sep 28 25 – 9 5.16 TRUE
33 South Africa vs. Samoa Sep 30 13 – 5 28.08 TRUE
34 Australia vs. Russia Oct 01 68 – 22 56.36 TRUE
35 England vs. Scotland Oct 01 16 – 12 12.96 TRUE
36 France vs. Tonga Oct 01 14 – 19 25.06 FALSE
37 Argentina vs. Georgia Oct 02 25 – 7 28.92 TRUE
38 Ireland vs. Italy Oct 02 36 – 6 12.30 TRUE
39 New Zealand vs. Canada Oct 02 79 – 15 50.88 TRUE
40 Wales vs. Fiji Oct 02 66 – 0 28.95 TRUE

 

Predictions for the Quarter Finals

Here are the predictions for the quarter final games

  Game Date Winner Prediction
1 Ireland vs. Wales Oct 08 Wales -3.90
2 England vs. France Oct 08 England 4.90
3 South Africa vs. Australia Oct 09 Australia -0.20
4 New Zealand vs. Argentina Oct 09 New Zealand 31.00

Most interesting here is the very narrow gap between South Africa and Australia. That game could clearly go either way.

Source File: hwriterPredictions.R

(Page generated on Wed Oct 05 13:39:15 2011 by hwriter 1.3)