January 6, 2016

Meet Statistics summer scholar Katie Fahy

Every summer, the Department of Statistics offers scholarships to a number of students so they can work with staff on real-world projects. Katie, right, is working on the New Zealand Socio-Economic Index with Dr Barry Milne of COMPASS (Katie FahyCentre of Methods and Policy Application in the Social Sciences) and Professor Alan Lee from the Department of Statistics. Katie explains:

“The New Zealand Socio-Economic Index (NZSEI) assigns occupations a score that enables us to measure the socio-economic status of people in that occupation. It’s calculated using the average age, income and education level of people with each job. For example, doctors would have a very high socio-economic index, because they’re typically high-earning and well-educated people.

“The NZSEI has been created from Census data since the 90s, but has not yet been updated for the most recent Census in 2013. In this project, my job is to update the NZSEI using path analysis, and check that this updated version is appropriate for all people in New Zealand. A couple of examples include assessing that the index is valid for all ethnicities, and valid for workers in both urban and rural regions.

“The index is important to measure any changes to New Zealand over time, as it is updated with each Census. As well as this, the NZSEI uses a similar methodology to international scales, so international comparisons are possible.

“I am currently in my third year of studying Mathematics and Statistics at the University of Sheffield in England, and I’m halfway through my year here in Auckland as an exchange student. I’ve always been interested in Statistics and studying it at university level has shown me how applicable it is in a variety of fields, from finance to biology.

“Over the summer, I’m looking forward to exploring New Zealand more.”



January 22, 2015

Meet Statistics summer scholar Yiying Zhang

yiyingEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Yiying, right, is working on a project called Modelling Competition and Dispersal in a Statistical Phylogeographic Framework with Dr Stéphane GuindonYiying explains.

 “The processes that govern the spatial distribution of species are complex. Traditional approaches in ecology generally rely on the hypothesis that adaptation to the environment is the main force driving this distribution.

“The supervisors of this project propose an alternative explanation that assumes that species are found in certain places simply because they were the first to colonise these locations during the course of evolution. They have recently designed a stochastic model that explains the observed spatial distribution of species using a combination of dispersal events (i.e., species migrating to new territories) and competition between species.

“In this project, I will run in silico [computer] experiments and analyse real data in order to validate the software Phyloland that implements our dispersal-competition model.

“To validate the model, we will randomly generate ‘true value’. Then we will use the model to make estimations of the true value. If the estimated values match the true value relatively closely, then the model is reliable.

“I am doing a BCom/BSc conjoint degree. My majors are Finance, Accounting and Statistics – 2015 is my fourth year. I am planning to do an Honours degree in statistics, so this summer research project is a very valuable experience for me.

“I enjoy statistics because it brings me closer to the real world. Sometimes, things are not simply what we see. Without data, we would never have convincing evidence about what is really happening. The amount of information out there is massive and statistics can help people tell how reliable a statement is. Studying statistics has helped me make better use of information and think more critically.

“My plans for summer include relaxing and reading more books. And having plenty of sleep.”


January 21, 2015

Meet Statistics summer scholar Alexander van der Voorn

Alex van der VoornEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Alexander, right, is undertaking a statistics education research project with Dr Marie Fitch and Dr Stephanie Budgett. Alexander explains:

“Essentially, what this project involves is looking at how bootstrapping and re-randomisation being added into the university’s introductory statistics course have affected students’ understanding of statistical inference, such as interpreting P-values and confidence intervals, and knowing what can and can’t be justifiably claimed based on those statistical results.

“This mainly consists of classifying test and exam questions into several key categories from before and after bootstrapping and re-randomisation were added to the course, and looking at the change (if any) in the number of students who correctly answer these questions over time, and even if any common misconceptions become more or less prominent in students’ answers as well.

“This sort of project is useful as traditionally, introductory statistics education has had a large focus on the normal distribution and using it to develop ideas and understanding of statistical inference from it. This results in a theoretical and mathematical approach, which means students will often be restricted by the complexity of it and will therefore struggle to be able to use it to make clear inference about the data.

“Bootstrapping and re-randomisation are two techniques that can be used in statistical analysis and were added into the introductory statistics course at the university in 2012. They have been around for some time, but have only become prominent and practically useful recently as they require many repetitions of simulations, which obviously is better-suited to a computer rather than a person. Research on this emphasises how using these techniques allow key statistical ideas to be taught and understood without a lot of fuss, such as complicated assumptions and dealing with probability distributions.

“In 2015, I’ll be completing my third year of a BSc in Statistics and Operations Research, and I’ll be looking at doing postgraduate study after that. I’m not sure why statistics appeals to me, I just found it very interesting and enjoyable at university and wanted to do more of it. I always liked maths at school, so it probably stemmed from that.

“I don’t have any plans to go away anywhere so this summer I’ll just relax, enjoy some time off in the sun and spend time around home. I might also focus on some drumming practice, as well as playing with my two dogs.”

June 5, 2014

Gender, coding, and measurement error

Alyssa Frazee, a PhD student in biostatistics at Johns Hopkins, has an interesting post looking at gender of programmers using the Github code repository. Github users have a profile, which includes a first name, and there programs that attempt to classify first names by gender.

This graph (click to embiggen, as usual) shows the guessed gender distribution for software with at least five ‘stars’ (likes, sort of) across programming languages. Orange is male, green is female, grey is “don’t know”


The main message is obvious. Women either aren’t putting code on Github or are using non-gender-revealing or male-associated names.

The other point is that the language with the most female coders seems to be R, the statistical programming language originally developed in Auckland, which has 5.5%.  Sadly, 3.9% of that is code by the very prolific Hadley Wickham (also originally developed in Auckland), who isn’t female. Measurement error, as I’ve written before, has a much bigger impact on rare categories than common ones.

March 7, 2014

Careers in statistics

From Science Careers

“[The Bureau of Labor Statistics] projects that statistics jobs will grow 27% from 2012 to 2022, putting the profession in the “much faster than the average for all occupations” growth category. The bureau puts statisticians’ median annual salary in 2012 at $75,560.

In addition to having a different quote from Hal Varian than the one you were expecting, they talk to statisticians including Xihong Lin and Montse Fuentes.

December 23, 2013

Meet Callum Gray, Statistics Summer Scholar 2013-2014

Every year, the Department of Statistics at the University of Auckland offers summer scholarships to a number of students so they can work with our staff on real-world projects. We’ll be profiling the 2013-2014 summer scholars on Stats Chat. Callum is working with Dr Ian Tuck on a project titled Probability of encountering a bus.  

Callum (right) explains:

“If you encounter a bus on a journey, you are likely to be exposed to higher levels of pollution. I am trying to find the probability of encountering a bus and how many you will encounter when you travel from place A to place B, taking into account variables such as the time of day and mode of transport.


“This research is useful because it will give us more of an understanding about the impact that buses have on our daily exposure to pollution. we can use this information to plan journeys and learn more about an issue that is becoming more and more apparent.

“I was born in Auckland and have lived here my whole life. I just finished my third year of a Bachelor of Commerce/Bachelor of Science conjoint majoring in Accounting, Finance, and Statistics, which I will finish at
the end of 2014.

“Statistics appeals to me because it is used everyday in conjunction with many other areas. It is very useful to know in a lot of workplaces, and it is interesting because it has a lot of real-life applications.

“I am going to Napier for Christmas and Rhythm and Vines for New Year. In the rest of my spare time, I will be playing cricket and golf, as well as hanging out with friends.”



November 20, 2013

Statistician statistics: gender, race, ethnicity

New data from the American Community Survey on race, ethnicity, and gender balance in science/technology employment.  (more…)

September 22, 2013


  • Careers: The number of people getting statistics degrees in the US has doubled in the past five years (and they’re still able to get jobs)
  • Increasing inequality in the US from 1977 to 2012 (it happens in other places too): top 1% share of income.  The colour choice is a bit unfortunate (red: more equal, green:less equal). There are animated pictures and more inequality measures in the original


  • Map of sasquatch sightings in the US. The original has all the sightings as well as this map cross-referenced with population density. Remember, just because you can measure it doesn’t mean it exists


  • Software for drawing data-based maps: CartoDB. Has both free and paid versions.  Worth a look if you do maps.
September 19, 2013

Silver Ferns’ secret weapon

From One News NZ, a story about Bobby Wilcox, the team’s performance analyst, who has a PhD in Statistics from our department

She’s been one of the Silver Ferns most integral members for nine years, yet she’s largely anonymous outside…


[the video comes with a very annoying ad, sadly]

September 13, 2013


From this morning’s Twitter feed

  • An animated GIF (click on it to wake it up) showing how to improve a barchart by removing junk. [from Darkhorse Analytics: Data looks better naked]



  • Data journalism: how the data sausage gets made.  Jacob Harris describes how he collected and summarised data on meat recalls in the US
  • The Royal Statistical Society has repeated the simple maths test they gave politicians last year, this time for senior professionals and managers. Less than half of them could give the probability of getting two heads from tossing two coins.
  • However, the same Royal Statistical Society news item ends “The figures have been weighted and are representative of all GB adults (aged 18+)”. This seems to me to fall in the “not even wrong” category. The target group aren’t remotely representative of all British adults, and I’d be surprised if it was even possible to reweight them to the national age distribution.
  • Cathy O’Neill ( asks why rankings of eg, cars or universities don’t allow the user to change priorities for different attributes (as the OECD Better Life Index does, for example)