Posts filed under Education (76)

February 4, 2015

Meet Statistics summer scholar Christopher Pearce

Chris PearceEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Christopher, right, is working on the OpenAPI project with Associate Professor Paul Murrell. Chris explains:

“Government data is becoming increasingly available. However, this does not mean it is readable – few individuals possess the knowledge and skills to make use of these data by themselves.

“In an ideal world, the code used by fellow statisticians would be available to everyone. It would be even more ideal if it were transferable. Sites like Wiki New Zealand  are doing a remarkable job of displaying some of New Zealand’s trends, but with no source code it can sometimes be impossible to recreate.

“The OpenAPI project is developing a flow-based framework that is primarily aimed at lowering the barriers to use of open data by the general public. My project is about creating an architecture for programmers and statisticians of all levels. Our goal is for anyone interested to have the ability to perform analyses on open government data. The idea is that there are publicly available snippets of code from fellow statisticians that can be easily linked in a meaningful way. The less expertise required by the end user, the better.

“My job is to come up with questions I am interested in answering, then figuring out how a potential lay observer would solve them. So far it has yielded some interesting results.

“I’m a third-year student at the University of Auckland, studying a Bachelor of Laws/Bachelor of Science conjoint. My skills lie in statistics and computer science, but I need the literal side to keep a balanced life.

“I got hooked on statistics when I discovered the Poisson distribution. There’s something about statistics that never seems to get old, and I’m discovering new things every day. It’s nice knowing I can actually attempt an answer to the curiosities in my head.”

February 3, 2015

Meet Statistics summer scholar Daniel van Vorsselen

Every year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Daniel, right, is working on a project called Working with data from conservation monitoring schemes with Associate Professor Rachel Fewster. Daniel explains:

Daniel Profile Picture“The university is involved in a project called CatchIT, an online system that aims to help community conservation schemes by proving users with a place where they can input and store their data for reference. The project also produces maps and graphics so that users can assess the effectiveness of their conservation schemes and identify areas where changes can be made.

“My role in the project is to help analyse the data that users put into the project. This involves correctly formatting and cleaning the data so that it is usable. I assist users in the technical aspects relating to their data and help them communicate their data in a meaningful way.

“It’s important to maintain and preserve the wildlife and plant species we have in New Zealand so that future generations have the opportunity to experience them as we have. Our environments are a defining factor of our culture and lifestyles as New Zealanders and we have a large amount of native species in New Zealand. It would be a shame to see them eradicated.

“I am currently studying a BCom/BA conjoint, majoring in Statistics, Economics and Finance. I’m hoping to do Honours in statistics and I am looking at a career in banking.

“Over summer, I hope to enjoy the nice weather, whether out on the boat fishing, at the beach or going for a run.”





January 30, 2015

Meet Statistics summer scholar Ying Zhang

Ying Zhang Photo

Every year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Ying, right, is working on a project called Service overview, client profile and outcome evaluation for Lifeline Aotearoa Face-to-Face Counselling Services  with the Department of Statistics’ Associate Professor David Scott and Christine Dong, research and clinical engagement manager, Lifeline and also an Honorary Research Fellow in the Department of Psychological Medicine at the University of Auckland. Ying explains:

“Lifeline New Zealand is a leading provider of dedicated community helpline services, face-to-face counselling and suicide prevention education. The project aims to investigate the client profile, the clinical effectiveness of the service and client experiences of, and satisfaction with, the face-to-face counselling service.

“In this project, my work includes three aspects: Data entry of client profiles and counselling outcomes; qualitative analysis of open-ended questions and descriptive analysis; and modelling for the quantitative variables using SAS.

“Very few research studies have been done in New Zealand to explore client profiles or find out clients’ experiences of, and satisfaction with, community face-to-face counselling services. Therefore, the study will add evidence in terms of both clinical effectiveness and client satisfaction. This study will also provide a systematic summary of the demographics and clinical characteristics of people accessing such services. It will help provide direction for strategies to improve the quality and efficiency of the service.

“I have just graduated from the University of Auckland with a Postgraduate Diploma in Statistics.  I got my bachelor and master degrees majoring in information management and information systems at Zhejiang University in China.

“My first contact with statistics was around 10 years ago when I was at university in China. It was an interesting but complex subject for me. After that, I did some internship work relating to data analysis. It helped me accumulate more experience about using data analysis to help inform business decisions.

“This summer, apart from participating in the project, I will spend some time expanding my knowledge of SAS – it’s a very useful tool and I want to know it better. I’m also hoping to find a full-time job in data analysis.”





January 28, 2015

Meet Statistics summer scholar Kai Huang

Kai Huang croppedEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Kai, right, is working on a project called Constrained Additive Ordination with Dr Thomas Yee. Kai explains:

“In the early 2000s, Dr Thomas Yee proposed a new technique in the field of ecology called Constrained Additive Ordination (CAO) that solves the problems about the shape of species’ response curves and how they are distributed along unknown underlying gradients, and meanwhile the CAO-oriented Vector Generalised Linear and Additive Models (VGAM) package for R has been developed. This summer, I am compiling code for improving performance for the VGAM package by facilitating the integration of R and C++ under the R environment.

“This project brings me the chance to work with a package in worldwide use and stimulates me to learn more about writing R extensions and C++ compilation. I don’t have any background in ecology, but I acquired a lot before I started this project.

“I just have done the one-year Graduate Diploma in Science in Statistics at the University of Auckland after graduating from Massey University at Palmerston North with a Bachelor of Business Studies in Finance and Economics. In 2015, I’ll be doing an honours degree in Statistics. Statistics is used in every field, which is awesome to me.

“This summer, I’ll be spending my days rationally, working with numbers and codes, and at night, romantically, spending my spare time with stars. Seeing the movie Interstellar [a 2014 science-fiction epic that features a crew of astronauts who travel through a wormhole in search of a new home for humanity] reignited my curiosity about the universe, and I have been reading astronomy and physics books in my spare time this summer. I even bought an annual pass to Stardome, the planetarium at Auckland, and have spent several evenings there.”


January 23, 2015

Meet Statistics summer scholar Bo Liu

Photo Bo LiuEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Bo, right, is working on a project called Construction of life-course variables for the New Zealand Longitudinal Census (NZLC) with Roy Lay-Yee, Senior Research Fellow at the COMPASS Research Centre, University of Auckland, and Professor Alan Lee of Statistics. Bo explains:

“The New Zealand Longitudinal Census has linked individuals across the 1981-2006 New Zealand censuses. This enables the assessment of life-course resources with various outcomes.

“I need to create life-course variables such as socio-economic status, health, education, work, family ties and cultural identity from the censuses. Sometimes such information is not given directly in the census questions, but several pieces of information need to be combined together.

“An example is the overcrowding index that measures the personal living space. We need to combine the age, partnership status of the residents and number of bedrooms in each dwelling to derive the index.

“Also, the format of the questionnaire as well as the answers used in each census were rather different, so data-cleaning is required. I need to harmonise information collected in each census so that they are consistent and can be compared over different censuses. For example, in one census the gender might be given code ‘0’ and ‘1’ representing female and male, but in another census the gender was given code ‘1’ and ‘2’. Thus the code ‘1’ can mean quite different things in different censuses. My job is to find these differences and gaps in each census.

“The results of this project will enable future studies based on New Zealand longitudinal censuses, say, for example, the influence of life-courses variables on the risk of mortality. This project will also be a very good experience for my future career, since data-cleaning is a very important process that we were barely taught in our courses but will actually cost almost one-third of the time in most real-life projects. When we were studying statistics courses, most data sets we encountered were “toy” data sets that had fewer variables and observations and were clean. However, in real life, as in this case, we often meet with data that have millions of observations, hundreds of variables, and inconsistent variable specification and coding.

“I hold a Bachelor of Commerce in Accounting, Finance and Information Systems. I have just completed Postgraduate Diploma in Science, majoring in Statistics, and in 2015, I will be doing Master of Science in Statistics.

“When I was studying information systems, my lecturer introduced several statistical techniques to us and I was fascinated by what statistics is capable of in the decision-making process. For example, retailers can find out if a customer is pregnant purely based on her purchasing behaviour, so the retailers can send out coupons to increase their sales. It is amazing how we can use statistical techniques to find that little tiny bit of useful information in oceans of data. Statistics appeals to me as it is highly useful and applicable in almost every industry.

“This summer, I will spend some time doing road trips – hopefully I can make it to the South Island this time. I enjoy doing road trips alone every summer as I feel this is the best way to get myself refreshed and motivated for the next year.”




January 22, 2015

Meet Statistics summer scholar Yiying Zhang

yiyingEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Yiying, right, is working on a project called Modelling Competition and Dispersal in a Statistical Phylogeographic Framework with Dr Stéphane GuindonYiying explains.

 “The processes that govern the spatial distribution of species are complex. Traditional approaches in ecology generally rely on the hypothesis that adaptation to the environment is the main force driving this distribution.

“The supervisors of this project propose an alternative explanation that assumes that species are found in certain places simply because they were the first to colonise these locations during the course of evolution. They have recently designed a stochastic model that explains the observed spatial distribution of species using a combination of dispersal events (i.e., species migrating to new territories) and competition between species.

“In this project, I will run in silico [computer] experiments and analyse real data in order to validate the software Phyloland that implements our dispersal-competition model.

“To validate the model, we will randomly generate ‘true value’. Then we will use the model to make estimations of the true value. If the estimated values match the true value relatively closely, then the model is reliable.

“I am doing a BCom/BSc conjoint degree. My majors are Finance, Accounting and Statistics – 2015 is my fourth year. I am planning to do an Honours degree in statistics, so this summer research project is a very valuable experience for me.

“I enjoy statistics because it brings me closer to the real world. Sometimes, things are not simply what we see. Without data, we would never have convincing evidence about what is really happening. The amount of information out there is massive and statistics can help people tell how reliable a statement is. Studying statistics has helped me make better use of information and think more critically.

“My plans for summer include relaxing and reading more books. And having plenty of sleep.”


January 21, 2015

Meet Statistics summer scholar Alexander van der Voorn

Alex van der VoornEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Alexander, right, is undertaking a statistics education research project with Dr Marie Fitch and Dr Stephanie Budgett. Alexander explains:

“Essentially, what this project involves is looking at how bootstrapping and re-randomisation being added into the university’s introductory statistics course have affected students’ understanding of statistical inference, such as interpreting P-values and confidence intervals, and knowing what can and can’t be justifiably claimed based on those statistical results.

“This mainly consists of classifying test and exam questions into several key categories from before and after bootstrapping and re-randomisation were added to the course, and looking at the change (if any) in the number of students who correctly answer these questions over time, and even if any common misconceptions become more or less prominent in students’ answers as well.

“This sort of project is useful as traditionally, introductory statistics education has had a large focus on the normal distribution and using it to develop ideas and understanding of statistical inference from it. This results in a theoretical and mathematical approach, which means students will often be restricted by the complexity of it and will therefore struggle to be able to use it to make clear inference about the data.

“Bootstrapping and re-randomisation are two techniques that can be used in statistical analysis and were added into the introductory statistics course at the university in 2012. They have been around for some time, but have only become prominent and practically useful recently as they require many repetitions of simulations, which obviously is better-suited to a computer rather than a person. Research on this emphasises how using these techniques allow key statistical ideas to be taught and understood without a lot of fuss, such as complicated assumptions and dealing with probability distributions.

“In 2015, I’ll be completing my third year of a BSc in Statistics and Operations Research, and I’ll be looking at doing postgraduate study after that. I’m not sure why statistics appeals to me, I just found it very interesting and enjoyable at university and wanted to do more of it. I always liked maths at school, so it probably stemmed from that.

“I don’t have any plans to go away anywhere so this summer I’ll just relax, enjoy some time off in the sun and spend time around home. I might also focus on some drumming practice, as well as playing with my two dogs.”

June 26, 2014

Want to learn data analysis? No stats experience required

4 Chris Wild, UoAInterested in learning to do data analysis but don’t know where to start? Try out the Department of Statistics’ new MOOC (massive online open course) called From Data to Insight: An Introduction to Data Analysis. It’s free – yep, it won’t cost you a bean – starts on October 6, takes just three hours a week, and will be led by our resident world-renowned statistics educator Prof Chris Wild (right).

The blurb says, in part:

“The course focuses on data exploration and discovery, showing you what to look for in statistical data, however large it may be. We’ll also teach you some of the limitations of data and what you can do to avoid being misled. We use data visualisations designed to teach you these skills quickly, and introduce you to the basic concepts you need to start understanding our world through data.

“This course assumes very little experience with statistical ideas and concepts. You will need to be comfortable thinking in terms of percentages, have basic Microsoft Excel skills, and a Windows or Macintosh computer to download and install our iNZight software.”

And that’s all you need. Spread the word.







June 5, 2014

Gender, coding, and measurement error

Alyssa Frazee, a PhD student in biostatistics at Johns Hopkins, has an interesting post looking at gender of programmers using the Github code repository. Github users have a profile, which includes a first name, and there programs that attempt to classify first names by gender.

This graph (click to embiggen, as usual) shows the guessed gender distribution for software with at least five ‘stars’ (likes, sort of) across programming languages. Orange is male, green is female, grey is “don’t know”


The main message is obvious. Women either aren’t putting code on Github or are using non-gender-revealing or male-associated names.

The other point is that the language with the most female coders seems to be R, the statistical programming language originally developed in Auckland, which has 5.5%.  Sadly, 3.9% of that is code by the very prolific Hadley Wickham (also originally developed in Auckland), who isn’t female. Measurement error, as I’ve written before, has a much bigger impact on rare categories than common ones.

June 4, 2014

How much disagreement should there be?

The Herald

Thousands of school students are being awarded the wrong NCEA grades, a review of last year’s results has revealed.

Nearly one in four grades given by teachers for internally marked work were deemed incorrect after checking by New Zealand Qualifications Authority moderators.

That’s not actually true, because moderators don’t deem grades to be incorrect. That’s not what moderators are for.  What the report says (pp105-107 in case you want to scroll through it) is that in 24% of cases the moderator and the internal assessor disagreed on grade, and in 12% they disagreed on whether the standard had been achieved.

What we don’t know is how much disagreement is appropriate. The only way the moderator’s assessment could be considered error-free is if you define the ‘right answer’ to be ‘whatever the moderator says’, which is obviously not appropriate. There always will be some variation between moderators, and some variation between schools, and what we want to know is whether there is too much.

The report is a bit disappointing from that point of view.  At the very least, there should have been some duplicate moderation. That is, some pieces of work should have been sent to two different moderators, so we could have an idea of the between-moderator agreement rate. Then, if we were willing to assume that moderators collectively were infallible (though not individually), we could estimate how much less reliable the internal assessments were.

Even better would be to get some information on how much variation there is between schools in the disagreement: if there is very little variation, the schools may be doing about as well as is possible, but if there is a lot of variation between schools it would suggest some schools aren’t assessing very reliably.