Posts filed under Education (59)

June 26, 2014

Want to learn data analysis? No stats experience required

4 Chris Wild, UoAInterested in learning to do data analysis but don’t know where to start? Try out the Department of Statistics’ new MOOC (massive online open course) called From Data to Insight: An Introduction to Data Analysis. It’s free – yep, it won’t cost you a bean – starts on October 6, takes just three hours a week, and will be led by our resident world-renowned statistics educator Prof Chris Wild (right).

The blurb says, in part:

“The course focuses on data exploration and discovery, showing you what to look for in statistical data, however large it may be. We’ll also teach you some of the limitations of data and what you can do to avoid being misled. We use data visualisations designed to teach you these skills quickly, and introduce you to the basic concepts you need to start understanding our world through data.

“This course assumes very little experience with statistical ideas and concepts. You will need to be comfortable thinking in terms of percentages, have basic Microsoft Excel skills, and a Windows or Macintosh computer to download and install our iNZight software.”

And that’s all you need. Spread the word.







June 5, 2014

Gender, coding, and measurement error

Alyssa Frazee, a PhD student in biostatistics at Johns Hopkins, has an interesting post looking at gender of programmers using the Github code repository. Github users have a profile, which includes a first name, and there programs that attempt to classify first names by gender.

This graph (click to embiggen, as usual) shows the guessed gender distribution for software with at least five ‘stars’ (likes, sort of) across programming languages. Orange is male, green is female, grey is “don’t know”


The main message is obvious. Women either aren’t putting code on Github or are using non-gender-revealing or male-associated names.

The other point is that the language with the most female coders seems to be R, the statistical programming language originally developed in Auckland, which has 5.5%.  Sadly, 3.9% of that is code by the very prolific Hadley Wickham (also originally developed in Auckland), who isn’t female. Measurement error, as I’ve written before, has a much bigger impact on rare categories than common ones.

June 4, 2014

How much disagreement should there be?

The Herald

Thousands of school students are being awarded the wrong NCEA grades, a review of last year’s results has revealed.

Nearly one in four grades given by teachers for internally marked work were deemed incorrect after checking by New Zealand Qualifications Authority moderators.

That’s not actually true, because moderators don’t deem grades to be incorrect. That’s not what moderators are for.  What the report says (pp105-107 in case you want to scroll through it) is that in 24% of cases the moderator and the internal assessor disagreed on grade, and in 12% they disagreed on whether the standard had been achieved.

What we don’t know is how much disagreement is appropriate. The only way the moderator’s assessment could be considered error-free is if you define the ‘right answer’ to be ‘whatever the moderator says’, which is obviously not appropriate. There always will be some variation between moderators, and some variation between schools, and what we want to know is whether there is too much.

The report is a bit disappointing from that point of view.  At the very least, there should have been some duplicate moderation. That is, some pieces of work should have been sent to two different moderators, so we could have an idea of the between-moderator agreement rate. Then, if we were willing to assume that moderators collectively were infallible (though not individually), we could estimate how much less reliable the internal assessments were.

Even better would be to get some information on how much variation there is between schools in the disagreement: if there is very little variation, the schools may be doing about as well as is possible, but if there is a lot of variation between schools it would suggest some schools aren’t assessing very reliably.


May 12, 2014

Resources in education

Attention conservation notice: I have to write this post because I’ve spent too much time on it otherwise. You don’t have to read it.

There was an episode of “Yes, Prime Minister” where the term “Human Resource Rich Countries” was being posed as a replacement for “Less Developed Countries”, meaning “poor”. “Resources” is a word that can mean lots of different things, which is why I spent more time than was strictly sensible investigating the following graph



The graph appeared in my Twitter feed last Monday. It’s originally from a campaign to give Australia a school funding model a bit more like NZ’s decile system, as recommended by a national review panel, so it is disturbing to see New Zealand almost at the bottom of the world.


May 3, 2014

White House report: ‘Big Data’

There’s a new report “Big Data: Seizing Opportunities, Preserving Values” from the Office of the President (of the USA).  Here’s part of the conclusion (there are detailed recommendations as well)

Big data tools offer astonishing and powerful opportunities to unlock previously inaccessible insights from new and existing data sets. Big data can fuel developments and discoveries in health care and education, in agriculture and energy use, and in how businesses organize their supply chains and monitor their equipment. Big data holds the potential to streamline the provision of public services, increase the efficient use of taxpayer dollars at every level of government, and substantially strengthen national security. The promise of big data requires government data be viewed as a national resource and be responsibly made available to those who can derive social value from it. It also presents the opportunity to shape the next generation of computational tools and technologies that will in turn drive further innovation.

Big data also introduces many quandaries. By their very nature, many of the sensor technologies deployed on our phones and in our homes, offices, and on lampposts and rooftops across our cities are collecting more and more information. Continuing advances in analytics provide incentives to collect as much data as possible not only for today’s uses but also for potential later uses. Technologically speaking, this is driving data collection to become functionally ubiquitous and permanent, allowing the digital traces we leave behind to be collected, analyzed, and assembled to reveal a surprising number of things about ourselves and our lives. These developments challenge longstanding notions of privacy and raise questions about the “notice and consent” framework, by which a user gives initial permission for their data to be collected. But these trends need not prevent creating ways for people to participate in the treatment and management of their information.

You can also read comments on the report by danah boyd, and the conference report and videos from her conference’The Social, Cultural & Ethical Dimensions of “Big Data”‘ are now online.

April 4, 2014

Thomas Lumley’s latest Listener column

…”One of the problems in developing drugs is detecting serious side effects. People who need medication tend to be unwell, so it’s hard to find a reliable comparison. That’s why the roughly threefold increase in heart-attack risk among Vioxx users took so long to be detected …”

Read his column, Faulty Powers, here.

February 22, 2014

Internal and external

There’s an interesting story in the Herald with interactive graphics comparing internal and external NCEA assessments for different subjects, levels, and decile of schools, over time.  The main thing I might change about the graphic is to display over deciles rather than over years, since that’s where the action is.

The general picture is fairly consistent: in low-decile schools, the students get substantially better grades on internal assessment than external. The difference is progressively smaller as you move up the decile scale, in some cases vanishing.  Interpreting the results is more difficult.

The lead says that students do better away from the pressure of exams, which is one explanation. Another, given by Professor Carnegie from VUW, is that the internal assessment is not very reliable. There are many alternatives views given in the story, and even some who says the differences over decile are reasonable and appropriate.


February 21, 2014

Most generous in the world

From Stuff

But Tertiary Education Minister Steven Joyce has made it clear they are not going to get any more in this year’s Budget, and says students already have “one of the most generous support systems in the world”.

This is sufficiently vague that you can probably find a sense in which it’s true, and so could Mr Joyce’s counterparts in most other countries. For example, the Hong Kong system provides slightly larger loans and similar tuition subsidy, but charges (low) interest on the loans from day 1.  The US system allows much larger student loans and significant means-tested non-loan support, but provides much less public subsidy for tuition.  The UK system is more generous for students in low-income households but less generous for students in high-income households. It must be hard to find criteria where the NZ system is more generous than Germany or some other Western European countries, though.

What’s a bit more surprising is that the story treats inflation as basically a matter of opinion

From January 1999 to December 2008, they could borrow up to $150 a week. The limit has risen slowly since, and now stands at $173.56, which Mr Joyce says is in line with the rise in inflation.


But Victoria University third-year student Annabelle Nichols said she and many of her friends were left in the red at the end of each week, and disagreed with Mr Joyce that living costs had kept pace with inflation.

If you look at the RBNZ online inflation calculator, you find that $150 in the first quarter of 1999 translates to $212.06 in the first quarter of 2014 using overall CPI, $217.38 using the food category, $346.62 using the housing category, $221.11 in the transport category, and $155.72 in the clothing category. Unless students are expected to spend the majority of their money on clothing, this seems inconsistent with Mr Joyce’s claim.

It’s possible that the Treasury has done specific living-cost modelling for students and that they do face lower effective inflation rates than the rest of the population, but given the location of many universities in places with expensive housing, that’s a bit surprising and would have been worth mentioning explicitly.

[Update: Mr Joyce was talking about just the period since 2008 ,when the loan limit stopped declining in real terms. That doesn’t affect my main point, which is that reporters shouldn’t treat inflation adjustment as a matter of opinion — they should check. Also, while 2008 is a relevant starting point for Mr Joyce, it’s not clear that it is for anyone else]

January 10, 2014

Meet Mengdan Yu, Statistics summer scholar

Every year, the Department of Statistics offers summer scholarships to a number of students so they can work with our staff on real-world projects. We’ll be profiling them on Stats Chat.

Mengdan (below) is working with Jessica McLay on a project titled The simario R package. She explains:

Mengdan Yu

“The simario R package is a collection of R functions for performing dynamic microsimulation developed by  COMPASS (the Centre of Methods and Policy Application in the Social Sciences at the University of Auckland). Dynamic microsimulation is used to test ‘what if?’ situations.  The starting point of the simulation is a set of attributes for each unit (usually individual) and the attributes (variables) are simulated or updated in annual steps.  User-specified modifications can be made on the variables at the start or any point during simulation in order to see the effects on output attributes of interest.

“A simple demonstration microsimulation model (demo model) using the simario R functions was created two years ago, but the focus since then has been on developing a complicated microsimulation model called Modelling the Early Life Course (MELC).  Compared to the demo model, the MELC model uses newer versions of the simario functions and has had a lot of additional functionality built in.

“What I’m doing for my summer project is ensuring that the newer versions of the simario functions  work properly with the demo model and extend the demo microsimulation model.  The extension includes adding more variables to the system, showcasing the different ways variables can be simulated over time and including more of the functionality that is currently in MELC but not in the demo model.  I will also be checking the documentation for all the functions in the simario package to make it ready to publish as an official R package.

“This is useful research as dynamic microsimulation is increasingly used, especially in government, to help in making policy decisions.  There are a number of programming languages used to create microsimulation models, including those based on C++, C#, SAS, and Java.  However, given the prominence of the R language, a package for microsimulation in R could prove useful and helpful to analysts attempting microsimulation.  The demo model in conjunction with an article (to be written later by COMPASS) will show how to put the functions together to create a working microsimulation model.

“This is my third year of a Bachelor of Science majoring in Statistics and Computer Science.  Initially, I chose statistics because I’m into calculating probabilities, and have been since I was a child. As I learned more about stats, especially analysing data by using software, I appreciated even more how useful the subject is in many areas. Studying statistics has improved my logic thinking and my ability to solve real-life problems with stats techniques.

“For the rest of the summer, I’d like to do something relaxing: hang out with my friends, sleep at home and watch dramas so I can be positive and energetic for next semester.”




December 27, 2013

Meet Tania Tian, Statistics Summer Scholar 2013-2014

Every year, the Department of Statistics at the University of Auckland offers summer scholarships to a number of students so they can work with our staff on real-world projects. We’ll be profiling the 2013-2014 summer scholars on Stats Chat. Tania is working with Dr Stephanie Budgett on a project titled First-time mums: Can we make a difference?

Tania (right) explains:Tania Tian

“This project is based on the ongoing levator ani study (LA, commonly known as the pelvic floor muscles) from the Pelvic Floor Research Group at the Auckland Bioengineering Institute (ABI), which looks at how the pelvic floor muscles change after first-time mums give birth.

“The aim is to see whether age, ethnicity, delivery conditions and other related factors are associated with the tearing of the muscle. Interestingly, the stiffness of the muscle at rest has been identified as a key factor and is being measured by a specially designed device, an elastometer, that was built by engineers at the ABI.

“Pelvic-floor muscle injury following a vaginal delivery can increase the risks for prolapse where pelvic organs, such as the uterus, small bowl, bladder and rectum, descend and herniate. Furthermore, the muscle trauma may also promote or intensify urinary and/or bowel incontinence.

“Not only do these pelvic- floor disorders cause discomfort and distress, and reduce the mother’s quality of life, and, if left untreated, may lead to major health concerns later in life. Therefore, a statistical model based on key factors elucidated from the study may aid health professionals in deciding the best strategy for delivering a woman’s baby and whether certain interventions are needed.

“I have recently completed my third year of a Bachelor of Science majoring in Statistics and Pharmacology and intend to pursue postgraduate studies. I hope to integrate my knowledge of medical sciences and statistics and specialise in medical statistics.

“Statistics appeals to me because it is a useful field with direct practical applications in almost every industry. I had initially taken the stage one paper as a standalone in order to broaden my knowledge, but eventually realised that I really liked the subject and that it could complement whichever career I have. That’s when I decided to major in statistics, and I’m very glad that I did.

“Over this summer, aside from the project, I am hoping to spend more time with friends and family – especially with my new baby brother! I am also looking forward to visiting the South Island during the Christmas break.”