Posts filed under Education (63)

January 28, 2015

Meet Statistics summer scholar Kai Huang

Kai Huang croppedEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Kai, right, is working on a project called Constrained Additive Ordination with Dr Thomas Yee. Kai explains:

“In the early 2000s, Dr Thomas Yee proposed a new technique in the field of ecology called Constrained Additive Ordination (CAO) that solves the problems about the shape of species’ response curves and how they are distributed along unknown underlying gradients, and meanwhile the CAO-oriented Vector Generalised Linear and Additive Models (VGAM) package for R has been developed. This summer, I am compiling code for improving performance for the VGAM package by facilitating the integration of R and C++ under the R environment.

“This project brings me the chance to work with a package in worldwide use and stimulates me to learn more about writing R extensions and C++ compilation. I don’t have any background in ecology, but I acquired a lot before I started this project.

“I just have done the one-year Graduate Diploma in Science in Statistics at the University of Auckland after graduating from Massey University at Palmerston North with a Bachelor of Business Studies in Finance and Economics. In 2015, I’ll be doing an honours degree in Statistics. Statistics is used in every field, which is awesome to me.

“This summer, I’ll be spending my days rationally, working with numbers and codes, and at night, romantically, spending my spare time with stars. Seeing the movie Interstellar [a 2014 science-fiction epic that features a crew of astronauts who travel through a wormhole in search of a new home for humanity] reignited my curiosity about the universe, and I have been reading astronomy and physics books in my spare time this summer. I even bought an annual pass to Stardome, the planetarium at Auckland, and have spent several evenings there.”

 

January 23, 2015

Meet Statistics summer scholar Bo Liu

Photo Bo LiuEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Bo, right, is working on a project called Construction of life-course variables for the New Zealand Longitudinal Census (NZLC) with Roy Lay-Yee, Senior Research Fellow at the COMPASS Research Centre, University of Auckland, and Professor Alan Lee of Statistics. Bo explains:

“The New Zealand Longitudinal Census has linked individuals across the 1981-2006 New Zealand censuses. This enables the assessment of life-course resources with various outcomes.

“I need to create life-course variables such as socio-economic status, health, education, work, family ties and cultural identity from the censuses. Sometimes such information is not given directly in the census questions, but several pieces of information need to be combined together.

“An example is the overcrowding index that measures the personal living space. We need to combine the age, partnership status of the residents and number of bedrooms in each dwelling to derive the index.

“Also, the format of the questionnaire as well as the answers used in each census were rather different, so data-cleaning is required. I need to harmonise information collected in each census so that they are consistent and can be compared over different censuses. For example, in one census the gender might be given code ‘0’ and ‘1’ representing female and male, but in another census the gender was given code ‘1’ and ‘2’. Thus the code ‘1’ can mean quite different things in different censuses. My job is to find these differences and gaps in each census.

“The results of this project will enable future studies based on New Zealand longitudinal censuses, say, for example, the influence of life-courses variables on the risk of mortality. This project will also be a very good experience for my future career, since data-cleaning is a very important process that we were barely taught in our courses but will actually cost almost one-third of the time in most real-life projects. When we were studying statistics courses, most data sets we encountered were “toy” data sets that had fewer variables and observations and were clean. However, in real life, as in this case, we often meet with data that have millions of observations, hundreds of variables, and inconsistent variable specification and coding.

“I hold a Bachelor of Commerce in Accounting, Finance and Information Systems. I have just completed Postgraduate Diploma in Science, majoring in Statistics, and in 2015, I will be doing Master of Science in Statistics.

“When I was studying information systems, my lecturer introduced several statistical techniques to us and I was fascinated by what statistics is capable of in the decision-making process. For example, retailers can find out if a customer is pregnant purely based on her purchasing behaviour, so the retailers can send out coupons to increase their sales. It is amazing how we can use statistical techniques to find that little tiny bit of useful information in oceans of data. Statistics appeals to me as it is highly useful and applicable in almost every industry.

“This summer, I will spend some time doing road trips – hopefully I can make it to the South Island this time. I enjoy doing road trips alone every summer as I feel this is the best way to get myself refreshed and motivated for the next year.”

 

 

 

January 22, 2015

Meet Statistics summer scholar Yiying Zhang

yiyingEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Yiying, right, is working on a project called Modelling Competition and Dispersal in a Statistical Phylogeographic Framework with Dr Stéphane GuindonYiying explains.

 “The processes that govern the spatial distribution of species are complex. Traditional approaches in ecology generally rely on the hypothesis that adaptation to the environment is the main force driving this distribution.

“The supervisors of this project propose an alternative explanation that assumes that species are found in certain places simply because they were the first to colonise these locations during the course of evolution. They have recently designed a stochastic model that explains the observed spatial distribution of species using a combination of dispersal events (i.e., species migrating to new territories) and competition between species.

“In this project, I will run in silico [computer] experiments and analyse real data in order to validate the software Phyloland that implements our dispersal-competition model.

“To validate the model, we will randomly generate ‘true value’. Then we will use the model to make estimations of the true value. If the estimated values match the true value relatively closely, then the model is reliable.

“I am doing a BCom/BSc conjoint degree. My majors are Finance, Accounting and Statistics – 2015 is my fourth year. I am planning to do an Honours degree in statistics, so this summer research project is a very valuable experience for me.

“I enjoy statistics because it brings me closer to the real world. Sometimes, things are not simply what we see. Without data, we would never have convincing evidence about what is really happening. The amount of information out there is massive and statistics can help people tell how reliable a statement is. Studying statistics has helped me make better use of information and think more critically.

“My plans for summer include relaxing and reading more books. And having plenty of sleep.”

 

January 21, 2015

Meet Statistics summer scholar Alexander van der Voorn

Alex van der VoornEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Alexander, right, is undertaking a statistics education research project with Dr Marie Fitch and Dr Stephanie Budgett. Alexander explains:

“Essentially, what this project involves is looking at how bootstrapping and re-randomisation being added into the university’s introductory statistics course have affected students’ understanding of statistical inference, such as interpreting P-values and confidence intervals, and knowing what can and can’t be justifiably claimed based on those statistical results.

“This mainly consists of classifying test and exam questions into several key categories from before and after bootstrapping and re-randomisation were added to the course, and looking at the change (if any) in the number of students who correctly answer these questions over time, and even if any common misconceptions become more or less prominent in students’ answers as well.

“This sort of project is useful as traditionally, introductory statistics education has had a large focus on the normal distribution and using it to develop ideas and understanding of statistical inference from it. This results in a theoretical and mathematical approach, which means students will often be restricted by the complexity of it and will therefore struggle to be able to use it to make clear inference about the data.

“Bootstrapping and re-randomisation are two techniques that can be used in statistical analysis and were added into the introductory statistics course at the university in 2012. They have been around for some time, but have only become prominent and practically useful recently as they require many repetitions of simulations, which obviously is better-suited to a computer rather than a person. Research on this emphasises how using these techniques allow key statistical ideas to be taught and understood without a lot of fuss, such as complicated assumptions and dealing with probability distributions.

“In 2015, I’ll be completing my third year of a BSc in Statistics and Operations Research, and I’ll be looking at doing postgraduate study after that. I’m not sure why statistics appeals to me, I just found it very interesting and enjoyable at university and wanted to do more of it. I always liked maths at school, so it probably stemmed from that.

“I don’t have any plans to go away anywhere so this summer I’ll just relax, enjoy some time off in the sun and spend time around home. I might also focus on some drumming practice, as well as playing with my two dogs.”

June 26, 2014

Want to learn data analysis? No stats experience required

4 Chris Wild, UoAInterested in learning to do data analysis but don’t know where to start? Try out the Department of Statistics’ new MOOC (massive online open course) called From Data to Insight: An Introduction to Data Analysis. It’s free – yep, it won’t cost you a bean – starts on October 6, takes just three hours a week, and will be led by our resident world-renowned statistics educator Prof Chris Wild (right).

The blurb says, in part:

“The course focuses on data exploration and discovery, showing you what to look for in statistical data, however large it may be. We’ll also teach you some of the limitations of data and what you can do to avoid being misled. We use data visualisations designed to teach you these skills quickly, and introduce you to the basic concepts you need to start understanding our world through data.

“This course assumes very little experience with statistical ideas and concepts. You will need to be comfortable thinking in terms of percentages, have basic Microsoft Excel skills, and a Windows or Macintosh computer to download and install our iNZight software.”

And that’s all you need. Spread the word.

 

 

 

 

 

 

June 5, 2014

Gender, coding, and measurement error

Alyssa Frazee, a PhD student in biostatistics at Johns Hopkins, has an interesting post looking at gender of programmers using the Github code repository. Github users have a profile, which includes a first name, and there programs that attempt to classify first names by gender.

This graph (click to embiggen, as usual) shows the guessed gender distribution for software with at least five ‘stars’ (likes, sort of) across programming languages. Orange is male, green is female, grey is “don’t know”

coder-gender

The main message is obvious. Women either aren’t putting code on Github or are using non-gender-revealing or male-associated names.

The other point is that the language with the most female coders seems to be R, the statistical programming language originally developed in Auckland, which has 5.5%.  Sadly, 3.9% of that is code by the very prolific Hadley Wickham (also originally developed in Auckland), who isn’t female. Measurement error, as I’ve written before, has a much bigger impact on rare categories than common ones.

June 4, 2014

How much disagreement should there be?

The Herald

Thousands of school students are being awarded the wrong NCEA grades, a review of last year’s results has revealed.

Nearly one in four grades given by teachers for internally marked work were deemed incorrect after checking by New Zealand Qualifications Authority moderators.

That’s not actually true, because moderators don’t deem grades to be incorrect. That’s not what moderators are for.  What the report says (pp105-107 in case you want to scroll through it) is that in 24% of cases the moderator and the internal assessor disagreed on grade, and in 12% they disagreed on whether the standard had been achieved.

What we don’t know is how much disagreement is appropriate. The only way the moderator’s assessment could be considered error-free is if you define the ‘right answer’ to be ‘whatever the moderator says’, which is obviously not appropriate. There always will be some variation between moderators, and some variation between schools, and what we want to know is whether there is too much.

The report is a bit disappointing from that point of view.  At the very least, there should have been some duplicate moderation. That is, some pieces of work should have been sent to two different moderators, so we could have an idea of the between-moderator agreement rate. Then, if we were willing to assume that moderators collectively were infallible (though not individually), we could estimate how much less reliable the internal assessments were.

Even better would be to get some information on how much variation there is between schools in the disagreement: if there is very little variation, the schools may be doing about as well as is possible, but if there is a lot of variation between schools it would suggest some schools aren’t assessing very reliably.

 

May 12, 2014

Resources in education

Attention conservation notice: I have to write this post because I’ve spent too much time on it otherwise. You don’t have to read it.

There was an episode of “Yes, Prime Minister” where the term “Human Resource Rich Countries” was being posed as a replacement for “Less Developed Countries”, meaning “poor”. “Resources” is a word that can mean lots of different things, which is why I spent more time than was strictly sensible investigating the following graph

Bm2xm_8CcAAAcK1

 

The graph appeared in my Twitter feed last Monday. It’s originally from a campaign to give Australia a school funding model a bit more like NZ’s decile system, as recommended by a national review panel, so it is disturbing to see New Zealand almost at the bottom of the world.

(more…)

May 3, 2014

White House report: ‘Big Data’

There’s a new report “Big Data: Seizing Opportunities, Preserving Values” from the Office of the President (of the USA).  Here’s part of the conclusion (there are detailed recommendations as well)

Big data tools offer astonishing and powerful opportunities to unlock previously inaccessible insights from new and existing data sets. Big data can fuel developments and discoveries in health care and education, in agriculture and energy use, and in how businesses organize their supply chains and monitor their equipment. Big data holds the potential to streamline the provision of public services, increase the efficient use of taxpayer dollars at every level of government, and substantially strengthen national security. The promise of big data requires government data be viewed as a national resource and be responsibly made available to those who can derive social value from it. It also presents the opportunity to shape the next generation of computational tools and technologies that will in turn drive further innovation.

Big data also introduces many quandaries. By their very nature, many of the sensor technologies deployed on our phones and in our homes, offices, and on lampposts and rooftops across our cities are collecting more and more information. Continuing advances in analytics provide incentives to collect as much data as possible not only for today’s uses but also for potential later uses. Technologically speaking, this is driving data collection to become functionally ubiquitous and permanent, allowing the digital traces we leave behind to be collected, analyzed, and assembled to reveal a surprising number of things about ourselves and our lives. These developments challenge longstanding notions of privacy and raise questions about the “notice and consent” framework, by which a user gives initial permission for their data to be collected. But these trends need not prevent creating ways for people to participate in the treatment and management of their information.

You can also read comments on the report by danah boyd, and the conference report and videos from her conference’The Social, Cultural & Ethical Dimensions of “Big Data”‘ are now online.

April 4, 2014

Thomas Lumley’s latest Listener column

…”One of the problems in developing drugs is detecting serious side effects. People who need medication tend to be unwell, so it’s hard to find a reliable comparison. That’s why the roughly threefold increase in heart-attack risk among Vioxx users took so long to be detected …”

Read his column, Faulty Powers, here.