Every year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Bo, right, is working on a project called Construction of life-course variables for the New Zealand Longitudinal Census (NZLC) with Roy Lay-Yee, Senior Research Fellow at the COMPASS Research Centre, University of Auckland, and Professor Alan Lee of Statistics. Bo explains:
“The New Zealand Longitudinal Census has linked individuals across the 1981-2006 New Zealand censuses. This enables the assessment of life-course resources with various outcomes.
“I need to create life-course variables such as socio-economic status, health, education, work, family ties and cultural identity from the censuses. Sometimes such information is not given directly in the census questions, but several pieces of information need to be combined together.
“An example is the overcrowding index that measures the personal living space. We need to combine the age, partnership status of the residents and number of bedrooms in each dwelling to derive the index.
“Also, the format of the questionnaire as well as the answers used in each census were rather different, so data-cleaning is required. I need to harmonise information collected in each census so that they are consistent and can be compared over different censuses. For example, in one census the gender might be given code ‘0’ and ‘1’ representing female and male, but in another census the gender was given code ‘1’ and ‘2’. Thus the code ‘1’ can mean quite different things in different censuses. My job is to find these differences and gaps in each census.
“The results of this project will enable future studies based on New Zealand longitudinal censuses, say, for example, the influence of life-courses variables on the risk of mortality. This project will also be a very good experience for my future career, since data-cleaning is a very important process that we were barely taught in our courses but will actually cost almost one-third of the time in most real-life projects. When we were studying statistics courses, most data sets we encountered were “toy” data sets that had fewer variables and observations and were clean. However, in real life, as in this case, we often meet with data that have millions of observations, hundreds of variables, and inconsistent variable specification and coding.
“I hold a Bachelor of Commerce in Accounting, Finance and Information Systems. I have just completed Postgraduate Diploma in Science, majoring in Statistics, and in 2015, I will be doing Master of Science in Statistics.
“When I was studying information systems, my lecturer introduced several statistical techniques to us and I was fascinated by what statistics is capable of in the decision-making process. For example, retailers can find out if a customer is pregnant purely based on her purchasing behaviour, so the retailers can send out coupons to increase their sales. It is amazing how we can use statistical techniques to find that little tiny bit of useful information in oceans of data. Statistics appeals to me as it is highly useful and applicable in almost every industry.
“This summer, I will spend some time doing road trips – hopefully I can make it to the South Island this time. I enjoy doing road trips alone every summer as I feel this is the best way to get myself refreshed and motivated for the next year.”