Stats Chat

February 24, 2016

Home ownership comparisons

Two graphs to help people on Twitter who are arguing about home ownership trends in Auckland vs rest of NZ or in generational differences.

Both are percentages of home ownership based on the census question “Do you own or partly own your home?”, with data from the last three censuses.

First, comparisons between Auckland and the Rest of NZ by age, over time. Blue is Auckland, pink is RONZ

Second, trends over 12 years, by age, for three census years. Blue is 2001, pink is 2006, green is 2013.

Data from the nzdotstat table “Tenure holder by age group and sex, for the census usually resident population count aged 15 years and over, 2001, 2006 and 2013 Censuses (RC, TA, AU)”

Update: And one more. Here the lines connect roughly the same group of people (birth cohort) over time (only approximately because the planned 2011 census didn’t happen until 2013).

View comments (2)

Briefly

By Thomas Lumley

“Places“: Interactive maps of place name distribution in the US. For example “Lake” — with high density in the “Land of Lakes” but also in some less-expected places
“Spreadsheets, the original analytics dashboard’, from Simply Statistics, about the origin of spreadsheets and what they were good for.

Cats can see the ‘rotating snake’ optical illusion: video evidence from Rasmus Bååth

As we’ve mentioned before, most people think teenagers have more risky behaviours now than in the Good Old Days. Most people are wrong. This time, from Vox.

From Kieran Healy, the network of shared institutional affiliations for the 1000+ authors of the LIGO gravitational waves paper (click to embiggen). That is, many scientists have some sort of connection with more than one university; the graph shows how these link up the LIGO researchers.
Come on, major political parties. Barchart axes start at zero unless you want to look like Fox News. There are reasons for this. If you don’t want to start the axis at zero use some other sort of chart.

February 23, 2016

Population density: drawing the lines

By Thomas Lumley

David Seymour, on the Herald website

Auckland is already denser than New York, and most American and Australian cities. The 1.6 million people in Manhattan may live cheek-by-jowl, but not the other 20 million inhabiting the wider urban area.

An intelligent politician wouldn’t say something as apparently bizarre as this first sentence if it wasn’t true, so of course it is. The question is going to be true in what sense?

Based on the population figure, Mr Seymour is talking about the New York Metropolitan Statistical Area, aka, New York Urban Area, which has a population of 20.1 million and a population density of 724/km²[*]. The Auckland Urban Area has a population of 1.45 million and a density of 2,600/km², and, yes, 2600 is larger than 724. However, as the scenic photos in the Wikipedia page for the New York Metropolitan Area suggest, that might not be a fair comparison.

In fact, it’s true almost by definition that the New York metropolitan area has a lower density than urban Auckland

Urban areas in the United States are defined by the U.S. Census Bureau as contiguous census block groups with a population density of at least 1,000/sq mi (390/km2) with any census block groups around this core having a density of at least 500/sq mi (190/km²). [Wikipedia, or see full legal definition]

That is, the metropolitan area is defined as the area around New York City all the way out until the local population density is below 190/km². It’s a sensible statistical unit — the US Census Bureau wasn’t trying to make a political point about urban infill when they defined it — but it’s not the same sort of unit as Stats New Zealand’s definition of urban or metro Auckland.

So, what other comparisons could we do? We could compare the New York Metropolitan Area to the Auckland Supercity, whose population density of 320/km² is less than half as high. That might be unfair in the other direction — the Supercity is designed with the future expansion of Auckland in mind, while the US definitions are only intended for a ten-year period between censuses.

We can’t quite do the perfect comparison of redrawing Auckland Urban Area by the US rules, because NZ Area Units are bigger than US Census Block Groups, and NZ meshblocks are smaller, but someone with more time than me could try.

We could compare the Auckland urban area to genuinely urban parts of the New York metro: Mr Seymour mentioned Manhattan (density 27,673/km², three times that of the Auckland CBD, nine times that of the Epsom electorate) but the other four boroughs of New York City all have higher density than urban Auckland. Two of them (the Bronx, and Brooklyn) have higher density than the Auckland CBD, Queens (8237/km²) is closer in density to the Auckland CBD than to the rest of Auckland, and even Staten Island is denser than urban Auckland as a whole. In the metropolitan area but across the river from New York City proper we have Hudson County (density 5,241/km²) and Newark (density about 4500/km²). The whole of Long Island, part of the New York metropolitan area, but also known for places like Fire Island and the Hamptons, has population density 2,151/km², not far below urban Auckland.

And finally, an alternative way to do this whole comparison, which is much less sensitive to where the lines are drawn, is to look at population-weighted densities. That is, for the average person in a city, how dense is the population near them? For the whole New York metropolitan area the population-weighted density is 12000/km² (or 120/hectare). For Auckland it is 43/hectare. In other words, while people near the edges of the New York metro area have a lot of space, most New Yorkers don’t. The average person in the broad New York metropolitan area sees three times the local population density of the average Aucklander.

Update: * Mr Seymour tells me he was referring the the definition of metropolitan areas from Demographia, which trims some of the low-density parts of the Census Bureau definition of New York to give a population density of 1800, and agrees well with the StatsNZ definition of urban Auckland. So, while the issue about the difficult in defining things comparably is still an issue, it is less his fault than I had assumed.

View comments (10)

February 21, 2016

Crushing and crashing

By Thomas Lumley

The Herald says “Police Minister Judith Collins has released figures to show crushing boy racers’ cars has worked“. The data are more consistent with the political interpretation than is usual for claims about crashes, but not as strong as the Minister would like us to think.

Here’s a graph of the data (supplied to the Herald by the Minister’s office), showing crashes, injuries, and deaths where the police reported ‘racing’ as a cause:

It’s fairly clear that something changed. Based purely on the graph you’d say the downwards trend started after 2007; 2009 isn’t unreasonable, but it fits the data a bit less well. This an example of a graph being much more useful than a table.

The next thing to check is other crashes — the road toll has been down in recent years, so this could just have been a general improvement. It’s not; the evidence for a change is a little weaker when considering racing deaths or injuries as a proportion of all fatal or injury crashes, but it’s still there.

In principle there could have been changes in reporting, but it’s hard to see how a government crackdown would make police less likely to report ‘racing’ involvement in a crash.

Finally, there’s publication bias. The reporter, Nicholas Jones, didn’t notice that Ms Collins was back and decide to pull figures on car crushing; the Minister decided to release the figures. She wouldn’t have done that if they didn’t look favourable. It’s hard to tell how much to discount the evidence for that, but a discount is needed.

Overall, the data are definitely consistent with a deterrent effect of car crushing, but the evidence isn’t all that strong — the best fit to the data suggests things changed earlier than 2009, and looking at the numbers was the Minister’s idea, not the reporter’s.

Updates:

in addition to the useful comments, I’ve been pointed to Dog & Lemon where Clive Matthew-Wilson says there is reason to believe the ‘boy racer’ thing was already going away on its own. If so, that would fit the trend starting earlier than the legislation.
If you check the crash numbers against the Road Crash Statistics system they don’t match. I think that’s because Table 26 of the Road Crash Statistics only includes crashes causing injury or death — that’s explicit in the 2012 spreadsheet, and I think it’s still true.

View comments (7)

Evils of axis

By Thomas Lumley

From One News, tweeted by various people:

The y-axis label is wrong: this has nothing to do with change, it’s percent support.

The x-axis label is maximally unhelpful: we can guess that the most recent poll is in February, but what are the earlier data? You might think the vertical divisions are months, but the story says the previous poll was in October.

Also, given that the measurement error is large compared to the expected changes, using a line graph without points indicating the observations is misleading.

Overall, the graph doesn’t add to the numbers on the right, which is a bit of a waste.

View comments (3)

February 16, 2016

Models for livestock breeding

By Thomas Lumley

One of the early motivating applications for linear mixed models was agricultural field studies looking at animal breeding or plant breeding. These are statistical models that combine differences between groups of observations with correlations between similar observations in order to get better comparisons.

John Oliver’s “Last Week Tonight” argues that these models shouldn’t be used to evaluate teachers , because they have been useful in animal breeding (with suitable video footage of a bull mounting a cow). It’s really annoying when someone bases a reasonable conclusion on totally bogus arguments.

As the American Statistical Association has said on value-added models for teaching (PDF), the basic idea makes some sense, but there are a lot of details you have to get right for the results to be useful. That doesn’t mean rejecting the whole idea of considering the different ways in which classes can be different, or giving up on averages over relevant groups. On the other hand, the mere fact that someone calls something a “value-added model” doesn’t mean it tells you some deep truth.

It would be a real sign of progress if we could discuss whether a model adequately captures educational difficulties due to deprivation and family advantage without automatically rejecting it because it also applies to cows, or without automatically accepting it because it has the words “value-added.”

But it probably wouldn’t be as funny.

View comments (2)

Chocolate deficit

By Thomas Lumley

2016, NZ Herald, “A new report claims the world is heading for a chocolate deficit” (increased demand, no increase in supply)

There’s not much detail in the story, and I’m not going to provide any more because the report costs £1,700.00 (+VAT if applicable) — so remember, anything you read about it is just marketing. However, there are other useful forms of context.

2013: Daily Mirror, “Chocolate could run out by 2020”

2012: NZ Herald, “Shortage will be costly for chocaholics”

2010: Discovery Channel, “Chocolate Supply Threatened by Cocoa Crisis”

2010: Independent, “Chocolate will be worth its weight in gold in 2020”

2008, CNN,”I think that in 20 years chocolate will be like caviar,”

2007: MSN Money, “World chocolate shortage ahead”

2006: Financial Post, “Possible chocolate shortage ahead”

2002, Discover, “Endangered chocolate”

1998, New York Times, “Chocoholics take note: beloved bean in peril” (predicting a shortfall in “as little as 5-10 years”)

It could be that, like bananas, chocolate really always is in peril, or it could be that falling global inequality will make it much more expensive, or it could be that it’s just a good story.

February 15, 2016

Sounds like a good deal

By Thomas Lumley

From Stuff

“According to a new study titled, Music Makes it Home, couples who listen to music together saw a huge spike in their sex lives.”

This is a genuine experimental study, but it’s for marketing. Neither the design nor the reporting are done they way they would be if the aim was to find things out.

In addition to a survey of 30,000 people, which just tells you about opinions, Stuff says Sonos did an experiment with 30 families:

Each family was given a Sonos sound system and Apple Music subscription and monitored for two weeks. In the first week, families were supposed to go about their lives as usual. But in the second week, they were to listen to the music.

Sonos says

The first week,participants were instructed not to listen to music out loud. The second week,participants were encouraged to listen to music out loud as much as they wanted.

That’s a big difference.

The reporting, both from Sonos and from Stuff, mixes results from the 30,000-person survey in with the experiment results. For example, the headline statistic in the Stuff story, 67% more sex, is from the survey, even though the phrasing “saw a huge spike in their sex lives” makes it sound like a change seen in the experiment. The experimental study found 37% more ‘active time in the bedroom’.

Overall, the differences seen in the experimental study still look pretty impressive, but there are two further points to consider. First, the participants knew exactly what was going on and why, and had been given lots of expensive electronics. It’s not unreasonable to think this might bias the results.

Second, we don’t have complete results, just the summaries that Sonos has provided — it wouldn’t be surprising if they had highlighted the best bits. In fact, the neuroscientist involved with the study admits in the story that negative results probably wouldn’t have been published.

February 14, 2016

Not 100% accurate

By Thomas Lumley

Q: Did you see there’s a new, 100% accurate cancer test?

A: No.

Q: It only uses a bit of saliva, and it can be done at home?

A: No.

Q: No?

A: Remember what I’ve said about ‘too good to be true’?

Q: So how accurate is it?

A: ‘It’ doesn’t really exist?

Q: But it “will enter full clinical trials with lung cancer patients later this year.”

A: That’s not a test for cancer. The phrase “lung cancer patients” is a hint.

Q: So what is it a test for?

A: It’s a test for whether a particular drug will work in a patient’s lung cancer

Q: Oh. That’s useful, isn’t it?

A: Definitely

Q: And that’s 100% accurate?

A: <tilts head, raises eyebrows>

Q: Too good to be true?

A: The test is very good at getting the same results that you would get from analysing a surgical specimen. Genetically it’s about 95% accurate in a small set of data reported in January. In clinical trials, 50% of people with the right tumour genetics responded to the drug. So you could say the test is 95% accurate or 50% accurate.

Q: That still sounds pretty good, doesn’t it?

A: Yes, if the trial this year gets results like the preliminary data it would be very impressive.

Q: And he does this with just a saliva sample?

A: Yes, it turns out that a little bit of tumour DNA ends up pretty much anywhere you look, and modern genetic technology only needs a few molecules.

Q: Could this technology be used for detecting cancer, too?

A: In principle, but we’d need to know it was accurate. At the moment, according to the abstract for the talk that prompted the story, they might be able to detect 80% of oral cancer. And they don’t seem to know how often a cell with one of the mutations might turn up in someone who wouldn’t go on to get cancer. Since oral cancer is rare, the test would need to be extremely accurate and inexpensive to be worth using in healthy people.

Q: What about other more common cancers?

A: In principle, maybe, but most cancers are rare when you get down to the level of specific genetic mutations. It’s conceivable, but it’s not happening in the two-year time frame that the story gives.

February 13, 2016

Neanderthal DNA: how could they tell?

By Thomas Lumley

As I said in August

“How would you even study that?” is an excellent question to ask when you see a surprising statistic in the media. Often the answer is “they didn’t,” but sometimes you get to find out about some really clever research technique.

There are stories around, such as the one in Stuff, about modern disease due to Neanderthal genes (press release).

The first-ever study directly comparing Neanderthal DNA to the human genome confirmed a wide range of health-related associations — from the psychiatric to the podiatric — that link modern humans to our broad-browed relatives.

It’s basically true, although as with most genetic studies the genetic effects are really, really small. There’s a genetic variant that doubles your risk of nicotine dependence, but only 1% of Europeans have it. The researchers estimate that Neanderthal genetic variants explain about 1% of depression and less than half a percent of cardiovascular disease. But that’s not zero, and it wasn’t so long ago that the idea of interbreeding was thought very unlikely.

Since hardly any Neanderthals have had their genome sequenced, how was this done? There are two parts to it: a big data part and a clever genetics part.

The clever genetics part (paper) uses the fact that Neanderthals and modern humans, since their ancestors had been separated for a long time (350,000 years), had lots of little, irrelevant differences in DNA accumulated as mutations– like a barcode sequence. Given a long enough snippet of genome, we can match it up either to the modern human barcode or the Neanderthal barcode. Neanderthals are recent enough (50,000 years is maybe 2500 generations) that many of the snippets of Neanderthal genome we inherit are long enough to match up the barcodes reliably. The researchers looked at genome sequences from the 1000 Genomes Project, and found genetic variants existing today that are part of genome snippets which appear Neanderthal. These genetic variants are what they looked at.

The Big Data is a collection of medical records at nine major hospitals in the US, together with DNA samples. This nothing like a random sample, and the disease data are from ICD9 diagnostic codes rather than detailed medical record review, but quantity helps.

Using the DNA samples, they can see which people have each of the Neanderthal-looking genetic variants, and what diseases these people have — and find the very small differences.

This isn’t really medical research. The lead researcher quoted in the news is an evolutionary geneticist, and the real story is genetics: even though the Neanderthals vanished 50,000 years ago, we can still see enough of their genome to learn new things about how they were different from us.

Stats Chat

Home ownership comparisons

Briefly

Population density: drawing the lines

Crushing and crashing

Evils of axis

Models for livestock breeding

Chocolate deficit

Sounds like a good deal

Not 100% accurate

Neanderthal DNA: how could they tell?

Recent comments

Popular posts

Latest posts

All topics

Recommended sites

Subscribe:

Receive our posts via email:

Recent comments

Popular posts

Latest posts

All topics

Recommended sites