Posts filed under Just look it up (285)

June 7, 2025

Auckland is larger than Wellington

The Herald is reporting on the Government’s new road-cone hotline. Apparently there were 236 reports in the first four days, and the article usefully points out that this was about 100 for each of the first two days, so it dropped pretty fast after that.

A bit less usefully, there’s a graph (click to embiggen) of reports by miscellaneous administrative category . Auckland (region), as usual, is at the top.

It’s not completely clear what the right scaling is, but raw reports aren’t it. The traditional Kiwi favourite of per capita shows Wellington (city) way ahead of Auckland (region), with more than half as many reports from about an eighth as many people.  NZTA is there to represent state highways, and it doesn’t really have a population in the same sense as the other areas.

We might also scale by the amount of road to have cones on.  This map, from Figure NZ, isn’t quite what we want because it uses consistent geographical units — regions– but it does show how state highway and local roads compare. It looks like NZTA is coning above its weight per km of road

FigureNZ also has a graph by territorial authority,  a better match to the Herald’s graph, showing Auckland has well over twice the sealed road of any other authority (if you aren’t a Kiwi: this is ordered north to south).  Wellington has much less road, so it is attracting comparatively the most reports.

So, the data do show a “hotspot for cone concerns”, but it’s Wellington, not Auckland.  This could mean more cones, or streets with less spare room, or fewer alternate routes, or people who just whinge more. The data do not suffice to tell us.

February 18, 2025

Surprises in data

When you get access to some data, a first step is to see if you understand it: do the variables measure what you expect, are there surprising values, and so on.  Often, you will be surprised by some of the results. Almost always this is because the data mean something a bit different from what you expected. Sometimes there are errors. Occasionally there is outright fraud.

Elon Musk and DOG-E have been looking at US Social Security data. They created a table of Social Security-eligible people not recorded as dead and noticed that (a) some of them were surprisingly old, and (b) the total added up to more than the US population.

That’s a good first step, as I said. The next step is to think about possible explanations (as Dan Davies says: “if you don’t make predictions, you won’t know when to be surprised”). The first two I thought of were people leaving the US after working long enough to be eligible for Social Security (like, for example, me) and missing death records for old people (the vital statistics records weren’t as good in the 19th century as they are now).

After that, the proper procedure is to ask someone or look for some documentation, rather than just to go with your first guess.  It’s quite likely that someone else has already observed the existence of records with unreasonable ages and looked for an explanation.

In this case, one would find (eg, by following economist Justin Wolfers) a 2023 report “Numberholders Age 100 or Older Who Did Not Have Death Information on the Numident” (PDF), a report by the Office of the Inspector General, which said that the very elderly ‘vampires collecting Social Security’ were neither vampires nor collecting Social Security, but were real people whose deaths hadn’t been recorded.   This was presumably a follow-up to a 2015 story where identity fraud was involved — but again, the government wasn’t losing money, because it wasn’t paying money out to dead people.

The excess population at younger years isn’t explained by this report, but again, the next step is to see what is already known by the people who spend their whole careers working with the data, rather than to decide  the explanation is the first thing that comes to mind.

March 28, 2018

Cycling for work or play

Auckland Transport publish data from cycle counters on various bike paths. They’re most interested in trends over time (increasing) and perhaps in seasonal variation (more in summer).

Here’s a look at weekday vs weekend counts using data from the start of 2016 to now (click to embiggen).

There are some paths that are clearly used primarily by commuters, with more than twice the average traffic on a weekday vs weekend. There are also some that are mostly used at the weekend, such as Matakana, Upper Harbour, and Mangere Bridge.  And some, like the Lightpath, that get used all the time.

Note: while it’s great that Auckland Transport publishes these data, the data would be easier to reuse if the names they used for each counter were consistent over time (eg: “Tamaki Dr” vs “Tamaki Drive”, or “Nelson Street Lightpath Counter Cyclists” vs “Nelson Street Lightpath Cyclists”)

 

March 26, 2018

The data speak for themselves?

This graph was on Twitter this morning. There’s nothing wrong with the graph: good data, clear presentation, but it does provide a nice illustration of the difficulties in official statistics — you have to decide what categories to use, and it makes a difference.

The second leading cause, motor vehicles, is straightforward enough.  The first, firearms, is more complicated. A majority of the firearm deaths are suicides, and it’s controversial whether firearm access increases the suicide rate or just affects the method.  Poisoning is also complicated: you might well want to treat both suicide and accidental recreational-drug overdose separately. And so on.

Sometimes you want to break down the data by intent, sometimes by physical cause, sometimes by medical type of injury or damage. You can’t define the ‘correct’ answer in the absence of a question.

February 17, 2018

Read me first?

There’s a viral story that viral stories are shared by people who don’t actually read them. I saw it again today in a tweet from Newseum Insititute

If you search for the study it doesn’t take long to start suspecting that the majority of news sources sharing this study didn’t read it first.  One that at least links is from the Independent, in June 2016.

The research paper is here. The money quote looks like this, from section 3.3

First, 59% of the shared URLs are never clicked or, as we call them, silent.

We can expand this quotation slightly

First, 59% of the shared URLs are never clicked or, as we call them, silent. Note that we merged URLs pointing to the same article, so out of 10 articles mentioned on Twitter, 6 typically on niche topics are never clicked

That’s starting to sound a bit different. And more complicated.

What the researchers did was to look at bit.ly URLs to news stories from five major sources, and see if they had ever been clicked. They divided the links into two groups: primary URLs tweeted by the media source itself (eg @NYTimes), and secondary URLs tweeted by anyone else. The primary URLs were always clicked at least once — you’d expect that just for checking purposes.  The secondary URLs, as you’d expect, averaged fewer clicks per tweet; 59% were not clicked at all.

That’s being interpreted as if it were 59% of retweets didn’t involve any clicks. But it isn’t. It’s quite likely that most of these links were never retweeted.  And there’s nothing in the data about whether the person who first tweeted the link read the story: there certainly isn’t any suggestion that person didn’t read the story.

So, if I read some annoying story about near-Earth asteroids on the Herald and if tweeted a bit.ly URL, there’s a chance no-one would click on it. And, looking at my Twitter analytics, I can see that does sometimes happen. When it happens, people usually don’t retweet the link either, and it definitely doesn’t go viral.

If I retweeted the official @NZHerald link about the story, then it would almost certainly have been clicked by someone. The research would say nothing whatsoever about the chance that I (or any of the other retweeters) had read it.

 

February 13, 2018

Opinions about immigrants

Ipsos MORI do a nice set of surveys about public misperceptions: ask a sample of people for their estimate of a number and compare it to the actual value.

The newest set includes a question about the proportion of the prison population than are immigrants. Here’s (a redrawing of) their graph, with NZ in all black.

People think more than a quarter of NZ prisoners are immigrants; it’s actually less than 2%. I actually prefer this as a ratio

The ratio would be better on a logarithmic scale, but I don’t feel like doing that today since it doesn’t affect the main point of this pointpost.

A couple of years ago, though, the question was about what proportion of the overall population were immigrants. That time people also overestimated a lot.  We can ask how much of the overestimation for the prison question can be explained by people just thinking there are more immigrants than there really are.

Here’s the ratio of the estimated proportion of immigrants among the prison population and the total population

The bar for New Zealand is to the left; New Zealand recognises that immigrants are less likely to be in prison than people born here. Well, the surveys taken two years apart are consistent with us recognising that, at least.

That’s just a ratio of two estimates. We can also compare to the reality. If we divide this ratio by the true ratio we find out how much more likely people think an individual immigrant is to end up in prison compared to how likely they really are.

It seems strange that NZ is suddenly at the top. What’s going on?

New Zealand has a lot of immigrants, and we only overestimate the actual number by about a half (we said 37%; it was 25% in 2017). But we overestimate the proportion among prisoners by a lot. That is, we get this year’s survey question badly wrong, but without even the excuse of being seriously deluded about how many immigrants there are.

January 8, 2018

Long tail of baby names

The Dept of Internal Affairs has released the most common baby names of 2017 (NZ is, I think, the first country each year to do this), and Radio NZ has a story.  A lot of names popular last year were also popular in the past; a few (eg Arlo) are changing fast.

If you look at the sixty-odd years of data available, there’s a dramatic trend. In 1954, ‘John’ was the top boy’s name, with 1389 uses. In 2017 the top was ‘Oliver’, but with only 314 uses — not enough to make 1954’s top twenty. According to the government, there were nearly 13,000 different names given last year, so the mean number of babies per name is under 5; the most popular names are still much more popular than average. But less so than in the past.

Here’s the trend in the number of babies given the top name

and the top ten names

and the top hundred names

That decrease is despite an increase in the total population: here’s the top 10 names as a percentage of all babies (assuming 53% of babies are boys)

and the top 100 names

The proportion with any of the top 100 names has been going down consistently, and also becoming less different between boys and girls.

 

November 19, 2017

Hyperbole or innumeracy?

From the Herald (and also from NewstalkZB, apparently originally at South Africa’s The Citizen)

He is also said to own a custom-built Mercedes Benz s600L that is able to withstand AK-47 bullets, landmines and grenades. It also features a CD and DVD player, internet access and anti-bugging devices. The Citizen reported that Mugabe – who is a trained teacher – also owns a Rolls-Royce Phantom IV: a colonial-era British luxury car so exclusive, only 18 were ever manufactured. The vintage black car is estimated to be worth more than Zimbabwe’s entire GDP. (emphasis added)

Several people on Twitter, starting with Richard Easther, had the same reaction: that this doesn’t look remotely plausible.  It’s like the claims that Labour’s water levies would make cabbages cost $18 and a bottle of wine $75 — extraordinary claims demand, if not extraordinary evidence, at least some evidence.

So, how is it that you’d decide this number was implausible? Well, in one direction, you might try to guess the GDP of Zimbwawe.  If Zimbabwe had a smaller population than NZ you’d probably know it was a small country, so we can say there’s at least 5 million people.  So, if the per-capita GDP was only $1, it would still add up to $5 million, and that’s a very expensive car.  Since you’d expect the population to be more than 5 million and the per-capita GDP to be a lot more than $1, the figure is looking implausible.

In the other direction, you might look up the current GDP of Zimbabwe — $16 billion — or the lowest it’s been in recent years — $4.4 billion in 2008 — and note that you could by several wide-body jets for that much.

That’s enough to know something is strange. If you wanted more detail you could search for prices of Rolls-Royce Phantom IVs or of the most expensive cars ever sold, and find that, yes, there’s three or four orders of magnitude missing.

Or, you could look at the first line of the story

Zimbabwe embattled president Robert Mugabe is reportedly worth more than $1 billion despite his country being one of the poorest in the world.

Or the last line

Rolls Royce Phantoms cost a minimum of just under $698,000, but custom-built versions are sold for as much as $1.74 million. Media in South Africa reported the combined cost of the cars was about $6.98 million.

and again, there’s no way the claim about the car vs the GDP could be true — a used one couldn’t be worth thousands of times more than a new one.

So, where could it have come from?  My guess is that the claim was originally hyperbole: that someone did say “his car’s worth more than the Zimbabwe GDP” but they didn’t mean it literally. Over repetitions, the rhetorical figure turned into an “estimate”, and was quoted without any real thought.

What’s harder to understand is someone thinking a CD and DVD player is the height of motoring luxury.

October 10, 2017

Graphic of the week

From the world’s third-largest news agency:

afp

  1. The Nationalist Party?
  2. National got 56 seats, not 58 — the graph seems to have the National results from the provisional count but the Labour and Green results from the final count
  3. NZ First doesn’t use yellow
  4. ACT, on the other hand, does.
  5. But ACT is relatively unlikely to enter a left-wing coalition with Labour and the Greens
August 11, 2017

Different sorts of graphs

This bar chart from Figure.NZ was in Stuff today, with the lead

Working-age people receiving benefits are mostly in the prime of our working life – the ages of 25 to 54.

19205831

The numbers are correct, but the extent to which the graph fits the story is a bit misleading.  The main reason the two bars in the middle are higher is that they are 15-year age groups, when the first bar is a 7-year group and the last is a ten-year group.

Another way to show the data is to scale the bar widths proportional to the number of years and then scale the height so that the bar area matches the count of people. The bar height is now counts of people per year of age

benefits

This is harder to read for people who aren’t used to it, but arguably more informative. It suggests the 25-54 year groups may be the largest just because the groups are wider.

We really need population size data, since the number of people in NZ also varies by age group.  Showing the percentage receiving benefits in each age group gives a different picture again

benpop

It looks as though

  • “working age” people 25-39 and 40-54 make up a larger fraction of those receiving benefits than people 18-24 or 55-64
  • a person receiving benefits is more likely to be, say, 20 or 60 than 35 or 45.
  • the proportion of people receiving benefits increases with age

These can all be true; they’re subtly different questions. Part of the job of a statistician is to help you think about which one you wanted to ask.