Posts from January 2018 (14)

January 26, 2018

Briefly

  • “Are bigoted, irrational robots hurting humans?” (Herald, from the Daily Mail).
  • From 538: ” If I had a magic wand, I’d develop an algorithm that: a) draws the shortest possible line(s) necessary to split a state into equally populous districts and b) requires that only as many pre-existing jurisdictions be split as necessary to achieve equally populous districts.” That’s the easy part. Getting a fair voting system isn’t primarily difficult because math is hard, but because power corrupts.
  • Sharon Begley at STAT on the hype around exercise and dementia “But we thought it was a reasonable thing to say, especially since [exercise] doesn’t have a lot of risk. To be honest, we’re looking for something positive to tell people.”
  • Peter Ellis writes on how to recruit data scientists in the public sector
  • There’s a famous medical paper reviewing all the randomised trials of parachute use when jumping from a plane (the point being, of course, that there aren’t any, and shouldn’t be). There’s a new medical paper looking at people using this analogy. Of 35 interventions where the parachute analogy was used explicitly to argue that randomised trials were impossible and there was no room for doubt about effectiveness , 22 had been examined in randomised trials. Six had been shown effective.
January 23, 2018

Low-flying rocks

From Stuff The Herald‘Potentially hazardous’ asteroid heading towards Earth at 108,800km/h, and in the story

It is the largest space rock to brush past our planet this year and previous research has found a rock of this size could plunge Earth into a mini-ice age if it hit.

The impact would cause average temperatures around the world to fall by as much as 8°C, according to a 2016 study on the effects of a collision with a 0.6-mile-wide (1km) asteroid.

“These would not be pleasant times,” Charles Bardeen, of the National Center for Atmospheric Research, said during a presentation at the American Geophysical Union (AGU).

In the “worst case scenario”, soot would remain in the atmosphere for around 10 years, while dust take six years to settle back on Earth.

Fortunately Nasa does not think this asteroid will collide with Earth.

NASA would put that more strongly. From NASA/JPL Asteroid WatchAsteroid 2002 AJ129 to Fly Safely Past Earth February 4,

At the time of closest approach, the asteroid will be no closer than 10 times the distance between Earth and the Moon (about 2.6 million miles, or 4.2 million kilometers).

“We have been tracking this asteroid for over 14 years and know its orbit very accurately,” said Paul Chodas, manager of NASA’s Center for Near-Earth Object Studies at the Jet Propulsion Laboratory, Pasadena, California. “Our calculations indicate that asteroid 2002 AJ129 has no chance – zero – of colliding with Earth on Feb. 4 or any time over the next 100 years.”

The @asteroidwatch twitterwallah is off because of the government shutdown, but was relatively patiently answering Twitter questions in both English and Spanish until Friday.

Part of the problem is that NASA uses “potentially hazardous” for any asteroid approaching within 1/20th of the distance to the sun, because these are the ones they want to track to be sure that they aren’t actually hazardous.  It’s an unfortunate term because it sounds dangerous when it isn’t.

 

 

January 19, 2018

Briefly

  • On herd immunity, from Quanta magazine “The necessary level of immunity in the population isn’t the same for every disease. For measles, a very high level of immunity needs to be maintained to prevent its transmission because the measles virus is possibly the most contagious known organism.”
  • Pornhub, as is its habit, has an interesting graph on what the Hawai’i missile false alarm did to its workload (not very NSFW, but Pornhub)
  • From NPR For Now, Sequencing Cancer Tumors Holds More Promise Than Proof.  Even though the story is about how DNA sequencing of tumors hasn’t been a very successful strategy so far, they lead with a ‘miracle cure’ story, because that’s how the news works.
  • “Bad Graphics: manipulation or laziness” from David Spiegelhalter, talking about an example from the Daily Mail.
  • Also from David Spiegelhalter: there’s a recent research paper on correlations between alcohol consumption and dementia. The researchers said ““Consuming more than one UK standard unit of alcohol per day is detrimental to cognitive performance and is more pronounced in older populations”, which was widely reported in the mediaThis is their graph of their analysis. The minimum of the graph is at 16g/day: two UK units or 1.6 NZ units.

    If we could trust this model, it would actually say consuming less than one unit per day is detrimental.
  • Via Elaine Smid, the  title and abstract of a research paper
January 18, 2018

Predicting the future

As you’ve heard if you’re in NZ, the Treasury got the wrong numbers for predicted impact on child poverty of Labour’s policies (and as you might not have heard, similarly wrong numbers for the previous government’s policies).

Their ‘technical note‘ is useful

In late November and early December 2017, a module was developed to further improve the Accommodation Supplement analysis. This was applied to both the previous Government’s package and the current Government’s Families Package. The coding error occurred in this “add-on module” – in a single line of about 1000 lines of code.

The quality-assurance (QA) process for the add-on module included an independent review of the methodology by a senior statistician outside the Treasury’s microsimulation modelling team, multiple layers of code review, and an independent replication of each stage by two modellers. No issues were identified during this process.

I haven’t seen their code, but I have seen other microsimulation models and as a statistics researcher I’m familiar with the problem of writing and testing code that does a calculation you don’t have any other way to do. In fact, when I got called by Newstalk ZB about the Treasury’s error I was in the middle of talking to a PhD student about how to check code for a new theoretical computation.

It’s relatively straightforward to test code when you know what the output should be for each input: you put in a set of income measurements and see if the right tax comes out, or you click on a link and see if you get taken to the right website, or you shoot the Nazi and see if his head explodes. The most difficult part is thinking of all the things that need to be checked.  It’s much harder when you don’t know what the output should even be because the whole point of writing the code is to find out.

You can test chunks of code that are small enough to be simple. You can review the code and try to see if it matches the process that you’re working from. You might be able to work out special cases in some independent way. You can see if the outputs change in sensible ways when you change the inputs. You can get other people to help. And you do all that. And sometimes it isn’t enough.

The Treasury say that they typically try to do more

This QA process, however, is not as rigorous as independent co-production, which is used for modifications of the core microsimulation model.  Independent co-production involves two people developing the analysis independently, and cross-referencing their results until they agree. This significantly reduces the risk of errors, but takes longer and was not possible in the time available.

That’s a much stronger verification approach.  Personally, I’ve never gone as far as complete independent co-production, but I have done partial versions and it does make you much more confident about the results.

The problem with more rigorous testing approaches is they take time and money and often end up just telling you that you were right.  Being less extreme about it is often fine, but maybe isn’t good enough for government work.

Measuring what you care about

There’s a story in the Guardian saying

The credibility of a computer program used for bail and sentencing decisions has been called into question after it was found to be no more accurate at predicting the risk of reoffending than people with no criminal justice experience provided with only the defendant’s age, sex and criminal history.

They even link to the research paper.

That’s all well and good, or rather, not good. But there’s another issue that doesn’t even get raised.  The algorithms aren’t trained and evaluated on data about re-offending. They’re trained and evaluated on data about re-conviction: they have to be, because that’s all we’ve got.

Suppose two groups of people have the same rate of re-offending, but one group are more likely to get arrested, tried, and convicted than the other. The group with a higher re-conviction rate will look to the algorithm as if they have a higher chance of re-offending.   They’ll get a higher predicted probability of re-offending. Evaluation will confirm they’re more likely to have the “re-offending” box ticked in their subsequent data.  The model can look like it’s good at discriminating between re-offenders and those who go straight, when it’s actually just good at discriminating against the same people as the justice system.

This isn’t an easy problem to fix: re-conviction data are what you’ve got. But when you don’t have the measurement you want, it’s important to be honest about it. You’re predicting what you measured, not what you wanted to measure.

Maps and models

This spectacular map from the National Geospatial-Intelligence Agency was circulating yesterday on Twitter. I got it from Christopher Jackson (@seis_matters). It shows antineutrino emissions from around the earth

Our local (sub)continent of Zealandia shows up nicely at the bottom right. The black dots are nuclear reactors, and the dark smudge is just the immense rock mass of the Himalayas.

This next map is a style you’ve seen before. It shows New Zealand’s winds at the moment: the storm is passing over.

What these maps have in common is a very high ratio of model to actual data.  The `live’ wind map isn’t based on detailed live reports from a fine grid of weather stations. There aren’t any — especially out in the Pacific. It’s a map of the NOAA Global Forecast System, but forecasting the very near future rather than the long range. It isn’t going to give you more up-to-date information than the Met Service.

The antineutrino map is even more model-based. In the scientific paper I was struck by the sentence

Recently, the blossoming field of neutrino geoscience, first proposed by Eder15, has become a reality with 130 observed geoneutrino interactions12,13 confirming Kobayashi’s view of the Earth being a “neutrino star”16

It looks like the map has well over a million pixels per observed geophysical neutrino. When it comes to nuclear reactors, the paper says “These exciting geophysical capabilities have significant overlap with the non-proliferation community where remote monitoring of antineutrinos emanating from nuclear reactors is being seriously considered“. That is, the reactors are black dots on the map because they know where the reactors are and how many neutrinos they’d make, not because they measured them. The observations do go into the model, and they probably provide actual information about the deeper bits of the earth’s crust, but the map is of the model, not the observaations.

Better or worse?

There was some controversy about the difficulty of the NCEA level 1 maths and stats exam last year.  As Stuff reports

It prompted the NZQA to release the exam to the public, and now the authority is taking the extra step to share the exam outcome before the consolidated results are released in April.

“NZQA has taken the unusual step of announcing these provisional results early so we can respond to the concerns teachers raised with us in the open letter,” said NZQA deputy chief executive Kristine Kilkelly.

“Provisional results for the NCEA Level 1 Mathematics and Statistics examinations in November show the majority of students who sat the examinations gained an Achieved or better grade for each standard.”

There’s a graph with the story, which is always nice:

I’m not convinced this graph is a great way of showing how the 2017 results differed from previous years: it’s better for showing that, yes, the majority of people passed.

Here’s my attempt at showing the 2017 differences: the arrows show the change from last year and the bars show the five-year range. I think it would have been better to just plot the four six-year time series, but that data wasn’t in the NCEA press release. It would also have been better to look at the ‘Merit’ and ‘Excellence’ percentages, but again that’s not given.

I think it’s clearer from this graph that the pass rate for “91028  Investigate Relationships Between Tables, Equations and Graphs” was lower last year, and lower by quite a large amount relative to previous the year-to-year variation. Two of the units have no sign of that sort of drop, and the fourth has a similar drop but from a high point to a value still within the recent range.

So, maybe there was an issue with the ‘tables, equations and graphs’ test.

 

Update: another redesign by Andrew P. Wheeler

January 14, 2018

Briefly

  • Metropolitan Museum of Art President “For various reasons, over the past 10 or 12 years, the pay-as-you-wish policy has failed. It has declined by 71% in the amount people pay.” Felix SalmonIt’s worth fact-checking this, because it turns out that it’s not really true”
  • Cloudflare, a company that distributes websites across the world, has a wall of lava lamps that it uses for random number generation (presumably to seed computational pseudorandom generators)
  • “Do algorithms reveal sexual orientation or just expose our stereotypes?”— on last year’s ‘gaydar’ paper.
  • 538 looks at how they got an analysis of broadband internet availabilty wrong, due to bad data.
  • “The projects tried to show hidden patterns of our daily shopping….Unfortunately, it shows only the internal categorization and sorting of the supermarket.” Another example of data not meaning what you think it means. Christian Laesser (via FlowingData)
  • Child protective agencies are haunted when they fail to save kids. Pittsburgh officials believe a new data analysis program is helping them make better judgment calls.from the New York Times.
  • The NZ government has released a review of the handling of weather data (PDF)
  • From the LSE Impact blog “Academics looking to communicate the findings and value of their research to wider audiences are increasingly going through the media to do so. But poor or incomplete reporting can undermine respect for experts by misrepresenting research, especially by trivialising or sensationalising it, or publishing under inappropriate headlines and with cherry-picked statistics.”  As StatsChat readers will known a lot of this is public-relations people, but some of it is definitely the researchers.
  • The scientific reporting of some pre-clinical research is disturbingly crap: a report in the BMJ; Siouxsie Wiles commenting at The Spinoff
  • Constructing optical illusions for AI visual systems: (gory technical details)
  • You may have seen reports of research saying that Australian hawks spread bushfires…

January 10, 2018

Complete balls

The UK’s Metro magazine has a dramatic story under the headline Popping ibuprofen could make your balls shrivel up

Got a pounding headache?

You might just want to give a big glass of water and a nap a go before reaching for the painkillers. Scientists warn that ibuprofen could be wrecking men’s fertility by making their balls shrivel up.

Sounds pleasant.

Fortunately, that’s not what the study showed.

The story goes on

Researchers looked at 31 male participants and found that taking ibuprofen reduced production of testosterone by nearly a quarter in the space of around six weeks.

That’s also almost completely untrue. In fact, the research paper says (emphasis added)

We investigated the levels of total testosterone and its direct downstream metabolic product, 17β-estradiol. Administration of ibuprofen did not result in any significant changes in the levels of these two steroid hormones after 14 d or at the last day of administration at 44 d. The levels of free testosterone were subsequently analyzed by using the SHBG levels. Neither free testosterone nor SHBG levels were affected by ibuprofen.

Stuff has a much better take on this one:

Men who take ibuprofen for longer than the bottle advises could be risking their fertility, according to a new study.

Researchers found that men who took ibuprofen for extended periods had developed a condition normally seen in elderly men and smokers that, over time, can lead to fertility problems

Ars Technica has the more accurately boring headline Small study suggests ibuprofen alters testosterone metabolism.

The study involved 14 men taking the equivalent of six tablets a day of ibuprofen for six weeks (plus a control group). Their testosterone levels didn’t change, but the interesting research finding is that this was due to compensation for what would otherwise have been a decrease. That is, a hormone signalling to increase testosterone production was elevated.  There’s a potential risk that if the men kept taking ibuprofen at this level for long enough, the compensation process might give up. And that would potentially lead to fertility problems — though not (I don’t think) to the problems Metro was worried about.

So, taking ibuprofen for months on end without a good reason? Probably inadvisable. Like it says on the pack.

 

January 9, 2018

Election maps: what’s the question?

XKCD has come out with a new map of the 2016 US election

In about 2008 I made a less-artistic one of the 2004 elections on similar principles

These maps show some useful things about the US vote:

  1. the proportions for the two parties are pretty close, but
  2. most of the land area has very few voters, and
  3. most areas are relatively polarised
  4. but not as polarised as you think, eg, look at the cities in Texas

What these maps are terrible at is showing changes from one election to the next. The map for 2004 (Republicans ahead by about 2.5%) and 2016 (Republicans behind by about 3%) look very similar. And even 2008 (Republicans behind by 7%) wouldn’t look that different.

Like a well-written thousand words, a well-drawn picture needs to be about something. Questions matter. The data don’t speak for themselves.