Posts filed under Law (31)

May 22, 2014

Big Data social context

From Cathy O’Neil: Ignore data, focus on power (and, well, most of the stuff on her blog)

From danah boyd and Kate Crawford: Critical Questions for Big Data

Will large-scale search data help us create better tools, services, and public goods? Or will it usher in a new wave of privacy incursions and invasive marketing? Will data ana- lytics help us understand online communities and political movements? Or will it be used to track protesters and suppress speech? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means?  


May 21, 2014

When not to map

Maps are good because they take advantage of all the previous maps we’ve seen to provide background familiarity.  On the other hand, they use up both the available spatial dimensions before you’ve actually got any data, so you need to encode the information some other way. Colour is the obvious choice, but colour is much more limited than people appreciate.

Kieran Healy tweetedIt’s not like there’s a simple, tradeoff-free solution, but this is not a good map.”



And he’s right; it isn’t. There are too many categories, and some of them are ordered but not all of them, so colour isn’t enough. Even if you’re going to try, these aren’t the right colours: for example, orange should be between yellow and red.  About the most you could do clearly with a single map is the three-way split: Yes, same-sex couples can just roll up to the registry;  No, not this week; or It’s Complicated.

Jacob Harris pointed to an article at Source, describing the design of graphics for a story about marijuana legalisation. It does much better




They link to the classic piece “When maps shouldn’t be maps” by Matthew Ericson, which I’ve linked before. They also have a whole collection of articles on better maps, though it’s fairly programming-oriented.

May 3, 2014

White House report: ‘Big Data’

There’s a new report “Big Data: Seizing Opportunities, Preserving Values” from the Office of the President (of the USA).  Here’s part of the conclusion (there are detailed recommendations as well)

Big data tools offer astonishing and powerful opportunities to unlock previously inaccessible insights from new and existing data sets. Big data can fuel developments and discoveries in health care and education, in agriculture and energy use, and in how businesses organize their supply chains and monitor their equipment. Big data holds the potential to streamline the provision of public services, increase the efficient use of taxpayer dollars at every level of government, and substantially strengthen national security. The promise of big data requires government data be viewed as a national resource and be responsibly made available to those who can derive social value from it. It also presents the opportunity to shape the next generation of computational tools and technologies that will in turn drive further innovation.

Big data also introduces many quandaries. By their very nature, many of the sensor technologies deployed on our phones and in our homes, offices, and on lampposts and rooftops across our cities are collecting more and more information. Continuing advances in analytics provide incentives to collect as much data as possible not only for today’s uses but also for potential later uses. Technologically speaking, this is driving data collection to become functionally ubiquitous and permanent, allowing the digital traces we leave behind to be collected, analyzed, and assembled to reveal a surprising number of things about ourselves and our lives. These developments challenge longstanding notions of privacy and raise questions about the “notice and consent” framework, by which a user gives initial permission for their data to be collected. But these trends need not prevent creating ways for people to participate in the treatment and management of their information.

You can also read comments on the report by danah boyd, and the conference report and videos from her conference’The Social, Cultural & Ethical Dimensions of “Big Data”‘ are now online.

March 18, 2014

Big Data & privacy presentation

If you have time, there’s an interesting event that will be streamed from New York University this (NZ) morning (10:30am today NZ time, 5:30pm yesterday NY time)

..the Data & Society Research Institute, the White House Office of Science and Technology Policy, and New York University’s Information Law Institute will be co-hosting a public event entitled The Social, Cultural, & Ethical Dimensions of “Big Data.” The purpose of this event is to convene key stakeholders and thought leaders from across academia, government, industry, and civil society to examine the social, cultural, and ethical implications of “big data,” with an eye to both the challenges and opportunities presented by the phenomenon.

The event is being organised by danah boyd, who we’ve mentioned a few times and whose new book I plan to write about soon.

November 26, 2013

Should recreational genotyping be illegal?

The US Food and Drug Administration has sent a letter to 23andme, one of the companies that will genotype you and provide lots of information from the sample, telling them to stop. It’s a tricky situation.

This product is a device within the meaning of section 201(h) of the FD&C Act, 21 U.S.C. 321(h), because it is intended for use in the diagnosis of disease or other conditions or in the cure, mitigation, treatment, or prevention of disease, or is intended to affect the structure or function of the body. For example, your company’s website at  (most recently viewed on November 6, 2013) markets the PGS for providing “health reports on 254 diseases and conditions,” including categories such as “carrier status,” “health risks,” and “drug response,” and specifically as a “first step in prevention” that enables users to “take steps toward mitigating serious diseases” such as diabetes, coronary heart disease, and breast cancer. Most of the intended uses for PGS listed on your website, a list that has grown over time, are medical device uses under section 201(h) of the FD&C Act. Most of these uses have not been classified and thus require premarket approval or de novo classification, as FDA has explained to you on numerous occasions.

On the one hand, I can’t see any valid social interest in stopping people from knowing their genotypes if they want to. On the other hand, the FDA has a point about marketing.

They raise two isssues.  The first is that 23andme make lots of (fairly weakly supported) claims about the usefulness of the results in disease prevention. The second is that some of the genotype information is actually clinically relevant and that they have not demonstrated sufficient accuracy in their results. The first issue is essentially a misleading advertising problem; the second is a quality assurance problem.

There are two things that can go wrong with the clinically useful results. The first is simple error: the genotype assay could give the wrong result, or you could be given results from someone else’s sample. This should be low probability, but it’s important to know how low — 1 in a 1000 would definitely be too high.

The second issue is interpretation. Suppose you have a lot of family members with breast cancer, and you suspect a BRCA1 mutation is responsible. You might be relieved if you test negative, and think your risk isn’t especially high, but that’s only a reliable conclusion if your family’s cancer risk was actually due to a BRCA1 mutation, not to some other genetic risk factor.


Update: I should probably note that 23andme could fix what I think are the actual problems, but this wouldn’t necessarily satisify the FDA.  The FDA aren’t currently being unreasonable oppressive Luddite statist bureaucrats, but they’re probably happy to be to if that’s the option on offer.

November 24, 2013

Blood alcohol change report

The detailed Ministry of Transport paper on changing the legal blood alcohol limit is now available.  There’s a story in Stuff, which is, if anything, unduly critical (an interesting change). It doesn’t mention the cost-benefit analysis, and implies a fines grab

Transport officials calculate nearly 20,000 people will be caught by the lower drink-driving limit – earning the Government $5 million extra in fines. 

which is a bit misleading since the report (paragraph 93) actually estimates a net increase in costs to the justice system of about $2 million in the first year and about half a million in subsequent years, ie, the fines don’t cover the costs of enforcing the change.

Basically, whether the change is a benefit or not depends on how much inconvenience and risk is caused to the average driver, the only major component that isn’t taken into account in the calculations.  If this is worth only 50c/month or so, the policy makes sense. If it’s worth a few dollars a month, not.

On the other hand, the policy is popular, and since most people should have a reasonable appreciation for how the change will affect them personally, that’s a more persuasive argument than it would ordinarily be.

October 25, 2013

A third of young Americans have been arrested

Via Keith Humphreys, being arrested is a very common experience for young people in America: using the National Longitudinal Survey of Youth, Richard Braeme and colleagues found

By age 18, the in-sample cumulative arrest prevalence rate lies between 15.9% and 26.8%; at age 23, it lies between 25.3% and 41.4%. These bounds make no assumptions at all about missing cases. If we assume that the missing cases are at least as likely to have been arrested as the observed cases, the in-sample age-23 prevalence rate must lie between 30.2% and 41.4%. The greatest growth in the cumulative prevalence of arrest occurs during late adolescence and the period of early or emerging adulthood

September 27, 2013

How many deaths would be prevented by lowering the blood alcohol limit to 0.08%?

Well, obviously, none. The limit is already 0.08%.

However, there are still deaths caused by people who drive over the limit.  From crash data for drivers only, in 2011, there were a lot more crashes where the driver was above 0.08% than between 0.05% and 0.08%, and a larger fraction of these will have been caused by alcohol rather than just being coincident with alcohol.



Just as lowering the limit 0.08% prevented some, but not all, crashes where the drivers were above 0.08%, lowering the limit to 0.05% would prevent some, but not all, crashes where the driver is above 0.05%.

The Herald says

Alcohol Healthwatch director Rebecca Williams said the statistics clearly showed 20 people would still be alive if the Government had responded to calls for a lower alcohol limit.

which seems completely indefensible.

I don’t have any personal stake in the 0.08% limit — I don’t have a car and can afford taxis. And I’m not denying the real dangers of drink driving: according to estimates from NZ data the risk at 0.08% is about three times that at 0.05%.  But it’s dishonest to assume that all deaths where the driver was over 0.05% would be eliminated by the change, and to pretend there are no costs from the change.  We should be arguing this using the best available estimates of the benefits and costs.

And if you think the costs are irrelevant because there’s no limit on the value of a life, what about all the other ways to save lives? There are plenty of competing options, even if you think it’s only New Zealand lives that are valuable.


September 25, 2013


  • Big Data and Due Process: fairly readable academic paper arguing for legal protections against harm done by automated classification (accurate or inaccurate)
  • The Herald quotes Maurice Williamson on a drug seizure operation

“The harm prevented from keeping these analogues away from communities has been calculated at $32 million,” Mr Williamson said.

Back in 2008, Russell Brown explained where these numbers come from. As you might expect, there is no reasonable sense in which they are estimates of harm prevented. They don’t measure what communities should care about.

  • Levels of statistical evidence are ending up in the US Supreme court. At issue is whether  a press release claiming that a treatment”Reduces Mortality by 70% in Patients with Mild to Moderate Disease” is fraud when the study wasn’t set up to look at mortality and when the reduction wasn’t statistically significant by usual standards.  Since a subsequent trial designed to look at mortality reductions convincingly failed to find them, the conclusion implied by the press release title is untrue, but the legal argument is whether, at the time, it was fraud.
  • From New Scientist: is ‘personalised’ medicine actually bad for public health?


July 31, 2013

“10 quadrillion times more likely to have done it”

Thomas Lumley, tipped off by Luis Apiolaza on Twitter, pointed me to this article in the NZ Herald.

The article is yet another example of the Herald’s inability to correctly report DNA statistics. It makes the following statement:
This article reports a quote from the Crown Prosecutor, paraphrased as follows:

A man charged with raping a woman during a kidnap has pleaded not guilty but Crown says DNA evidence shows the man was “10,000,000,000,000,000 times likely” to be responsible for the crime.

To be fair to the article’s author, this may have been the statement that the Crown prosecutor made, but nNo forensic scientist in New Zealand would say this. ESR scientists are trained to give statements of the following type:

“The evidence is 1016 (=10,000,000,000,000,000) times more likely if the defendant and the victim were contributors to the stain, rather than the victim and someone unrelated to the defendant.”

It is extremely important to note that This is a statement about the likelihood of the evidence given the hypotheses rather than the other way around. A forensic scientist is bought to court to comment on the strength of the evidence and specifically not on whether the defendant is guilty.

I have commented on this before., and sent correspondence to the NZ Herald numerous times. Perhaps a mention on StatsChat will inspire change.

Update: The NZ Herald reporter, James Ihaka, has contacted me and said “The statement came from a Crown prosecutor about the evidence that the forensic scientist will present later in the trial. Taking in to consideration what you have said however, it would probably be more accurate to rephrase this.” Good on you James!

Update 2: James Ihaka has contacted me again, with the following information:

This is the direct quote from Crown prosecutor Rebecca Mann: ( I checked with her)
“It is ten thousand million million times more likely for the DNA these samples originated from (the complainant) and Mr Martin rather than from (the complainant) and another unrelated individual selected at random from the general New Zealand population.”

I apologize unreservedly for attributing this to James Ihaka, and again congratulate him for following it up.

The statement Ms. Mann should have given is

The evidence (the DNA match) is ten thousand million million times more likely if these samples originated from (the complainant) and Mr Martin rather than if they originated from (the complainant) and another unrelated individual selected at random from the general New Zealand population.”