Posts filed under Books (7)

March 27, 2019

The dangers in a world built on data about men

Women all know about the toilet queue in the intermission at concerts – same-sized bathrooms for men and women does not equal efficiency. Women who have ever stood and waited in a long line for the loo while the men come and go with speed – and I think I can say that this is about, roughly, give or take, 100% of us – roll our eyes and laugh about this as we wait. But the anecdote reveals an uncomfortable truth, says Caroline Criado Perez in her book Invisible Women: Exposing Data Bias in a World Designed for Men. Design and services that takes the average male or the needs of the average male as the norm – as is the case with car-crash test dummies and stab-proof vests, among other things – are potentially deadly. The Guardian has excerpted a section of her book and it’s a sobering read. Recommended.

And while we are on the subject of a world designed for data about men, NASA has cancelled the first all-women spacewalk due to a spacesuit size issue.

March 19, 2013

How could this possibly go wrong?

There’s a new research paper out that sequences the genome of one of the most important cancer cell lines, HeLa.  It shows the fascinating genomic mess that can arise when a cell is freed from the normal constraints against genetic damage, and it gives valuable information about a vital research resource.

However, the discussion on Twitter (or at least the parts I frequent) has been dominated by another fact about the paper.  The researchers apparently didn’t consult at all with the family of Henrietta Lacks, the person whose tumour this originally was.  There are two reasons this is bad.

Firstly, publishing a genome of  an ancestor of yours allows people to learn a lot about your genome. The high levels of mutation in the cancer cell line reduces this information a bit, but there’s still a lot there. As a trivial example, even without worrying about genetic disease risks, you could use the data to tell if someone who thought they were a descendant of Ms Lacks actually was or wasn’t. Publishing a genome without consent from, or consultation with, anyone is at best rude.

And secondly: come on, guys, didn’t you read the book? From the author’s summary

In 1950, Henrietta Lacks, a young mother of five children, entered the colored ward of The Johns Hopkins Hospital to begin treatment for an extremely aggressive strain of cervical cancer. As she lay on the operating table, a sample of her cancerous cervical tissue was taken without her knowledge or consent and given to Dr. George Gey, the head of tissue research. Gey was conducting experiments in an attempt to create an immortal line of human cells that could be used in medical research. Those cells, he hoped, would allow scientists to unlock the mysteries of cancer, and eventually lead to a cure for the disease. Until this point, all of Gey’s attempts to grow a human cell line had ended in failure, but Henrietta’s cells were different: they never died.

Less than a year after her initial diagnosis, Henrietta succumbed to the ravages of cancer and was buried in an unmarked grave on her family’s land. She was just thirty-one years old. Her family had no idea that part of her was still alive, growing vigorously in laboratories—first at Johns Hopkins, and eventually all over the world.

That’s how they did things back then.  It’s not how we do things now. If there was a symbolically worse genome to sequence without some sort of consultation, I’d have a hard time thinking of it.

I don’t think anyone’s saying laws or regulations were violated, and I’m not saying that the family should have had veto power, but they should at least have been talked to.

December 23, 2012

Metareviews: The Signal and the Noise

Andrew Gelman has a review of two reviews of Nate Silver’s book, The Signal and the Noise. Unlike him, I’ve actually read the book, but I think his review of the reviews captures the good and bad points well.

December 10, 2012

[Video] Nate Silver talks about his new book: The Signal and the Noise

Nate Silver joins Google’s Chief Economist Hal Varian to talk about his new book “The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t” and answer Googler questions.

September 3, 2012

Smoking statistics

Andrew Gelman, of Columbia University, wrote an article a few months ago called “Statistics for Cigarette Sellers”

Remember How to Lie with Statistics? It turns out the  author worked for the cigarette companies… It appears he was also working on a book in the late 1960s called How to Lie with Smoking Statistics, which the publisher saw “high likelihood of proceeding into print.”

His blog has a largely overlapping post, but with a bit more material (and discussion in comments).

July 9, 2012

Book review: Thinking, Fast and Slow

Daniel Kahneman and Amos Tversky made huge contributions to our understanding of why we are so bad at prediction.  Kahneman won a Nobel Prize[*] for this in 2002 (Tversky failed to satisfy the secondary requirement of still being alive).  Kahneman has now written a book, Thinking, Fast and Slow about their research.  Unlike some of his previous writing, this book is designed to be shelved in the Business/Management section of bookshops and read by people who might otherwise be  looking for their cheese.

The “Fast” and “Slow” of the title are two systems of thought: the rapid preconscious judgement that we use for most of our decision-making, and the conscious and deliberate evaluation of alternatives and probabilities that we like to believe we use.   The “Fast” system relies very heavily on stereotyping — finding the best match for a situation in a library of stories — and so is subject to predictable and exploitable biases.  The “Slow” system can be trained to do much better, but only if we can force it to be used.

A dramatic example of the sort of mischief the “fast” system can get up to is anchoring bias.  Suppose you ask a bunch of people how many UN-member countries are in Africa.  You will get a range of guesses, probably not very accurate, and perhaps a few people who actually know the answer.  Suppose you had first asked people to write down the last two digits of their telephone number, or to spin a roulette wheel and write down the number that is chosen, and then to guess how many countries there are in Africa.  Empirically, across a range of situations like this, there is a strong correlation between the obviously irrelevant first number and the guess.   This is an outrageous finding, but it is very well confirmed.   It’s one of the reasons that bogus polls are harmful even if you know they are bogus.

Kahneman gives many other examples of cognitive illusions generated by the ‘fast’ system of the mind.  As with optical illusions, they don’t lose their intuitive force when you understand them, but you can learn not to trust your intuition in situations where it’s going to be biased.

One minor omission of the book is that there’s not much explanation of why we are so stupid: Kahneman points out, and documents, that thinking uses up blood sugar and is biologically expensive, but that doesn’t explain why the mistakes we make are so simple.  Research in computer science and philosophy, by people actually trying to implement thinking, gives one possibility, under the general name of “the frame problem“.  We know an enormous number of facts and relationships between them, and we cannot afford to investigate the logical consequences of all these facts when trying to make a decision.  The price of tea in China really is irrelevant to most decisions, but not to decisions about tea purchases, or about souvenir purchases when in Beijing, or to living-wage levels in Fujian.  We need some way of ignoring the price of tea in China, and millions of other facts, except very occasionally when they are relevant, without having to deduce their irrelevance each time.  Not surprisingly, it sometimes misfires and treats information as important when it is actually irrelevant.

Read this book.  It might help you think better, and at least will give you better excuses for your mistakes.

 

* to quote Daniel Davies: “blah blah blah Sveriges Riksbank. Nobody cares, you know.”

April 29, 2012

Data Journalism Handbook now available

I’m very excited to learn that the Data Journalism Handbook is now live and I am looking forward to reading it. The handbook features contributions from over 70 leading practitioners of data journalism from every corner of the globe, from Japan to Finland, Nigeria to the US and from leading news outlets such the New York Times, Zeit Online, the BBC and the Guardian.

It’s an open educational resource, under a creative commons licence (CC-BY-SA) so you may share it and remix it. It is hoped that the handbook will encourage many budding data journalists to look at data as a source and give them courage to tackle it, as well as showcasing some great examples of journalism using data as inspiration for future stories.

You can find the handbook at: http://datajournalismhandbook.org/

Also available for pre-order is the e-book and print version from O’Reilly Media: http://oreil.ly/ddj-e-print.