Posts filed under Research (123)

July 28, 2014

Rise of the machines

Journalism

Data

The Automatic Statistician project (somewhat flaky website) is working to automate various types of statistical modelling. They have interesting research papers. They also have a demo that’s fairly limited but produces linear regression models, model checks, and descriptions that are reasonable from a predictive point of view.

Automating some bits of data analysis is an important problem, because there aren’t enough statisticians to go around. However (as Cathy O’Neill points out about competition sites like Kaggle), they aren’t tackling the hard bits of data analysis: getting the data ready, and more importantly, getting the question into a precisely-specified form that can be answered by fitting a model.

July 23, 2014

Human statisticians not obsolete

There’s a website, OnlyBoth.com, that, as it says

Discovers New Insights from Data.
Writes Them Up in Perfect English.
All Automated.

You can test this by asking it for ‘insights’ in some example areas. One area is baseball, so naturally I selected the Seattle Mariners, and 2009, when I still lived in Seattle. OnlyBoth returns several names where it found insights, and I chose ‘Matt Tuiasosopo’ — the most obvious thing about him is that he comes from a famous local football family, but I was interested in what new insight the data revealed.

Matt Tuiasosopo in 2009 was the 2nd-youngest (23 yrs) of the 25 hitters who were born in Washington and played for the Seattle Mariners.

outdone by Matt Tuiasosopo in 2008 (22 yrs).

I don’t think our students need to be too worried yet.

July 13, 2014

Age/period/cohort voting

From the New York Times, an interactive graph showing how political leanings at different ages have changed over time

vote

Yes, voting preferences for kids are problematic. Read the story (and this link) to find out how they inferred them. There’s more at Andrew Gelman’s blog.

July 1, 2014

Facebook recap

The discussion over the Facebook experiment seems to involve a lot of people being honestly surprised that other people feel differently.

One interesting correlation based on my Twitter feed is that scientists involved in human subjects research were disturbed by the research and those not involved in human subjects research were not. This suggests our indoctrination in research ethics has some impact, but doesn’t answer the question of who is right.

Some links that cover most of the issues

June 29, 2014

Not yet news

When you read “The university did not reveal how the study was carried out” in a news story about a research article, you’d expect the story to be covering some sort of scandal. Not this time.

The Herald story  is about broccoli and asthma

They say eating up to two cups of lightly steamed broccoli a day can help clear the airways, prevent deterioration in the condition and even reduce or reverse lung damage.

Other vegetables with the same effect include kale, cabbage, brussels sprouts, cauliflower and bok choy.

Using broccoli to treat asthma may also help for people who don’t respond to traditional treatment.

‘How the study was carried out’ isn’t just a matter of detail: if they just gave people broccoli, they wouldn’t know what other vegetables had the same effect, so maybe it wasn’t broccoli but some sort of extract? Was it even experimental or just observational? And did they actually test people who don’t respond to traditional treatment? And what exactly does that mean — failing to respond is pretty rare, though failing to get good control of asthma attacks isn’t.

The Daily Mail story was actually more informative (and that’s not a sentence I like to find myself writing). They reported a claim that wasn’t in the press release

The finding due to sulforaphane naturally occurring in broccoli and other cruciferous vegetables, which may help protect against respiratory inflammation that can cause asthma.

Even then, it isn’t clear whether the research really found that sulforaphane was responsible, or whether that’s just their theory about why broccoli is effective. 

My guess is that the point of the press release is the last sentence

Ms Mazarakis will be presenting the research findings at the 2014 Undergraduate Research Conference about Food Safety in Shanghai, China.

That’s a reasonable basis for a press release, and potentially for a story if you’re in Melbourne. The rest isn’t. It’s not science until they tell you what they did.

Ask first

Via The Atlantic, there’s a new paper in PNAS (open access) that I’m sure is going to be a widely cited example by people teaching research ethics, and not in a good way:

 In an experiment with people who use Facebook, we test whether emotional contagion occurs outside of in-person interaction between individuals by reducing the amount of emotional content in the News Feed. When positive expressions were reduced, people produced fewer positive posts and more negative posts; when negative expressions were reduced, the opposite pattern occurred. These results indicate that emotions expressed by others on Facebook influence our own emotions, constituting experimental evidence for massive-scale contagion via social networks.

More than 650,000 people had their Facebook feeds meddled with in this way, and as that paragraph from the abstract makes clear, it made a difference.

The problem is consent.  There is a clear ethical principle that experiments on humans require consent, except in a few specific situations, and that the consent has to be specific and informed. It’s not that uncommon in psychological experiments for some details of the experiment to be kept hidden to avoid bias, but participants still should be given a clear idea of possible risks and benefits and a general idea of what’s going on. Even in medical research, where clinical trials are comparing two real treatments for which the best choice isn’t known, there are very few exceptions to consent (I’ve written about some of them elsewhere).

The need for consent is especially clear in cases where the research is expected to cause harm. In this example, the Facebook researchers expected in advance that their intervention would have real effects on people’s emotions; that it would do actual harm, even if the harm was (hopefully) minor and transient.

Facebook had its research reviewed by an Institutional Review Board (the US equivalent of our Ethics Committees), and the terms of service say they can use your data for research purposes, so they are probably within the law.  The psychologist who edited the study for PNAS said

“I was concerned,” Fiske told The Atlantic, “until I queried the authors and they said their local institutional review board had approved it—and apparently on the grounds that Facebook apparently manipulates people’s News Feeds all the time.”

Fiske added that she didn’t want the “the originality of the research” to be lost, but called the experiment “an open ethical question.”

To me, the only open ethical question is whether people believed their agreement to the Facebook Terms of Service allowed this sort of thing. This could be settled empirically, by a suitably-designed survey. I’m betting the answer is “No.” Or, quite likely, “Hell, no!”.

[Update: Story in the Herald]

June 3, 2014

Are girl hurricanes less scary?

There’s a new paper out in the journal PNAS claiming that hurricanes with female names cause three times as many deaths as those with male names (because people don’t give girl hurricanes the proper respect). Ed Yong does a good job of explaining why this is probably bogus, but no-one seems to have drawn any graphs, which I think make the situation a lot clearer. (more…)

May 14, 2014

One of the things social media is good for

[Update: 538 now has an intro to the story explaining the mistakes and apologising. Good for them.]

So, at  fivethirtyeight.com there’s this story on mapping kidnappings in Nigeria using data from GDELT, the sort of thing data journalism is supposed to be good at. GDELT automatically extracts information from news stories to build a huge global database.

On Twitter, Erin Simpson, whose about.me page says she is “a leading specialist in the intersection of intelligence, data analysis, irregular warfare, and illicit systems – with an emphasis on novel research designs,” — and who has worked on the GDELT parser — is Not Happy.

Thanks to Storify, here are three summaries of what she says, but a lot of it can be boiled down to one point:

In conclusion: VALIDATE YOUR FREAKING DATA. It’s not true just because it’s on a goddamn map.

(via @LewSOS)

May 8, 2014

Think I’ll go eat worms

This table is from a University of California alumni magazine

Screen-Shot-2014-05-06-at-9.06.38-PM

 

Jeff Leek argues at Simply Statistics that the big problem with Big Data is they, too, forgot statistics.

Who’s afraid of the NSA?

Two tweets in my time line this morning linked to this report about this research paper, saying “americans have stopped searching on forbidden words

That’s a wild exaggeration, but what the research found was interesting. They looked at Google Trends search data for words and phrases that might be privacy-related in various ways: for example, searches that might be of interest to the US government security apparat or searchers that might be embarrassing if a friend knew about them.

In the US (but not in other countries) there was a small but definite change in searches at around the time of Edward Snowden’s NSA revelations. Search volume in general kept increasing, but searches on words that might be of interest to the government decreased slightly

unnamed

The data suggest that some people in the US became concerned that the NSA might care about them, and given that there presumably aren’t enough terrorists in the US to explain the difference, that knowing about the NSA surveillance is having an effect on political behaviour of (a subset of) ordinary Americans.

There is a complication, though. A similar fall was seen in the other categories of privacy-sensitive data, so either the real answer is something different, or people are worried about the NSA seeing their searches for porn.