Posts filed under Research (131)

October 30, 2014

Cocoa puff

Both Stuff and the Herald have stories about the recent cocoa flavanols research (the Herald got theirs from the Independent).

Stuff’s story starts out

Remember to eat chocolate because it might just save your memory. This is the message of a new study, by Columbia University Medical Centre.

 

Sixteen paragraphs later, though, it turns out this isn’t the message

“The supplement used in this study was specially formulated from cocoa beans, so people shouldn’t take this as a sign to stock up on chocolate bars,” said Dr Simon Ridley, Head of Research at Alzheimer’s Research UK.

 

There’s a lot of variation in flavanol concentrations even in dark chocolate, but 900mg of flavanols would be somewhere between 150g and 1kg of dark chocolate per day.  Ordinary cocoa powder is also not going to provide 900mg at any reasonable consumption level.

The Herald story is much less over the top. They also quote in more detail the cautious expert comments and give less space to the positive ones. For example, that the study was very small and very short, and the improvement in memory was just in one measure of speed of very-short-term recall from a visual prompt, or that this measure was chosen because they expected it to be affected by cocoa rather than because of its relevance to everyday life. There was another memory test in the study, arguably a more relevant one, which was not expected to improve and didn’t.

Neither story mentions that the randomised trial also evaluated an exercise program that the researchers expected to be effective but wasn’t. Taking that into account, the statistical evidence for the effect of flavanols is not all that strong.

October 28, 2014

Absolute, relative, correlation, cause

The conclusions of a recent research paper

Delivery by [caesarean section] is associated with a modest increased odds of [autism], and possibly ADHD, when compared to vaginal delivery. Although the effect may be due to residual confounding, the current and accelerating rate of[caesarean section] implies that even a small increase in the odds of disorders, such as [autism] or ADHD, may have a large impact on the society as a whole. This warrants further investigation.

The Herald

Babies born through Caesarean section are more likely to develop autism, a new study says.

Academics warn the increasingly popular C-section deliveries heighten the risk of the disorder by 23 per cent.

There’s a fairly clear difference in language: the news story is fairly clearly implying that caesarean sections cause autism; the research paper is being scrupulously careful not to say that.

Using a relative risk is convenient in technical communication, but in non-technical communication makes the impact seem greater than it really is. The US Centers for Disease Control estimate a risk of 1 in 68 for autism spectrum disorder (there aren’t systematic NZ data).  If the correlation with C-section really is causal, we’re talking about roughly 14 kids with autism spectrum disorders per 1000 without a C-section and about 17 per 1000 with a C-section. The absolute risk increase, if it’s real, is about 3 cases per 1000 C-sections.

It’s also important to be clear that this correlation cannot explain much of the recent increases in autism. A relative risk of 1.23 means that if we went from no C-sections to 100% C-sections there would be a 23% increase in autism spectrum disorder. The observed increase is about five times that, and since  C-sections have only increased about 10 percentage points, not 100 percentage points, the observed increase in autism is about 50 times what this correlation could explain.

There are (I’m told by people who know the issues) good reasons to think there are too many C-sections.  This probably won’t be one of the most important ones.

 

October 18, 2014

When barcharts shouldn’t start at zero

Barcharts should almost always start at zero. Almost always.

Randal Olson has a very popular post on predictors of divorce, based on research by two economists at Emory University. The post has a lot of barcharts like this one

marriage-stability-wedding-expenses

The estimates in the research report are hazard ratios for dissolution of marriage. A hazard ratio of zero means a factor appears completely protective — it’s not a natural reference point. The natural reference point for hazard ratios is 1: no difference between two groups, so that would be a more natural place to put the axis than at zero.

A bar chart is also not good for showing uncertainty. The green bar has no uncertainty, because the others are defined as comparisons to it, but the other bars do. The more usual way to show estimates like these from regression models is with a forest plot:

marriage

The area of each coloured box is proportional to the number of people in that group in the sample, and the line is a 95% confidence interval.  The horizontal scale is logarithmic, so that 0.5 and 2 are the same distance from 1 — otherwise the shape of the graph would depend on which box was taken as the comparison group.

Two more minor notes: first, the hazard ratio measures the relative rate of divorces over time, not the relative probability of divorce, so a hazard ratio of 1.46 doesn’t actually mean 1.46 times more likely to get divorced. Second, the category of people with total wedding expenses over $20,000 was only 11% of the sample — the sample is differently non-representative than the samples that lead to bogus estimates of $30,000 as the average cost of a wedding.

October 8, 2014

What are CEOs paid; what should they be paid?

From Harvard Business Review, reporting on recent research

Using data from the International Social Survey Programme (ISSP) from December 2012, in which respondents were asked to both “estimate how much a chairman of a national company (CEO), a cabinet minister in a national government, and an unskilled factory worker actually earn” and how much each person should earn, the researchers calculated the median ratios for the full sample and for 40 countries separately.

The graph:

actualestimated

 

The radial graph exaggerates the differences, but they are already huge. Respondents dramatically underestimated what CEOs are actually paid, and still thought it was too much.  Here’s a barchart of the blue and grey data (the red data seems to only be available in the graph). Ordering by ideal pay ratio (rather than alphabetically) helps with the nearly-invisible blue bars: it’s interesting that Australia has the highest ideal ratio.

ceo

The findings are a contrast to foreign aid budgets, where the desired level of expenditure is less than the estimated level, but more than the actual level.  On the other hand, it’s less clear exactly what the implications are in the CEO case.

 

September 26, 2014

Screening is harder than that

From the Herald

Calcium in the blood could provide an early warning of certain cancers, especially in men, research has shown.

Even slightly raised blood levels of calcium in men was associated with an increased risk of cancer diagnosis within one year.

The discovery, reported in the British Journal of Cancer, raises the prospect of a simple blood test to aid the early detection of cancer in high risk patients.

In fact, from the abstract of the research paper, 3% of people had high blood levels of calcium, and among those,  11.5% of the men developed cancer within a year. That’s really not strong enough prediction to be useful for early detection of cancer. For every thousand men tested you would find three cancer cases, and 27 false positives. What the research paper actually says under “Implications for clinical practice” is

“This study should help GPs investigate hypercalcaemia appropriately.”

That is, if a GP happens to measure blood calcium for some reason and notices that it’s abnormally high, cancer is one explanation worth checking out.

The overstatement is from a Bristol University press release, with the lead

High levels of calcium in blood, a condition known as hypercalcaemia, can be used by GPs as an early indication of certain types of cancer, according to a study by researchers from the universities of Bristol and Exeter.

and later on an explanation of why they are pushing this angle

The research is part of the Discovery Programme which aims to transform the diagnosis of cancer and prevent hundreds of unnecessary deaths each year. In partnership with NHS trusts and six Universities, a group of the UK’s leading researchers into primary care cancer diagnostics are working together in a five year programme.

While the story isn’t the Herald’s fault, using a photo of a man drinking a glass of milk is. The story isn’t about dietary calcium being bad, it’s about changes in the internal regulation of calcium levels in the blood, a completely different issue. Milk has nothing to do with it.

August 30, 2014

Funding vs disease burden: two graphics

You have probably seen the graphic from vox.comhyU8ohq

 

There are several things wrong with it. From a graphics point of view it doesn’t make any of the relevant comparisons easy. The diameter of the circle is proportional to the deaths or money, exaggerating the differences. And the donation data are basically wrong — the original story tries to make it clear that these are particular events, not all donations for a disease, but it’s the graph that is quoted.

For example, the graph lists $54 million for heart disease, based on the ‘Jump Rope for Heart’ fundraiser. According to Forbes magazine’s list of top charities, the American Heart Association actually received $511 million in private donations in the year to June 2012, almost ten times as much.  Almost as much again came in grants for heart disease research from the National Institutes of Health.

There’s another graph I’ve seen on Twitter, which shows what could have been done to make the comparisons clearer:

BwNxOzdCIAAyIZS

 

It’s limited, because it only shows government funding, not private charity, but it shows the relationship between funding and the aggregate loss of health and life for a wide range of diseases.

There are a few outliers, and some of them are for interesting reasons. Tuberculosis is not currently a major health problem in the US, but it is in other countries, and there’s a real risk that it could spread to the US.  AIDS is highly funded partly because of successful lobbying, partly because it — like TB — is a foreign-aid issue, and partly because it has been scientifically rewarding and interesting. COPD and lung cancer are going to become much less common in the future, as the victims of the century-long smoking epidemic die off.

Depression and injuries, though?

 

Update: here’s how distorted the areas are: the purple number is about 4.2 times the blue number

four-to-one

August 28, 2014

Age, period, um, cohort

A recurring issue with trends over time is whether they are ‘age’ trends, ‘period’ trends, or ‘cohort’ trends.  That is, when we complain about ‘kids these days’, is it ‘kids’ or ‘these days’ that’s the problem? Mark Liberman at Language Log has a nice example using analyses by Joe Fruehwald.

 

If you look at the frequency of “um” in speech (in this case in Philadelphia), it decreases with age at any given year

Fruehwald1

 

On the other hand, it increases over time for people in a given age cohort (for example, the line that stretches right across the graph is for people born in the 1950s)

Fruehwald2

 

It’s not that people say “um” less as they get older, it’s that people born a long time ago say “um” less than people born recently.

August 15, 2014

Cancer statistics done right

I’ve mentioned a number of times that statistics on cancer survival are often unreliable for the conclusion people want to draw, and that you need to look at cancer mortality.  Today’s story in Stuff is about Otago research that does it right:

The report found for 11-year timeframe, cancer-specific death rates decreased in both countries and cancer mortality fell in both countries. But there was no change in the difference between the death rates New Zealand and Australia, which remained remained 10 per cent higher in New Zealand.

That is, they didn’t look at survival after diagnosis, they looked at the rate of deaths. They also looked at the rate of cancer diagnoses

“The higher mortality from all cancers combined cannot be attributed to higher incidence rates, and this suggests that overall patient survival is lower in New Zealand,” Skegg said.

That’s not quite as solid a conclusion — it’s conceivable that New Zealand really has higher incidence, but Australia compensates by over-diagnosing tumours that wouldn’t ever cause a problem — but it would be a stretch to have that happen over all types of cancer combined, as they observed.

 

July 28, 2014

Rise of the machines

Journalism

Data

The Automatic Statistician project (somewhat flaky website) is working to automate various types of statistical modelling. They have interesting research papers. They also have a demo that’s fairly limited but produces linear regression models, model checks, and descriptions that are reasonable from a predictive point of view.

Automating some bits of data analysis is an important problem, because there aren’t enough statisticians to go around. However (as Cathy O’Neill points out about competition sites like Kaggle), they aren’t tackling the hard bits of data analysis: getting the data ready, and more importantly, getting the question into a precisely-specified form that can be answered by fitting a model.

July 23, 2014

Human statisticians not obsolete

There’s a website, OnlyBoth.com, that, as it says

Discovers New Insights from Data.
Writes Them Up in Perfect English.
All Automated.

You can test this by asking it for ‘insights’ in some example areas. One area is baseball, so naturally I selected the Seattle Mariners, and 2009, when I still lived in Seattle. OnlyBoth returns several names where it found insights, and I chose ‘Matt Tuiasosopo’ — the most obvious thing about him is that he comes from a famous local football family, but I was interested in what new insight the data revealed.

Matt Tuiasosopo in 2009 was the 2nd-youngest (23 yrs) of the 25 hitters who were born in Washington and played for the Seattle Mariners.

outdone by Matt Tuiasosopo in 2008 (22 yrs).

I don’t think our students need to be too worried yet.