Posts filed under Medical news (300)

June 21, 2015

Sunbathing and babies

The Herald (from the Daily Mail)

A sunshine break is the perfect way to unwind, catch up on your reading and top up that tan.

But it seems a week soaking up the rays could also offer a surprising benefit – helping a woman have a baby.

Increased exposure to sunshine could raise the odds of becoming a mother by more than a third, a study suggests.


If you read StatsChat regularly, you probably won’t be surprised to hear the study had nothing to do with either holidays or sunbathing, or fertility in the usual sense.

As the story goes on to say, it was about the weather and IVF success rates. The researchers looked for correlations between a variety of weather measurements and a variety of ways of measuring IVF success. They didn’t find evidence of correlations with the weather at the time of conception. As they said (conference abstract, since this isn’t published)

When looking for a linear correlation between IVF results and the mean monthly values for the weather, the results were inconsistent.

So, following the ‘try, try again’ strategy they looked at weather a month earlier

However, when the same analysis was repeated with the weather results of 1 month earlier, there was a clear trend towards better IVF outcome with higher temperature, less rain and more sunshine hours. 

It helps, here, to know that “a clear trend” is jargon for “unimpresssive statistical evidence, but at least in the direction we wanted”.  That’s not the only problem, though. Since these are honest researchers, you find the other big problem in the section of the abstract labelled “limitations”

Because of the retrospective design of the study, further adjusting for possible confounding factors such as age of the woman, type of infertility and indication for IVF is mandatory. 

That is, their analysis lumped together women of different ages,  types of infertility, and reasons for using IVF, even those these have a much bigger impact on success than is being claimed for the weather.

I don’t have any problem with these analyses being performed and presented to other consenting scientists who are trying to work out ways to improve IVF.  On the other hand,  I’m pretty sure the Daily Mail didn’t get these results by reading the abstract book or sitting through the conference. Someone made a deliberate decision to get publicity for this research, at this stage, in a form where all the cautionary notes would be lost. 


June 11, 2015

Comparing all the treatments

This story didn’t get into the local media, but I’m writing about it because it illustrates the benefit of new statistical methods, something that’s often not visible to outsiders.

From a University of Otago press release about the work of A/Prof Suetonia Palmer

The University of Otago, Christchurch researcher together with a global team used innovative statistical analysis to compare hundreds of research studies on the effectiveness of blood-pressure-lowering drugs for patients with kidney disease and diabetes. The result: a one-stop-shop, evidence-based guide on which drugs are safe and effective.

They link to the research paper, which has interesting looking graphics like this:


The red circles represent blood-pressuring lowering treatments that have been tested in patients with kidney disease and diabetes, with the lines indicating which comparisons have been done in randomised trials. The circle size shows how many trials have used a drug; the line width shows how many trials have compared a given pair of drugs.

If you want to compare, say, endothelin inhibitors with ACE inhibitors, there aren’t any direct trials. However, there are two trials comparing endothelin inhibitors to placebo, and ten trials comparing placebo to ACE inhibitors. If we estimate the advantage of endothelin inhibitors over placebo and subtract off the advantage of ACE inhibitors over placebo we will get an estimate of the advantage of endothelin inhibitors over ACE inhibitors.

More generally, if you want to compare any two treatments A and B, you look at all the paths in the network between A and B, add up differences along the path to get an estimate of the difference between A and B, then take a suitable weighted average of the estimates along different paths. This statistical technique is called ‘network meta-analysis’.

Two important technical questions remain: what is a suitable weighted average, and how can you tell if these different estimates are consistent with each other? The first question is relatively straightforward (though quite technical). The second question was initially the hard one.  It could be for example, that the trials involving placebo had very different participants from the others, or that old trials had very different participants from recent trials, and their conclusions just could not be usefully combined.

The basic insight for examining consistency is that the same follow-the-path approach could be used to compare a treatment to itself. If you compare placebo to ACE inhibitors, ACE inhibitors to ARB, and ARB to placebo, there’s a path (a loop) that gives an estimate of how much better placebo is than placebo. We know the true difference is zero; we can see how large the estimated difference is.

In this analysis, there wasn’t much evidence of inconsistency, and the researchers combined all the trials to get results like this:


The ‘forest plot’ shows how each treatment compares to placebo (vertical line) in terms of preventing death. We can’t be absolutely sure than any of them are better, but it definitely looks as though ACE inhibitors plus calcium-channel blockers or ARBs, and ARBs alone, are better. It could be that aldosterone inhibitors are much better, but also could be that they are worse. This sort of summary is useful as an input to clinical decisions, and also in deciding what research should be prioritised in the future.

I said the analysis illustrated progress in statistical methods. Network meta-analysis isn’t completely new, and its first use was also in studying blood pressure drugs, but in healthy people rather than people with kidney disease. Here are those results


There are different patterns for which drug is best across the different events being studied (heart attack, stroke, death), and the overall patterns are different from those in kidney disease/diabetes. The basic analysis is similar; the improvements since this 2003 paper are more systematic and flexible ways of examining inconsistency, and new displays of the network of treatments.

‘Innovative statistical techniques’ are important, but the key to getting good results here is a mind-boggling amount of actual work. As Dr Palmer put it in a blog interview

Our techniques are still very labour intensive. A new medical question we’re working on involves 20-30 people on an international team, scanning 5000-6000 individual reports of medical trials, finding all the relevant papers, and entering data for about 100-600 reports by hand. We need to build an international partnership to make these kind of studies easier, cheaper, more efficient, and more relevant.

At this point, I should confess the self-promotion aspect of the post.  I invented the term “network meta-analysis” and the idea of using loops in the network to assess inconsistency.  Since then, there have been developments in statistical theory, especially by Guobing Lu and A E Ades in Bristol, who had already been working on other aspects of multiple-treatment analysis. There have also been improvements in usability and standardisation, thanks to Georgia Salanti and others in the Cochrane Collaboration ‘Comparing Multiple Interventions Methods Group’.  In fact, network meta-analysis has grown up and left home to the extent that the original papers often don’t get referenced. And I’m fine with that. It’s how progress works.


Women and dementia risk

A Herald story headlined “Women face greater dementia risk – study” has been nominated for Stat of the Week, I think a bit unfairly. Still, perhaps it’s worth clarifying the points made in the nomination.

People diagnosed with dementia are more likely to be women, and the story mentions three reasons. The first is overwhelmingly the most important from the viewpoint of population statistics: dementia is primarily a disease of old people, the majority of whom are women because women live longer.

In addition, and importantly from the viewpoint of individual health, women are more likely to have diagnosed dementia than men in  a given age range

European research has indicated that although at age 70, the prevalence of dementia is the same for men and women, it rapidly diverges in older age groups. By 85, women had a 40 per cent higher prevalence than men.

There could be many reasons for this. A recent research paper lists possibilities related to sex (differences in brain structure, impact of changes in hormones after menopause) and to gender (among current 85-year-olds, women tend to be less educated and less likely to have had intellectually demanding careers).

The third statistic mentioned in the Stat of the Week nomination was that “Women with Alzheimer’s disease (AD) pathology have a three-fold risk of being diagnosed with AD than men.”  This is from research looking at people’s brains.  Comparing people with similar amounts of apparent damage to their brains, women were more likely to be diagnosed with Alzheimer’s disease.

So, the differences in the summary statistics are because they are making different comparisons.

Statistical analysis of Alzheimer’s disease is complicated because the disease happens in the brain, where you can’t see. Definitive diagnosis and measurement of the biological disease process can only be done at autopsy. Practical clinical diagnosis is variable because dementia is a very late stage in the process, and different people take different amounts of neurological damage to get to that point.


June 8, 2015

Meddling kids confirm mānuka honey isn’t panacea

The Sunday Star-Times has a story about a small, short-term, unpublished randomised trial of mānuka honey for preventing minor illness. There are two reasons this is potentially worth writing about: it was done by primary school kids, and it appears to be the largest controlled trial in humans for prevention of illness.

Here are the results (which I found from the Twitter account of the school’s lab, run by Carole Kenrick, who is  named in the story)CGuGbSiWoAACzbe

The kids didn’t find any benefit of mānuka honey over either ordinary honey or no honey. Realistically, that just means they managed to design and carry out the study well enough to avoid major biases. The reason there aren’t any controlled prevention trials in humans is that there’s no plausible mechanism for mānuka honey to help with anything except wound healing. To its credit, the SST story quotes a mānuka producer saying exactly this:

But Bray advises consumers to “follow the science”.

“The only science that’s viable for mānuka honey is for topical applications – yet it’s all sold and promoted for ingestion.”

You might, at a stretch, say mānuka honey could affect bacteria in the gut, but that’s actually been tested, and any effects are pretty small. Even in wound healing, it’s quite likely that any benefit is due to the honey content rather than the magic of mānuka — and the trials don’t typically have a normal-honey control.

As a primary-school science project, this is very well done. The most obvious procedural weakness is that mānuka honey’s distinctive flavour might well break their attempts to blind the treatment groups. It’s also a bit small, but we need to look more closely to see how that matters.

When you don’t find a difference between groups, it’s crucial to have some idea of what effect sizes have been ruled out.  We don’t have the data, but measuring off the graphs and multiplying by 10 weeks and 10 kids per group, the number of person-days of unwellness looks to be in the high 80s. If the reported unwellness is similar for different kids, so that the 700 days for each treatment behave like 700 independent observations, a 95% confidence interval would be 0±2%.  At the other extreme, if 0ne kid had 70 days unwell, a second kid had 19, and the other eight had none, the confidence interval would be 0±4.5%.

In other words, the study data are still consistent with manūka honey preventing about one day a month of feeling “slightly or very unwell”, in a population of Islington primary-school science nerds. At three 5g servings per day that would be about 500g honey for each extra day of slightly improved health, at a cost of $70-$100, so the study basically rules out manūka honey being cost-effective for preventing minor unwellness in this population. The study is too small to look at benefits or risks for moderate to serious illness, which remain as plausible as they were before. That is, not very.

Fortunately for the mānuka honey export industry, their primary market isn’t people who care about empirical evidence.

June 7, 2015

What does 80% accurate mean?

From Stuff (from the Telegraph)

And the scientists claim they do not even need to carry out a physical examination to predict the risk accurately. Instead, people are questioned about their walking speed, financial situation, previous illnesses, marital status and whether they have had previous illnesses.

Participants can calculate their five-year mortality risk as well as their “Ubble age” – the age at which the average mortality risk in the population is most similar to the estimated risk. Ubble stands for “UK Longevity Explorer” and researchers say the test is 80 per cent accurate.

There are two obvious questions based on this quote: what does it mean for the test to be 80 per cent accurate, and how does “Ubble” stand for “UK Longevity Explorer”? The second question is easier: the data underlying the predictions are from the UK Biobank, so presumably “Ubble” comes from “UK Biobank Longevity Explorer.”

An obvious first guess at the accuracy question would be that the test is 80% right in predicting whether or not you will survive 5 years. That doesn’t fly. First, the test gives a percentage, not a yes/no answer. Second, you can do a lot better than 80% in predicting whether someone will survive 5 years or not just by guessing “yes” for everyone.

The 80% figure doesn’t refer to accuracy in predicting death, it refers to discrimination: the ability to get higher predicted risks for people at higher actual risk. Specifically, it claims that if you pick pairs of  UK residents aged 40-70, one of whom dies in the next five years and the other doesn’t, the one who dies will have a higher predicted risk in 80% of pairs.

So, how does it manage this level of accuracy, and why do simple questions like self-rated health, self-reported walking speed, and car ownership show up instead of weight or cholesterol or blood pressure? Part of the answer is that Ubble is looking only at five-year risk, and only in people under 70. If you’re under 70 and going to die within five years, you’re probably sick already. Asking you about your health or your walking speed turns out to be a good way of finding if you’re sick.

This table from the research paper behind the Ubble shows how well different sorts of information predict.


Age on its own gets you 67% accuracy, and age plus asking about diagnosed serious health conditions (the Charlson score) gets you to 75%.  The prediction model does a bit better, presumably it’s better at picking up a chance of undiagnosed disease.  The usual things doctors nag you about, apart from smoking, aren’t in there because they usually take longer than five years to kill you.

As an illustration of the importance of age and basic health in the prediction, if you put in data for a 60-year old man living with a partner/wife/husband, who smokes but is healthy apart from high blood pressure, the predicted percentage for dying is 4.1%.

The result comes with this well-designed graphic using counts out of 100 rather than fractions, and illustrating the randomness inherent in the prediction by scattering the four little red people across the panel.


Back to newspaper issues: the Herald also ran a Telegraph story (a rather worse one), but followed it up with a good repost from The Conversation by two of the researchers. None of these stories mentioned that the predictions will be less accurate for New Zealand users. That’s partly because the predictive model is calibrated to life expectancy, general health positivity/negativity, walking speeds, car ownership, and diagnostic patterns in Brits. It’s also because there are three questions on UK government disability support, which in our case we have not got.


June 4, 2015

Round up on the chocolate hoax

Science journalism (or science) has a problem:

Meh. Unimpressed.

Study was unethical


May 30, 2015

Coffee health limit exaggerated

The Herald says

Drinking the caffeine equivalent of more than four espressos a day is harmful to health, especially for minors and pregnant women, the European Union food safety agency has said.

“It is the first time that the risks from caffeine from all dietary sources have been assessed at EU level,” the EFSA said, recommending that an adult’s daily caffeine intake remain below 400mg a day.

Deciding a recommended limit was a request of the European Commission, the EU’s executive body, to try to find a Europe-wide benchmark for caffeine consumption.

But regulators said the most worrying aspect was not the espressos and lattes consumed on cafe terraces across Europe, but Red Bull-style energy drinks, hugely popular with the young.

Contrast that with the Scientific Opinion on the safety of caffeine from the EFSA Panel on Dietetic Products, Nutrition, and Allergies (PDF of the whole thing). First, what they were asked for

the EFSA Panel … was asked to deliver a scientific opinion on the safety of caffeine. Advice should be provided on a daily intake of caffeine, from all sources, that does not give rise to concerns about harmful effects to health for the general population and for specific subgroups of the population. Possible interactions between caffeine and other constituents of so-called “energy drinks”, alcohol, synephrine and physical exercise should also be addressed.

and what they concluded (there’s more than 100 pages extra detail if you want it)

Single doses of caffeine up to 200 mg, corresponding to about 3 mg/kg bw for a 70-kg adult are unlikely to induce clinically relevant changes in blood pressure, myocardial blood flow, hydration status or body temperature, to  reduce perceived extertion/effort during exercise or to mask the subjective perception of alcohol intoxication. Daily caffeine intakes from all sources up to 400 mg per day do not raise safety concerns for adults in the general population, except pregnant women. Other common constituents of “energy drinks” (i.e. taurine, D-glucurono-γ- lactone) or alcohol are unlikely to adversely interact with caffeine. The short- and long-term effects of co-consumption of caffeine and synephrine on the cardiovascular system have not been adequately investigated in humans. Daily caffeine intakes from all sources up to 200 mg per day by pregnant women do not raise safety concerns for the fetus. For children and adolescents, the information available is insufficient to base a safe level of caffeine intake. The Panel considers that caffeine intakes of no concern derived for acute consumption in adults (3 mg/kg bw per day) may serve as a basis to derive daily caffeine intakes of no concern for children and adolescents.

Or, in even shorter paraphrase.

<shrugs> If you need a safe level, four cups a day seems pretty harmless in healthy people, and there doesn’t seem to be a special reason to worry about teenagers.




May 28, 2015

Junk food science

In an interesting sting on the world of science journalism, John Bohannon and two colleagues, plus a German medical doctor, ran a small randomised experiment on the effects of chocolate consumption, and found better weight loss in those given chocolate. The experiment was real and the measurements were real, but the medical journal  was the sort that published their paper two weeks after submission, with no changes.

Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.

Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a “significant” result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one “statistically significant” result were pretty good.

Bohannon and his conspirators were doing this deliberately, but lots of people do it accidentally. Their study was (deliberately) crappier than average, but since the journalists didn’t ask, that didn’t matter. You should go read the whole thing.

Finally, two answers for obvious concerns: first, the participants were told the research was for a documentary on dieting, not that it was in any sense real scientific research. Second: no, neither Stuff nor the Herald fell for it.

 [Update: Although there was participant consent, there wasn’t ethics committee review. An ethics committee probably wouldn’t have allowed it. Hilda Bastian on Twitter]

May 25, 2015

Genetic determinism: infidelity edition

New York Times columnist Richard Friedman is writing about hormones, genetics, and infidelity.  This paragraph is about recently-published research by Brendan Zietsch and colleagues (the NYT tries to link, but the URL is wrong)

His study, published last year in Evolution and Human Behavior, found a significant association between five different variants of the vasopressin gene and infidelity in women only and no relationship between the oxytocin genes and sexual behavior for either sex. That was impressive: Forty percent of the variation in promiscuous behavior in women could be attributed to genes.

If you didn’t read carefully you might think this was a claim that the  vasopressin gene association explained the “Forty percent” and that the percentage was lower in men. In fact, the vasopressin gene associations are rather weaker than that, and the variation attributed by the researchers to genes is 62% in men.

But it gets worse. The correlation with genetics was only seen in identical twins. That is, pairs of identical twins had fairly similar cheating behaviour , but there was no similarity at all between pairs of non-identical twins (of any gender combination) or between non-twin siblings.  If that’s not due to chance (which it could be), it’s very surprising. It doesn’t rule out a genetic explanation — but it means the genetics would have to be weird.  You’d need either a variant that had opposite effects with one versus two copies, or a lot of variants that only had effects with two copies and no effect with one, or an effect that switched on only when you had variant copies of multiple genes, or an effect driven by new mutations not inherited from parents.  The results for the vasopressin gene don’t have this kind of weird.

The story is all “yes, it’s surprising that you’d get this sort of effect in a complex social behaviour, but genetics! And voles!”. I’ll give him the voles, but if anything, the strong correlation between identical twins (only) argues against vasopressin gene variants being a major driver in humans, and the research paper is much more cautious on this point.



May 20, 2015

Actually it’s about neuroscience in videogame journalism

Q: Why does Stuff think playing ‘Call of Duty’ increases the risk of Alzheimer’s disease? I didn’t think old people played violent video games much.

A: I don’t think Stuff does think anything in particular about it. They just reprinted that from the Telegraph and trimmed out the casual sexism.

Q: Ok, why does the Telegraph think playing ‘Call of Duty’ increases the risk of Alzheimer’s disease? I didn’t think old people played violent video games much.

A: It’s not so much about the games they play now as the ones they played 60 years earlier

Q: What video games did they play 60 years ago?

A: Not current people with Alzheimer’s; gamers now who might get Alzheimer’s in a few decades.

Q: Ok, so why will that happen?

A: Because video game players use response learning strategies for navigation in video mazes

Q: Why is that bad?

A: Because other research found people who used those strategies had more activity in their caudate nucleus.

Q: Is this going to start making sense soon?

A:. Yes. Sorry. The research found that when navigating a virtual-reality maze habitual game players used strategies that had previously been correlated with less activity in a part of the brain involved in memory and spatial awareness than normal people did. They apparently used different strategies involving other parts of the brain.

Q: And why is this a problem?

A: Because that part of the brain, the hippocampus, is less active in people with Alzheimer’s, as well as some other neurological and psychological disorders

Q: While they’re playing video games?

A: No, all the time.

Q: Couldn’t it just be that the video gamers have developed a more efficient strategy and that their hippocampuses are perfectly fine. Or hippocampi, whatever?

A: Yes, that could also be the case.

Q: I mean, if you saw people twitching their thumbs rapidly playing a video game it would be fine, but if they were just doing that while sitting around at meetings you’d worry a bit.

A: Indeed.

Q: Did the research look at memory or cognition at all?

A: No.

Q: Do they even know that these brain differences happened after playing video games? Could it be that people who don’t use that part of the brain for video navigation are just better at games?

A: It could be, yes.

Q: The story quotes the percentages using the parts of their brain to four significant digits. Does that mean there were tens of thousands of people in the research?

A: No.

Q: How many?

A: 59: about 30 in each group

Q: If this was true, could it explain why dementia is increasingly common?

A: No.

Q: Why not?

A: Partly because it’s too soon, and partly because dementia isn’t increasingly common at a given age, at least in the US and Europe. If anything, it’s less common. There are more cases now because there are more old people.

Q: It sounds like more research might be needed before writing international headlines about the risk of a terrifying disease.

A: You think?