Waiting for the details
In August, the Guardian and the BBC reported a successful clinical trial of an “AI stethoscope”. I’ll note at the start that this isn’t ChatGPT, it’s the older deep neural network style of AI that works more predictably and consistently.
The BBC said
A British team conducted a study using a modern version and say they found it can spot heart failure, heart valve disease and abnormal heart rhythms almost instantly.
Those examined using the new tool were twice as likely to be diagnosed with heart failure, compared with similar patients who were not examined using the technology.
Patients were three times more likely to be diagnosed with atrial fibrillation – an abnormal heart rhythm that can increase the risk of having a stroke. They were almost twice as likely to be diagnosed with heart valve disease, which is where one or more heart valves do not work properly.
These reports were based on a presentation at a scientific conference, the European Society of Cardiology meeting. Yes, the trial was called TRICORDER. “A good name is better than precious ointment”, as the Bible tells us.
There’s some reason to think the claims are plausible. The most important of the abnormal heart rhythms is very obvious just by taking a pulse, and doctors listen for particular heart sounds as evidence of heart failure. So, it could be true. The “AI stethoscope” would still have to be better than a stethoscope together with natural intelligence, but that’s why you do the trial. The trial randomly allocated half of the GP practices to use an AI stethoscope and the other half to business as usual.
Now we have the full research paper published in Lancet. The abstract says
Intention-to-treat analysis found heart failure detection did not differ between groups (IRR 0·94 [95% CI 0·86–1·02]); with no difference in community-based or hospital-based diagnoses (p>0·05).
That is, there isn’t good evidence of a benefit from your doctor having an “AI stethoscope” and at least for heart failure detection there’s evidence against a meaningful benefit: the uncertainty interval tops out at a 2% increase.
It’s worth emphasising that the trial had already finished last August. All that differs is the analysis. The analysis in the conference presentation was the “per protocol” comparison: comparing people who got an AI stethoscope examination with a curated set of controls who got at least one face-to-face consultation (but not necessarily a non-AI stethoscope). In fact, they couldn’t get enough information for data linkage on all the people who got an AI stethoscope examination, so the analysis only used half of them. The published paper also reports this analysis
Use declined over time, with clinicians citing workflow barriers to sustained use. In per-protocol analyses, adjusting for patient exposure to the AI-stethoscope, detection of heart failure (IRR 2·33 [95% CI 1·28–4·26]), atrial fibrillation (IRR 3·45 [2·24–5·32]), and VHD (IRR 1·92 [1·09–3·40]) was significantly increased in the intervention group.
So, doctors didn’t use the “AI stethoscope” much, but if you compare half of the people who did get AI stethoscoped with the same number of apparently similar people in the control group of the trial, you find big differences. This difference could be a real benefit of the new device, but we no longer really have randomised evidence on that question; we’re relying on how similar the researchers could make the treatment and control groups.
It’s still worth publishing the data, and the researchers and Lancet get credit for putting the randomised-trial analysis first in the research paper. Lancet doesn’t really get much credit for posting the results as “AI-enabled stethoscopes show promise for improving diagnosis of cardiovascular conditions, UK trial finds”.

Recent comments