Medical chatbots: the questions or the answers
A story in the NYTimes and also an unpaywalled story at 404 Media report a study of chatbots for medical advice, saying they are Bad and Not Good.
The research study is published in Nature Medicine. It’s a randomised controlled experiment, where people pretending to be patients were given a set of symptoms and some background health and lifestyle information. These people were randomly assigned to talk to one of three large language models or just to use whatever information they would normally use at home for a health problem.
The three chatbots were chosen because they were able to recognise the medical situation in nearly every case and typically give appropriate advice when directly given the same information that the pretend patients had. When used in chat by non-medical people, though, the bots did much less well. One highlighted example was a scenario of a severe, sudden-onset headache, with sensitivity to light and a stiff neck. In this scenario, the sudden onset and the stiff neck are both signs of a very serious event — the scenario was based on subarachnoid haemorrhage, a type of stroke. One pretend patient emphasised the suddenness of the headache and got the correct advice (Ambulance! Now!), another didn’t mention the onset and got advice for a migraine or a tension headache (“lie down in a dark room”). The bots weren’t any worse than unaided lay people, but they weren’t any better either.
You might think it’s a bit unfair to the chatbot that it wasn’t given all the information, but an important part of the training of doctors (as with statisticians and lawyers and plumbers) is learning what questions to ask when dealing with a non-specialist member of the public. Obviously, even if you think there’s a barrier in principle to statistical algorithms making great art, there’s no barrier in principle to statistical algorithms learning to take adequate medical histories. They aren’t there yet.
Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »