AI performs well on facts, but fails when it comes to symptoms

From “Dr. Google” to AI chatbots: the internet is full of health advice. However, experts emphasise that medical diagnosis and treatment should remain in the hands of professionals. A recent study investigated how well ChatGPT actually performs in a medical context. The outcome is clear: ChatGPT cannot (yet) replace a doctor.

The research team tested ChatGPT on its ability to recognise medical terms, medicines and gene information. The AI scored impressively high, with an accuracy of between 88 and 97 per cent. But as soon as the questions became more complex and concerned symptoms or diagnosis, ChatGPT proved to be less reliable. The research was led by Ahmed Abdeen Hamed of Binghamton University and published in iScience.

‘The AI recognised known diseases effortlessly,’ said Hamed. ‘But as soon as people used vague or everyday language to describe their symptoms, the system lost its grip. It couldn't properly link symptoms to possible causes.’

ChatGPT is “too friendly”

An important explanation lies in the nature of the language ChatGPT uses. The model is trained to communicate in a friendly and sociable manner with laypeople, not to reason strictly in medical terms. As a result, it simplifies medical concepts, which is pleasant in conversations but risky when it comes to health questions.

Even more worrying is that ChatGPT, unlike doctors, does not indicate when it does not know something. ‘The chatbot presents answers with the same confidence, whether they are correct or incorrect,’ warns Dr Andrew Thornton of Wellstar Hospital in Atlanta. ‘That can be dangerous because users think they are getting reliable information, when that is not always the case.’

Digital self-care is a useful tool, not a replacement

It is no surprise that many people turn to AI for medical advice. According to figures from the Pew Research Centre, 34% of Americans have already used ChatGPT, twice as many as in 2023. Even more striking: one in six (17%) use AI chatbots for health information on a monthly basis, especially young people under the age of 30.

Thornton also sees this trend in his consulting room. ‘Patients now openly admit that they have looked up their symptoms online. Ten years ago, they were ashamed of doing so. Now it's the norm.’ Nevertheless, he emphasises that online information should only be supplementary. ‘AI can provide context about diseases or medications, but it cannot determine what someone actually has. That remains the domain of a doctor.’

AI in healthcare certainly has potential

Although ChatGPT is not suitable for diagnosis or triage, researchers recognise that AI can (and will) mean a lot for the future of healthcare. In the right context, for example as a supplement to medical decision-making or for patient information, AI can improve efficiency and accessibility.

For the time being, however, one piece of advice remains: use AI wisely. Consult a doctor in case of serious complaints or emergencies. ‘If you have chest pain, you should not consult a chatbot. You should call 911!’ says Thornton. AI can be valuable as a source of information, but for now it remains only a tool. Diagnosis should still come from person to person.

This study is certainly not the first of its kind to look at the actual performance and disadvantages of AI in healthcare, alongside all the “hosanna stories” about its added value. In issue 4 of our magazine, internist-intensivist and clinical pharmacologist Jessica Workum advocated for the responsible and careful use of AI solutions in healthcare. ‘The most important thing is to learn from each other, not to reinvent the wheel, to share knowledge and to find the balance between sufficient speed of innovation and responsible use of generative AI.’

During the ICT&health Congress 2026 in Maastricht, this will be precisely one of the prime topics of discussion: what works, and how, and where are the limits of AI in healthcare?