AI is advancing at an unprecedented pace in healthcare, but experts warn that innovation is outstripping safeguards. Researchers from Flinders University argue that while AI systems are becoming increasingly capable, their real-world safety and reliability remain far from guaranteed.
In a commentary the team highlights a growing gap between technological progress and the frameworks needed to safely implement AI in clinical practice. According to the authors, strong performance in controlled studies does not automatically translate into safe or effective use in hospitals and clinics. Despite these concerns, the researchers acknowledge the significant potential of AI to support healthcare professionals, particularly in high-pressure environments where efficiency and decision support are critical.
Clinical reasoning
Recent developments in AI have shifted the field beyond simple question-and-answer systems. New models are increasingly capable of reasoning through clinical scenarios step by step, mimicking aspects of human diagnostic thinking. According to Erik Cornelisse, Ph.D. candidate at Flinders University, this represents a major turning point. “We are moving from basic tools toward systems that can demonstrate forms of clinical reasoning, at least in text-based scenarios,” he explains.
Some of these systems have shown performance levels comparable to, or even exceeding, experienced clinicians when tested on structured diagnostic cases. This has fueled optimism about their potential role in supporting medical decision-making. However, the researchers emphasize that such results are largely based on controlled environments and do not reflect the full complexity of real-world healthcare.
The limits of AI in clinical practice
Healthcare decision-making involves far more than interpreting data or solving theoretical cases. It requires physical examination, patient interaction, contextual understanding, and ethical responsibility. Elements that current AI systems cannot replicate. “Healthcare decisions are complex, high-stakes, and deeply human,” says Cornelisse. “Accuracy alone, particularly in text-based tasks, is not enough to ensure patient safety.”
Senior author Ash Hopkins adds that modern medicine depends on judgment, accountability, and professional oversight. While AI can assist in analyzing information, it cannot replace the broader clinical role of healthcare professionals. This gap highlights the risk of overestimating the readiness of AI tools for routine use. Without proper validation and integration, even highly accurate systems may lead to unintended consequences.
Risks of premature deployment
The commentary also points to known risks associated with poorly evaluated AI systems. These include algorithmic bias, unequal access to care, and potential harm to patients. If AI models are trained on incomplete or unrepresentative datasets, they may reinforce existing disparities rather than reduce them.
“History shows that algorithms can amplify problems as easily as they solve them,” Cornelisse warns. This underscores the need for rigorous testing and continuous monitoring before and during deployment. The researchers stress that healthcare systems must avoid adopting AI simply because it performs well in experimental settings. Instead, real-world outcomes, such as improved patient health and safety, should be the primary benchmark.
Responsible and regulated AI use
Looking ahead, the Flinders team calls for stronger governance and clearer standards for medical AI. They argue that AI systems should be held to standards comparable to those applied to healthcare professionals. “We do not allow doctors to practice without supervision and evaluation,” says Cornelisse. “AI should meet similar expectations.” The commentary was recently published in Science.
Efforts are already underway among policymakers, researchers, and industry stakeholders to define legal, ethical, and professional responsibilities for AI in healthcare. However, the researchers stress that careful, step-by-step integration into clinical workflows is essential. According to Hopkins, the ultimate goal is not to deploy AI quickly, but to deploy it responsibly. “Patients deserve technology that improves care in real-world settings, not just systems that perform well in studies,” he says.
If implemented thoughtfully, AI has the potential to enhance clinical decision-making, improve efficiency, and support more equitable care. But achieving this will require a balance between innovation and oversight. As healthcare systems continue to explore the possibilities of artificial intelligence, the message from Flinders University is clear: progress must be matched with responsibility. Only through careful design, rigorous evaluation, and strong governance can AI become a truly safe and effective tool in modern medicine.
Deployment of AI in healthcare
Last year TU Delft was asked to advise the WHO on the issue that only about 2% of AI innovations are actually used. Barriers include poor alignment with clinical practice, ethical concerns, and algorithmic bias. Together with partners such as Erasmus MC and SAS, TU Delft is developing frameworks for responsible AI use in healthcare. These are tested in clinical practice, for example in projects that use AI to determine safe hospital discharge timing.