DeepRare AI outperforms physicians in rare disease diagnosis

An advanced AI system named DeepRare has demonstrated superior diagnostic performance compared to experienced physicians in identifying rare diseases, according to research. The findings mark a significant step toward AI-supported clinical decision-making in complex diagnostic pathways.

Rare diseases affect an estimated 300 million people worldwide. Due to their low prevalence and overlapping symptom profiles, diagnosis often takes five years or longer. Patients frequently experience repeated referrals, misdiagnoses and unnecessary interventions before receiving a definitive answer.

Multi-agent AI architecture

Unlike conventional AI systems that rely on a single model to solve a problem, DeepRare is built as a multi-agent framework. The platform integrates 40 specialised digital tools capable of analysing genomic data, structured and unstructured medical records, clinical notes, including handwritten documentation, and curated medical databases.

At the centre of the system is a coordinating AI “host” that orchestrates collaboration between these tools. This agentic architecture enables DeepRare to synthesise diverse data sources into a coherent diagnostic hypothesis. The research was recently published in Nature.

Large-scale validation

The system was first evaluated retrospectively using 6,401 clinical cases with confirmed diagnoses. Researchers provided DeepRare with the same symptom descriptions and DNA data originally available to clinicians. The AI not only identified the correct condition earlier in the diagnostic pathway but also outperformed 15 established diagnostic support tools.

A more stringent head-to-head evaluation followed, involving 163 complex cases. Five physicians, each with more than ten years of clinical experience, were given access to identical clinical information as the AI system. DeepRare achieved a first-attempt diagnostic accuracy of 64.4%, compared to 54.6% among the physicians. According to the research team, “DeepRare is one of the first computational models to surpass the diagnostic performance of expert physicians in the complex task of rare-disease phenotyping and diagnosis.”

Transparent reasoning and clinical alignment

Even when the correct diagnosis was not ranked first, DeepRare demonstrated a high Recall@3 score, meaning the correct condition was almost always included among its top three suggestions. Importantly, ten rare disease specialists reviewed the AI’s step-by-step reasoning and agreed with its logic in 95.4% of cases.

The researchers emphasise that the system is not intended to replace clinicians, but to augment diagnostic workflows. “Our work not only advances rare disease diagnosis but also demonstrates how the latest powerful large-language-model-driven agentic systems can reshape current clinical workflows.” If validated in prospective clinical settings, agent-based AI platforms such as DeepRare could help shorten the diagnostic odyssey for rare disease patients, improving outcomes while reducing healthcare burden.

AI-driven diagnosis

Last year, an international research team developed popEVE, an AI system that can identify harmful DNA mutations, even if they have never been previously documented. Designed to support rare disease diagnosis, the model helps clinicians prioritise the most likely pathogenic variants, potentially shortening the long diagnostic journey faced by many patients. The study, published in Nature Genetics, was led by researchers from Harvard Medical School and the Centre for Genomic Regulation in Barcelona.

Unlike traditional tools, popEVE draws on billions of years of evolutionary data, analysing protein sequences across hundreds of thousands of species to determine which genetic regions are essential for life. Combined with large-scale population datasets such as gnomAD and the UK Biobank, the system ranks mutation severity across all genes while reducing ancestry-related bias. Tested on 31,000 families, popEVE identified the correct pathogenic mutation in 98% of cases and uncovered 123 previously unknown disease genes, supporting faster and fairer genomic diagnostics.

AI Diagnostic Orchestrator

In june of 2025, Microsoft introduced the AI Diagnostic Orchestrator (MAI-DxO), a decision-support system designed to simulate the reasoning of a multidisciplinary medical team. In a test involving 304 complex cases from the New England Journal of Medicine, the tool achieved an accuracy of more than 85 percent. That was more than four times higher than the 20 percent average scored by 21 experienced physicians under controlled conditions.

MAI-DxO uses a “chain-of-debate” reasoning approach, combining multiple AI models, including GPT, Claude, Gemini and Llama, into a coordinated agent system. The platform actively gathers information, suggests targeted tests and refines hypotheses, while also promising greater transparency and a potential 20 percent reduction in diagnostic costs. Microsoft emphasised that the tool is intended to support, not replace, clinicians. Experts call the results promising but urge caution, noting the controlled test environment. Further clinical validation and regulatory approval are required before broader implementation in healthcare settings.