AI is increasingly used in digital pathology to predict cancer biology directly from microscope images. These systems promise faster diagnoses and lower costs by identifying biomarkers without additional laboratory tests. However, new research from the University of Warwick suggests that many of these models may rely on statistical shortcuts rather than genuine biological signals.
In the study, researchers analysed more than 8,000 patient samples across four cancer types: breast, colorectal, lung and endometrial cancer. The team compared several leading machine learning approaches designed to predict tumour biomarkers from histopathology images.
Not necessarily true biological insight
While many models reported high accuracy, the researchers found that these results often depended on correlations rather than direct biological signals. "It's a bit like judging a restaurant's quality by the queue of people waiting to get in: it's a useful shortcut, but it's not a direct measure of what's happening in the kitchen," says Fayyaz Minhas, associate professor and principal investigator of the Predictive Systems in Biomedicine (PRISM) Lab.
"Many AI pathology models are doing the same thing, relying on correlations between biomarkers or on obvious tissue features, rather than isolating biomarker-specific signals. And when conditions change, these shortcuts often fall apart." The findings of the study were published in Nature Biomedical Engineering.
Learning correlations instead of causality
The study illustrates how these shortcuts emerge in practice. For example, instead of detecting mutations in the BRAF gene directly from tissue images, a model might learn that BRAF mutations often occur alongside another feature such as microsatellite instability (MSI). The algorithm then predicts BRAF status based on that correlation rather than the biological signal itself.
According to Kim Branson, SVP Global Head of Artificial Intelligence and Machine Learning at GSK and co-author of the study, this can lead to misleading conclusions. "We've found that predicting a BRAF mutation by looking at correlated features like MSI is often like predicting rain by looking at umbrellas. It works, but it doesn't mean you understand meteorology. "Crucially, if a model cannot demonstrate information gain above a simple pathologist-assigned grade, we haven't advanced the field; we've just automated a shortcut."
Performance drops when shortcuts disappear
When researchers evaluated the models within carefully defined patient subgroups, such as only high-grade breast cancers or only MSI-positive tumours, accuracy dropped substantially. This suggests that many algorithms rely heavily on confounding signals that disappear when the data is stratified.
For some tasks, the advantage of deep learning over traditional clinical assessment proved modest. AI models achieved prediction accuracies slightly above 80 percent for certain biomarkers, compared with approximately 75 percent when using tumour grade alone, a measure routinely assessed by pathologists.
Stricter evaluation standards needed
Nasir Rajpoot, director of the Tissue Image Analytics (TIA) Center at the University of Warwick and CEO of the spin-out company Histofy, says the results highlight the importance of rigorous validation. "This study highlights a critical point about the rollout of AI in medicine: to deliver real and lasting impact, the value of AI-based clinically important predictions must be judged through rigorous, bias-aware evaluation, rather than relying solely on headline accuracies that fail to account for confounding effects."
The researchers argue that future pathology AI systems must go beyond correlation-based learning and explicitly model biological relationships and causal mechanisms. They also call for stronger evaluation protocols, including subgroup testing and comparisons with simple clinical baselines before systems are introduced into routine care.
AI remains valuable
Despite the limitations identified in the study, the researchers emphasize that machine learning can still play an important role in biomedical research, drug development and clinical decision support. According to Minhas, the findings should be seen as a call for more robust development practices rather than a rejection of AI in pathology.
"This research is not a condemnation of AI in pathology. It is a wake-up call. Current models may perform well in controlled settings but rely on statistical shortcuts rather than genuine biological understanding. Until more robust evaluation standards are in place, these tools should not be seen as replacements for molecular testing, and it is essential that clinicians and researchers understand their limitations and use them with appropriate caution."
Clinical experts also stress that innovation must remain aligned with patient benefit. Sabine Tejpar, head of Digestive Oncology at KU Leuven, notes that technological advances should be carefully assessed before widespread adoption. "Clinical relevance of novel tools requires grounded tailoring to what is precise, correct and feasible for the individual patient. Too often, oncology is swept up by 'innovation' with limited or no impact on patient care."
As AI continues to expand within digital pathology, like accelerating and improving cancer tissue analasys or pathology workflows, the study highlights a key challenge: ensuring that algorithms learn genuine biological mechanisms rather than statistical shortcuts that may fail in real-world clinical settings.