New AI model predicts risk of 1,200+ diseases years in advance

Researchers at the European Molecular Biology Laboratory (EMBL) have developed the generative AI model Delphi-2M, which uses data from large-scale, anonymised health records to make predictions for more than 1,000 diseases, often decades in advance.

The system is built on a transformer architecture, similar to that of large language models, with special adaptations for medical data. Delphi-2M was trained with approximately 400,000 participants from the UK Biobank. For external validation, it has been successfully tested on 1.9 million patients from the Danish national patient registry. For many conditions, the model achieves performance comparable to specialised risk tools, such as QRisk for cardiovascular disease. The results of the study were published this week in Nature.

What Delphi-2M can mean for healthcare

Delphi-2M can predict health risks at the population level. This allows policymakers to estimate how many people are likely to develop certain chronic diseases, enabling preventive interventions to be deployed more effectively and in a more targeted manner.

Although the results are promising, the researchers indicate that Delphi-2M still needs several years of development before it can be used routinely in individual patient care. Validation, reliability, and integration into clinical workflow are crucial. In short, it is not yet suitable for use in daily practice.

The model works particularly well for diseases with predictable progression, such as cardiovascular disease, diabetes, and infections. For conditions with highly variable causes or very rare congenital diseases, predictions are less reliable.

Technological & ethical challenges

Delphi-2M only uses medical records and basic lifestyle variables such as gender, age, BMI, smoking habits, etc. Biological data such as genes or protein profiles are not yet included, but the researchers plan to add them at a later stage.

One point of attention is that training and validation datasets are often demographically limited (such as the UK Biobank, which mainly contains British participants), which may lead to bias. The interpretation of predictions must also be transparent, so that both patients and doctors understand what the risk prediction means.

Delphi-2M could prove to be an important step in the development of predictive models in healthcare. It shows that AI can already predict many diseases based on simple, readily available records with a quality that matches that of specialised tools. The challenge for the future lies in scaling up, external validation, integration into daily healthcare practice and ensuring ethically responsible use: privacy, fairness and comprehensibility. If this is achieved, there is potential for both improved patient outcomes and more effective prevention programmes.

AI prediction models

Delphi-2M is certainly not the first example of AI being used to make health predictions. As early as 2021, researchers from the Catharina Heart and Vascular Centre and Eindhoven University of Technology were working on an AI model within the COMBAT-VT project that predicts which patients will develop cardiac arrhythmias after a heart attack. What was unique about that project was that no fixed dataset was used. Instead, the data available for each patient, such as ECGs, radiological images and file information, was examined.

Earlier this week, we reported on an AI model developed by researchers at Moorfields Eye Hospital NHS Foundation Trust and University College London that predicts which patients with keratoconus require immediate treatment and which patients can be safely monitored.