Researchers from the Faculty of Engineering at The University of Hong Kong have developed two new deep learning algorithms that significantly refine genetic analysis: ClairS-TO and Clair3-RNA. The algorithms increase the accuracy of mutation detection in cancer diagnostics and RNA-based genomic research, opening up new perspectives for precision medicine. The results have been published in Nature Communications.
The research team is led by Ruibang Luo, affiliated with the School of Computing and Data Science. The new algorithms build on long-read sequencing technology, which reads long, contiguous DNA and RNA fragments. This technology provides richer genetic information than traditional sequencing, but also places higher demands on data analysis. Until now, reliable interpretation has been a challenge, especially with complex samples such as tumour material or RNA data with high biological variation.
Tumour analysis without reference tissue
ClairS-TO addresses a persistent problem in oncological diagnostics: analysing tumour mutations without a corresponding healthy reference sample. In clinical practice, such “matched normal samples” are not always available, whereas they are necessary in conventional methods to distinguish real mutations from measurement errors.
ClairS-TO takes a different approach with an advanced dual neural network architecture. One network confirms real mutations, while a second network filters out sequencing errors and artefacts. This allows tumour DNA to be reliably analysed without reference material. This reduces costs, shortens turnaround times and makes accurate cancer diagnostics more accessible, even when the available sample is limited.
Breakthrough in RNA sequencing
Clair3-RNA is a first: according to the researchers, it is the first deep learning model specifically developed to detect small genetic variants in long-read RNA sequencing data. RNA analysis is complex because RNA editing and technical errors can cloud the interpretation of mutations.
The new algorithm uses deep learning to distinguish between real genetic variants and biological or technical “noise”. This allows researchers and clinicians to analyse gene expression and mutations simultaneously, with a level of accuracy that was previously unattainable. This is particularly relevant for research into disease processes, cancer biology and the development of personalised therapies.
Part of a proven AI ecosystem
ClairS-TO and Clair3-RNA are part of the broader Clair series, a collection of open-source AI tools for genomic analysis developed by Luo's team. Previous algorithms, including Clair3, are now considered the industry standard for processing third-generation sequencing data. The tools are known for their speed, robustness and precision and have been downloaded more than 400,000 times worldwide by research institutes and sequencing companies.
According to Luo, the new algorithms have ‘laid a solid foundation for deep learning-driven mutation discovery and accelerated the further adoption of precision medicine and clinical genomics’.
Impact on healthcare and research
The combination of higher accuracy, lower costs and broader applicability marks an important step towards more accessible and reliable genetic analysis. The technology has the potential to improve cancer diagnostics, support personalised treatment strategies and accelerate genomic research. This AI-driven innovation therefore translates not only into scientific progress, but also into tangible benefits for patients and healthcare systems worldwide.