Researchers at ETH Zurich have developed a revolutionary digital tool that allows scientists to search through the world’s genetic data archives as easily as a web search on Google. The system, called MetaGraph, enables rapid full-text searches of global DNA and RNA databases—an innovation that could accelerate biomedical research and the development of new treatments.
For decades, DNA sequencing has driven major advances in medicine, from identifying rare hereditary diseases to decoding the SARS-CoV-2 genome. Yet the enormous volume of genomic data, over 100 petabytes stored in international archives such as the Sequence Read Archive (SRA) and European Nucleotide Archive (ENA), has made efficient searching virtually impossible. Until now, researchers needed vast computing power and time-consuming downloads to analyze data.
DNA or RNA sequence search
MetaGraph changes this. Published in Nature, the tool allows users to enter a DNA or RNA sequence into a search field and instantly discover where that sequence appears in public datasets. “It’s a kind of Google for DNA,” says prof. Gunnar Rätsch, data scientist at ETH Zurich.
By compressing genomic information by a factor of 300, MetaGraph makes it possible to index and search biological sequences from humans, animals, plants, viruses and bacteria with remarkable speed and efficiency. The tool not only cuts computing costs—estimated at less than $1 per megabase—but also integrates raw sequence data with descriptive metadata, ensuring precise and comprehensive results.
According to dr. André Kahles from ETH’s Biomedical Informatics Group, the system’s scalability is its key advantage: “The more data we query, the less additional computing power is required.” Currently, MetaGraph covers nearly half of all global sequence datasets, with full coverage expected by the end of the year.
Significant healthcare implications
The implications for healthcare and life sciences are significant. Researchers can now identify antibiotic resistance genes, track pathogen mutations, or search for therapeutic viruses (bacteriophages) in seconds—accelerating the discovery of new drugs and diagnostic tools.
As genomic sequencing becomes ever faster and more affordable, tools like MetaGraph could soon extend beyond the lab. “In the early days, even Google didn’t know what a search engine would become,” notes Kahles. “One day, tools like this might help anyone identify genetic traits, or even their own houseplants, using DNA.”