A group of international researchers has shed further light on genetic variants responsible for human diseases by analysing primate DNA data with a novel AI algorithm.
Initially, the scientists sequenced over 800 individual samples from 233 species of non-human primates representing all 16 families, from lemurs to gorillas. To interpret the data, they developed a new algorithm: PrimateAI-3D.
PrimateAI-3D is built on deep-learning language architectures similar to those used in ChatGPT, but designed to model genomic rather than linguistic sequences. The team used natural selection to train its parameters, by presenting it with mutations that are ruled out for disease in our primate relatives. This way, the algorithm learned to recognise benign genetic variants and, by process of elimination, mutations that are likely to cause disease.
Then the scientists applied PrimateAI-3D to identify potentially harmful mutations in humans, using health records and gene variant data of over 400 people who have donated samples to the UK Biobank project. They found that the algorithm showed “impressive improvements” in predicting humans’ increased genetic risk for common diseases.
The method’s claimed ability to identify pathogenic mutations more accurately than existing techniques is also correlated with the fact that it can overcome genetic bias stemming from white European ancestry.
“Even though there are 8 billion of us, our genetic diversity still looks like the original population of 10,000 common ancestors we’re all descended from,” said Kyle Farh, co-author of the study and VP of Artificial Intelligence at collaborating company Illumina.
“There just isn’t enough information to glean from the human species. It became clear several years ago that, to really understand the human genome, the data contained in human genome sequencing was not enough,” he added.
Combining human and non-human primate data is key to that, especially as living primates share more than 90% of our DNA with one another. Research from Illumina has shown that a genetic variant is tolerated by natural selection in another primate, it’s 99% unlikely to cause disease in humans.
The study’s findings can be used to support health research, such as helping scientists prioritise variants that are most likely to pose a risk to humans. They can also help conserve the populations of the other primates.
“I think we’re only at the beginning,” Farh noted. “There’s a tremendous amount that can be learned here. And the idea that you can learn more about our own species from other species is, I think, deeply romantic.”
The full study is published in the journal Science.
Get the TNW newsletter
Get the most important tech news in your inbox each week.