Until now, genetic risk factors for breast cancer have usually been studied as single factors. Professor Arto Mannermaa´s team at the University of Eastern Finland intends to explore the big picture and look for more factors which significantly increase the risk of illness when they interact. This is where artificial intelligence comes in.
Mannermaa's team is developing algorithms that can learn on the basis of genomic and clinical data, and identify and predict risk factors. Learning algorithms are also used in the interpretation of mammography images. Genomic and clinical data are integrated to an AI model that not only helps to determine the risk of illness, but also in drawing up individual treatment plans.
The question Mannermaa wants to answer is which factors have contributed to the onset of breast cancer in a patient. Mannermaa's team has created an AI model for breast cancer risk factors that is being tested with Finnish and international material.
– We also have material obtained from the Biobank. We are comparing the data of breast cancer patients and healthy individuals and trying to find the interactive combination of all variables that has the greatest influence on the onset of breast cancer. says Mannermaa.
One of the study's targets concerns normal genomic variation, or SNPs. The rapid development of DNA sequencing techniques has made it possible to determine single nucleotide polymorphisms (SNPs), providing a very accurate estimate of the differences between individuals.SNP is the difference in the DNA chain caused by a mutation within a population.
Mannermaa's team is working to identify SNPs related to breast cancer, by means of AI and learning algorithms. The results have been promising. The algorithm helped to identify genes close to SNPs, and these SNPs are probably affecting the operation of the genes. We found a gene network related to oestrogen metabolism.
The amount of data in Mannermaa's team's study is so huge that CSC's (The Finnish ELIXIR node) supercomputing capacity is required.
– About 200,000 SNPs can be identified from one laboratory sample. Each SNP is compared with all the others. In addition, we simulate genetic variation, in other words what SNPs they have in common but remain unidentified. This means that up to another 10 million SNPs can be added to the equation. Add to this variables from imaging and the biobank, and computing capacity is definitely called for.
The basic model of the Mannermaa team's AI is based on genetic data. Clinical variables, i.e. breast cancer risk factors, have now been added to this model. Mannermaa believes that the models will significantly improve diagnostics.
This article was first published on 25 February 2020 by CSC – IT CENTER FOR SCIENCE.