The GDI project starter kit is newly launched as a pilot to meet strong desire to create a federated, sustainable, interoperable and secure infrastructure based on open community standards for sensitive genomic data.
The European Genomic Data Infrastructure (GDI) aims to realize the 1+MG initiative’s ambition to enable secure access to human genomics and corresponding clinical data across Europe by creating data services. The newly released GDI starter kit is a key project output that supports federated data access workflows. It will benefit not only the national nodes of GDI, but also in future public institutions and companies who can use the software technologies provided in the starter kit to increase service interoperability by including these software and standards into their own environment.
This development has also been one of the key objectives of the national 1+ million genomes initiative in Finland, coordinated by the ministry of social affairs and health and supported by the ministry of education and culture. The aim furthermore is to increase interoperability of national biobank data operations, and enable them to connect to a federated data infrastructure that crosses national borders in Europe. At the moment, the transfer and utilization of genomic data does not work across borders. The GDI project aims to create a data network worthy of public trust to hold over one million human genome sequences for research and clinical reference. This will create opportunities for transnational and multi-stakeholder actions, such as those required to drive personalized medicine forward. Authorized data users, such as clinicians, researchers and innovators, will be able to advance understanding of genomics for more precise and faster clinical decision-making, diagnostics, treatments and predictive medicine, and for improved public health measures to benefit European citizens, healthcare systems and the overall economy.
Data infrastructure
The starter kit is a package of open source reference implementations of software that is the basis for infrastructure services co-developed by the 20 GDI nodes. Starter kit uses open source software, and implements a framework of standards from both ISO and open community standards such as the Global Alliance for Genomics and Health (GA4GH). The starter kit is a pilot to support national nodes by demonstrating how a set of applications and components can be linked via these standards to form a secure and distributed data infrastructure. Overall data infrastructure provides 5 main functionalities:
Dylan Spalding, Senior Coordinator, CSC: “The development of the Starter Kit products have been led by the product owners from 4 different countries - Dominik F. Bučík and Lukas Hejtmanek (Czechia), Meeri Hakala (Finland), Albert Hornos and Jordi Rambla (Spain), and Johan Viklund and Dimitris Bampalikis (Sweden). These products have not been developed in isolation for the GDI Starter Kit, but in collaboration with other projects, such as Federated EGA. This helps ensure technical interoperability of the Starter Kit with other genomic data sharing platforms, and provides a demonstration of the standards proposed for use within GDI which links the different components or products together, with the aim of making the data as FAIR as possible.”
“Finland and Sweden co-lead technical development of the European genomic data infrastructure services. We together in the Nordics envision that many of the software products in the starter kit - which are all open source software products - can be operated as services in national data hubs across Europe. These services then form a European backbone for securely managing sensitive data derived from human samples in collaboration with biobanks. GDI results may thus be deployed as soon as we have the legal framework that allows it”, says Tommi Nyrönen, the program director of the ELIXIR Node in Finland.
Research Professor at the National Institute for Health and Welfare (THL) Markus Perola: “The perspective of research improving national health systems is gathering statistically significant, high-quality and versatile datasets. Today, an integral facet of this effort is the incorporation of genome information. It provides molecular insights on how each cell and individual will react to changes in its environment. In the GDI project Finnish national health authority THL is responsible for submission of harmonized dataset at the given framework of standards and processes. With the help of data infrastructure services we aim to make it securely stored and accessible for human health research with appropriate data access bodies.”
This release is the first version of the starter kit, which will be further developed during the lifetime of the project following an iterative approach.
This article was first published on 24 August by CSC – IT CENTER FOR SCIENCE.