Trusted legal frameworks and tech tools could finally drive widespread data sharing
Open science depends on trusted tools and rules. Now, the technological, legal and political pieces may finally be falling into place. That was the view of the speakers in a recent Science|Business webinar entitled: Mandatory data sharing versus data sovereignty.
In a generally upbeat discussion, representatives of the OECD, the European Commission, academia and industry sounded an optimistic note about the future of data flows between different actors in the economy and different regions of the world, paving the way for faster research and innovation.
“I see […] a strong commitment by countries to both work together more, but to also tackle some of the more thorny problems,” said Audrey Plonk, head of division, digital economic policy, at the OECD. Whereas privacy, trade, national security and other issues have long been addressed separately, now “we're starting to see much more willingness at the government level to bring those communities together to try to dislodge some of the challenges that those silos have created over time,” Plonk told the webinar.
Virtual machines travel the world
The pandemic may have focused minds. The urgent need to track and anticipate the spread of COVID-19 appears to have accelerated efforts to share healthcare data between countries and between different actors, such as public bodies, pharmaceutical companies and researchers in academia. Erik Flikkenschild, information manager of the Leiden University Medical Centre, described how his team has helped establish an automated system that enables researchers to access data on the number of COVID-19 patients in hospitals in six African countries, simply by pushing a button.
Co-funded by the Philips Foundation and Google, the (Go-FAIR) Virus Outbreak Data Network (VODAN) is using the FAIR (findable, accessible, interoperable and reusable) principles to enable both humans and machines to read data from many different healthcare systems. VODAN has created a network of “FAIR data points” that virtual machines can “visit” and query the local patient records, compiled in a format standardised by the World Health Organization. The local data custodian (generally a hospital or centre for disease control and prevention) grants permission to the virtual machine to ask the question or run an analysis. As the personal data of patients never leaves the underlying database of the local institution, VODAN says the data can be visited without violating any patient rights or the laws and policies of the local jurisdictions.
“I don't know what was the technology in Africa, it was all different, but at the end it worked and we could have a global view on how the patients in different hospitals were increasing,” thereby demonstrating the potential of the FAIR principles, said Flikkenschild. He suggested the development of the Internet is now entering a new phase with FAIR digital objects as “the follow-up” to the HTTP protocol “You can do your own technical implementation, but there are some principles you have to follow and if you do, then you can see that it will work,” he added.
Will Europe be the global trendsetter?
The concept of analysing the data in situ, rather than trying to move it or copy it, is in line with the thinking of GAIA-X, a public-private partnership to develop a federated data infrastructure, backed by the German and French governments. “We want to create data spaces where data is exchanged between the partners themselves or where the data remains where it is and only the algorithms are exchanged,” Andreas Weiss, director of EuroCloud Deutschland and coordinator of GAIA-X’s federation services, told the webinar. “If you start to share the data, you should be still in the driver seat and you should also be able to stop sharing the data or to revoke the data.”
GAIA-X, which says more than 300 organisations are involved in the initiative, appears to be gaining momentum. Just before the Science|Business webinar, BMW, Siemens, SAP, Robert Bosch and Deutsche Telekom announced GAIA-X will underpin a new cloud platform for data exchange within the automotive sector.
But Weiss stressed participation isn’t limited to European entities. “There is the constraint that GAIA-X is clearly related to European standards and values, [but] there's no constraint to GAIA-X as a community and we have already the buy-in by Japan, by South Korea, by China, by the U.S., so everyone is already on board, even the hyper-scalers are included,” he said.
“Some parties are saying this might be a gold standard,” Weiss added, drawing a comparison with the way in which other countries and some U.S. states, such as California, have introduced rules broadly aligned with those in the EU’s General Data Protection Regulation (GDPR). But he did caution that GAIA-X is only one year old, adding: “We need to convince the market about this.”
Still, GAIA-X and other proponents of data sharing may be pushing at an open door, as businesses increasingly look to access broader and deeper datasets to glean better insights, and enable machine learning and greater automation. “It's very unlikely that any one entity on their own has enough data to be successful at artificial intelligence development,” Jeremy Rollison, senior director of EU government affairs, Microsoft, told the webinar. “Some of the most successful companies of the future are going to be those that are most open, [but] that does not mean that you make your data available necessarily in all circumstances for anyone and everyone to use with any purpose in mind.”
Rollison also alluded to how greater data sharing could help to reduce inequalities both between countries and between companies. “You want to avoid a situation where advantages emerge that become very hard to overcome,” he said. “When it's in the context of AI or other contexts, you want to bridge that data divide and I think data sharing is a way of doing so […] ensuring that data doesn't increasingly fall into the hands of only a few countries or only a few companies.”
Controlling how your data is used by others
Microsoft and other companies are developing new tools designed to enable entities to share data with other players, but solely for a specific purpose. The goal is to ensure the owner maintains control over their data, explained David Sturzenegger, head of product at Decentriq, a Zurich-based start-up. “With this technology, the data owners can get a proof of what their data is used for, as well as a proof of deletion of their data if they want to,” he added, noting that Decentriq is providing banks, insurance, reinsurance and healthcare companies with a platform they can use to confidentially collaborate on sensitive data. “I can basically give my health data for a specific research, but I can be absolutely sure that it can't be used for anything else.”
He said this “confidential computing” technology means that you can outsource data processing to cloud providers without having to fear that they could access your data. “So, it's not necessarily a trade-off” between data utility and data privacy, Sturzenegger concluded. “You can actually get the best of both worlds.”
Decentriq intends to become an intermediary providing services and tools to entities that share data in line with the European Commission’s proposed Data Governance Act, which was published in November 2020. That Act would require these intermediaries to meet certain criteria to ensure they are neutral and trustworthy, such as not using the data exchanged to develop their own products. The legislation calls for structural separation between the data-sharing service and any other services provided, so as to avoid conflicts of interest.
Although it has shied away from making data sharing mandatory, the European Commission is going to some lengths to encourage the practice. As well as drawing up new legislation, it is preparing to roll out “common European data spaces” supported by secure technological infrastructure and governance mechanisms. The Commission plans to invest €2 billion to foster the development of data processing infrastructures, tools, architectures and mechanisms for data sharing.
Federico Milani, deputy head of the Data Policy and Innovation Unit at the European Commission, said the aim is to create an environment in which companies will be willing to share the data knowing that it won’t be used to compete with them and the confidentiality of their data will be respected, as appropriate.
Tightening up the legislative framework
By the end of next year, the Commission hopes to introduce further legislation that will address tensions that can arise when multiple entities are involved in creating data, Milani noted. For example, “in the automotive industry, the future car or even the present car, will produce an enormous of data,” he said. “Who can do what with that data? Usually it's collected by the car manufacturer, but there is data which is personal because it's how I drive the car. What can I do with that data? […] Can I give this data, for example, to another company for providing me additional services?”
At the same time, the EU may need to make its existing legislation, particularly the GDPR, clearer. That is certainly the view of Novartis. Alexandre Entraygues, head of data privacy EU at the pharmaceutical company, called for greater clarity about what is allowed under the GDPR’s notion of scientific research. “The problem is that there's no definition” of scientific research in the GDPR, he said. “There’s just a recital […] advocating for a broad interpretation […] so you can imagine the level of discussions, dissenting opinions, debates around this notion.”
Entraygues compared the U.S. and European approaches when it comes to the rules around anonymising data. The U.S. has taken a “pretty practical approach” to de-identification, having created “a list of 19 identifiers for the patient which need to be removed from the data set, or the related documentation, in order to achieve de-identification,” Entraygues explained, whereas the European approach is “much more conceptual” and does not have a standardised process.
Entraygues also highlighted that the GDPR’s provisions on health data leave a lot of scope for national interpretation, resulting in very high fragmentation in Europe with respect to the privacy rules relating to scientific research. This situation has prompted the pharmaceutical industry to develop a self-regulatory code of conduct to enable consistent practices across Europe and build greater trust.
There is also work to be done to bring about greater harmonisation at a global level, according to several speakers. The EU and the U.S. still don’t see eye to eye on how best to protect individuals’ privacy, for example, as evidenced by the long-standing dispute about transfers of EU personal data to the U.S.. “We see increasingly that the countries want to restrict data movement for various reasons, whether it's for the protection of privacy or intellectual property or the desire for sovereignty,” noted Plonk from the OECD. “That's a concerning trend.”
She suggested there is a need for some sort of rules and agreement by which we govern data. It will be important, she added, for international stakeholders to work together “to develop a more nuanced understanding of data, from how it's classified to how it's categorized, to how we share it and measure it and understand it and give it value over time, so that we can get more concrete and more specific in our policy making and ultimately so that we can reap what we believe to be tremendous benefits from its use.”