The Commission wants to knit existing data infrastructures into a shared pan European resource. Agreeing the rules for doing this is a daunting task, says Juan Bicarregui, a member of the expert group that is laying the groundwork
For the research labs laying the groundwork for the European Open Science Cloud (EOSC) - the European Commission’s ambitious initiative to make it easier and cheaper to share research data - drawing up a governance structure is a particular challenge.
“We don’t know how to do it at the moment, so we’re talking to a lot of people for ideas,” said Juan Bicarregui, head of data services at the UK Science and Technology Facilities Council, coordinator of the EOSC pilot, a two year project that kicked off in January. “It’s really hard. The cloud is a broad vision and it will have a lot of funding streams running into it,” Bicarregui said.
Rather than build its own cloud from scratch, the Commission has proposed that the EU helps to interconnect existing and new European data infrastructures run by commercial and publicly-funded providers, adding the software, metadata, data registries and other tools needed to glue things together. The aim is that scientists will be a few clicks away from access to multi-petabytes of data from any lab or any scientific discipline in Europe.
The European cloud vision is being incubated by the Commission’s digital and research directorates, with advice from a 10-person panel headed by Dutch molecular biologist Barend Mons.
“Everyone agrees it needs to happen. We’re testing the water, trying to make the vision a bit more real,” said Bicarregui. “We want to bring all the good stuff together and share resources so that researchers don’t have to do things twice.”
The pilot, being steered by a diverse group of almost 50 research labs, including Max-Planck, the European Molecular Biology Laboratory (EMBL) and the Barcelona Supercomputing Centre, is clearing the path to the cloud, which is supposed to go live in 2020.
Over the course of two years it will deliver a number of building blocks, including technology demonstrators and a first draft of a multi-stakeholder structure which can accommodate a mix of different users. “We don’t want to come up with a controlling, centralised governance,” said Bicarregui.
But the final arrangement will have to be amenable to the many competing science power centres in Europe. The biggest data producers include European Organisation for Nuclear Research (CERN) - which is not a member of the pilot - and EMBL. It will also need to satisfy governments in Europe and elsewhere, universities, industry and MEPs.
That will be no mean feat, especially given there are no international cloud initiatives on which to draw for inspiration. “Australia has done a lot to bring cloud initiatives together. Theirs is a much smaller effort though,” said Bicarregui.
Discussions are taking place to ensure the European cloud is compatible with those in the US, Australia, South Africa, Canada and other nations that have a growing interest in open research data.
An international group, the Research Data Alliance, has been organising meetings on the topic for the past five years (Bicarregui is on the body’s steering group).
Creating an end product that will not look outdated after a few years will be top of peoples’ minds. A few politicians in Brussels have expressed misgivings about the initiative, fearing it recalls expensive failures in the EU’s past.
“There’s a small risk of redundancy when you deal with new cutting edge technology but this is not dissimilar to what you face in any project. There’s always lock-in and legacy costs,” said Bicarregui.
He does not believe the initiative will be impeded by technical limits but rather that, “the barriers are largely cultural.”
Researchers remain relatively inexperienced with cloud services, and there may be some initial resistance to the idea of an open science commons.
“Engineers can [complete] whatever task you set them. The hard thing is agreeing on the vision,” Bicarregui said.