A ‘coalition of doers’ called on to build complex EU science cloud

The EU set out the groundwork for a hugely ambitious plan to interconnect Europe’s science outputs. Now the squabbling over how to fund and who to run the cloud begins

EU Research Commissioner Carlos Moedas speaking at the European Open Science Cloud Summit

“We are moving from vision to action,” said EU Research Commissioner Carlos Moedas, calling on governments and researchers to back the creation of a European Open Science Cloud and give scientists ready access to data stores across Europe.

At a meeting to push the next phase of one of the most ambitious projects undertaken by Brussels, Moedas told heads of Europe’s leading research institutes and big science infrastructures, “We are not short of [cloud] initiatives. But there is no one-stop-shop for researchers. No overall architecture that allows them to connect.”

The vision is that with a few clicks researchers will get access to data from any laboratory or scientific discipline across Europe. The cloud project will install the essential plumbing for this to happen, by interconnecting data infrastructures run by commercial and publicly-funded providers, adding software, metadata, data registries and other tools needed to glue these existing services together.

The Commission’s digital and research directorates are incubating the project, which Moedas wants to see up and running by 2020. He told researchers in the room, who included representatives from the European Molecular Biology Laboratory, CERN and the Spallation Source, to form a “coalition of doers” to build “the new republic of letters”, a reference to the network of correspondence that once linked thinkers including Locke, Voltaire and Rousseau.

Scattered clouds

The burning question is how exactly to go about collecting and ordering all the research data Europe produces under a single interface. With the project is in its early days, the discussions yesterday underscored how much still needs to come together.

The science cloud will not be a physical place, said Robert-Jan Smits, director-general of the Commission’s research directorate. “It’s an ecosystem, involving ministries, funders, universities and data service providers. We are willing to invest big time into this, but we can only build it by working together,” he said.

Amongst the issues that need to be resolved, none of the major players in the room could agree on who should run the cloud. While Klaus Tochtermann, director of the Leibniz Information Centre for Economics said the Commission should, “step back and hand it to the member states after it gets going,” others said the EU should retain a prominent role.

And inevitably, legal complications loom. “I see all kinds of obstacles in legislation,” said Kurt Deketelaere, secretary-general of the League of European Research Universities. “Take the general data protection legislation, which is one of the worst pieces of law we have ever adopted, and could see different interpretations for data use across countries. And then there are moves in Brussels by some to make copyright law more restrictive, creating more obstacles.”

Efforts to ensure privacy and accountability by encrypting parts of the cloud, were also discussed.  

Numerous funding agencies are concerned that sending their data into the cloud could threaten the privacy of subjects who take part in research. As Rory Fitzgerald, director of the European Social Survey, pointed out, the cloud will need to honour the ‘right to be forgotten’. “How is this going to work? he asked.

Deciding what data to keep is one problem, deciding what data to delete is another. “If we really try and store every bit of data, it’s going to be really expensive,” said Cees De Laat, a researcher at the Lawrence Berkeley National Lab.

As head of one of Europe’s biggest data producers, the European Molecular Biology Laboratory, Iain Mattaj is in a good position to size up the potential costs involved. “We rarely throw data away; it is just too expensive,” he said. “We do close data resources and shut services, regularly. It’s infinitely more expensive to go back and prune out the data you no longer want.”

Researcher behaviour

Many people at the meeting said the biggest obstacle is neither technical nor legal, but rather getting enough people to care about how to preserve data 30 years into the future.

Only around one in five universities have research data plans, said Lidia Borrell-Damian, director of research and innovation at the European University Association. Few university librarians have strong data management training. “We have data steward vacancies that we cannot fill,” said Karel Luyben, rector of the Delft University of Technology.

Funders are trying to make researchers think harder about data management by offering extra money for stewardship and storage. The Swiss National Science Foundation, for example, puts aside CHF3 million a year for curation, or CHF 10,000 per project. However, the Commission says per-project payments will be ruinously expensive in the long-term.

Another issue is deciding whether the responsibility for preserving the data lies with the researcher, the university or the funder.

It should certainly not reside with companies, said Eckhard Elsen, director for research and computing at CERN. We can’t hand stewardship responsibility over to companies, because what happens if they go bankrupt in 30 years?”

Cloud can make up lost altitude

The cloud is not only about connecting networks, it must make data findable, accessible, interoperable and reusable across borders. For this it will need customised tools, like its own search engine.

“The internet is useless to us without Google or apps to take us to the data we want to use. So, in the end, we are going to need some generalised tools to navigate the landscape of data,” said John Wormsley, director-general of the European Spallation Source, a super-microscope facility in Sweden.

The cloud needs to be user friendly, or it will be like disappearing data down a memory hole, according to Tiziana Ferrari, technical director of EGI, an EU-funded infrastructure that provides computer storage to researchers. “There is a risk that the effort can become a data graveyard,” she said.

Currently, US companies such as Microsoft, Google and Amazon, rather than any EU government or home grown company, have the most sway on the future of data collection and storage.

“Clearly, we are not in the lead when it comes to computers,” said Mateo Valero, director of the Barcelona Supercomputing Centre. “We are quite good developing software and applications but we don’t make any hardware – we have to buy it, and this is a sin. We need to be independent; we should not be forced to buy tech outside of Europe.”

To provide the basic infrastructure of the science cloud, the EU has pledged big investments in high-bandwidth networks, large scale storage facilities and supercomputer capacity.

The precise amount invested depends on the outcome of EU budget negotiations over the next few years but a sustainable model will inevitably mean finding private investment. While several different commercial ideas are under discussion, no companies were invited to the meeting.

“We now have a flourishing cloud services market in Europe, we need to be able to leverage it,” said Bob Jones, head of the Helix Nebula science cloud project.

Goulash approach

Resolving these issues and coming up with final arrangement for the cloud that is amenable to the many competing science power centres in Europe will not be easy. It will be necessary to satisfy governments in Europe and elsewhere, universities, industry and MEPs.

However, people at the meeting greeted the task with optimism. Just because it’s hard doesn’t mean it’s not worth it,” said Carole Goble, professor of computer science at Manchester University.

Ex-MEP Edit Herczog, now a lobbyist working with the Research Data Alliance, said that, like Hungarian goulash, the cloud could come in a thousand different varieties. “I like to think of Christopher Columbus, who didn’t know where he was going, and he didn’t know anything about the destination even when he got there. The science cloud could be our ship,” she said.

Lest anyone was in any doubt about Columbus’ place in history, Smits reminded the room, “His ship also relied on public funding.”
Receive our free weekly EU innovation newsletter, sign up now
Related subjects: Open science cloud