30 Nov 2017   |   News

Researchers seek answers on how European science cloud will work

Commission aims to provide instant access to data stores across Europe, but some scientists worry about governance, control and costs

The devil is in the details: That’s the concern of several researchers calling for more clarity about how an ambitious European Commission plan to interlink laboratories in the ‘cloud’ is going to work.

The idea behind the European Open Science Cloud is that with a few clicks researchers could get access to data or applications from any laboratory or scientific discipline across Europe. But at a Brussels conference organised Nov.28-29 by a Commission project, several researchers expressed concerns about how it will actually work.

“We are finding a high level of anxiety around this project,” said Damien Lecarpentier, a director at CSC, the Finnish IT Centre for Science.

Cloud planners say the uncertainty reflects the fact that it’s going to take some time to get all the diverse actors in the EU research landscape to agree on a common approach. 

“We’re all looking at the elephant from a different angle; that doesn’t need to be a bad thing,” said Juan Bicarregui, head of data at the UK’s Science and Technology Facilities Council and leader of a cloud pilot programme involving nearly 50 labs across Europe. “The things is, we’re all starting from different stages. In some places, researchers have quite developed cloud systems; others obviously don’t. Our job is to listen to everyone and take their issues on board.”

“It’s a learning by doing exercise,” said Jean-David Malo, director of open innovation and open science in the Commission’s research department.

2020 target date

The Commission, under a €6.7 billion long-term plan announced last year, is offering to fund some of the essential plumbing to interconnect data infrastructures run by commercial and publicly funded providers, allowing researchers to share data and applications in a way that has never been done before. The service is supposed to start by 2020, and so a flurry of meetings and studies are now underway to flesh out the details of how to knit them all together.

The project has the support of some of the world’s most influential and largest data repositories, including CERN and the European Molecular Biology Laboratory, but concerns abound. “I’m getting worried that we are heading for a large monolithic structure,” said Erik Huizer, interim chief executive officer at the EU-funded Géant, a research Internet network. “We need to be sharper and clearer as to what this is. The most exact thing you can say about the plan is that nothing is specific.”

Outsourcing data storage and management to the cloud has already become a common choice for researchers wrestling with big data. The task for the people driving the EU initiative is to convince scientists to open up access to these repositories and share their data with others.

 “There are people worried that it will jeopardise decades of investment in research infrastructures. Reaching a common understanding of what we are even talking about is a challenge,” said CDC’s Lecarpentier, who is also project manager of EUDAT, an EU-funded research infrastructure that is putting in place technical standards to allow data to be shared and curated across borders and disciplines. “It’s hard to set principles when there’s a wide set of opinions around what it should be,” Lecarpentier said.

Cloud control

A big question for scientists is who will run the cloud? The pilot led by Bicarregui is testing different governance structures, while trying to avoid “a strict hierarchical model,” said Matthew Dovey, programme director at the UK’s JISC, which provides advice on digital resources to universities and labs.

The model currently in favour is made up of three layers – a strategic layer, including the Commission and member states that would oversee and advise, a steering layer of researchers to set priorities, while an executive layer would take decisions.

But after a presentation on Wednesday, came some doubts. “Every interaction in this model means many processes,” said Géant’s Huzier. “Are we building something that will suck up all our time? We have other things to do in our own e-infrastructures, such as cyber-attacks to negotiate.”

There is the risk of creating a structure that is “bogged down” in processes, Dovey of JISC said. “It needs to be lightweight, but how do you balance that with the complexity and ambition of the cloud?” he asked. To help persuade the audience that the structure would be flexible, Dovey added there would be no ‘rules of engagements’, just ‘principles of engagement’.

Besides governance, the project requires researchers and policymakers to toggle through a large list of issues simultaneously.

One case in point is the issue of how the Commission will create incentives for data sharing and induce researchers to join in big numbers. Privacy concerns could also prevent full openness.

Another challenge is to find agreement on quality thresholds for storage and sharing and to get researchers to care about how data will be preserved 30 years into the future.

“Here you have a big task,” said Valentino Cavalli, EU open science projects officer with LIBER, an association of research libraries. “Think of all the hospitals, for example, that follow different data formats and procedures.”

Then there is data portability – researchers want to know if their research could become locked into the EU cloud system. Moreover, who is responsible if a hacker loots information, and who decides which data to delete? “If we keep on storing more and more data, it’s obvious the model will become unsustainable,” said Huizer. “At some point, we are going to need to delete stuff.”

Cash cloud

The economics of cloud computing can be complex, and in many cases, sending data to the cloud, or retrieving them, remains more expensive than in-house storage.

One possibility is that participating countries will subscribe to some form of cost-sharing model, which takes into account gross national income. Another idea is for scientists to pay for the services they use, using a special token system, or reimburse the project for core costs. “I think, if we are talking about essential costs for the running of the cloud, it won’t be an unwarranted cost for users,” said Rachel Bruce, deputy chief innovation officer at JISC. Anything that avoids erecting paywalls is important, she added.

“There will always be a large component of public funding for the cloud anyway,” said Augusto Burgueño Arjona, head of the e-infrastructure unit in the Commission’s DG CONNECT. The amount invested will depend on the outcome of EU budget negotiations over the next few years.

There are enough examples of workable business plans, so the EU cloud does not have to turn to something far beyond what already exists, advised Dimitris Koureas, a biodiversity informatics specialist at the Natural History Museum in London.

“It can be as complicated as we want it, and as simple as we like it,” he said. “I wouldn’t like us to be too judgemental because it’s a huge opportunity.”