What is the cloud and why do researchers need it?
The cloud is a means for institutions to share, re-use and analyse the huge data sets produced by modern science. Of course, components of it already exist; lots of public labs run cloud services for researchers, either in-house or through private sector cloud providers. But the Commission’s aim is to make sharing and analysing data easier, more efficient, cheaper, sustainable and versatile. With a few clicks, it hopes, you will be able from any lab in Europe to get access to research data from any other lab or scientific discipline. And you may also be able to access computers that offer cutting-edge speeds, and save on purchasing new equipment and costly software upgrades.
Who uses it in Europe?
Computing clouds are used by very large science hubs in Europe, including ELIXIR, which manages biological data, the European Plate Observing System, which monitors the earth’s crust, and initiatives such as the Helix Nebula project, a computer cloud system run by research centres including the European Organisation for Nuclear Research (CERN) and the European Space Agency. Additionally, a sizeable number of both large (e.g. Horizon 2020) and small projects and individual researchers, in various other disciplines, make use of cloud-based services of different kinds to manage data.
So, the academic cloud market is growing…
Yes. Research data stored in public archives around the world are already well into the multi-petabyte range, and the computation needs of Big Science are growing by the day. The coming years are expected to see a big upswing in the use of big data with academic cloud services such as the European Bioinformatics Institute in Hinxton, the UK, and commercial cloud providers including Microsoft, Amazon and Google, becoming regular academic partners. In the UK, the Wellcome Trust has given grant money to cloud projects while on the other side of the Atlantic, the US National Science Foundation has given money to federate the private academic clouds of several universities. The National Institutes of Health is starting the ‘Commons’, an initiative almost identical to Commission’s science cloud.
And the EU wants to make its own science cloud?
Not quite. The “European Open Science Cloud” initiative, announced in April 2016, intends to offer the EU's 1.7 million researchers and 70 million science and technology professionals "a virtual environment to store, share and re-use their data across disciplines and borders." Rather than build its own cloud from scratch, the idea is that the EU would help interconnect existing and new European data infrastructures run by commercial and publicly funded providers – adding software, metadata, data registries and other tools needed to glue the existing services together. In the short term, the Commission aims to organise what’s already available into a single, bigger marketplace, and thereby stimulate private and public sector to invest more in cloud services themselves. Underpinning the new cloud is the EU’s ‘European Data Infrastructure’ initiative, which will invest in modernising high-bandwidth networks, large scale storage facilities and super-computer capacity.
When will the open science cloud come into operation?
It will be rolled out in several stages between now and 2020. During 2017, planning for the European Open Science Cloud will step up the pace – writing the “Rules of Engagement” for suppliers and users to join the cloud, and setting up a way to manage the cloud coherently (a governance system). In parallel, the Commission will propose a new framework for consistent, long-term funding of cloud-based scientific data and infrastructures; so far, financing has been national and discipline-bound. Discussions, to be sure the European cloud is compatible with the rest of the world, will continue with the US, Australia, South Africa, Canada and other nations that have a growing interest in open research data; an international group, the Research Data Alliance, has already been organising meetings on the topic for the past five years, and a G-7 meeting will discuss it during 2017. Indeed, the cloud is not only about connecting pipelines: one key requirement is that open data needs to be findable, accessible, interoperable and reusable across borders. Work is needed on data specifications and standards to ensure this is truly the case.
The Commission may launch demonstration projects in different scientific fields during 2018/19, to work out the details of implementation. Carlos Moedas, EU Commissioner for Research, Science and Innovation, has set 2020 as an official start date for EOSC – though it isn’t clear yet exactly what that will mean to researchers in specific fields or countries.
Who will use it?
Public-sector scientists will be the first and main users, but industry researchers and institutions such as health care providers will be welcome too; the Commission promises a special effort to help small companies get involved both as providers of data-based services and users. Future research results and services funded by the Commission will go into the cloud, but the terms of accessing it – free, paid, with embargoes on with other usage restrictions – haven’t yet been resolved. It’s likely there will be a mix of rules – with both public researchers depositing data, and publishers and other organisations making it available for re-use, with both being able to have some control over the terms of use.
Will it be open to other countries?
The European Commission says ‘Yes’, but the details of collaboration for organisations outside Europe still need to be decided. Particularly difficult, politically, are terms of access for Chinese and Russian researchers. There will also, doubtless, be some legal issues – such as liability and reciprocity – to be resolved with the US and other global research partners; only this month did the US and EU finally agree on some terms for US researchers to participate in Horizon 2020 projects. As mentioned earlier, these international issues are on the table during the various global governance meetings planned by the EU, US, Australia and other nations in 2017 and later on. As a starting point, all recognise that each region shouldn’t have its own closed, private cloud; the real value of sharing data won’t be realised unless it’s possible to do so internationally.
Will the data in the EU science cloud be available for free?
Some of it, yes; some of it, no. The EU says that not all data ‘will necessarily be free’, due to the legitimate rights of IP holders, so there will be an opportunity for some organisations to sell access to some of their data through the cloud. Private publishers, such as Elsevier and Springer, are also keen to be able to maintain charges for access to some of their services – but have also been unexpectedly enthusiastic about exploring the possible new business models that a very large, very active cloud could permit. On the other hand, some universities and research councils – among the most active proponents of free open access for research reports and text and data mining – are pushing to make the new cloud a tariff-free zone. It’s difficult to predict yet how this issue will be resolved.
How much will the EU open science cloud cost?
It depends on how you define it. The basic elements to get it going may cost less than half a billion euros from Horizon 2020: that would fund planning meetings, demonstration projects, metadata, data-management plans, basic app development, and middleware to hook everything together. But the Commission, in announcing its plans last April, put out some eye-popping estimates of an additional €6.7 billion – but that includes big investments in data infrastructure such as research into quantum computing, buying two new supercomputers, and expanding broadband capacity around Europe. It will all depend on the outcome of EU budget negotiations over the next few years. In any case, cloud advocates say, the extra hardware isn’t a prerequisite for science cloud software and services to get off the ground.
What spending is already planned?
The Horizon 2020 programme already includes, in its 2016/17 work plans, some cloud planning initiatives. These include grants to guide international cooperation between science infrastructures; platform-driven e-infrastructure innovation; and data and distributed computing e-infrastructures. Soon, the EU will announce a consortium of 33 universities and public bodies, which will be given €10 million to research different governance models for the cloud, as well as rules of engagement for providers. Next year, the EU will reveal its spending on the initiative between 2018-2020, when its budget is expected to grow considerably.
Who’s leading the work?
Steering the EU cloud effort are the Commissioner for the Digital Economy and Society – a successor to Günther Oettinger has not yet been chosen – and Research, Science and Innovation Commissioner Carlos Moedas.
Stepping back a little – why is the Commission getting involved in all this?
By stimulating competition and aggregating demand in the market with public money, the Commission hopes it can help bring down average prices set by cloud providers for storing and analysing, which are considered too high by many researchers who either produce or reuse scientific data. For example, when some Dutch researchers wanted to copy their huge Dutch genome files to Hinxton, they calculated that it would cost about €60,000 with a commercial provider. Instead, the frugal Dutch resorted to copying the data on hard disks and carrying them across the Channel by hand. While it worked, it was clumsy and slow, of course. But the goal goes far beyond money. The Commission wants to make cloud-sharing so cheap, easy and versatile that it changes the very nature of doing science in Europe. It wants to see collaborations among different researchers – in different countries, different disciplines, different sectors – cropping up everywhere. If research data can be shared and properly re-used easily, the quality and reliability of science will rise; it will become simple for one researcher to check another’s work, or define more precisely the limits of any new research. New breakthroughs will come, through collaboration among different disciplines. In short, the Commission thinks this could spur a new Renaissance in European science.
How can cloud suppliers join in the new marketplace?
The Rules of Engagement have yet to be drafted, though the Commission is pledging open, public tendering and simple, transparent rules. Experts advising the Commission have proposed implementing a set of principles referred to as the ‘FAIR Data Principles’. FAIR – which stands for Findability, Accessibility, Interoperability, and Reusability – is a data management and stewardship principle, which enhances the ability of machines to automatically store, find and use data, in addition to supporting its reuse by individuals. Today, no publicly-funded science infrastructure in Europe completely lives up to FAIR principles. Still, an increasing level of FAIR-ness of data is emerging and will enhance the effectiveness of machine-assisted open science.
Do any existing cloud services comply with FAIR principles?
Several pieces of a future ‘FAIR data system’ are already in place and across borders. For instance, the Harvard Dataverse generates a formal citation for each deposit and makes the ‘Digital Object Identifier’, or other identifiers, public when the dataset is published. This leads the user to a landing page, providing access to metadata, data files, dataset terms, waivers or licenses, and version information, all of which is indexed and searchable.
Dataverse also provides public machine-accessible interfaces to search the data, access the metadata and download the data files, using a token to grant access when data files are restricted. It’s not clear yet to what extent cloud suppliers, such as Amazon, Microsoft, Google and SAP, will be willing to comply with FAIR principles.
Meanwhile, EU-funded THOR is building services around object identifiers; OpenMinTed works on text and data mining services for research; and OpenAire provides a number of services to find data sets and data providers.What about privacy or ethical concerns?
Differing privacy and ethical policies and regulations in Europe, the US, and elsewhere could become sticking points which would prevent the cloud becoming fully global. There are legal restraints on where research data can be stored – essentially it has to be located in countries, and under the control of organisations, that are subject to EU data protection legislation, and that should make US-based commercial providers a little wary. Rules will need to be established to clarify the roles and responsibilities of the funding agencies, the data custodians, the cloud service providers and the researchers who use cloud-based data. The Commission has said these legal issues will be resolved as part of its broader rule-making efforts under its Digital Single Market – for privacy, copyright, and security of data. But it may not be so simple. The last time science and data rules collided was in 2014/15, when the EU was rewriting its data-privacy regulation; the original, EU-wide proposal would have had an unintended impact on medical research – leading medical universities across the EU to scream loudly that the EU was about to kill drug research. A muddled compromise resulted. Expect similar surprises in cloud regulation.