09 Dec 2014   |   Viewpoint

Memo to Juncker: Plan how to benefit from scientific data sharing

An expert panel urges the new European Commission to reap the economic benefits of science on-line

History goes in cycles, and a new one is now beginning. Over the past 25 years, we have seen the Internet grow from a technical tool to a global economic force on which millions of jobs depend. A key character in that drama has been the scientific community; in fact, it was at a European physics lab, CERN, that the World Wide Web was invented – and it was the global scientific community that first recognised its potential, and pushed its development.

Now, the story is about to repeat: A new digital technology is coming. At its core is the scientific community. And, while we don’t know how the story will end yet, we do know it will be important.

The story is about sharing scientific data on a truly massive scale. The sheer volume of data spilling from telescopes, gene sequencers or environmental monitors is vast. So too is the torrent from such diverse disciplines as sociology, economics or linguistics. We often feel as if we are drowning in words, numbers, sounds and images – and we are.

But when data volumes rise so high, something strange and marvellous happens: the nature of science changes. Problems that were previously not even recognised suddenly become tractable.  Researchers who never met, at different institutions and in divergent fields, find themselves working on related topics. Work that previously plodded along from one experiment or hypothesis to another can accelerate.

Global data commons

And what’s the vital catalyst for all this? The ability to share the data – in huge volumes, over vast distances, across disciplines and institutions. And then to analyse, re-interpret, re-use and re-think it.

This is the future of science: a global data commons, a virtual science library spanning the globe.

We are, today, starting to move. In Europe, a host of projects – national, EU, regional – is now pioneering how the system will work. Developers are working on systems to share and exploit satellite data, to measure the thermal efficiency of cities and buildings to preserve the climate, or track tigers in the wild to preserve biodiversity.  Scientists are sharing brain scans and genomics databases to find new medicines. In the policy world, the EU and several member-states have been successfully promoting Open Access – first for research publications, and now for data.

Why should we care? Because, just as the World Wide Web has transformed our lives and economies, so this new data wave will matter eventually to every one of us, scientist or not. In the first instance, developing the tools, systems and businesses required for this will create jobs, revenues and economic growth; the cost – growing over time to something on the order of 5 per cent of research budgets – is large but, if the market incentives are set correctly, will be shared between the private and public sector.

Already, economists have shown how scientific investments of a narrower scope have yielded great returns: For instance, in the US, one study estimated the $13 billion in government spending on the Human Genome project and its successors has yielded a total economic benefit of about $1 trillion.

A British study of its public economic and social research database found that for every £1 invested by the government, an economic return of £5.40 resulted. Even bigger numbers have been circulating about the impact of Big Data, a related trend. However it is measured, the economic and social benefits will be large.

That means Europe’s leaders, including its new slate of European Commissioners and Parliamentarians, must act – or go down in history as the politicians who missed the Next Big Thing.  We, European members of the Research Data Alliance, an international effort to stimulate and coordinate work on data sharing, propose the following actions:

1.    DO require a data plan, and show it is being implemented. We want a system to let researchers around the globe gather, store and manage, share, re-use, re-interpret and act upon each others’ data.  For that, every EU member-state should have a plan to develop the tools, infrastructure, skills and funding to take part - and the EU should update its own plans to coordinate the European effort. Internationally, every country wanting to join coordinating bodies like RDA should also have a plan implemented.

2.    DO promote data literacy across society, from researcher to citizen. Embracing these new possibilities requires training and cultural education – inside and outside universities. Data science must be promoted as an important field in its own right. Use and evaluation of data must be embedded in all curricula, from primary school to post-doctoral programme. EU R&D programmes should incorporate data training and skills. And public workers, who control scientifically vital databases on populations and environment, need training.

3.    DO develop incentives and grants for data sharing (and don’t forget Horizon 2020). Few people will act without incentives – whether direct grants from EU programmes, or indirect market incentives to private investors. For Horizon 2020, the upcoming Work Programme for 2016-17 should reflect the growing importance of data sharing – in funding for experiments, business models, communities and analysis. Incentives will be needed for industry, in public-private partnerships or direct government procurement of innovative infrastructure. Clarity is needed on who owns a scientific data set, so a balance can be struck between public access and private gain. And within universities, a cultural change is needed so that good data management is seen as important in tenure and other rewards.

4.    DO develop tools and policies to build trust and data-sharing. Perhaps the biggest challenge in sharing data is trust: How do you create a system robust enough for scientists to trust that, if they share, their data won’t be lost, garbled, stolen or misused? The problem is partly technical: Much work is needed to develop the underlying infrastructure, identifiers, meta-data, systems and networks for that – and for that, again, public funding in Europe and international coordination by RDA will be needed. But in the end, it is the culture of science that we are talking about, and that will take a generation to change.

5.    DO support international collaboration. The biggest benefits will come from cross-fertilisation with other disciplines, regions, cultures and economic systems. Our organisation, the Research Data Alliance, with its 96-country membership, exemplifies the kind of global coordination that will be needed. But Europe must speak with one voice as the work advances, and that means the European institutions must lead. Long-term thinking and support will be needed to work globally.  

6.    DON’T regulate what we don’t yet understand. Sharing scientific data on this scale is new; we don’t know yet what opportunities will arise, or what problems will dog us. Until then, we urge forbearance from those who would wish to regulate too hastily. Issues such as privacy and ethics should be handled in consultation with the wider data and scientific community.

7.    DON’T stop what has begun well. Much effort, expense and brainpower, across the EU, has been invested in making data sharing a reality. It will be a temptation, with a new Commission and Parliament in Brussels, to change course, re-order priorities and move funding lines around. Don’t.

John Wood is Secretary General of the Association of Commonwealth Universities and RDA Europe Chair & RDA Council Co-Chair.

The Research Data Alliance is an international effort to stimulate and coordinate work on data.

Link to the full report:

Never miss an update from Science|Business:   Newsletter sign-up