Removing the barriers to Big Data

12 Jun 2013 | News
There are fine lines to tread – and regulatory, technical and cultural obstacles to negotiate – to protect the IP rights that underpin commercialisation and defend individual privacy when opening up and interconnecting Big Data stores

While potent early examples have demonstrated the power of Big Data, a number of significant hurdles stand in the way of unleashing its full potential. These are strewn across the landscape, from the inadequate technical capabilities and capacity of Europe’s computer systems, data storage and communications networks, to standards for sharing and guaranteeing the quality of data, and on to the sensitive issues of individual privacy and intellectual property rights. And looming over all these, are cultural barriers to sharing.

Many of these challenges are evident in environmental modelling, a field that is crying out for the application of Big Data, as Sean Beevers, Senior Lecturer in Air Quality Monitoring at King’s College London described to delegates at the Science|Business Smarter Data for Europe conference, held on 23 May. “To build good air quality models, we have to utilise Big Data, incorporating more measurements, and so improve predictive capabilities,” Beevers said.

One example involves analysing number plate recognition data collected as vehicles pass into London’s congestion charging zone to find out exactly what vehicles enter the centre of the city and cross referencing this against published vehicle emissions information, giving one measure of the level of pollution. This can then be factored into the model alongside actual measurements of emissions made by roadside monitoring equipment.

The prime reason to be concerned about air quality is the effect that pollution has on health, and Beevers is involved in a major research programme called Traffic, which aims to better understand the health problems caused by vehicle emissions in London. Among other aspects this will attempt to understand individual exposure to pollution by using anonymised data collected by Transport for London’s Oyster card electronic payment system.  “This will show where people spend time, how they move around, what mode of transport they use, at a spatial and temporal resolution,” Beevers said.

Individual exposure

While these examples hint at the power of Big Data to avoid the serious shortcoming of earlier air quality models, which is that they gave no idea about individual exposure, they also highlight the potential privacy issues. Beevers suggested that existing checks can be useful here, saying, “There are two things you have to do as a researcher, publish and get funding. Without ethics you get neither.”

Given Big Data’s reliance on bringing together disparate data sets that have different owners, another barrier to be overcome is getting people to recognise it is in their self-interest to share. An increasing attrition rate of drugs in development and the flight of R&D from Europe was the spur for such sharing in the pharmaceutical industry, under the umbrella of the EU’s Innovative Medicines Initiative, a €2 billion, ten-year programme that aims to reshape the landscape for pharmaceutical research.

“Companies realised they can’t do it alone; most of the data is outside our four walls and the best ideas are shared,” said Kenny Simmen, Vice President, Infectious Diseases, Research and Early Development at Janssen. By pooling data, companies can not only share the risks, but also maximise the opportunities that are inherent in a wealth of new target biology coming out of Europe’s universities, and genomics, and other ‘omics data. “IMI is lighting up an amazing array of potential,” Simmen said.

Common architecture for data sharing

IMI is working with companies and academics on the European Medical Information Framework (EMIF), which will provide a common architecture for sharing data, with seven countries having committed to contribute 48 million patient records. An example of how this might be applied is EMIF-AD, which will mine these records to look for links between genes, biomarkers and outcomes in cases of Alzheimer’s disease.

There are of course barriers to such data pooling. “The absolute imperative is the communication of the benefits. We’ve got to bring patients and citizens with us,” Simmen said.

While IMI is making progress in changing the culture and providing the framework for the exploitation of Big Data in pharma, similar work is in hand at the Excellence in Science Unit at DG Connect to make Big Data outputs of research funded by the EU widely available, as Thierry Van der Pyl, Director of the Unit described. There are two aspects to the Commission’s policy, first making sure research published in journals can be accessed without a subscription under Open Data rules, and second making sure there is access to the actual data.

However, there is a fine line to be navigated between the moral imperative of allowing taxpayers free access to research they have funded and the requirement to protect intellectual property rights that are needed to incentivise investment and commercialisation by industry. While the Commission wants all information generated in Horizon 2020 R&D projects to be freely available, there will be a pilot study to see how to do this. “We need to find the right balance between the legitimate interests of science and being able to reuse data, but not prevent the exploitation of research,” Van de Pyl said.

Promoting openness


In addition to understanding how to promote openness without hampering commercialisation, Horizon 2020 will also need and electronic infrastructure to support free access and sharing, including data storage, data curation, metadata tools for data mining, data analytics and new algorithms.

“The tricky aspect is that this e-infrastructure will be common across disciplines, so how do we avoid reinventing the wheel. We need to get groups together so we don’t keep doing the same thing,” said Van de Pyl. “We won’t be supporting individual communities but a broad platform.”

In parallel with the Commission’s push to create infrastructure for Big Data in Horizon 2020, The Research Data Alliance (RDA) has set the ambitious objective of ensuring interoperability between data generated by all scientific disciplines worldwide. Following its launch in March, RDA is now setting out an infrastructure roadmap and making overtures to other countries beyond the founder member of the EU, US and Australia. “We’ve got to make it work seamlessly, like the World Wide Web,” John Wood, Secretary-General of the Association of Commonwealth Universities , who is co-chair of RDA told delegates.

Here, the biggest barrier to be overcome is to find ways of linking computer scientists with the scientists who are going to use the data. “You need to get a conversation going that is not all about algorithms,” Wood said.

The best way to promote interoperability will be to set an example. “If you show it works, people will start to find ways of doing it. You can store your data however you like, as long as it’s accessible,” said Wood. “If you show examples of interoperability and Big Data working, people will want to do it.”

Never miss an update from Science|Business:   Newsletter sign-up