New EU text and data mining proposal would not benefit everyone

29 Sep 2016 | News
Rights to freely search journal databases will improve the situation for some public sector researchers, but excluding start-ups from the proposal means it will be harder for them to grow and raise funds from investors

The European Commission’s plans to open up use of data text and data mining are good in part, but do not go far enough to address the shortfall in access to publicly-funded research, according to researchers who gathered in Brussels this week to reflect on the new EU proposal.

The general observation made by those present was that it would improve the situation in Europe overall, but not for everyone.

The scale of under-utilised resource was highlighted by Stelios Piperidis, a researcher at Greece’s Institute for Language and Speech Processing, who noted that one new academic paper is published every 30 seconds. “Over half are never read by anyone except for their authors and referees,” he said. “Almost 90 per cent go uncited.”

A typical database search performed by a biomedical student returns about 80,000 hits. There are 70,000 papers published on a single protein, the tumour suppressor p53.

Self-evidently, scientists have no hope of getting through all this data without sophisticated combing tools. But to date, the use of programmes that can extract data from thousands of publications has been tightly controlled by journal publishers in Europe on copyright grounds. 

Now, a new EU rule, announced at the start of the month, proposes to give researchers a freer rein to use computer programmes to data mine research papers.

Researchers at the meeting feel a lot of ground has been lost. Use of mining in Europe is significantly lower than in the US and Asia, most probably due to current limitations imposed by European rules.

“In all EU research projects since 2007, just 3 per cent of projects used text and data mining,” said Maria Eskevich, a post-doctoral researcher at the Centre for Language and Speech Technology at Radboud University in the Netherlands.

The new law will give universities, research institutes and research-performing companies greater legal certainty. But start-ups are excluded from the new proposals. As a result, “it will be harder for them to grow and raise funds from investors,” said Lenard Koschwitz, director of European affairs with Allied for Startups, a lobby group. 

“If you can’t use these technologies, you can’t keep up with companies from other parts of the world where there’s no legal restrictions. There’s added legal liability for those start-ups in Europe that have been doing text mining already. Small companies don’t usually have big legal departments to protect them,” Koschwitz said.

The Commission has justified limiting mining rights on the grounds that extra requests could make publishers’ websites slower for everyone to use.

Hardly any companies are using text mining in Europe at the moment – just over 1 per cent, according to data from FutureTDM, an EU-backed project which promotes mining techniques in Europe.

Mediately, a Slovenian start-up created in 2011, is one of the few that does. The company gives people everything they need to know about a prescription medicine: what it is for, the appropriate doses, how to take it, and the side effects. Its websites and apps are used by over 35,000 doctors, nurses and other medical professionals in Slovenia, the Czech Republic, Slovakia, Serbia and Croatia.

Mediately struggles to get permission to use programmes that extract data. In Europe, not all countries provide the information it needs for free. Those that do attach all kinds of terms and conditions. Mediately’s competitors in the US, meanwhile, get all the information they need from the Food and Drug Administration.

Other shortcomings

Letting businesses in on computer crawling would lead to innovation in mining tools, researchers argued. Technical barriers are blocking the wider use of mining too.

Software can speed-read and synthesise thousands of pages, but at the moment, “The download rate is extremely limiting,” said Sophia Ananiadou, professor in the school of computer science at Manchester University and chair of the only publicly-funded national text mining centre in the world. “One document every 20 seconds sounds good – but it would take 12 years to download 20 million documents at that rate,” she said.

Another obstacle for European researchers is that current tools have a strong language bias. “Text and data mining software is good for English, but not for other languages in Europe,” said Eskevich. “If you want something multi-lingual, the costs are much more.”

Start-ups are not the only ones losing out, noted Lucie Guilbault, associate professor at the Institute for Information Law at the University of Amsterdam. “With new EU proposals, you eliminate museums and public libraries from benefiting from the exception. Same with citizen science,” she said.

Data-driven journalism is not exempted in the European Commission’s proposal either. “We wouldn’t have been able to make sense of Wikileaks if it wasn’t for journalists using mining tools,” said Susan Reilly, director of LIBER, an association of research libraries.  

Never miss an update from Science|Business:   Newsletter sign-up