Vote on copyright will not deliver on text and data mining

25 Jun 2015 | Viewpoint
Europe needs a mandatory exception to let scientists mine publicly-funded research locked up in the online databases of academic content. The recent EU Parliamentary vote does not provide this, says Paul Ayris, Director of Library Services at University College London

Earlier in June the European Parliament passed a vote on recommendations for amending copyright law, agreeing to a much-debated text and data mining element of the European Commission’s proposal for EU copyright reform.

But while some provision on text and data mining has been allowed, the vote is nowhere near strong enough. The final text was neutered, believes Kurt Deketelaere, secretary-general of the League of European Research Universities (LERU). "The mandatory exception for text and data mining - which several MEPs suggested in their amendments - has now been reduced to the need, ‘to properly assess the enablement of automated analytical techniques for text and data’,” Deketelaere said.

Small wonder that academic publishers welcomed the Parliamentary vote, with Duncan Campbell, director of journal digital licensing at John Wiley & Sons, telling Science|Business, “It is true that in general publishers do not want a change to the law. Copyright reform adds uncertainty.”

There seems to be not one line in the sand, but three – those organisations and individuals that do not want a change in the law, representatives of the European Parliament who think that the recent vote is a triumph, and others who argue for further legal reform. Who is right?

Links and meanings

The growth in the power of digital delivery, the reach of global networks and new ways of conducting and disseminating research outputs (such as through the open science movement) lies at the heart of the debate about data mining. Think how sensational it would be for a researcher to be able to find links and meanings in hundreds of journal articles and conference proceedings, and the data which underpin these publications.

This is what data mining, using computer programmes to speed-read and interpret thousands of pieces of academic content, will allow. It will place a powerful intellectual repository in the hands of researchers working to address challenges such as food security, the rising tide of chronic disease and climate change.

Such new analytical methods have the potential to revolutionise how research is conducted. 

Campbell maintains that, “Copyright reform adds uncertainty.” However, studies show that in jurisdictions around the world which have made legal accommodations for data mining, the level of activity is greater than in Europe, where legal uncertainty discourages this new approach to research.

Nor is it true that the recent EU Parliamentary vote is a triumph for those who want to liberalise text and data mining. As Deketelaere says, the text of the original Parliamentary resolution has been so watered down that it is now close to meaningless.

What is needed is a mandatory, cross-Europe exception for text and data mining for individuals to access research that cannot be overridden by contract and which applies to all materials to which an individual has legal access.

Publishing companies that do not want the current EU copyright framework to be changed point to licensing as the answer, claiming it would be easy for researchers to comply. But as the research charity the Wellcome Trust has shown,  a researcher wanting to mine papers which include the word ‘malaria’ would need to contact 1,024 journals at a cost (in terms of time spent) of £18,630; 62 per cent of a working year.

Licensing complications

Researchers can currently read online all papers to which they have legal access. What they want to do now is to develop their own text and data mining tools. They do not want to rely on third party tools, but to create their own methods that match specific research requirements, according to Cambridge University chemist Peter Murray-Rust, who is scathing about publishers’ attempts to mediate content mining to researchers.

“There is no indication of how current the material will be [in a publisher-mediated service]. I shall be mining the literature an hour after it appears. Will the [publisher’s] API [application programming interface] provide that?” Murray-Rust asks.

Campbell claims that academic publishers have “got their act together” on text and data mining. However, studies in the UK show that the framework for content mining is far from adequate. Of the 15 publishers in the scheme under which universities negotiate with publishers for access to electronic content, 11 have clauses permitting text and data mining. Of the 11, seven permit it using one type of model licence, four using other types of licence. This clearly complicates matters for researchers and institutions. And the four remaining publishing companies which do not have a stated policy account for 42 per cent of content; in terms of overall expenditure they represent 72 per cent.

The UK has an exception for text and data mining which blazes the trail for the rest of Europe. What is needed in Europe are not words, but action in the form of a mandatory, cross-Europe exception for content mining, which cannot be overridden by contract, and which embraces all materials to which a user has legal access.

The recent vote in the Parliament does not provide this. With respect, the EU Parliament needs to think again.

Paul Ayris is former President of LIBER (Association of European Research Libraries), which recently adopted the Hague Declaration on text and data mining, Adviser to the LIBER Board on EU matters and Horizon 2020, and co-chair of the League of European Research Universities’ Chief Information Officers group.

Never miss an update from Science|Business:   Newsletter sign-up