There is a treasure trove of unseen patterns and associations strewn across academic journals and joining these dots could lead to new discoveries. The tools to do it are there in computer programmes that can speed-read and synthesise thousands of pages of academic literature.
The problem is that often the content which researchers want to mine is owned by a number of rights holders. A scientist wanting to data mine papers about malaria would need to refer to 1,024 journals, for example. And without changes to statute books, publishers across Europe can claim a right to grant or refuse the mining of their works on the basis of copyright law, EU database protection law, and intellectual property law.
Things could be about to change. The Europe Commission recently said it would make legislative proposals to, “harmonise exceptions for the cross-border use of content for specific purposes such as research, education and text and data mining,” by the autumn.
This is a hot issue for the open access movement, whose members say publishers are resisting change.
It is true that in general publishers do not want a change to the law, says Duncan Campbell, director of journal digital licensing at John Wiley & Sons' global research division. “Copyright reform adds uncertainty,” he says.
Even if computer-based text crawling were to be exempted from copyright law, the door to data mining would not swing open. “It doesn’t give you the means to [perform text mining]. There are still technical barriers,” says Campbell.
Publishers have a few particular concerns. For example, in some instances text mining could provide cover for swiping intellectual property and charging for it somewhere else.
There are also fears that publishers’ computer systems would crumple under the weight of researcher traffic.
And in the absence of a standard cross-platform format, machine-reading challenges will still exist, Campbell said. “It doesn’t magically make XML [the computer language] different. Normalisation of formats is not something that can be achieved with law change.”
As proof of this, Campbell pointed to the UK, which is the only EU member state to allow text and data mining for non-commercial use. There has not been a surge in interest since the law changed last year. “Crawling traffic is flat,” Campbell said.
Campbell could not specify what share of Wiley’s business would be affected by a new Europe-wide law, saying, “I’d hesitate to put a figure on it.”
There is room to reach an accommodation without lawmakers stepping in, Campbell says.
Wiley and other publishing companies, including Elsevier, currently provide permission to text-mine without legal restrictions to academics and university libraries through special licences.
However, licencing has become a bête noire for researchers who claim it involves a tedious hopscotch tour of different rights holders. The UK’s Wellcome Trust calculated that a researcher wanting to mine papers which include the word malaria would need, “to contact 1,024 journals at a cost (in terms of time spent) of £18,630; 62% of a working year.”
Campbell counters what he calls, “the myths” surrounding the service. “There’s a view going around that [licences are] overly complex and involve additional fees. We’re trying to make it as simple as possible,” he said.
Publishers are further lowering barriers to their academic stock through the non-profit publisher collaboration service CrossRef. Major publishers have signed up, or are in the process of doing so. The service promises a single log-in to access journals in several different systems. “CrossRef is a pretty straightforward solution,” said Campbell.
In the future, Wiley will divert all researchers who want to text mine its four million-plus articles to CrossRef. However, Campbell noted, the demand for licences is lower than the hype might suggest. “We’ve probably had five requests in the last year.”
The tension between researchers and publishers on copyright is increasing. In May 2013 a group of researchers and librarians walked out of EU talks on how to develop to a better regime on text mining, because, they said, only the licensing approach was being discussed.
"More liberal voices were probably drowned out in the debate,” Campbell said.
He would like to draw a line in the sand though. “We’d like to reset the discussion; bring it back to a less adversarial level. We haven’t always helped ourselves in the past but now we’re more responsive to the needs of researchers. The industry has really got its act together,” he said.