Dear Commissioner Moedas,
I am an academic at the University of Cambridge who is determined to see published scientific knowledge brought to citizens.
I was inspired by your speech in Amsterdam Monday where you wholeheartedly promoted the open science agenda for Europe. I was especially delighted to see your praise for the European Bioinformatics Institute, which hosts Europe’s PubmedCentral, a collection of the world’s published biomedical literature.
I support all of your vision, but wish specifically to urge the unrestrained development of published science and content mining, which involves researchers using software to harvest facts and data from academic papers, in the hope of cracking intractable research problems.
It is critical to reform copyright law in Europe and it must go beyond the UK’s 2014 legislation, which gave UK-based researchers the right to perform non-commercial mining of research.
I am part of probably one of only two UK groups to make use of this exemption, because it is heavily weighted against researchers.
To work, it depends on universities allowing their staff to data mine without having permission from publishers.My anecdotal evidence is that many libraries give in to publishers and sign restrictive contracts, regulating academic access, and thereby negating the law.
We then have the problem of publishing the results of data mining exercises, as this may breach copyright. The UK law allows freedom of quotation, but this is untested.
In short, we must have legal clarity. Changing the law is not enough; we must change hearts and minds.
Involving citizens
There are not enough academics actively working with citizens, yet it is critical that science is equally available to conservationists, doctors, policy makers, schools and patient groups. Please find ways of actively involving citizens outside academia.
Access to science must not be controlled, however lightly, through the current publishers. There has been massive lobbying by the rights holders against reform of content mining.
Arguments against allowing researchers to mine data start with claims the practice will break servers. However, I can mine Cambridge University’s whole daily scientific literature on my laptop in an hour. This is probably less than one millionth of the daily accesses made by other subscribers for other uses.
Publishers also claim there is no demand from scientists for mining, but in truth, access is made so difficult nobody asks for it. It is also said special publisher application programme interfaces are needed, when in fact our software can scrape publisher’s sites directly.
Publishers create another barrier by saying only experts can use the software. Our software is open for anyone to use and we’d be delighted if you and other Commission staff wish to see how accessible it is.
Finally, there is the claim that some academics will use mining as a way to steal content. I am a responsible citizen and have no intention of making copyrighted content available illegally.
We are in an unequal battle. I have watched publishers spend millions of euros on watering down proposals put forward by German MEP Julia Reda, who produced an “own initiative report” on copyright reform in June last year, and dilute and delay any reform put forward by the European Commission.
To redress the balance, Commissioner, I’m offering to come to Brussels and demonstrate on my (or your) laptop the value of text and data mining for open science in Europe.
Peter Murray-Rust is Reader Emeritus at Cambridge University. He runs Contentmine, a non-profit project which uses machines “to liberate 100,000,000 facts from the scientific literature”. His original open letter, which has been adapted for publication by Science|Business, is available on his blog here; he tweets about open science here