The Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) reported the successful completion of a pilot project to mine scientific literature published in Chinese. The research conducted at the biopharmaceuticals company Merck Serono in Geneva, a division of Merck KGaA, in Darmstadt, was a feasibility study to evaluate how far current text mining technology is able to support automated information extraction from Chinese text sources.
In the project, ProMiner, named entity recognition software developed at Fraunhofer SCAI, was adapted to the specific requirements of text mining Chinese biomedical and pharmaceutical literature. At present, most commercial text mining technology is able to analyse English text, and some have the ability to analyse German and French.
Now, with the steep increase in Chinese scientific output and the ever-growing importance and attractions of the Chinese market to Western companies, the ability to automatically analyse Chinese unstructured information sources is becoming crucial in gathering scientific and competitive intelligence, and following what happens in China.
While the pilot system is able to mine Chinese literature for biomedical terms with similar performance to system for searching English, “The challenge of Chinese Text Mining cannot be regarded as being solved,” according to Juliane Fluck, Head of the Text Mining Team at Fraunhofer SCAI.
“We have just demonstrated that we are able to mine the Chinese biomedical scientific literature automatically. The real work – which is aiming at providing all functionalities needed for true knowledge discovery from Chinese unstructured text sources – starts now, after the proof-of-principle,” Fluck said.
The next step will see collaboration on the project extended to another Fraunhofer Institute: the Fraunhofer Institute for Systems and Innovation Research, which has strong ties to China and specialises in monitoring Chinese research, innovation and markets.
In March this year the European Patent Office signed an agreement with search engine giant Google under which the two are setting up a machine translation service for patents. While this covers many other languages, part of the motivation is to improve access the rapidly increasing volumes of Chinese patents.