Spain: New system for transcribing ancient documents

22 Jul 2009 | News

Research lead | Development opportunity

The Computational Perception and Learning Research Group in the Computer Languages and Systems Department at the Universitat Jaume I, working in collaboration with the Universidad Politécnica de Valencia, has developed a new system for the transcription of written text, called State, which aims to speed up the recovery and preservation of ancient documents and manuscripts.

Traditional optical character recognition (OCR) systems give rise to transcription errors, and the resulting text needs to be edited afterwards. State includes image processing tools with which ‘noise’ can be removed and the original image cleaned up. Once the page is scanned, mistakes can be quickly and easily edited with interactive tools such as an electronic pen applied directly on the text.

Andrés Marzal, one of the researchers in the project, explains, “It is a practical solution to the problem of a supervised transcription, since it shortens the most time-consuming phase, that is, editing the automatic transcription so that it is true to the original.”

The researchers say State makes it possible to save up to 50 per cent of the time taken to transcribe and correct ancient texts and manuscripts.

The prototype is in an alpha version and was recently installed in the Miguel de Cervantes Virtual Library and will be used in the Jaume I Archive for transcribing ancient documents.

Never miss an update from Science|Business:   Newsletter sign-up