17 Feb 2010   |   News

Edinburgh: Pronunciation lexicon for text-to-speech synthesis

Licensing opportunity

Researchers at Edinburgh University have developed a keyword-based pronunciation lexicon for use in text-to-speech synthesis (voice-building or run-time synthesis) and in speech recognition systems.

The lexicon, called Combilex, is available in three versions: Received Pronunciation English, General American, and Scottish English. Each lexicon contains around 145,000 entries, including the 20,000 most frequent words, and includes a variety of linguistic information alongside detailed pronunciations, including many proper names.

Combilex is an ASCII text file, one entry-per-line. Full manually notated orthographic-phonemic correspondences are included, allowing derivation of accurate grapheme-to-phoneme rules.

The system contains a rich specification for each word, covering pronunciation, with variants, part-of-speech tags, morphological boundaries, full correspondence between orthography and pronunciation, and semantic information where available.

The researchers say Combilex provides greater than 86 per cent accuracy and is accent-independent.

The system is implemented as a database, allowing compact representations of word-forms, their morphological derivations, compounds and cross-references. Transcriptions include a phonemic-orthographic link and developing letter-to-sound rules for out-of-vocabulary words. The transcriptions uses a meta-symbol set, which may be converted by rule into appropriate forms for various accents

Edinburgh University is seeking interest from commercial organisations to license this technology on a non-exclusive basis.

For more information, see the project’s page at: http://www.university-technology.com/details/combilex---a-keyword-based-pronunciation-lexicon

Never miss an update from Science|Business:   Newsletter sign-up