public marks

PUBLIC MARKS with tags speech & sphinx

April 2007

Donate your speech to VoxForge using your telephone

by kmaclean
VoxForge ( http://www.voxforge.org ) is a open source project that collects speech recordings for use in the creation of Acoustic Models. Speech recognition engines need an acoustic model to recognize speech. To create an acoustic model, you take a very large number of speech audio recordings and 'compile' them into statistical representations of the sounds that make up each word. Most open source speech recognition engines use 'closed source' acoustic models. VoxForge hopes to address this problem by creating a free gpl speech corpus, and generating acoustic models from this corpus. You can now use your telephone to your donate your speech. Click this link: http://www.voxforge.org/home/s… to get the number, and the Interactive Voice Response system will guide you through the process.

February 2007

Julius Open-Source Large Vocabulary Speech Recognition Engine

by kmaclean
Julius is an open source speech recognition engine. Julius is a two-pass large vocabulary continuous speech recognition (LVCSR) software decoder. It can perform almost real-time decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated. It is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tied-mixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and also works on Windows. Julius is open source and distributed with a revised BSD style license. Julius adopts acoustic models in HTK ascii format, pronunciation dictionary in HTK-like format, and word 3-gram language models in ARPA standard format (forward 2-gram and reverse 3-gram as trained from corpus with reversed word order). Although Julius is only distributed with Japanese models, the VoxForge project (www.voxforge.org) is working on creating English Acoustic Models for use with the Julius Speech Recognition Engine.

Improving Open Source Speech Recognition

by kmaclean
Speech Recognition Engines require two types of files to recognize speech: an Acoustic Model, created by 'compiling' a lots of transcribed speech into statistical models, and a Language Model (for Dictation) or Grammar file (for Command and Control). Most Acoustic Models used by 'Open Source' Speech Recognition engines are 'Closed Source'. They do not give you access to the speech audio (the 'Source') used to create the Acoustic Model. The reason for this is that there is no free Speech Corpus in a form that can readily be used to create Acoustic Models for Speech Recognition Engines. Open Source projects are thus required to purchase a Speech Corpus which has restrictive licensing in order to create their Acoustic Models. VoxForge (http://www.voxforge.org) was set up to address this problem. The site collects GPL transcribed speech audio from users which is then used to create Acoustic Models. These can then be used with Free and Open Source Speech Recognition Engines such as Sphinx, ISIP, Julius and/or HTK.

December 2006

Active users

kmaclean
last mark : 26/04/2007 17:13

adrpater
last mark : 08/12/2006 17:06