public marks

PUBLIC MARKS with tags speech & recognition

February 2007

Julius Open-Source Large Vocabulary Speech Recognition Engine

by kmaclean
Julius is an open source speech recognition engine. Julius is a two-pass large vocabulary continuous speech recognition (LVCSR) software decoder. It can perform almost real-time decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated. It is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tied-mixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and also works on Windows. Julius is open source and distributed with a revised BSD style license. Julius adopts acoustic models in HTK ascii format, pronunciation dictionary in HTK-like format, and word 3-gram language models in ARPA standard format (forward 2-gram and reverse 3-gram as trained from corpus with reversed word order). Although Julius is only distributed with Japanese models, the VoxForge project (www.voxforge.org) is working on creating English Acoustic Models for use with the Julius Speech Recognition Engine.

Improving Open Source Speech Recognition

by kmaclean
Speech Recognition Engines require two types of files to recognize speech: an Acoustic Model, created by 'compiling' a lots of transcribed speech into statistical models, and a Language Model (for Dictation) or Grammar file (for Command and Control). Most Acoustic Models used by 'Open Source' Speech Recognition engines are 'Closed Source'. They do not give you access to the speech audio (the 'Source') used to create the Acoustic Model. The reason for this is that there is no free Speech Corpus in a form that can readily be used to create Acoustic Models for Speech Recognition Engines. Open Source projects are thus required to purchase a Speech Corpus which has restrictive licensing in order to create their Acoustic Models. VoxForge (http://www.voxforge.org) was set up to address this problem. The site collects GPL transcribed speech audio from users which is then used to create Acoustic Models. These can then be used with Free and Open Source Speech Recognition Engines such as Sphinx, ISIP, Julius and/or HTK.

December 2006

October 2006

September 2006

Active users

kmaclean
last mark : 16/02/2007 15:23

adrpater
last mark : 08/12/2006 19:05

gvlx
last mark : 11/10/2006 07:20