CMU Sphinx
CMU Sphinx
Sphinx is of historical interest only; it has been superseded in performance by subsequent versions.
Sphinx 2
A fast performance-oriented recognizer, originally developed by Xuedong Huang at Carnegie Mellon and
released as open-source with a BSD-style license on SourceForge by Kevin Lenzo at LinuxWorld in
2000. Sphinx 2 focuses on real-time recognition suitable for spoken language applications. As such it
incorporates functionality such as end-pointing, partial hypothesis generation, dynamic language model
switching and so on. It is used in dialog systems and language learning systems. It can be used in
computer based PBX systems such as Asterisk. Sphinx 2 code has also been incorporated into a number
of commercial products. It is no longer under active development (other than for routine maintenance).
Current real-time decoder development is taking place in the Pocket Sphinx project.[3]
Sphinx 3
Sphinx 2 used a semi-continuous representation for acoustic modeling (i.e., a single set of Gaussians is
used for all models, with individual models represented as a weight vector over these Gaussians). Sphinx
3 adopted the prevalent continuous HMM representation and has been used primarily for high-accuracy,
non-real-time recognition. Recent developments (in algorithms and in hardware) have made Sphinx 3
"near" real-time, although not yet suitable for critical interactive applications. Sphinx 3 is under active
development and in conjunction with SphinxTrain provides access to a number of modern modeling
techniques, such as LDA/MLLT, MLLR and VTLN, that improve recognition accuracy (see the article on
Speech Recognition for descriptions of these techniques).
Sphinx 4
Sphinx 4 is a complete rewrite of the Sphinx engine with the goal of providing a more flexible framework
for research in speech recognition, written entirely in the Java programming language. Sun Microsystems
supported the development of Sphinx 4 and contributed software engineering expertise to the project.
Participants included individuals at MERL, MIT and CMU. (Currently supported languages are C, C++,
C#, Python, Ruby, Java, and JavaScript.)
PocketSphinx
A version of Sphinx that can be used in embedded systems (e.g., based on an ARM processor).
PocketSphinx is under active development and incorporates features such as fixed-point arithmetic and
efficient algorithms for GMM computation.
See also
Speech recognition software for Linux
List of speech recognition software
Project LISTEN
References
1. http://www.speech.cs.cmu.edu/sphinx
2. Lee, K.-F.; Hon, H.-W.; Reddy, R. (January 1990). "An overview of the SPHINX speech
recognition system" (https://ieeexplore.ieee.org/document/45616). IEEE Transactions on
Acoustics, Speech, and Signal Processing. 38 (1): 35–45. doi:10.1109/29.45616 (https://doi.
org/10.1109%2F29.45616).
3. Huang, Xuedong; Alleva, Fileno; Hwang, Mei-Yuh; Rosenfeld, Ronald (1993). "An overview
of the SPHINX-II speech recognition system" (https://dx.doi.org/10.3115/1075671.1075690).
Proceedings of the Workshop on Human Language Technology - HLT '93. Morristown, NJ,
USA: Association for Computational Linguistics: 81. doi:10.3115/1075671.1075690 (https://d
oi.org/10.3115%2F1075671.1075690). ISBN 1-55860-324-7.
External links
Sphinx developers recommend Vosk now (https://alphacephei.com/vosk/)
CMU Sphinx homepage (https://cmusphinx.github.io/wiki/)
Sphinx' repository (https://github.com/cmusphinx) on GitHub should be considered the
definitive source for code
SourceForge (http://sourceforge.net/projects/cmusphinx) hosts older releases and files
NeXT on Campus Fall 1990 (https://web.archive.org/web/20170324083105/http://nextstuff.in
fo/mirrors/otto/html/pub/Documents/user-groups/OnCampus/NOCFall90/NOCFall90Text.ps.
gz) (This document is postscript format compressed with gzip.) Carnegie Mellon University -
Breakthroughs in speech recognition and document management, pgs. 12-13