0% found this document useful (0 votes)
26 views3 pages

CMU Sphinx

CMU Sphinx is a collection of speech recognition systems developed at Carnegie Mellon University, including Sphinx 2 to Sphinx 4 and SphinxTrain. The systems utilize various acoustic models and have been made open-source, with the latest developments focusing on flexibility and advanced recognition techniques. PocketSphinx is a version designed for embedded systems, while Sphinx 4 represents a complete rewrite aimed at research and development in speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

CMU Sphinx

CMU Sphinx is a collection of speech recognition systems developed at Carnegie Mellon University, including Sphinx 2 to Sphinx 4 and SphinxTrain. The systems utilize various acoustic models and have been made open-source, with the latest developments focusing on flexibility and advanced recognition techniques. PocketSphinx is a version designed for embedded systems, while Sphinx 4 represents a complete rewrite aimed at research and development in speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CMU Sphinx

CMU Sphinx, also called Sphinx for short, is the


Sphinx4
general term to describe a group of speech recognition
systems developed at Carnegie Mellon University. Stable release 5-prealpha / August 3,
These include a series of speech recognizers (Sphinx 2 2015
- 4) and an acoustic model trainer (SphinxTrain). Written in Java

In 2000, the Sphinx group at Carnegie Mellon Operating system Cross-platform


committed to open source several speech recognizer Type Image library
components, including Sphinx 2 and later Sphinx 3 (in License BSD-style[1]
2001). The speech decoders come with acoustic Website cmusphinx.github.io/wiki/
models and sample applications. The available (https://cmusphinx.github.i
resources include in addition software for acoustic o/wiki/)
model training, language model compilation and a
public domain pronunciation dictionary, cmudict.
Pocketsphinx
Sphinx encompasses a number of software systems, Stable release 5-prealpha / August 5,
described below. 2015
Written in C

Sphinx Operating system Cross-platform


Type Image library
Sphinx is a continuous-speech, speaker-independent License BSD-style
recognition system making use of hidden Markov
Website cmusphinx.github.io/wiki/
acoustic models (HMMs) and an n-gram statistical
(https://cmusphinx.github.i
language model. It was developed by Kai-Fu Lee.
o/wiki/)
Sphinx featured feasibility of continuous-speech,
speaker-independent large-vocabulary recognition, the
possibility of which was in dispute at the time (1986).[2]

Sphinx is of historical interest only; it has been superseded in performance by subsequent versions.

Sphinx 2
A fast performance-oriented recognizer, originally developed by Xuedong Huang at Carnegie Mellon and
released as open-source with a BSD-style license on SourceForge by Kevin Lenzo at LinuxWorld in
2000. Sphinx 2 focuses on real-time recognition suitable for spoken language applications. As such it
incorporates functionality such as end-pointing, partial hypothesis generation, dynamic language model
switching and so on. It is used in dialog systems and language learning systems. It can be used in
computer based PBX systems such as Asterisk. Sphinx 2 code has also been incorporated into a number
of commercial products. It is no longer under active development (other than for routine maintenance).
Current real-time decoder development is taking place in the Pocket Sphinx project.[3]

Sphinx 3
Sphinx 2 used a semi-continuous representation for acoustic modeling (i.e., a single set of Gaussians is
used for all models, with individual models represented as a weight vector over these Gaussians). Sphinx
3 adopted the prevalent continuous HMM representation and has been used primarily for high-accuracy,
non-real-time recognition. Recent developments (in algorithms and in hardware) have made Sphinx 3
"near" real-time, although not yet suitable for critical interactive applications. Sphinx 3 is under active
development and in conjunction with SphinxTrain provides access to a number of modern modeling
techniques, such as LDA/MLLT, MLLR and VTLN, that improve recognition accuracy (see the article on
Speech Recognition for descriptions of these techniques).

Sphinx 4
Sphinx 4 is a complete rewrite of the Sphinx engine with the goal of providing a more flexible framework
for research in speech recognition, written entirely in the Java programming language. Sun Microsystems
supported the development of Sphinx 4 and contributed software engineering expertise to the project.
Participants included individuals at MERL, MIT and CMU. (Currently supported languages are C, C++,
C#, Python, Ruby, Java, and JavaScript.)

Current development goals include:

developing a new (acoustic model) trainer


implementing speaker adaptation (e.g. MLLR)
improving configuration management
creating a graph-based UI for graphical system design

PocketSphinx
A version of Sphinx that can be used in embedded systems (e.g., based on an ARM processor).
PocketSphinx is under active development and incorporates features such as fixed-point arithmetic and
efficient algorithms for GMM computation.

See also
Speech recognition software for Linux
List of speech recognition software
Project LISTEN

References
1. http://www.speech.cs.cmu.edu/sphinx
2. Lee, K.-F.; Hon, H.-W.; Reddy, R. (January 1990). "An overview of the SPHINX speech
recognition system" (https://ieeexplore.ieee.org/document/45616). IEEE Transactions on
Acoustics, Speech, and Signal Processing. 38 (1): 35–45. doi:10.1109/29.45616 (https://doi.
org/10.1109%2F29.45616).
3. Huang, Xuedong; Alleva, Fileno; Hwang, Mei-Yuh; Rosenfeld, Ronald (1993). "An overview
of the SPHINX-II speech recognition system" (https://dx.doi.org/10.3115/1075671.1075690).
Proceedings of the Workshop on Human Language Technology - HLT '93. Morristown, NJ,
USA: Association for Computational Linguistics: 81. doi:10.3115/1075671.1075690 (https://d
oi.org/10.3115%2F1075671.1075690). ISBN 1-55860-324-7.

External links
Sphinx developers recommend Vosk now (https://alphacephei.com/vosk/)
CMU Sphinx homepage (https://cmusphinx.github.io/wiki/)
Sphinx' repository (https://github.com/cmusphinx) on GitHub should be considered the
definitive source for code
SourceForge (http://sourceforge.net/projects/cmusphinx) hosts older releases and files
NeXT on Campus Fall 1990 (https://web.archive.org/web/20170324083105/http://nextstuff.in
fo/mirrors/otto/html/pub/Documents/user-groups/OnCampus/NOCFall90/NOCFall90Text.ps.
gz) (This document is postscript format compressed with gzip.) Carnegie Mellon University -
Breakthroughs in speech recognition and document management, pgs. 12-13

Retrieved from "https://en.wikipedia.org/w/index.php?title=CMU_Sphinx&oldid=1263415119"

You might also like