0% found this document useful (1 vote)
2K views28 pages

Speech Recognition System: A Project Report Submitted by

The document describes a speech recognition system project report submitted by Mohammed Flaeel Ahmed Shariff to the University of Peradeniya. The report includes an introduction to the speech recognition system, requirements and specifications including use cases and class diagrams, a design strategy, and project plan for developing the system using CMUsphinx.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
2K views28 pages

Speech Recognition System: A Project Report Submitted by

The document describes a speech recognition system project report submitted by Mohammed Flaeel Ahmed Shariff to the University of Peradeniya. The report includes an introduction to the speech recognition system, requirements and specifications including use cases and class diagrams, a design strategy, and project plan for developing the system using CMUsphinx.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Speech Recognition System

A PROJECT REPORT SUBMITTED BY

Mohammed Flaeel Ahmed Shariff


(s/11/523)

to the

DEPARTMENT OF STATISTICS AND COMPUTER SCIENCE

In partial fulfillment of the requirement


for the award of the degree of

Bachelor of Science

of the

UNIVERSITY OF PERADENIYA
SRI LANKA

2015
CS304 – Project Report Speech Recognition System

Declaration
I hereby declare that the project work entitled “Speech Recognition System” submitted
to the University of Peradeniya, is a record of an original work done by me under the
guidance of ………………………………………………………., Staff Member, Department of
Statistics and Computer Science, Faculty of Science, University of Peradeniya, and this
project work has not performed the basis for the award of any Degree or
diploma/fellowship and similar project if any.

Signature: ………………………………. Date: …………………………….

Name: M.F.Ahmed Shariff


Registration Number: s/11/523

Certified By:

Supervisor:……………………….
Signature:…………………………
Date:………………………………..

1|Page
CS304 – Project Report Speech Recognition System

Acknowledgement
I express my sincere gratitude to the staff members of the Department of Statistics and
Computer Science, Faculty of Science, University of Peradeniya, for their support and
guidance in successfully complete this project.

Date: 30/11/2015
Name: M.F.Ahmed Shariff

2|Page
CS304 – Project Report Speech Recognition System

Abstract

The Speech Recognition System documented in this report is a system that uses the
CMUsphinx as the base API to obtain speech recognition results and is implemented using
Java. The primary goal of the system is to provide the user the ability to define how speech
is recognized, by providing models for the recognizer, how the speech result is processed
and what the consequent functions that need to be executed are. The user can provide
these details in the form of plugins, which are classes that implement the provided
interfaces packed in a jar file. The details of the classes to be loaded as modules must be
included in the configuration file. Using the provided interfaces, a user can implement a
broad range of functions using the plugin system provided with much ease.

3|Page
CS304 – Project Report Speech Recognition System

Contents
1. Introduction 6
2. Software requirements and specifications 7
2.1. Product perspective 7
2.1.1. Use Cases 7
2.1.1.1. Use case diagram 7
2.1.1.2. USE CASE: Recognize Speech 8
2.1.1.3. USE CASE: Process speech 8
2.1.1.4. USE CASE: Execute function 9
2.1.1.5. USE CASE: Provide details 10
2.1.1.6. USE CASE: Provide recognition details 10
2.1.1.7. USE CASE: Provide process details 10
2.1.1.8. USE CASE: Provide execution details 10
2.1.2. Class diagram 11
2.2. User Characteristics 16
2.3. Specific Requirements 16
2.3.1. Functional Requirements 16
2.3.2. External Interfaces 16
2.4. Performance Requirements 17
2.5. Design Constraints 18
3. Design Strategy 19
4. Project plan 20
4.1. The Engines 20
4.2. The plugin modules 21
4.3. The Recognizer Engine 21
4.4. The Response Engine 22
4.5. The System Engine 24
4.6. The System as a whole 25
5. Future work 26
6. Conclusion 26
7. Reference 27

4|Page
CS304 – Project Report Speech Recognition System

List of Figures
Figure 2.1.1. 1.1- Use case diagram 7
Figure 2.1.2. 1- Class diagram (1) 11
Figure 2.1.2. 1- Class diagram (2) 12
Figure 2.1.2. 2- Class diagram (3) 13
Figure 2.1.2. 3- Class diagram (4) 14
Figure 2.1.2. 4- Class diagram (5) 15
Figure 2.3.2. 1- The Output window 17
Figure 2.3.2. 2 - System Tray icon and popup menu 17
Figure 2.3.2. 3 - The System Console window 17
Figure 4.3. 1 - Recognizer Engine's process cycle 22
Figure 4.4. 1 - Response Engine's process cycle 23
Figure 4.4. 2 - Response Engine Processor's process 24
Figure 4.5. 1 - The System Engine's process cycle 24
Figure 4.6. 1 - Simplified model of the speech recognition system 25

5|Page
CS304 – Project Report Speech Recognition System

1. Introduction

Today we have many technologies that provide the functionality of communicating with
machines in human’s natural form, speech. Though, yet, the traditional means of
communicating with a machine or computers, such as switches, keyboards, etc., is still in
dominance due to the complexities that come with implementing a successful speech
recognition system with a broad range of functionalities. Having to recognize speech
successfully is one part of the problem, the other part of the problem is being able to do
anything using what is said. There aren’t many systems that implement a speech
recognition system with the flexibility such as a mouse or a keyboard has when interacting
with a computer, that is, what we can do using our voice alone tends to be somewhat
limited. The system designed for this project is an attempt to provide a system which can
be easily adapted to broaden the range of functionalities of a speech recognition system.
To further elaborate, say a system is designed to simply type what is being said, which can
be done using the existing API’s available. If the user want to be able to use this for
another purpose such as to give commands to the computer, adapting such a system to
suit the later need, it would be a tedious task. If the user is even more ambitious and want
to automate functions around the house, such as switch on lights or other appliances, the
adaption process is becomes even more complex.
The system designed here provides an interface to be able to easily design a system that
can do the bidding of the user as they want, the systems a user develops, which are
essentially simple instructions, can be easily executed by simply providing them as plugins
to the system. That is the user can provide details to the system in the form of plugins,
details such as, the context in which speech must be recognizer, or in other terms, what
exactly are the set of words the recognizer should be looking for, how the words
recognized should be processed, and what should be done with the processes results.
The speech recognition API sued in this project is CMUsphinx, which is an open source
speech recognition API developed by the Carnegie Mellon University, which is also one of
the leading open source speech recognizers available today. And the system is completely
designed in Java.

6|Page
CS304 – Project Report Speech Recognition System

2. Software requirements and specifications


2.1. Product Perspective
2.1.1. Use Cases
2.1.1.1. Use Case Diagram

Figure 2.1.1. 1.1- Usecase diagram

7|Page
CS304 – Project Report Speech Recognition System

2.1.1.2. USE CASE: Recognize Speech


Primary Actor: User
Stake Holders and Interests:
 User – wants the system to recognize what he/she is saying.
Pre-conditions:
 The system is provided the details of the context the user wants his/her speech to
be recognized.
 The System has started and running, and the mic is on and functioning.
Success guarantees:
 The system successfully recognizes what was said by the user.
Trigger:
 The micro phone records a continuous audio signal, which can be speech.
Main Success Scenario:
1. The user utters a set of words.
2. The system pauses.
3. The system get the details of the context of the speech recognition.
4. The words are recognized based on the details.
5. Recognized set of words are recorded.
6. System resumes listening.
Extensions:
1. a. If the words spoken by the user is not provided in the context details, those
words will be marked as unclear words.
Open issues:
 The words spoken can be wrongly recognized.
 Any noise can be recognized as words spoken by the user.

2.1.1.3. USE CASE: Process speech


Stake holders and Interests:
 User
 Executing system – the system in which the system is running.

8|Page
CS304 – Project Report Speech Recognition System

Pre-conditions:
 The details of processing speech is provided.
Success Guarantees:
 A processed speech result is obtained.
Main Success Scenario:
1. The recognized words are obtained – include – Recognize speech.
2. The details regarding processing the words are obtained.
3. The words recognized are processed based on the details.
4. The processed speech result is outputted.

2.1.1.4. USE CASE: Execute function


Primary Actor: Executing System.
Stakeholders and Interests:
 User – requires the system to ensure the correct function related to the speech
result is executed.
 Executing System – executes the function related to the speech result.
Pre-conditions:
 The details of execution is provided.
Success Guarantees:
 The function related to the words the user utters is executed.
Trigger:
 The micro phone receives an audio signal which can be speech.
Main Success Scenario:
1. The processed speech result is obtained – include – Process speech
2. The provided execution details are obtained.
3. A function related to the processed speech result is passed to the executing system
to process.
Open Issues:
 The provided function may not be supported in the executing system.
 There can be no related functions to a particular speech result.

9|Page
CS304 – Project Report Speech Recognition System

2.1.1.5. USE CASE: Provide details


Primary Actor: Developer
Stakeholders and Interests:
 User – wants the system to recognize and execute functions related to a set of
words the user utters.
 Developer – Wants to provide necessary details to the system to ensure proper
functionality as expected.
 Executing System – Requires the necessary details to execute any function the
system needs to execute
Success Guarantees:
 The details the developer wanted to provide has been provided which can be
loaded to the system when it starts.
Main Success Scenario:
1. The developer uses the provided interfaces to provide the system with necessary
details.
2. The details are placed in the pre-defined location in the disk.
3. Necessary changes are made to the systems configuration.

2.1.1.6. USE CASE: Provide recognition details


Extended Main Success Scenario:
1. A. The developer uses the provided interfaces to provide the system with details
of the context in which speech needs to be recognized.

2.1.1.7. USE CASE: Provide process details


Extended Main Success Scenario:
1. A. The developer uses the provided interfaces to provide the system with details
regarding the processing of speech results obtained.

2.1.1.8. USE CASE: Provide execution details


Extended Main Success Scenario:
1. A. The developer uses the provided interfaces to provide the system with details
of the functions the executing system has to execute for a given processed speech
result.

10 | P a g e
CS304 – Project Report Speech Recognition System

2.1.2. Class diagram


The following class diagram describes the design of the system. Implementation details
are not included in these class diagram. The class diagram has been divided into 5 parts.

Figure 2.1.2. 5- Class diagram (1) – The Classes 'Response' and 'ResponseEngineProcess' are described in Figure 2.1.2.2-Class
diagram (2) and Figure 2.1.2.3-Class diagram (3) respectively. Classes ‘Queue’ and ‘PrintWriter’ are classes from the Java API

11 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.1.2. 6-Class diagram (2) - The interface ‘ResponseEngineInterface’ is described in Figure 2.1.2.3- Class diagram
(3). The classes ‘Configuration’ and ‘LiveSpeechRecognizer’ are from the sphinx API, and the interface ‘Runnable’ is from
the Java API.

12 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.1.2. 7- Class diagram (3) – The interfaces ‘ModuleSet’ and ‘SystemEngineInterface’ will be described
in Figure 2.1.2.5-Class diagram (5) and Figure 2.1.2.4- Class diagram (4) respectively. ‘BlockingQueue’,
‘Runnable’ and ‘List’ are from the Java API.

13 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.1.2. 8- Class Diagram (4)- The interface ’ModuleSet’ is described in Figure 2.1.2.5-Class diagram
(5). ‘BlockingQueue’,’Runnable’ and ‘PrintWriter’ are from the Java API

14 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.1.2. 9- Class diagram (5) – ‘Map’, ‘List’ and ‘NodeList’ are from the Java API

15 | P a g e
CS304 – Project Report Speech Recognition System

2.2. User Characteristics


The target users of this system are those who have the need to easily develop and deploy
their own speech recognition system with its own features. A knowledge on speech
recognition is only optional, however, in order to develop plugins for the system a
reasonable knowledge in Java would be necessary.
But, if a user is to use a set of plugins developed by a third party, a basic computer
knowledge will suffice. However the system documented in this report does not place
constrains on what a plugin may do to the system it is running on, when it is loaded, as
the goal of the system to provide the developer to be able to implement functions of their
own liking. Hence to use a third party plugin is at the discretion of the user.

2.3. Specific Requirements


2.3.1. Functional Requirements
The speech recognition system requires Java to be installed. Also it needs a functioning
microphone connected to function.
When a plugin is developed, the plugin modules must be packed in a jar file, the packed
jar file must be placed in a directory named “Plugin files”. This directory must be in the
same directory as the speech recognition system’s main jar file is in. Also the necessary
information must be included in the “configuration.xml” file, which must be placed in the
same directory as the “Plugin files”. The “configuration.xml” file must follow the schema
specified. The information that must be included in the “configuration.xml” file is the
classes that implement the specific module interface that must be loaded to the system,
and which of the classes must be set as the active when the speech recognition starts to
running.
The modules in the system, which are effectively classes that implement a specified
interface, will be referred by their simple name, when instructions are to be passed to the
speech recognition system to refer to another module, the name it uses must be the
simple name. When name conflicts occur, any one of the modules with the same name
will be loaded.
If one module implements more than one module interfaces, it must be listed in the
“configuration.xml” file in each type of module’s interface it implements. However for
each type of module, a new instance of the module will be created.
2.3.2. External Interfaces
When the speech recognition system is launched, The ‘Output’ window will appear, only
a System Module (see section 4.2) can write to this window, if it implements the specified
interface that passes the reference to the stream bound to this window. Closing this will
not stop the Speech recognition system, only minimize it to the system tray.
In the system tray, when the icon as shown in figure 2.3.2.2, from which you can open the
and close the ‘output’ window as needed. Also the ‘System console’ window can be
opened from here, which will display all the functions of the system. This window can be
used in case the plugins need to be tested for bugs, as any exception in the functions of
the system will be displayed here. Also this is bound to the systems default output stream.
16 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.3.2. 1- The Output window Figure 2.3.2. 2 – System Tray icon and popup menu

Figure 2.3.2. 3 – The System Console window

Next option in the system tray popup menu is to pause the system. Which, when selected
will stop following the next speech result. The reason for not being able to stop
instantaneously is addressed in section 2.5. When deselected, the system will start from
where it stopped.
When exit is selected from this menu, the speech recognitions system will effectively exit.

2.4. Performance Requirements


The system must have Java installed in the system. A minimum of 750 MB of memory is
needed for the speech recognition system to run. Also a minimum of 70MB of disk space
is required.

17 | P a g e
CS304 – Project Report Speech Recognition System

2.5. Design Constraints


The speech recognizer used in this system, the CUMsphinx speech recognition system,
does not provide the facility to switch any models it uses during runtime. That is, if the
context of the speech recognition need to be changed, the API does not provide any
default methods for this purpose. Hence, in the system implemented here, whenever
there is the need to switch the context, which is the models used in the speech recognizer,
the speech recognizer is stopped and re-instantiated.
The system here uses the LiveSpeechRecognizer of the sphinx API, as it is live speech being
considered, though the system can be extended to implement recognition on a different
source, such as an audio stream over a network port. The LiveSpeechRecognizer class
binds the microphone resource to it. Once the microphone resource is acquired, the API
does not provide a functionality to close the line and release the resource, which is
required to re-instantiate the recognizer with a new context. As a solution to this problem,
code was introduced to the sphinx API source to close the resource and free it when
necessary.
The recognizer has several states it can be in. When the recognizer is ready to obtain a
new speech result, it will be in the READY state. When a speech result is requested, the
recognizer enters a RECOGNIZING state. Once a speech result is obtained, it returns the
result, and goes back to READY state, waiting for it to be requested another speech result.
The API does not provide a system to facilitate interrupting the recognizer while it is in
the RECOGNIZING state. The system implemented here also does not introduce a method
when such interruption is needed.

18 | P a g e
CS304 – Project Report Speech Recognition System

3. Design Strategy

First of all, the system will be implemented in Java, as it is platform independent, making
the system portable, providing the user to implement what they want and not bother
about the platform it is running on. The primary focus of the system designed here is to
provide an interface any user can use to implement their own system using speech
recognition. For this purpose, a plugin system is implemented. The necessary interfaces
will be provided, the user can implement the interfaces that they need to accomplish their
task, and place them in a predefined folder packed as jar files. Also the plugins will not
depend on any of the components of the primary system, this is to ensure that the user
does not alter the core functions by providing an illegal instruction.
The user can provide three types of details:
1. How speech is recognized (the context in which speech is recognized)
A recognizer needs three important components other the audio stream.
I. Acoustic model: It is the model that represent the relationship between an audio
signal and the corresponding linguistic feature or phonemes they represent.
II. Dictionary Model: It comprises of a list of words that will be recognized by the
speech recognizer and respective the phonemes or linguistic feature of each word.
III. Language/grammar model: This is the mapping of the order of word that will be
spoken.
The user will be able to provide the necessary models they want to use in the speech
recognition, and include the path of these resources in the plugins, which in turn the
system will load and use for the recognition process.
2. How the recognition result is processed.
In this phase the user can provide instructions to the system as to what should be done
with the speech result obtained from the recognizer. For example, it can further filter
the response so that the in the consequent steps they request can be more easily
processed, or the user may instruct the system to switch the models used in the
recognizer, or else simply pass the speech result to be processed by the next phase.
3. What is to be done with the result that was processed.
Here the user can provide information on what the system must do with the result. It
can be virtually anything that can be programmatically done.
To handle the three different details a user may provide, three engines are used, each
engine handling a different type of detail. For the processing part, the user is provided
with a set of instructions that can be processed by the system. In order for the engines to
communicate with each other, the speech result will be wrapped in an object where each
engine can add additional information to this object.

19 | P a g e
CS304 – Project Report Speech Recognition System

4. Project plan
4.1. The Engines
The system primarily has three engines.
1. Recognizer Engine
2. Response Engine
3. System Engine

1. Recognizer Engine
The Recognizer Engine’s responsibility is to obtain the speech result from the recognizer
and to switch the models used when requested to do so. The Recognizer Engine will obtain
the result and pass it to the Response Engine and wait for the Response engine to signal
it to proceed or switch the models it is using.
2. Response Engine
The Response engine is responsible for deciding what has to be done with the speech
result obtained, and what is to be done after the speech result is obtained. The set of its
functions are as follows:
 Instruct the Recognizer Engine to proceed and obtain the next speech result.
 Instruct the Recognizer to change a model it is using.
 Process the speech result.
 Wrap the speech result with information the System Engine needs to execute the
function related to the speech result.
 Pass the speech result to the system engine for it to proceed with its functions.
3. System Engine
The system engine will execute the function related to the speech result obtained.
Another responsibility of the system engine is to identify if a model used in the recognizer
engine needs to be rebuilt. In that case, it will request the Response Engine to be able to
proceed with the build, the Response Engine will then pause the Recognize Engine and
signal the System Engine to proceed with the build. When the building process is complete
the Response engine will be signaled to proceed with its functions.

20 | P a g e
CS304 – Project Report Speech Recognition System

4.2. The plugin modules


As outlined in part 3, the system has three types of details that will be provided to the
system by the user via plugins. There are a total of six different module interfaces with
different responsibilities the user can implement in the plugin, through these interfaces
the system will obtain the necessary details needed to function. The modules are as
follows:
1. Acoustic Module: This module will be providing reference to the acoustic model
resource. Also the acoustic model can be dynamically re built through this module.
2. Dictionary Module: This module will be providing reference to the dictionary
model resource. Also the dictionary model can be dynamically re built through
this module.
3. Language Module: This module will be providing the reference to the
Language/Grammar model to be used in the system. Also the model can be
dynamically re built using through this module.
4. Response Handler Module: The responsibility of this module is to provide the steps
to be taken by the Response Engine once a speech result has been obtained, based
on the speech result. The set of steps that can be taken are predefined.
5. Response Generator Module: The responsibility of the module is to obtain what
the speech result contains and decide what is being said. As the speech result
obtained can be a random set of words, the responsibility of the System Engine is
simplified when the result is filtered, and is less random.
6. System Module: This is the module that takes the action related to the speech
result. It will be given the processed speech result and it will execute the function
related to the speech result.

4.3. The Recognizer Engine


As briefly described above, the Recognizer Engine will obtain the speech result from the
recognizer and pass it the Response Engine. The speech result will be wrapped in an object
where additional information can be included to it at later stages, which is known as a
response in the context of this system.
Another responsibility of the Recognizer Engine is to switch the model being used by the
recognizer. The module that hold the reference to the model will be passed to the
recognizer engine, and it will switch the relevant model. Note that the Recognizer Engine
does not hold the references to all the models that can be loaded to the recognizer. The
sphinx API does not provide a default method to switch the model being sued dynamically.
Hence the Response Engine will re instantiate the recognizer with the relevant models.

21 | P a g e
CS304 – Project Report Speech Recognition System

Obtain Speech result


from recognizer

Wait for Response


Wrap speech result
Engine to signal to
in a response object
continue recognition

If a model needs to Pass response object


swicthed, do so to Response Engine

Figure 4.3. 1- Recognizer Engine's process cycle

4.4. The Response Engine

The Response Engine’s function can be described as coordinating the functions of the
Recognizer Engine and System Engine. The functions of the Response Engine are
coordinated by a Response Engine Processor. The processor will have one of the
Response Handler Modules and Response Generator Modules set as active. Also the
references of all modules of types Response Handler Module, Response Generator
Module, Acoustic Module, Dictionary Module and Language Module, which are all loaded
to the system are stored in the Processor. The list of modules that will be loaded to the
system are defined by a configuration file. When the Response Engine is passed the
response object, the Processor will get this response and pass it to the active Response
Handler Module. The Response Handler module will return a process queue to the
Processor, containing instructions for the Processor to execute. The list of instructions the
Processor can execute are as follows:

 Pass response to generator- The response will be passed to the active Response
Generator Module, which will return the response object with additional
information attached to it.
 Pass response to system- The response will be passed to the System Engine.
 Switch Response Handler Module- The active Response Handler Module will be
switched to the specified Response Handler Module from the Module references
stored.
 Switch Response Generator Module- The active Response Generator Module will
be switched to the specified Response Generator Module from the Module
references stored.

22 | P a g e
CS304 – Project Report Speech Recognition System

 Switch Acoustic Module- The specified Acoustic Module will be passed to the
Recognizer Engine through the Response Engine to load the acoustic model the
specified Acoustic Module refers to.
 Switch Dictionary Module- The specified Dictionary Module will be passed to the
Recognizer Engine through the Response Engine to load the dictionary model the
specified Dictionary Module refers to.
 Switch Language Module- The specified Language Module will be passed to the
Recognizer Engine through the Response Engine to load the Language model the
specified Language Module refers to.
 Wait for a predefined period of time- A null response object will be passed to the
Response Engine in the predetermined period of time, if the Processor was not
instructed to pass a response object to the System Engine within that period of
time. This instruction can be used to implement functionality such as providing the
user a brief period of time to be able to cancel a function related to the speech
result before it is executed. (Note: if a model of the Recognizer Engine was
switched to obtain a different type of recognition result during the period, it can
cause inconsistencies in the result, as the default API of sphinx does not provide
for a method to interrupt the recognizer when it enters a RECOGNIZING state)
Once the Processor has completed executing all the instructions provided to it by the
active Response Handler Module, the Response Engine will signal the Recognizer Engine
to continue, and wait for the next response to be passed to it.
Every time a new response is obtained, before passing it to the Processor, it will check if
the System Engine has placed a request to build, if so, it will signal the System Engine and
wait to be signaled back to continue with its functions.

Get a response from


Recognizer Engine

If System Engine has


Signal the recognizer to placed a request to build,
continue recognition wait for it to be
completed

Pass the obtained


Wait for the Processor to response to the
complete its instructions Response Engine
Processor

Figure 4.4. 1- Response Engine's process cycle

23 | P a g e
CS304 – Project Report Speech Recognition System

Get process queue


Pass to active Execute all process
from the active
Get response Response Handler intructions from
Response Handler
Module returned queue
Module

Figure 4.4. 2- Response Engine Processor's process

4.5. The System Engine


The System Engine is part of the system that executes the functions related to a speech
result. It obtains the responses passed from the Response Engine, and based on the key
in the response, which was included in the process in the Response Engine, the
appropriate System Module will be handed the response, which will execute the function
related to the speech result in the response object. A module selector will be assisting in
which module is chosen. If a model loaded in the system need to be rebuilt, the System
Engine will inform the Response Engine to build the relevant model. When the Response
Engine signals the System Engine to continue to build, the System Module that requested
to build will be called to continue with the building procedure, once it is finished, the
Response Engine will be signaled back to continue with its functions.

Get Response

When module
Allow the module
completes build, signal
selector to choose the
Response Engine to
appropriate module
continue.

When Response ENgine Pass the response to the


signals back, allow the module and let it
module to continue execute the relevent
build functions

If such a request is Check if the module that


placed, inform the was selected has placed
Response Engine and a request to build
wait model

Figure 4.5. 1 The System Engine's process cycle

24 | P a g e
CS304 – Project Report Speech Recognition System

4.6. The System as a whole

System Engine Module Selector System Module

Response Generator
Module
Response Engine
Response Engine
Processor
Response Handler
Module

Acoustic Module

Recognizer Engine Dictionary Module

Language Module

Engine Engine component Plugin Module

Figure 4.6. 1- Simplified model of the speech recognition system

The three engines are designed as singletons to avoid conflicts for resources. Also note
that the modules, Acoustic Module, Dictionary Module and Language Module are
related to the Recognizer engine, yet all loaded Modules will be stored in the Response
Engine Processor. Also the plugins modules may communicate among them, to improve
their functionalities.

25 | P a g e
CS304 – Project Report Speech Recognition System

5. Future work

One of the primary focuses in improving the speech recognition system, is to improve how
configuration details are provided and managed. Which also can include functionality to
load modules during runtime. Also a system which provides a graphical user interface that
can automate the process of building plugins is planned, which will eliminate the need of
programming knowledge to implement simple plugins.
Another aspect that will be taken into consideration is to provide functionality for other
audio sources such as audio streams or audio files, for which the sphinx API provides
functionalities. Which will allow the system to be implemented in network systems,
servers, etc.

6. Conclusion

Giving the machines the ability to communicate with the humans in human’s natural
medium of communications has always been a fascinating prospect, and the modern
technologies have brought humans closer to realizing this dream than ever before. Yet
providing the necessary intelligence a machine needs to be able to flawlessly
communicate with machines is the greatest challenge in realizing this dream at this stage.
The Speech Recognition System designed and documented in this report is an attempt to
provide the users a simple interface for to provide their own details to recognize speech
and have computers do their bidding.

26 | P a g e
CS304 – Project Report Speech Recognition System

7. References
 JavaTM Speech API Programmer's Guide, Sun Microsystems, Inc, Retrieved: July 8,
2015, from: http://www.ling.helsinki.fi/kit/2004s/ctl310gen/L7-
Speech/JSAPI/index.html

 CMUSphinx Wiki, Retrieved: July 8, 2015, from:


http://cmusphinx.sourceforge.net/wiki/

27 | P a g e

You might also like