0% found this document useful (1 vote)

2K views28 pages

Speech Recognition System: A Project Report Submitted by

The document describes a speech recognition system project report submitted by Mohammed Flaeel Ahmed Shariff to the University of Peradeniya. The report includes an introduction to the speech recognition system, requirements and specifications including use cases and class diagrams, a design strategy, and project plan for developing the system using CMUsphinx.

Uploaded by

Rajeev Ranjan Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

2K views28 pages

Speech Recognition System: A Project Report Submitted by

Uploaded by

Rajeev Ranjan Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Speech Recognition System

A PROJECT REPORT SUBMITTED BY

Mohammed Flaeel Ahmed Shariff

(s/11/523)

to the

DEPARTMENT OF STATISTICS AND COMPUTER SCIENCE

In partial fulfillment of the requirement

for the award of the degree of

Bachelor of Science

of the

UNIVERSITY OF PERADENIYA
SRI LANKA

2015
CS304 – Project Report Speech Recognition System

Declaration
I hereby declare that the project work entitled “Speech Recognition System” submitted
to the University of Peradeniya, is a record of an original work done by me under the
guidance of ………………………………………………………., Staff Member, Department of
Statistics and Computer Science, Faculty of Science, University of Peradeniya, and this
project work has not performed the basis for the award of any Degree or
diploma/fellowship and similar project if any.

Signature: ………………………………. Date: …………………………….

Name: M.F.Ahmed Shariff

Registration Number: s/11/523

Certified By:

Supervisor:……………………….
Signature:…………………………
Date:………………………………..

1|Page
CS304 – Project Report Speech Recognition System

Acknowledgement
I express my sincere gratitude to the staff members of the Department of Statistics and
Computer Science, Faculty of Science, University of Peradeniya, for their support and
guidance in successfully complete this project.

Date: 30/11/2015
Name: M.F.Ahmed Shariff

2|Page
CS304 – Project Report Speech Recognition System

Abstract

The Speech Recognition System documented in this report is a system that uses the
CMUsphinx as the base API to obtain speech recognition results and is implemented using
Java. The primary goal of the system is to provide the user the ability to define how speech
is recognized, by providing models for the recognizer, how the speech result is processed
and what the consequent functions that need to be executed are. The user can provide
these details in the form of plugins, which are classes that implement the provided
interfaces packed in a jar file. The details of the classes to be loaded as modules must be
included in the configuration file. Using the provided interfaces, a user can implement a
broad range of functions using the plugin system provided with much ease.

3|Page
CS304 – Project Report Speech Recognition System

Contents
1. Introduction 6
2. Software requirements and specifications 7
2.1. Product perspective 7
2.1.1. Use Cases 7
2.1.1.1. Use case diagram 7
2.1.1.2. USE CASE: Recognize Speech 8
2.1.1.3. USE CASE: Process speech 8
2.1.1.4. USE CASE: Execute function 9
2.1.1.5. USE CASE: Provide details 10
2.1.1.6. USE CASE: Provide recognition details 10
2.1.1.7. USE CASE: Provide process details 10
2.1.1.8. USE CASE: Provide execution details 10
2.1.2. Class diagram 11
2.2. User Characteristics 16
2.3. Specific Requirements 16
2.3.1. Functional Requirements 16
2.3.2. External Interfaces 16
2.4. Performance Requirements 17
2.5. Design Constraints 18
3. Design Strategy 19
4. Project plan 20
4.1. The Engines 20
4.2. The plugin modules 21
4.3. The Recognizer Engine 21
4.4. The Response Engine 22
4.5. The System Engine 24
4.6. The System as a whole 25
5. Future work 26
6. Conclusion 26
7. Reference 27

4|Page
CS304 – Project Report Speech Recognition System

List of Figures
Figure 2.1.1. 1.1- Use case diagram 7
Figure 2.1.2. 1- Class diagram (1) 11
Figure 2.1.2. 1- Class diagram (2) 12
Figure 2.1.2. 2- Class diagram (3) 13
Figure 2.1.2. 3- Class diagram (4) 14
Figure 2.1.2. 4- Class diagram (5) 15
Figure 2.3.2. 1- The Output window 17
Figure 2.3.2. 2 - System Tray icon and popup menu 17
Figure 2.3.2. 3 - The System Console window 17
Figure 4.3. 1 - Recognizer Engine's process cycle 22
Figure 4.4. 1 - Response Engine's process cycle 23
Figure 4.4. 2 - Response Engine Processor's process 24
Figure 4.5. 1 - The System Engine's process cycle 24
Figure 4.6. 1 - Simplified model of the speech recognition system 25

5|Page
CS304 – Project Report Speech Recognition System

1. Introduction

Today we have many technologies that provide the functionality of communicating with
machines in human’s natural form, speech. Though, yet, the traditional means of
communicating with a machine or computers, such as switches, keyboards, etc., is still in
dominance due to the complexities that come with implementing a successful speech
recognition system with a broad range of functionalities. Having to recognize speech
successfully is one part of the problem, the other part of the problem is being able to do
anything using what is said. There aren’t many systems that implement a speech
recognition system with the flexibility such as a mouse or a keyboard has when interacting
with a computer, that is, what we can do using our voice alone tends to be somewhat
limited. The system designed for this project is an attempt to provide a system which can
be easily adapted to broaden the range of functionalities of a speech recognition system.
To further elaborate, say a system is designed to simply type what is being said, which can
be done using the existing API’s available. If the user want to be able to use this for
another purpose such as to give commands to the computer, adapting such a system to
suit the later need, it would be a tedious task. If the user is even more ambitious and want
to automate functions around the house, such as switch on lights or other appliances, the
adaption process is becomes even more complex.
The system designed here provides an interface to be able to easily design a system that
can do the bidding of the user as they want, the systems a user develops, which are
essentially simple instructions, can be easily executed by simply providing them as plugins
to the system. That is the user can provide details to the system in the form of plugins,
details such as, the context in which speech must be recognizer, or in other terms, what
exactly are the set of words the recognizer should be looking for, how the words
recognized should be processed, and what should be done with the processes results.
The speech recognition API sued in this project is CMUsphinx, which is an open source
speech recognition API developed by the Carnegie Mellon University, which is also one of
the leading open source speech recognizers available today. And the system is completely
designed in Java.

6|Page
CS304 – Project Report Speech Recognition System

2. Software requirements and specifications

2.1. Product Perspective
2.1.1. Use Cases
2.1.1.1. Use Case Diagram

Figure 2.1.1. 1.1- Usecase diagram

7|Page
CS304 – Project Report Speech Recognition System

2.1.1.2. USE CASE: Recognize Speech

Primary Actor: User
Stake Holders and Interests:
 User – wants the system to recognize what he/she is saying.
Pre-conditions:
 The system is provided the details of the context the user wants his/her speech to
be recognized.
 The System has started and running, and the mic is on and functioning.
Success guarantees:
 The system successfully recognizes what was said by the user.
Trigger:
 The micro phone records a continuous audio signal, which can be speech.
Main Success Scenario:
1. The user utters a set of words.
2. The system pauses.
3. The system get the details of the context of the speech recognition.
4. The words are recognized based on the details.
5. Recognized set of words are recorded.
6. System resumes listening.
Extensions:
1. a. If the words spoken by the user is not provided in the context details, those
words will be marked as unclear words.
Open issues:
 The words spoken can be wrongly recognized.
 Any noise can be recognized as words spoken by the user.

2.1.1.3. USE CASE: Process speech

Stake holders and Interests:
 User
 Executing system – the system in which the system is running.

8|Page
CS304 – Project Report Speech Recognition System

Pre-conditions:
 The details of processing speech is provided.
Success Guarantees:
 A processed speech result is obtained.
Main Success Scenario:
1. The recognized words are obtained – include – Recognize speech.
2. The details regarding processing the words are obtained.
3. The words recognized are processed based on the details.
4. The processed speech result is outputted.

2.1.1.4. USE CASE: Execute function

Primary Actor: Executing System.
Stakeholders and Interests:
 User – requires the system to ensure the correct function related to the speech
result is executed.
 Executing System – executes the function related to the speech result.
Pre-conditions:
 The details of execution is provided.
Success Guarantees:
 The function related to the words the user utters is executed.
Trigger:
 The micro phone receives an audio signal which can be speech.
Main Success Scenario:
1. The processed speech result is obtained – include – Process speech
2. The provided execution details are obtained.
3. A function related to the processed speech result is passed to the executing system
to process.
Open Issues:
 The provided function may not be supported in the executing system.
 There can be no related functions to a particular speech result.

9|Page
CS304 – Project Report Speech Recognition System

2.1.1.5. USE CASE: Provide details

Primary Actor: Developer
Stakeholders and Interests:
 User – wants the system to recognize and execute functions related to a set of
words the user utters.
 Developer – Wants to provide necessary details to the system to ensure proper
functionality as expected.
 Executing System – Requires the necessary details to execute any function the
system needs to execute
Success Guarantees:
 The details the developer wanted to provide has been provided which can be
loaded to the system when it starts.
Main Success Scenario:
1. The developer uses the provided interfaces to provide the system with necessary
details.
2. The details are placed in the pre-defined location in the disk.
3. Necessary changes are made to the systems configuration.

2.1.1.6. USE CASE: Provide recognition details

Extended Main Success Scenario:
1. A. The developer uses the provided interfaces to provide the system with details
of the context in which speech needs to be recognized.

2.1.1.7. USE CASE: Provide process details

Extended Main Success Scenario:
1. A. The developer uses the provided interfaces to provide the system with details
regarding the processing of speech results obtained.

2.1.1.8. USE CASE: Provide execution details

Extended Main Success Scenario:
1. A. The developer uses the provided interfaces to provide the system with details
of the functions the executing system has to execute for a given processed speech
result.

10 | P a g e
CS304 – Project Report Speech Recognition System

2.1.2. Class diagram

The following class diagram describes the design of the system. Implementation details
are not included in these class diagram. The class diagram has been divided into 5 parts.

Figure 2.1.2. 5- Class diagram (1) – The Classes 'Response' and 'ResponseEngineProcess' are described in Figure 2.1.2.2-Class
diagram (2) and Figure 2.1.2.3-Class diagram (3) respectively. Classes ‘Queue’ and ‘PrintWriter’ are classes from the Java API

11 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.1.2. 6-Class diagram (2) - The interface ‘ResponseEngineInterface’ is described in Figure 2.1.2.3- Class diagram
(3). The classes ‘Configuration’ and ‘LiveSpeechRecognizer’ are from the sphinx API, and the interface ‘Runnable’ is from
the Java API.

12 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.1.2. 7- Class diagram (3) – The interfaces ‘ModuleSet’ and ‘SystemEngineInterface’ will be described
in Figure 2.1.2.5-Class diagram (5) and Figure 2.1.2.4- Class diagram (4) respectively. ‘BlockingQueue’,
‘Runnable’ and ‘List’ are from the Java API.

13 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.1.2. 8- Class Diagram (4)- The interface ’ModuleSet’ is described in Figure 2.1.2.5-Class diagram
(5). ‘BlockingQueue’,’Runnable’ and ‘PrintWriter’ are from the Java API

14 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.1.2. 9- Class diagram (5) – ‘Map’, ‘List’ and ‘NodeList’ are from the Java API

15 | P a g e
CS304 – Project Report Speech Recognition System

2.2. User Characteristics

The target users of this system are those who have the need to easily develop and deploy
their own speech recognition system with its own features. A knowledge on speech
recognition is only optional, however, in order to develop plugins for the system a
reasonable knowledge in Java would be necessary.
But, if a user is to use a set of plugins developed by a third party, a basic computer
knowledge will suffice. However the system documented in this report does not place
constrains on what a plugin may do to the system it is running on, when it is loaded, as
the goal of the system to provide the developer to be able to implement functions of their
own liking. Hence to use a third party plugin is at the discretion of the user.

2.3. Specific Requirements

2.3.1. Functional Requirements
The speech recognition system requires Java to be installed. Also it needs a functioning
microphone connected to function.
When a plugin is developed, the plugin modules must be packed in a jar file, the packed
jar file must be placed in a directory named “Plugin files”. This directory must be in the
same directory as the speech recognition system’s main jar file is in. Also the necessary
information must be included in the “configuration.xml” file, which must be placed in the
same directory as the “Plugin files”. The “configuration.xml” file must follow the schema
specified. The information that must be included in the “configuration.xml” file is the
classes that implement the specific module interface that must be loaded to the system,
and which of the classes must be set as the active when the speech recognition starts to
running.
The modules in the system, which are effectively classes that implement a specified
interface, will be referred by their simple name, when instructions are to be passed to the
speech recognition system to refer to another module, the name it uses must be the
simple name. When name conflicts occur, any one of the modules with the same name
will be loaded.
If one module implements more than one module interfaces, it must be listed in the
“configuration.xml” file in each type of module’s interface it implements. However for
each type of module, a new instance of the module will be created.
2.3.2. External Interfaces
When the speech recognition system is launched, The ‘Output’ window will appear, only
a System Module (see section 4.2) can write to this window, if it implements the specified
interface that passes the reference to the stream bound to this window. Closing this will
not stop the Speech recognition system, only minimize it to the system tray.
In the system tray, when the icon as shown in figure 2.3.2.2, from which you can open the
and close the ‘output’ window as needed. Also the ‘System console’ window can be
opened from here, which will display all the functions of the system. This window can be
used in case the plugins need to be tested for bugs, as any exception in the functions of
the system will be displayed here. Also this is bound to the systems default output stream.
16 | P a g e
CS304 – Project Report Speech Recognition System

Figure 2.3.2. 1- The Output window Figure 2.3.2. 2 – System Tray icon and popup menu

Figure 2.3.2. 3 – The System Console window

Next option in the system tray popup menu is to pause the system. Which, when selected
will stop following the next speech result. The reason for not being able to stop
instantaneously is addressed in section 2.5. When deselected, the system will start from
where it stopped.
When exit is selected from this menu, the speech recognitions system will effectively exit.

2.4. Performance Requirements

The system must have Java installed in the system. A minimum of 750 MB of memory is
needed for the speech recognition system to run. Also a minimum of 70MB of disk space
is required.

17 | P a g e
CS304 – Project Report Speech Recognition System

2.5. Design Constraints

The speech recognizer used in this system, the CUMsphinx speech recognition system,
does not provide the facility to switch any models it uses during runtime. That is, if the
context of the speech recognition need to be changed, the API does not provide any
default methods for this purpose. Hence, in the system implemented here, whenever
there is the need to switch the context, which is the models used in the speech recognizer,
the speech recognizer is stopped and re-instantiated.
The system here uses the LiveSpeechRecognizer of the sphinx API, as it is live speech being
considered, though the system can be extended to implement recognition on a different
source, such as an audio stream over a network port. The LiveSpeechRecognizer class
binds the microphone resource to it. Once the microphone resource is acquired, the API
does not provide a functionality to close the line and release the resource, which is
required to re-instantiate the recognizer with a new context. As a solution to this problem,
code was introduced to the sphinx API source to close the resource and free it when
necessary.
The recognizer has several states it can be in. When the recognizer is ready to obtain a
new speech result, it will be in the READY state. When a speech result is requested, the
recognizer enters a RECOGNIZING state. Once a speech result is obtained, it returns the
result, and goes back to READY state, waiting for it to be requested another speech result.
The API does not provide a system to facilitate interrupting the recognizer while it is in
the RECOGNIZING state. The system implemented here also does not introduce a method
when such interruption is needed.

18 | P a g e
CS304 – Project Report Speech Recognition System

3. Design Strategy

First of all, the system will be implemented in Java, as it is platform independent, making
the system portable, providing the user to implement what they want and not bother
about the platform it is running on. The primary focus of the system designed here is to
provide an interface any user can use to implement their own system using speech
recognition. For this purpose, a plugin system is implemented. The necessary interfaces
will be provided, the user can implement the interfaces that they need to accomplish their
task, and place them in a predefined folder packed as jar files. Also the plugins will not
depend on any of the components of the primary system, this is to ensure that the user
does not alter the core functions by providing an illegal instruction.
The user can provide three types of details:
1. How speech is recognized (the context in which speech is recognized)
A recognizer needs three important components other the audio stream.
I. Acoustic model: It is the model that represent the relationship between an audio
signal and the corresponding linguistic feature or phonemes they represent.
II. Dictionary Model: It comprises of a list of words that will be recognized by the
speech recognizer and respective the phonemes or linguistic feature of each word.
III. Language/grammar model: This is the mapping of the order of word that will be
spoken.
The user will be able to provide the necessary models they want to use in the speech
recognition, and include the path of these resources in the plugins, which in turn the
system will load and use for the recognition process.
2. How the recognition result is processed.
In this phase the user can provide instructions to the system as to what should be done
with the speech result obtained from the recognizer. For example, it can further filter
the response so that the in the consequent steps they request can be more easily
processed, or the user may instruct the system to switch the models used in the
recognizer, or else simply pass the speech result to be processed by the next phase.
3. What is to be done with the result that was processed.
Here the user can provide information on what the system must do with the result. It
can be virtually anything that can be programmatically done.
To handle the three different details a user may provide, three engines are used, each
engine handling a different type of detail. For the processing part, the user is provided
with a set of instructions that can be processed by the system. In order for the engines to
communicate with each other, the speech result will be wrapped in an object where each
engine can add additional information to this object.

19 | P a g e
CS304 – Project Report Speech Recognition System

4. Project plan
4.1. The Engines
The system primarily has three engines.
1. Recognizer Engine
2. Response Engine
3. System Engine

1. Recognizer Engine
The Recognizer Engine’s responsibility is to obtain the speech result from the recognizer
and to switch the models used when requested to do so. The Recognizer Engine will obtain
the result and pass it to the Response Engine and wait for the Response engine to signal
it to proceed or switch the models it is using.
2. Response Engine
The Response engine is responsible for deciding what has to be done with the speech
result obtained, and what is to be done after the speech result is obtained. The set of its
functions are as follows:
 Instruct the Recognizer Engine to proceed and obtain the next speech result.
 Instruct the Recognizer to change a model it is using.
 Process the speech result.
 Wrap the speech result with information the System Engine needs to execute the
function related to the speech result.
 Pass the speech result to the system engine for it to proceed with its functions.
3. System Engine
The system engine will execute the function related to the speech result obtained.
Another responsibility of the system engine is to identify if a model used in the recognizer
engine needs to be rebuilt. In that case, it will request the Response Engine to be able to
proceed with the build, the Response Engine will then pause the Recognize Engine and
signal the System Engine to proceed with the build. When the building process is complete
the Response engine will be signaled to proceed with its functions.

20 | P a g e
CS304 – Project Report Speech Recognition System

4.2. The plugin modules

As outlined in part 3, the system has three types of details that will be provided to the
system by the user via plugins. There are a total of six different module interfaces with
different responsibilities the user can implement in the plugin, through these interfaces
the system will obtain the necessary details needed to function. The modules are as
follows:
1. Acoustic Module: This module will be providing reference to the acoustic model
resource. Also the acoustic model can be dynamically re built through this module.
2. Dictionary Module: This module will be providing reference to the dictionary
model resource. Also the dictionary model can be dynamically re built through
this module.
3. Language Module: This module will be providing the reference to the
Language/Grammar model to be used in the system. Also the model can be
dynamically re built using through this module.
4. Response Handler Module: The responsibility of this module is to provide the steps
to be taken by the Response Engine once a speech result has been obtained, based
on the speech result. The set of steps that can be taken are predefined.
5. Response Generator Module: The responsibility of the module is to obtain what
the speech result contains and decide what is being said. As the speech result
obtained can be a random set of words, the responsibility of the System Engine is
simplified when the result is filtered, and is less random.
6. System Module: This is the module that takes the action related to the speech
result. It will be given the processed speech result and it will execute the function
related to the speech result.

4.3. The Recognizer Engine

As briefly described above, the Recognizer Engine will obtain the speech result from the
recognizer and pass it the Response Engine. The speech result will be wrapped in an object
where additional information can be included to it at later stages, which is known as a
response in the context of this system.
Another responsibility of the Recognizer Engine is to switch the model being used by the
recognizer. The module that hold the reference to the model will be passed to the
recognizer engine, and it will switch the relevant model. Note that the Recognizer Engine
does not hold the references to all the models that can be loaded to the recognizer. The
sphinx API does not provide a default method to switch the model being sued dynamically.
Hence the Response Engine will re instantiate the recognizer with the relevant models.

21 | P a g e
CS304 – Project Report Speech Recognition System

Obtain Speech result

from recognizer

Wait for Response

Wrap speech result
Engine to signal to
in a response object
continue recognition

If a model needs to Pass response object

swicthed, do so to Response Engine

Figure 4.3. 1- Recognizer Engine's process cycle

4.4. The Response Engine

The Response Engine’s function can be described as coordinating the functions of the
Recognizer Engine and System Engine. The functions of the Response Engine are
coordinated by a Response Engine Processor. The processor will have one of the
Response Handler Modules and Response Generator Modules set as active. Also the
references of all modules of types Response Handler Module, Response Generator
Module, Acoustic Module, Dictionary Module and Language Module, which are all loaded
to the system are stored in the Processor. The list of modules that will be loaded to the
system are defined by a configuration file. When the Response Engine is passed the
response object, the Processor will get this response and pass it to the active Response
Handler Module. The Response Handler module will return a process queue to the
Processor, containing instructions for the Processor to execute. The list of instructions the
Processor can execute are as follows:

 Pass response to generator- The response will be passed to the active Response
Generator Module, which will return the response object with additional
information attached to it.
 Pass response to system- The response will be passed to the System Engine.
 Switch Response Handler Module- The active Response Handler Module will be
switched to the specified Response Handler Module from the Module references
stored.
 Switch Response Generator Module- The active Response Generator Module will
be switched to the specified Response Generator Module from the Module
references stored.

22 | P a g e
CS304 – Project Report Speech Recognition System

 Switch Acoustic Module- The specified Acoustic Module will be passed to the
Recognizer Engine through the Response Engine to load the acoustic model the
specified Acoustic Module refers to.
 Switch Dictionary Module- The specified Dictionary Module will be passed to the
Recognizer Engine through the Response Engine to load the dictionary model the
specified Dictionary Module refers to.
 Switch Language Module- The specified Language Module will be passed to the
Recognizer Engine through the Response Engine to load the Language model the
specified Language Module refers to.
 Wait for a predefined period of time- A null response object will be passed to the
Response Engine in the predetermined period of time, if the Processor was not
instructed to pass a response object to the System Engine within that period of
time. This instruction can be used to implement functionality such as providing the
user a brief period of time to be able to cancel a function related to the speech
result before it is executed. (Note: if a model of the Recognizer Engine was
switched to obtain a different type of recognition result during the period, it can
cause inconsistencies in the result, as the default API of sphinx does not provide
for a method to interrupt the recognizer when it enters a RECOGNIZING state)
Once the Processor has completed executing all the instructions provided to it by the
active Response Handler Module, the Response Engine will signal the Recognizer Engine
to continue, and wait for the next response to be passed to it.
Every time a new response is obtained, before passing it to the Processor, it will check if
the System Engine has placed a request to build, if so, it will signal the System Engine and
wait to be signaled back to continue with its functions.

Get a response from

Recognizer Engine

If System Engine has

Signal the recognizer to placed a request to build,
continue recognition wait for it to be
completed

Pass the obtained

Wait for the Processor to response to the
complete its instructions Response Engine
Processor

Figure 4.4. 1- Response Engine's process cycle

23 | P a g e
CS304 – Project Report Speech Recognition System

Get process queue

Pass to active Execute all process
from the active
Get response Response Handler intructions from
Response Handler
Module returned queue
Module

Figure 4.4. 2- Response Engine Processor's process

4.5. The System Engine

The System Engine is part of the system that executes the functions related to a speech
result. It obtains the responses passed from the Response Engine, and based on the key
in the response, which was included in the process in the Response Engine, the
appropriate System Module will be handed the response, which will execute the function
related to the speech result in the response object. A module selector will be assisting in
which module is chosen. If a model loaded in the system need to be rebuilt, the System
Engine will inform the Response Engine to build the relevant model. When the Response
Engine signals the System Engine to continue to build, the System Module that requested
to build will be called to continue with the building procedure, once it is finished, the
Response Engine will be signaled back to continue with its functions.

Get Response

When module
Allow the module
completes build, signal
selector to choose the
Response Engine to
appropriate module
continue.

When Response ENgine Pass the response to the

signals back, allow the module and let it
module to continue execute the relevent
build functions

If such a request is Check if the module that

placed, inform the was selected has placed
Response Engine and a request to build
wait model

Figure 4.5. 1 The System Engine's process cycle

24 | P a g e
CS304 – Project Report Speech Recognition System

4.6. The System as a whole

System Engine Module Selector System Module

Response Generator
Module
Response Engine
Response Engine
Processor
Response Handler
Module

Acoustic Module

Recognizer Engine Dictionary Module

Language Module

Engine Engine component Plugin Module

Figure 4.6. 1- Simplified model of the speech recognition system

The three engines are designed as singletons to avoid conflicts for resources. Also note
that the modules, Acoustic Module, Dictionary Module and Language Module are
related to the Recognizer engine, yet all loaded Modules will be stored in the Response
Engine Processor. Also the plugins modules may communicate among them, to improve
their functionalities.

25 | P a g e
CS304 – Project Report Speech Recognition System

5. Future work

One of the primary focuses in improving the speech recognition system, is to improve how
configuration details are provided and managed. Which also can include functionality to
load modules during runtime. Also a system which provides a graphical user interface that
can automate the process of building plugins is planned, which will eliminate the need of
programming knowledge to implement simple plugins.
Another aspect that will be taken into consideration is to provide functionality for other
audio sources such as audio streams or audio files, for which the sphinx API provides
functionalities. Which will allow the system to be implemented in network systems,
servers, etc.

6. Conclusion

Giving the machines the ability to communicate with the humans in human’s natural
medium of communications has always been a fascinating prospect, and the modern
technologies have brought humans closer to realizing this dream than ever before. Yet
providing the necessary intelligence a machine needs to be able to flawlessly
communicate with machines is the greatest challenge in realizing this dream at this stage.
The Speech Recognition System designed and documented in this report is an attempt to
provide the users a simple interface for to provide their own details to recognize speech
and have computers do their bidding.

26 | P a g e
CS304 – Project Report Speech Recognition System

7. References
 JavaTM Speech API Programmer's Guide, Sun Microsystems, Inc, Retrieved: July 8,
2015, from: http://www.ling.helsinki.fi/kit/2004s/ctl310gen/L7-
Speech/JSAPI/index.html

 CMUSphinx Wiki, Retrieved: July 8, 2015, from:

http://cmusphinx.sourceforge.net/wiki/

27 | P a g e

Bibiani 2025 Time Table
No ratings yet
Bibiani 2025 Time Table
1 page
DIRECTIONS: Reflect On Your Attainment of The RPMS Objective by Answering The Questions/prompts
No ratings yet
DIRECTIONS: Reflect On Your Attainment of The RPMS Objective by Answering The Questions/prompts
2 pages
Be Meek
No ratings yet
Be Meek
19 pages
YEZU RUDASUMBWA (Niyomukiza Dieudonne)
No ratings yet
YEZU RUDASUMBWA (Niyomukiza Dieudonne)
2 pages
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Edgar Cayce On The Mysterious Essenes Lessons From Our Sacred Past Extended Version Download
100% (16)
Edgar Cayce On The Mysterious Essenes Lessons From Our Sacred Past Extended Version Download
16 pages
Practice Exam Papers For Russian National Exam. Teachers Book - Afanaseva Evans Kopylova - 2010 211s
No ratings yet
Practice Exam Papers For Russian National Exam. Teachers Book - Afanaseva Evans Kopylova - 2010 211s
209 pages
African Traditional Concept of God
No ratings yet
African Traditional Concept of God
19 pages
Modernité de Shakespeare - Hamlet, Vengeance and The Last of Us Part II
No ratings yet
Modernité de Shakespeare - Hamlet, Vengeance and The Last of Us Part II
6 pages
Splunk Lab - Scheduling Reports & Alerts
No ratings yet
Splunk Lab - Scheduling Reports & Alerts
8 pages
Branch List As On 30.11.2023
No ratings yet
Branch List As On 30.11.2023
540 pages
(Refined) Tafsir of Surah Al-Fatihah
No ratings yet
(Refined) Tafsir of Surah Al-Fatihah
31 pages
Action Research Cover
No ratings yet
Action Research Cover
4 pages
Research Paper Frankenstein
No ratings yet
Research Paper Frankenstein
7 pages
Unit 1 - Lesson C
No ratings yet
Unit 1 - Lesson C
29 pages
Bec613a MMC Mod4
100% (1)
Bec613a MMC Mod4
41 pages
CRIx 3 LH MZ Ia 98 NR E6 PPC S6 Ip Jol 5 C Xfe Rfo BLP EL
No ratings yet
CRIx 3 LH MZ Ia 98 NR E6 PPC S6 Ip Jol 5 C Xfe Rfo BLP EL
2 pages
Twenty-Eighth Sunday in Ordinary Time - USCCB 2
No ratings yet
Twenty-Eighth Sunday in Ordinary Time - USCCB 2
5 pages
Web Chat Application Project Report
100% (4)
Web Chat Application Project Report
50 pages
Imp GRC Tables
No ratings yet
Imp GRC Tables
3 pages
10 Egbs Inglés Microcurricular
No ratings yet
10 Egbs Inglés Microcurricular
3 pages
CJIS
No ratings yet
CJIS
26 pages
Hardware and Software Requirements
0% (2)
Hardware and Software Requirements
2 pages
Chinese Literature: Essays, Articles, Reviews (CLEAR)
No ratings yet
Chinese Literature: Essays, Articles, Reviews (CLEAR)
4 pages
Enacom
No ratings yet
Enacom
3 pages
Speech Recognition Report
No ratings yet
Speech Recognition Report
46 pages
Flexible Instructional Delivery Plan (Fidp) : What To Teach Step 3: Most Essential Topics
100% (2)
Flexible Instructional Delivery Plan (Fidp) : What To Teach Step 3: Most Essential Topics
8 pages
Bhagwad Gita Mahatmya: Sanskrit Versus and Its English Translation
No ratings yet
Bhagwad Gita Mahatmya: Sanskrit Versus and Its English Translation
36 pages
Readme PDF
100% (1)
Readme PDF
5 pages
EDI Characteristics
100% (1)
EDI Characteristics
2 pages
Causative
No ratings yet
Causative
2 pages
Soalan ENGLISH BI Bahasa Inggeris Tahun 4 Paper 2
No ratings yet
Soalan ENGLISH BI Bahasa Inggeris Tahun 4 Paper 2
6 pages
Various Components of E-Commercedocx
No ratings yet
Various Components of E-Commercedocx
2 pages
Language Processing System
No ratings yet
Language Processing System
6 pages
OFC Paper 2
No ratings yet
OFC Paper 2
2 pages
Object Oriented Modeling and Design Patterns - Lecture Notes-Dr
50% (4)
Object Oriented Modeling and Design Patterns - Lecture Notes-Dr
86 pages
Object Oriented Modeling and Design Patterns - Lecture Notes-Dr
50% (4)
Object Oriented Modeling and Design Patterns - Lecture Notes-Dr
86 pages
Iot Viva Questions
No ratings yet
Iot Viva Questions
5 pages
Object-Oriented Modeling and Design PDF
No ratings yet
Object-Oriented Modeling and Design PDF
519 pages
B2B Key Technologies
100% (1)
B2B Key Technologies
2 pages
Flynn's Classification
No ratings yet
Flynn's Classification
4 pages
Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
Applications of Compiler Design
No ratings yet
Applications of Compiler Design
2 pages
Speech Recognition Using Python
No ratings yet
Speech Recognition Using Python
49 pages
Major Project Report Template
No ratings yet
Major Project Report Template
44 pages
DL MiniProject
No ratings yet
DL MiniProject
27 pages
Voice Assistant
No ratings yet
Voice Assistant
46 pages
DTE Demux MP...
No ratings yet
DTE Demux MP...
17 pages
Flexsim Technical Specs
No ratings yet
Flexsim Technical Specs
2 pages
The Idea of Provincializing Europe
No ratings yet
The Idea of Provincializing Europe
1 page
Analog Clock
No ratings yet
Analog Clock
10 pages
Mca 1 Sem Problem Solving Using C Kca102 2022
No ratings yet
Mca 1 Sem Problem Solving Using C Kca102 2022
2 pages
BM2406 Digital Image Processing Lab Manual
No ratings yet
BM2406 Digital Image Processing Lab Manual
107 pages
Rock Paper Scissors
No ratings yet
Rock Paper Scissors
9 pages
CGR Micro Complete Project
No ratings yet
CGR Micro Complete Project
35 pages
DSAL Lab Manual
No ratings yet
DSAL Lab Manual
61 pages
Internship - Full Stack Web Development
No ratings yet
Internship - Full Stack Web Development
17 pages
C Program For Binary Search Tree
89% (19)
C Program For Binary Search Tree
6 pages
AIRSHOWREPORT1
No ratings yet
AIRSHOWREPORT1
26 pages
CST308 - KQB KtuQbank
No ratings yet
CST308 - KQB KtuQbank
13 pages
Rich Internet Applications (Rias) : Characteristics of Ria
No ratings yet
Rich Internet Applications (Rias) : Characteristics of Ria
24 pages
I VR With Speech Recognition
No ratings yet
I VR With Speech Recognition
79 pages
Text Media - Definition, Characteristics, Criteria, and Design Text
100% (2)
Text Media - Definition, Characteristics, Criteria, and Design Text
2 pages
DIP Lab Manual Final
No ratings yet
DIP Lab Manual Final
31 pages
"Text To Speech Converter": A Project Report On
No ratings yet
"Text To Speech Converter": A Project Report On
9 pages
Seminar Report On Biological Computers
No ratings yet
Seminar Report On Biological Computers
3 pages
Pipes in RTOS
No ratings yet
Pipes in RTOS
16 pages
Computer Graphics Moving Boat in River
No ratings yet
Computer Graphics Moving Boat in River
32 pages
Media Player Reportpdf
No ratings yet
Media Player Reportpdf
13 pages
Memory Game App: Bachelor of Engineering in Computer Science & Engineering
No ratings yet
Memory Game App: Bachelor of Engineering in Computer Science & Engineering
19 pages
Module-3 Syntax Analyzer
No ratings yet
Module-3 Syntax Analyzer
80 pages
Visvesvaraya Technological University: Computer Graphics Laboratory With Mini Project 18CSL67
100% (1)
Visvesvaraya Technological University: Computer Graphics Laboratory With Mini Project 18CSL67
34 pages
Object Detection System Data Flow Diagram
100% (1)
Object Detection System Data Flow Diagram
16 pages
Course Title: Computer Graphics Marks: 70 Time: 3 Hrs. Instructions
No ratings yet
Course Title: Computer Graphics Marks: 70 Time: 3 Hrs. Instructions
2 pages
System Calls Lab Manual
100% (1)
System Calls Lab Manual
8 pages
Pythonshreyash
100% (1)
Pythonshreyash
18 pages
Speech Recognition
No ratings yet
Speech Recognition
17 pages
Final Report of Speech
No ratings yet
Final Report of Speech
102 pages
"Blood Donation Android Application": A Mini Project Report On
No ratings yet
"Blood Donation Android Application": A Mini Project Report On
77 pages
Mini Project Report: Visual Applications of Sorting Algorithms
100% (2)
Mini Project Report: Visual Applications of Sorting Algorithms
25 pages
Sample Solutions Unit Test 1 For Set A, B, C and D
No ratings yet
Sample Solutions Unit Test 1 For Set A, B, C and D
33 pages
CS2308 - System Software Lab
100% (1)
CS2308 - System Software Lab
14 pages
Project Zeroth Review PPT Template
No ratings yet
Project Zeroth Review PPT Template
8 pages
1.what Is Mixed Language Programming?
100% (1)
1.what Is Mixed Language Programming?
2 pages
Compiler Construction Tools
100% (1)
Compiler Construction Tools
2 pages
Speech Recognition
100% (3)
Speech Recognition
66 pages
Dbms - Mini-Project Report Guidelines
100% (3)
Dbms - Mini-Project Report Guidelines
3 pages
University of Mumbai Dec 2018 TCS Paper Solved
No ratings yet
University of Mumbai Dec 2018 TCS Paper Solved
18 pages
In Circuit Emulator
0% (1)
In Circuit Emulator
6 pages
Introduction to bada: A Developer's Guide
From Everand
Introduction to bada: A Developer's Guide
Ben Morris
No ratings yet
Adventurous Car Game Final Document
100% (2)
Adventurous Car Game Final Document
47 pages
System Software and Compiler Design
No ratings yet
System Software and Compiler Design
34 pages
Unit-I Introduction To Compilers: CS6660-Compiler Design Department of CSE &IT 2016-2017
No ratings yet
Unit-I Introduction To Compilers: CS6660-Compiler Design Department of CSE &IT 2016-2017
95 pages
5th Sem MCA Mini Project Report Format (Vtu) - 2016
No ratings yet
5th Sem MCA Mini Project Report Format (Vtu) - 2016
22 pages
Mobile Application Development Question Bank
No ratings yet
Mobile Application Development Question Bank
8 pages
Raster Scan System and Random Scan System
100% (1)
Raster Scan System and Random Scan System
18 pages
Hand Written Notes-DC
No ratings yet
Hand Written Notes-DC
51 pages
Expert Systems: Dendral & Mycin
100% (2)
Expert Systems: Dendral & Mycin
7 pages
Program For System Calls of Unix Operating Systems (Opendir, Readdir, Closedir, Etc)
No ratings yet
Program For System Calls of Unix Operating Systems (Opendir, Readdir, Closedir, Etc)
7 pages
Computer Architecture 16 Marks
100% (1)
Computer Architecture 16 Marks
28 pages

Speech Recognition System: A Project Report Submitted by

Uploaded by

Speech Recognition System: A Project Report Submitted by

Uploaded by

Speech Recognition System

A PROJECT REPORT SUBMITTED BY

Mohammed Flaeel Ahmed Shariff

DEPARTMENT OF STATISTICS AND COMPUTER SCIENCE

In partial fulfillment of the requirement

Signature: ………………………………. Date: …………………………….

Name: M.F.Ahmed Shariff

2. Software requirements and specifications

Figure 2.1.1. 1.1- Usecase diagram

2.1.1.2. USE CASE: Recognize Speech

2.1.1.3. USE CASE: Process speech

2.1.1.4. USE CASE: Execute function

2.1.1.5. USE CASE: Provide details

2.1.1.6. USE CASE: Provide recognition details

2.1.1.7. USE CASE: Provide process details

2.1.1.8. USE CASE: Provide execution details

2.1.2. Class diagram

2.2. User Characteristics

2.3. Specific Requirements

Figure 2.3.2. 3 – The System Console window

2.4. Performance Requirements

2.5. Design Constraints

4.2. The plugin modules

4.3. The Recognizer Engine

Obtain Speech result

Wait for Response

If a model needs to Pass response object

Figure 4.3. 1- Recognizer Engine's process cycle

4.4. The Response Engine

Get a response from

If System Engine has

Pass the obtained

Figure 4.4. 1- Response Engine's process cycle

Get process queue

Figure 4.4. 2- Response Engine Processor's process

4.5. The System Engine

When Response ENgine Pass the response to the

If such a request is Check if the module that

Figure 4.5. 1 The System Engine's process cycle

4.6. The System as a whole

System Engine Module Selector System Module

Recognizer Engine Dictionary Module

Engine Engine component Plugin Module

Figure 4.6. 1- Simplified model of the speech recognition system

 CMUSphinx Wiki, Retrieved: July 8, 2015, from:

You might also like