0% found this document useful (0 votes)

15 views46 pages

Speech Recognition Report

The document outlines a project report on 'Speech Recognition using Python' submitted by a B.Tech student at KIIT College of Engineering. It includes sections such as the student's declaration, internal guide certification, acknowledgments, and an abstract discussing the significance of speech recognition technology for individuals with disabilities. The report also details the project's objectives, methodology, literature review, system development, and performance analysis, emphasizing the evolution and applications of speech recognition systems.

Uploaded by

Rajesh Raghav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views46 pages

Speech Recognition Report

Uploaded by

Rajesh Raghav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

PROJECT TRAINING -II

{PT-CSE-425G}
On
“VOICE RECOGNITION”
Bachelor of Technology
In
Computer Science &
Engineering (Batch: 2021-25)

Under the Supervisionof Submitted by:

Dr. Kanika Kaur NIDHI
Roll No: 211430690022
Dr. Atul Kumar
Dr. Seema Sharma

KIIT College of Engineering, Gurugram

(Affiliated to GU, Gurugram, Approved by AICTE, New Delhi)
KIIT Campus, Gurugram- Sohna Road, Gurugram-122102
(Haryana)
STUDENT’s
DECLARATION

I, NIDHI student of B. Tech (CSE)-VII Semester, Batch 2021-25 hereby

declare that this project report on, “Speech Recognition using Python”,
which is being submitted in fulfillment for the program in B.Tech (CSE), is the
record of authentic work carried out by me , under the guidance of Dr.
Kanika Kaur, Dr. Atul Kumar and Dr. Seema Sharma.

I have not submitted the matter embodied in this project report for any other
degree or diploma.

Signature of the Student:

Name of Student: NIDHI

University E.No: 211430690022

CERTIFICATE FROM INTERNAL
GUIDE

This is to certify that NIDHI of CSE, VIth Semester, 2021-2025, KIIT College of
Engineering, Gurugram has successfully completed the project work entitled “Speech
Recognition using Python” for the completion of Bachelor of Technology (Computer
Science and Engineering) as prescribed by the Gurugram University, Gurugram.
This project report is the record of authentic work carried out by her, Under the guidance of
Dr. Kanika Kaur, Dr. Atul Kumar and Dr. Seema Sharma.

She has worked under our guidance.The performance of the student is satisfactory.

Dr. Kanika Kaur

H.O.D

and Professor

Dr. Atul Kumar

Professor

Dr. Seema Sharma

Assistant Professor
Acknowledgment

I take immense pleasure in thanking Prof. (Dr.) S. S. Agrawal, Director General (KIIT
Group of Colleges), and Prof. (Dr.) Mahavir Singh (Principal), Prof. (Dr.) Kanika
Kaur (H.O.D), for permiting me to carry out this project work.

I wish to express my sense of gratitude to our Project Supervisor Dr. Atul Kumar,
Dr. Seema Sharma for his/her guidance, which helped me complete the project
work.

Finally, yet importantly, I would like to express my heartfelt thanks to our beloved parents
for their blessings, our friends/classmates for their help, and our wishes for the successful
completion of this project.
ABSTRACT

Speech recognition technology is one from the fast growing engineering

technologies. It has a number of application in different areas and provides
potential benefits. Nearly 20% people of the world are suffering from various
disabilities; many of them are blind or unable to use their hands effectively.

The speech recognition systems in those particular cases provide a

significant help to them, so that they can share information with people by
operating computer through voice input.

This project is designed and developed keeping that factor into mind, and a
little effort to achieve this aim. Our project is capable to recognize the speech
and convert into text.
TABLE OF CONTENTS

CHAPTER TOPICS PAGE NO.

Title page
Student’s Declaration II
Certificate from internal guide III
Acknowlegment IV
Abstract V

Chapter 1 Introduction 8-12

1,1 Introduction
1,2 Objectives
1,3 Problem Statement
1,4 Methodology
1,5 Scope

Chapter 2 Literature Review 13-19

2,1 History
2,2 Analysis
2,3 Speech recognition type
2,4 Application

Chapter 3 System Development 20-37

3,1 Speech synthesis
3,2 Packages used and implementation
3.2.1 Working with audio files
3.2.2 Effect of noise
3.2.3 Speech recognition using microphone
3.2.4 Guess the fruit game
3,3 System Development approach
3.3.1 Activity diagram
3.3.2 Class diagram
3.3.3 Sequence diagram
2,4 Application

Chapter 4 Performance analysis 38-43

4,1 System requirement
4.1.1 Minimum requirement
4.1.2 Best requirement
4,2 Hardware requirement
4,3 Web search using speech
4,4 Graphical representation

Chapter 5 Conclusion 44-46

5,1 Advantage of software
5,2 Diadvantages
5,3 Conclusion
References
CHAPTER 1

1.1 INTRODUCTION

• There is huge development in speech recognition technologies from last years as it

had completely brought up huge progress based on the new machine learning
algorithms.the speech recognition system proves to be benificial in many aspects as it
reduces the wastage of time as well as helps the disabled individuals.
• Speech technology with fields within the scope of the paper are to be presented in
Fig. as the unified framework that encompasses covered topics, showing their
complementarity, ranges and borders, interconnections, and intersections in the
interdisciplinary area of Speech.

Unified framework
Fig: 1.1 Unified Framework

• In mostly areas of the country, there are lot of people who don’t know how to write
and also how to read any word, so this project is very helpful for these type of
people as you know in today’s world

• Everybody has its own mobile phones and they want to search a lot of things. In
this project, they usually speak what they want to search and various results of
such type opens in the browser window.
• In this project, we made our machine recognise the speech passed as
the audio file as well as the disection of the speech basis on the
requirement .

• Our Aim is to make the search fast and efficient and also reliable for
every person by implementing basic search commands and also
correct their vocabulary easily and also further implementing
speaking mode like Siri in iphone’s.

2. Objectives

I. To be familer with the speech recognition and its

fundamental.

II. Its working and application in different areas.

III To implement it as an application for relative searches.

IV. Software which can be used for:

a) Speech recognition
b) Web searches
c) Word guessing
1.3 Problem Statement

Speech recognition is the process that recognizes all words being said
by humans and to convert this speech into text and to analyse this texxt
to produce the results required by the humans. The performance of this system
majorly depends upon number of factors such as the speed of the
spoken words by the user,vocabularies and the background noise
caused by the environment .The speech recognition library of the
package provided by the pypi library can be helpful in reducing various
factors such as background noise which then makes the speech good for
processing and the performing the tasks provided to this system such as words
recognition ,web searches .

1.4 Methodology:

Due to the daily changes and enhancement in technology, not

everyone is familiar with speech recognition technology.

The basic function of both speech synthesis and speech recognition is

easy to understand as there are many powerful capabilities provided by speech
recognition technology that helps many developers to understand and utilize
this technology.
• Despite the substantial growth and research in speech recognition
technology their are still more limitations in this technology. Because of the
speech recognition humans are able to utilize the time in various aspects and
also it proves to be benificial to various disabled peoples,still this system is
unfamilar with natural human to human conversations.

• The complete knowledge of the limitation also the strength is very

important for the accurate use of speech recognition technologies as there
may be differences in the output provided by the system and the output
required by the user for a particular input.Due to this understanding the user
or developers of these application can make a decisions about whether the
technology will benefit the use of speech-to-text in a particular speech input.

1.5 Scope

• The speech recognition system in this project has the capability which could
be same as the systems used by Iphones and google but cannot be as much
effective as the functions provided by these systems.
• This project is the basic implementation of speech to text conversion and also
performing the basic tasks provided by the user to the system.
CHAPTER 2

LITERATURE REVIEW

2.1 HISTORY

The First speech recognition system were focused on numbers, not words.
In 1952 bell Laboratory designed the “Audrey System” which could
recognize a single voice speaking digits aloud. Ten years later IBM
introduced “shoebox” which understood 16 words in English .
Across the globe other nations developed hardware that could recognize sound
and sleep. And by the end of ‘60s , the technology could support words with 4
vowels and nine consonants.
1970’S
Speech recognition made several meaningful advancements in this Decade.
This was mostly due to the US Department of defence and DARPA. The
Speech Understanding Program SUR program ther ran was one of the largest of
its kind in the history of speech recognition. Mellon ‘Harpy Speech System‘
came from this programand was capable of understanding over 1000 kind
words that is about the same a three year old’s vocabulary.
Also significant in the 70’s was Bell Laboratories introduction od the
system that could interpret Multiple voices.
1980’s
The ‘80s saw speech Recognition vocab go from few of hundreds words to the
several thousands words. One of the Breakthroughs that came from a statistical
methods known as The ‘ Hidden Markov Model0 ‘HMM’ ‘ . Instead of just
using words and looking for the sound patterns. The Hmm estimated the
probability of the unknown sounds actually being words .
1990’s

Speech recognition was propelled forward in the 90s in the large part because of the
own personal computer. The faster processors made it possible for software like
dragon dictate to become the more widely used bell south introduced the Voice
Portal (VAL) in which was a dial in interactive voice recognition system . This
System give new birth to the myriad of the phones tree system that are still in the
existence today.

2000s

• From the year 20002 Speech recognition Technology had achieved close to the
80 percent accuracy.
• For almost of all the Decade There aren’t a lot of Advancements till google
has come with a start of google search voice.
• As it was an application which put speech recognition into hands of lakhs of
people .
• This was also Significant because that the processing power would be
offloaded to its data Centres.
• Not only for that, Google Application was collecting data from many billions
of the searches which could help this to predict what a human is actually
Saying.
• That time Google’s English voice search system, included 240 billion words
from user searches. 2010s

In 2012 Apple Launched SIRI which was as same as the Google’s VOICE SEARCH.

The early part of the decade saw an explosion of the other voice Recognition
Applications.

And with Amazon’s ALEXA, Google Home we’ve seen consumers Becoming More
and More comfortable talking to Machines.

Today, Some of the Largest Technical Companies are competing to herald the speech
accuracy title. In 2015, IBM achieved a word ERROR RATE pf 6.8%.

IN 2016 Microsoft overpassed IBM with a 5.8 % claim. Shortly After that IBM
improved their Rate to 5.4 %. However it’s Google that claims the lowest Ratio rate at
4.8percent.
The Future

The tech to support speech Applications is today both Relatively Inexpensive and
Powerful. With the betterment or the advance tech in Artificial Intelligence and to the
increase amounts of Speech Data that can be easily mined, it is now possible to that
voice becomes the next Dominant Interface.

At Sonix, We can also applause the many companies before us that propelled speech
Recognition to where it is Today. We Automate Transcription workflow and make it
fast , easy and more affordable.

We wouldn’t do this without the proper Work that has to been done before we.
• Analysis:

From apple SIRI to Smart Devices of home, Speech Recognition Is very

drastically used in our lives. This Speech Recognition project is to Utilize
Kaggle Speech Recognition Challenge Dataset to Create Keras Model on
above of tenserflow & to create predictions in the voice files.

• Data Indigestion and Processing

Similar to image Recognition, the most important part of the speech
Recognition is to convert audio into 2*2 Arrays.
Sample Rate and raw Wave of audio Files:
Sample Rate of an Audio File represents the numbers of samples of Audio
Carried per Second and is measured in Hz. The following image shows the
relationship between the audio Raw Wave and Sample Rate of “Bed” audio
file:
2.3 SPEECH RECOGNITION TYPES

SPEECH RECOGNITION SYSTEM is basically Divided into following

depending on various types:
Speaking Mode:

Basically it means that how the words are been spoken as in connected or in
isolated. In Isolated word of speech Recognition System needs that speaker
take pause between the words he speak. It means single kind word In connected
word of speech recognition system did not need that the speaker take pause
briefly in between the words. It generally means full length sentences in which
words are then artificially keep away by silence.

Speaking Style:
Generally it Includes whether that the speech is in continuous form of
spontaneous form. Continuous form is that spoken in natural form.
Systems are to evaluated on speech read from the scripts that are
prepared where as in spontaneous or extemporaneously generated, speech does
not contain fluencies, and it is also difficult to figure out that speech read from
the written script. It is also vastly much more hard as it tends to be peppered
with unfluency like “uuh” and “uum”, no full sentaces, spluttering , stuttering,
sneezing , cough, and also vocabulary is essentially ulimited, So there must be
training to system to be able to tackle with unknown and hidden words.

Vocabulary :
IT is much simple to discriminate a smaller set of the words, but rate of error
incareses as the size of the vocabulary increases.
For ex: 10 digits start from 0 -9 can easily be recognised rightly on the other
side vocabulary whose size is 100 , 4000 or 15000 have the rate of error as 3%,
6%, 40% . The vocabulary is hard to predict or recognize if it contains
Confused kind of words.
Enrollment:
This is kind of 2 ways
1)Speaker Dependent 2) speaker independent
In speaker dependent the user must be providing various samples of her or his
speech before they’re used, a speaker dependent system is meant for use of
only single kind speaker , where as speaker independent system is allowed or
intended to use any type or kind of speaker
Fig: 2.3 Speech Recognition Process

2.4 APPLICATION

I. FROM MEDICAL PERSPECTIVE

II. FROM MILITARY PERSPECTIVE
III. FROM EDUCATIONAL PERSPCTIVE
IV. FROM COMMERCIAL PERSPECTIVE
CHAPTER 3

SYSTEM DEVELOPMENT

1. Speech Synthesis
1. Evaluation of Synthetic Speech:

Speech Synthesis Systems can be calculate I terms of different

requirementssuch as speech intelligibility, Speech Naturalness, System
Complexity, and so on.For Ambient Intelligent Application it is Reasonable to
imagine that new Evaluation Criteria will be Require for example , emotional
Influence on the User, Ability to get the User to Act, mastery over Language
generation, and Whether the system takes the Environmental Variables into
Account and adjusts its behaviour Accordingly
Some Of the Just Mentioned evaluation Criteria are for the Complete System .
Having Evaluation Criteria for the Whole System is reasonable because a
single, misperforming component would negatively impact how the system is
perceived by humans.

3.1.2 Building Speech Synthesis Systems:

Building Speech Synthesis Systems require a speech Units Corpus. Natural
Speech must have been recorded for all Units- For Example, all Phonemes – in
all possible Contexts.
Next the Units in the Spoken Speech Data are segmented and labelled. Finally,
the most Appropriate Speech Units are Chosen (Black and Campbell, 1995).
Generally, concatenative Synthesis yields high quality Speech. With the Large
Speech Units Corpus, high quality speech waveforms can be generated. Such
synthesized speech preserves waveforms can be generated. Such synthesised
speech preserves naturalness and
intelligibility. Separate prosody modelling is not necessary for speech unit
selection due to the availability of many units corresponding to varied
contexts.

GRAPH 3.1

(Line graph showing transcription accuracy by speaking rate for expert and
non-expert users of text-to-speech synthesizers)
3.2 Packages Used :

The following is the install packages in this project:

1. import speech_recognition : speech_recognition helps to take the

input with ease and helps in running model in just a few
minutes.

The speech_recognition library has several popular speech APIs and is

thus extremely flexible. It consists of seven APIs which can be used to speech
recognition but all six APIs comes with authentication key and password except
Google_speech API which makes it extremely flexible and with its ability of
free usage and ease of use it makes it excellent choice for speech recognition.

2. import pyaudio: The pip install pyAudio command installs pyaudio

to the python interpreter and thus make it easier to work with
microphones which helps in real time speech recognition.
With PyAudio, we can easily use Python to record and to play
audio on a kind of variety of platforms.

3. import web browser: with this package we can make use of our default
browser used to locate, retrieve and display data .The URL and the
query is passed to the instance of the webbrowser package and basis on
the url provided and the query the particular webpage opens.
Recognizer Class : All the major process for speech recognition occurs in the
recognizer class. As the main function or purpose of
a recognizer instance is that to recognize speech and it provides with the
various processes and functions which furthur helps in recognising speech
from audio source.

Each Recognizer instance is having 7 methods for recognizing speech from

the audio source with using various APIs. These are:
· recognize_bing(): Microsoft Bing Speech
· recognize_google():Google Web Speech API
· recognize_google_cloud(): Google Cloud Speech - requires
installation of the google-cloud-speech package
· recognize_houndify(): Houndify by SoundHound
· recognize_ibm(): IBM Speech to Text
· recognize_sphinx(): CMU Sphinx - requires installing
PocketSphinx
· recognize_wit(): Wit.ai

The 6 APIs require authentication of either an API key or an

username/password combination.Therefore we have used the
google’s web speech of API in this project.

3.2.1 Working With Audio Files:

After the installation of Speech Recognition in the command line it
becomes easy to use the audio files because of its Audiofile class.

The path of the audio file can be passed as the argument to the AudioFile class
and it also provides with the context manager as it helps in reading and
working with the file material.
The context manager then is responsible for opening of the audio file and
finally stores the data of file in the instance of the AudioFile.Then
the record()method is used to store the data from the entire audio file
and initialize it into the instance of AudioData.

The recognize google() is used to recognize any kind of speech in the audio.
The results depends on the internet’s connection speed and are displayed and
the speech to text conversion depends immensly on the accent and the speed of
the speaker.As we have used the audio file our speech recognition system
caught some words differently because of the vocabulary of the speaker.

Implementation of Audio Files Working:

Output For the audio files:

The offset duration keywords are useful for modifications in the
and
audio. The offset argument in the record method tells about the starting point
and duration tells about the time upto which the conversion is to be made. E.g:
if offset value = 5 then the audio file is trimmed to the first five seconds and
then rest of speech is used. If duration =5 then the audio file speech is
converted for five seconds only and rest of audio is not recoginized.

Fig Usage of
Offset and duration
3.2.2 The Effect of Noice on Speech Recognition

All of the audio recordings consists of some level of noise in them & the
unhandled noise can greatly reduce the accuracy of the speech recognition
apps.

This file has the phrase “smell during periods” spoken with a loud sound in the
background. Thus the speech cannot be recognised properly.

Input For The noisy audio file:

Output For The noisy audio file:
3.3.3 Speech Recognition Using Microphone:

The Pyaudio is installed to access the microphone which helps the user for
real -time speech recognition.With instance of speech_recognition the
microphone can be used.

Input Through the microphone:

Output of the Speech input taken from the microphone:

3.2.4 Guess The Fruit Game:

Using the speech Recognition we implemented a game of guessing a word with

the help of the input provided by the microphone of the user.In this game the
user is provided with the list of fruits names and the number of attempts
required by the user to guess the fruit which is being guessed by the speech
recognition system.

If the user guesses the fruit name correctly then the game announces the win
else it prints the message to try again if their are any attempts remaining.

3.2.4.1 Working :

The function recognize_speech_mic() takes two arguments, recognizer and

microphone and return a dictionary with three keys. The first key
,success is of type bool which tells about the request made to the API. The
second key error is used because it returns None and error message if
the API is unavailable and speecch was not recognised. The last key
trancription contains the audio recorded by the microphone.

This function first checks the correctness of both the arguments and
produces a Type Error if anyone of them is invalid.

Then listen method is used to liste to the input from the mic.

The adjust_for_noise method is used to change the noise conditons each

time the function recognise is called which provides clear transcription
of the input speech to the user.

Then the recognise_google is called to transcribe the speech from the

recording.A try and ecept block is used to catch the Request Error and
Unknown ValueError and are handled by returning the resonse.
The response dictionary is responsible for returning the success of API
request ,any error messages and the transcribed speech and the values of each
key is stored accordingly which is returned from the
recognize_speech_mic function.
The game is quite simple ,we first declare the list of fruits and number of
gusses then we create the instance for the Recognizer and Microphone and
random word is choosen from the list of the fruits.
Then the instructions are printed which tells the user that the speech
Recogition system is thinking of one word and the number of guess given
to the user.After that the sleep(n) function wait for n seconds.

The first for loop of the program runs for the number of guesses provided to
the user.the othe for loop inside the first for loop attempts to recognise the
input each time from the recognise_speech_mic() function which stores the
dictonary returned from this function and stores it in an variable

If the system recognises the word spoken by the user .I.e the transcription key
is not null and the speech of user is transcribed and the inner loop breaks out
and if the speech is not transcribed and the API error occured then also the loop
breaks out and if the API request
becomes succesful but the speech was not recognised then the else
statement is executed which tells the user to again speak the word.

If the inner loop breaks out without any errors then the returnedd
dicitionary is correct the errors if the error occured then the error
message is displayed and which ends the program.

If no error occur on the breaking of the inner loop then the inscription is
cmatched is compared to the word selected by system and to lower() method
is used to convert string into lowercases which reduces the possiblity of
wrong answer because of the conversion of the speech to upper cases .

If user makes a correct guess which matches with the system’s guess then the
user win the game else the outer loop executes on the basis of the attempts left
and finally if user fails in last attempt then user loses the game.
Output For Guess The Fruit Using Speech Recognition Through Microphone:
3.3 System Development Approach :

3.3.1 ACTIVITY DIAGRAM:

Fig: 3.1 ACTIVITY DIAGRAM

3.3.2 CLASS DIAGRAM:

Fig: 3.2 CLASS DIAGRAM

3.3.4 SEQUENCE DIAGRAM:

Fig: 3.4 SEQUENCE DIAGRAM
CHAPTER 4

(PERFORMANCE ANALYSIS)

1. System Requirement:

1. Requirements:
a. 1.6 MHz Processor
b. 128 MB RAM
c. Microphones for good audio.

2. Best Requirements:
a. 2.4 GHz processor
b. Greater than128 MB RAM
c. 10% consumption of memory
d. best quality microphones

4.2 Hardware Requirement

Sound cards:
The proper driver must be installed for the sound as speech
requires low Bandwidth thus high quality of sound cards are to be used.

Microphones:
Microphones are the most important tools for the real
time speech to text conversion .Therefore the pre-installed
ones cannot be used as they are more prone to the background
noise and also of poor quality in terms of speech.

Computer Processor:
Speech recognition application depends majorly on processing
speed. The input from the user can take some time
if the processing speed is low and thus user wasted more time on
waiting compared to performing the task which makes the application
less feasible for use.

4.3 Web Search Using Speech Recognition:

We will make a program using the speech Recognition python to

execute the following:

1. Conversion of speech to text.

2. Using the text to open a URL using web browser
3. Searching a query using speech inside the URL.

The program imports speech recognition library which handles the request
from the user to perform web search and search the query on the youtube.
For performing web search we used the Recognizer class of the speech
recognition and created three instances of this class.
first instance is used to recognize text from youtube ,second instance is used
for web search and third instance is used to listen to speech .
We take input from the user’s microphone and on the basis of the words
spoken e.g: web search and video we search the web and youtube respectively.

The microphone recognizes the speech using recognize_google() method and

using listen method we record the input from the source and outputs the web
browser page.

This system is designed to recognize the speech and also has the capabilities
to convert speech to text. This software name ‘SPEECH RECOGNITION
SYSTEM’ has the capability to write spoken words into text.
with sr.Microphone() as source:
print('[search python : search
Youtube]') print('speak Now!! \n')
audio = r3.listen(source)

if 'python' in r2.recognize
google(audio): r2 =
sr.Recogniser()
url =
'https://va<.edureka.co/' with
sr.Microphone() as source:
print('\n search the query \
n') audio = r2.listen(source)

get = r2.recognize
google(audio) print(get)
wb.get().open
new(url+get) except
sr.UnknownValueError:
print('Unable to
recognize') except
sr.RequestError as e:
print('failed'.format(e))
if 'video' in r1.recognize
google(audio): r1 =
sr.Recogniser()
url = 'https://va<.youtube.com/results!search
query=' with sr.Microphone() as source:
print('\n search the query \
n') audio = r2.listen(source)

get = r1.recognize
google(audio) print(get)
wb.get().open
new(url+get) except
sr.UnknownValueError:
print('Unable to
recognize') except
sr.RequestError as e:
print('failed'.format(e))

Fig: 4.3.1 program for relative searches

Fig: 4.3.1 Recognize the word freecodecamp
4.1 Graphical Representation:

GRAPH 4.1
(SOURCE: MICRosoft)
CHAPTER 5

(CONCLUSION)
1. Advantages of Software:
In mostly areas of the country, there are lot of people who
don’t know how to write and also how to read any word, so this project is
very helpful for these type of people as you know in today’s world,
everybody has its own mobile phones and they want to search a lot of things.
In this project, they usually speak what they want to search and various
results of such type opens in the browser window.

1. Ability to write text using speech.

2. Different windows can be opened and web searches can
be made.
3. More utilization of resources and less time
consumption.
4. Recognises different audio files and convert them to text.
5. Helpful for disabled peoples.

2. Disadvantages:
1. Low accuracy because of its limited ability.
2. Fails in noisy environment.
3. Depends majorly on GoogleAPI thus not a original software.
4. Limited operations can be performed.
5.3
Conclusion:

The project of speech recognition gives us the introduction of this

technology and its various application in different sectors. The project is
divided into three parts ,the first which helps in converting audio to text
,the second which recognises the spoken word and the third which performs
the operations provided as the command by the user.After the development of
these parts these models were tested and the results were produced which tells
about the accuracy of each model.Various advantages and disadvantages of
this software is discussed.
REFERENCES

BOOKS:
1. G. L. Clapper, "Automatic word recognition", IEEE Spectrum, pp. 57-59, Aug. 1971.
View Article Full Text: PDF (8868KB) Google Scholar
2. M. B. Herscher, "Real-time interactive speech technology at threshold technology",
Workshop Voice Technol. Interactive Real Time Command Control Syst. Appl.,
1977-Dec.
Google Scholar
3. J. W. Gleen, "Template estimation for word recognition", Proc. Conf. Pattern
Recog. Image Processing, pp. 514-516, 1978-June.
Google Scholar

Internet:

1. https://pypi.org/project/SpeechRecognition.

2. https://www.researchgate.net/publication/337155654_A_Study_on_Automatic_Sp
eech_Recognition.

3. https://www.ijedr.org/papers/IJEDR1404035.pdf

Touchpad Plus Ver. 4.0 Class 8
From Everand
Touchpad Plus Ver. 4.0 Class 8
Nidhi Gupta
No ratings yet
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
No ratings yet
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
7 pages
Grub CFG
No ratings yet
Grub CFG
12 pages
Speech Recognition MY Final Year Project
75% (85)
Speech Recognition MY Final Year Project
82 pages
Speech Recognition System: A Project Report Submitted by
0% (1)
Speech Recognition System: A Project Report Submitted by
28 pages
Speech Recognition Using Python
No ratings yet
Speech Recognition Using Python
49 pages
Speech Recognition Final Report (1) - Removed - Removed
No ratings yet
Speech Recognition Final Report (1) - Removed - Removed
62 pages
Research On Speech Recognition Technique While Building Speech Recognition Bot
No ratings yet
Research On Speech Recognition Technique While Building Speech Recognition Bot
13 pages
Project Report
No ratings yet
Project Report
17 pages
Speech Recognition
No ratings yet
Speech Recognition
9 pages
Minor Project Report
No ratings yet
Minor Project Report
13 pages
1.1 Background To The Study: Chapter One: Introduction
No ratings yet
1.1 Background To The Study: Chapter One: Introduction
4 pages
A Report On
No ratings yet
A Report On
35 pages
Ranjith S - Mini Project
No ratings yet
Ranjith S - Mini Project
74 pages
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
No ratings yet
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
5 pages
Speech Recognition Using Ic HM2007
100% (4)
Speech Recognition Using Ic HM2007
31 pages
I VR With Speech Recognition
No ratings yet
I VR With Speech Recognition
79 pages
Speech Recognition
No ratings yet
Speech Recognition
66 pages
Tejaswini Group Report
No ratings yet
Tejaswini Group Report
18 pages
Progress - Report - of - Intership MD Shams Alam
No ratings yet
Progress - Report - of - Intership MD Shams Alam
4 pages
Ai Project Sona-1 (1) - 250630 - 194118
No ratings yet
Ai Project Sona-1 (1) - 250630 - 194118
10 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
24 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
Jasmeet Seminar Report
No ratings yet
Jasmeet Seminar Report
24 pages
"Speech Recognition and Voice Detection System": Bachlor of Technology IN Computer Science Engineering
No ratings yet
"Speech Recognition and Voice Detection System": Bachlor of Technology IN Computer Science Engineering
29 pages
Final Report
No ratings yet
Final Report
35 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
22 pages
Widcollogo1 FINAL
No ratings yet
Widcollogo1 FINAL
83 pages
Speech
No ratings yet
Speech
58 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Text To Speech Speech To Text Using Translations (Mini Project)
No ratings yet
Text To Speech Speech To Text Using Translations (Mini Project)
46 pages
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
No ratings yet
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
32 pages
Final Report of Speech
No ratings yet
Final Report of Speech
102 pages
Mad Lab Report
0% (2)
Mad Lab Report
27 pages
Synopsis
No ratings yet
Synopsis
5 pages
Text To Speech Convertion Report
No ratings yet
Text To Speech Convertion Report
26 pages
Expert System Voice Assistant
No ratings yet
Expert System Voice Assistant
52 pages
Speech Recognition Using Python
100% (2)
Speech Recognition Using Python
6 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
Similarity 0505064848
No ratings yet
Similarity 0505064848
56 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
9 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
Abinaya.K Gayathiri.S Kirubamanohari.R Nivetha.M: Speech Interface Challan Application
No ratings yet
Abinaya.K Gayathiri.S Kirubamanohari.R Nivetha.M: Speech Interface Challan Application
62 pages
Artificial Intelligence in Voice Recognition
No ratings yet
Artificial Intelligence in Voice Recognition
14 pages
Speech Recognition System
No ratings yet
Speech Recognition System
5 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
Speech Recognition System - A Review
No ratings yet
Speech Recognition System - A Review
10 pages
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
Main (pt2)
No ratings yet
Main (pt2)
13 pages
6.python Text To Speech
No ratings yet
6.python Text To Speech
2 pages
7sem Projectreport
No ratings yet
7sem Projectreport
33 pages
KY DSV
No ratings yet
KY DSV
7 pages
DesktopAssistant Reoprt
No ratings yet
DesktopAssistant Reoprt
42 pages
Speech Recognition: A Seminar Report On
No ratings yet
Speech Recognition: A Seminar Report On
5 pages
A Survey On Speech Recognition
No ratings yet
A Survey On Speech Recognition
2 pages
Wa0002.
No ratings yet
Wa0002.
10 pages
Speech Recognition System - A Review: April 2016
No ratings yet
Speech Recognition System - A Review: April 2016
10 pages
ABSTRACT Seminar
No ratings yet
ABSTRACT Seminar
5 pages
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
Professional Test Driven Development with C#: Developing Real World Applications with TDD
From Everand
Professional Test Driven Development with C#: Developing Real World Applications with TDD
James Bender
No ratings yet
Virtual Lifelong Learning: Educating Society with Modern Communication Technologies
From Everand
Virtual Lifelong Learning: Educating Society with Modern Communication Technologies
Neha
No ratings yet
Right Triangle Trigonometry: Geometry Notes
No ratings yet
Right Triangle Trigonometry: Geometry Notes
15 pages
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
No ratings yet
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
9 pages
Vynamic View Portal
No ratings yet
Vynamic View Portal
2 pages
Mil - Q2 - Module 5
No ratings yet
Mil - Q2 - Module 5
9 pages
Array: B. Javascript Array Directly (New Keyword)
No ratings yet
Array: B. Javascript Array Directly (New Keyword)
4 pages
X500
No ratings yet
X500
13 pages
Data Change Manual
No ratings yet
Data Change Manual
3 pages
Ramesh Yadav Resume
No ratings yet
Ramesh Yadav Resume
3 pages
New XXX Hot XNXX Sex Xvideo Hot Sex 0008
No ratings yet
New XXX Hot XNXX Sex Xvideo Hot Sex 0008
3 pages
Launcher Log
No ratings yet
Launcher Log
179 pages
Cs - REVISION TOUR
No ratings yet
Cs - REVISION TOUR
59 pages
S1-21 - DSECLZC415 Data Pre-Processing: BITS Pilani
No ratings yet
S1-21 - DSECLZC415 Data Pre-Processing: BITS Pilani
54 pages
Science - Passage 5-6
No ratings yet
Science - Passage 5-6
2 pages
Azure Book 74
No ratings yet
Azure Book 74
1 page
HPC Tuning Guide PDF
No ratings yet
HPC Tuning Guide PDF
22 pages
Digital Photography
No ratings yet
Digital Photography
2 pages
Fiction Becomes Fact: Sustainable Information and Communications Technology in 2020
No ratings yet
Fiction Becomes Fact: Sustainable Information and Communications Technology in 2020
38 pages
Database Management System
No ratings yet
Database Management System
162 pages
Distributed Systems Assignment
No ratings yet
Distributed Systems Assignment
3 pages
The Ribbons - MS Word Review 12345
No ratings yet
The Ribbons - MS Word Review 12345
14 pages
Exercise Quadratic Equations
100% (1)
Exercise Quadratic Equations
7 pages
Grade 9 Pre June 2024 Marking Guidelines
No ratings yet
Grade 9 Pre June 2024 Marking Guidelines
10 pages
Bumps and Pothole Detection Report Final
No ratings yet
Bumps and Pothole Detection Report Final
64 pages
03 Activity 1 3 Ans
No ratings yet
03 Activity 1 3 Ans
2 pages
BS EN 80000-13-2008 - 1 Quantities and Units - Djvu
No ratings yet
BS EN 80000-13-2008 - 1 Quantities and Units - Djvu
36 pages
PROPOSAL - 101127342 DP EU ERASMUS JMO 2023 HEI TCH RSCH PART - B - Section - 2
No ratings yet
PROPOSAL - 101127342 DP EU ERASMUS JMO 2023 HEI TCH RSCH PART - B - Section - 2
2 pages
Archiving Super Audio CDs (SACDs) With A PS3
No ratings yet
Archiving Super Audio CDs (SACDs) With A PS3
4 pages
OWBN Anarch-Anarch - Genre-2023
100% (2)
OWBN Anarch-Anarch - Genre-2023
74 pages

Speech Recognition Report

Uploaded by

Speech Recognition Report

Uploaded by

PROJECT TRAINING -II

Under the Supervisionof Submitted by:

KIIT College of Engineering, Gurugram

I, NIDHI student of B. Tech (CSE)-VII Semester, Batch 2021-25 hereby

Signature of the Student:

Name of Student: NIDHI

University E.No: 211430690022

Dr. Kanika Kaur

Dr. Atul Kumar

Dr. Seema Sharma

Speech recognition technology is one from the fast growing engineering

The speech recognition systems in those particular cases provide a

CHAPTER TOPICS PAGE NO.

Chapter 1 Introduction 8-12

Chapter 2 Literature Review 13-19

Chapter 3 System Development 20-37

Chapter 4 Performance analysis 38-43

Chapter 5 Conclusion 44-46

• There is huge development in speech recognition technologies from last years as it

I. To be familer with the speech recognition and its

II. Its working and application in different areas.

III To implement it as an application for relative searches.

IV. Software which can be used for:

Due to the daily changes and enhancement in technology, not

The basic function of both speech synthesis and speech recognition is

• The complete knowledge of the limitation also the strength is very

From apple SIRI to Smart Devices of home, Speech Recognition Is very

• Data Indigestion and Processing

SPEECH RECOGNITION SYSTEM is basically Divided into following

I. FROM MEDICAL PERSPECTIVE

Speech Synthesis Systems can be calculate I terms of different

3.1.2 Building Speech Synthesis Systems:

The following is the install packages in this project:

1. import speech_recognition : speech_recognition helps to take the

The speech_recognition library has several popular speech APIs and is

2. import pyaudio: The pip install pyAudio command installs pyaudio

Each Recognizer instance is having 7 methods for recognizing speech from

The 6 APIs require authentication of either an API key or an

3.2.1 Working With Audio Files:

Implementation of Audio Files Working:

Output For the audio files:

Input For The noisy audio file:

Input Through the microphone:

Output of the Speech input taken from the microphone:

Using the speech Recognition we implemented a game of guessing a word with

The function recognize_speech_mic() takes two arguments, recognizer and

The adjust_for_noise method is used to change the noise conditons each

Then the recognise_google is called to transcribe the speech from the

3.3.1 ACTIVITY DIAGRAM:

Fig: 3.1 ACTIVITY DIAGRAM

Fig: 3.2 CLASS DIAGRAM

3.3.4 SEQUENCE DIAGRAM:

4.2 Hardware Requirement

4.3 Web Search Using Speech Recognition:

We will make a program using the speech Recognition python to

1. Conversion of speech to text.

The microphone recognizes the speech using recognize_google() method and

Fig: 4.3.1 program for relative searches

1. Ability to write text using speech.

The project of speech recognition gives us the introduction of this

You might also like