0% found this document useful (0 votes)

93 views23 pages

Ann LA2 Project

This document discusses a student project to implement speech recognition using neural networks. It outlines the aim to accurately convert speech to text independently of speaker or recording device. It describes the key steps of speech recognition including preprocessing the speech signal, extracting features, and using algorithms like hidden Markov models, dynamic time warping, and artificial neural networks to classify the speech and output text. Hardware and software requirements for speech recognition systems are also listed.

Uploaded by

Dimitri Molotov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views23 pages

Ann LA2 Project

Uploaded by

Dimitri Molotov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Nitte Meenakshi Institute of Technology

Department of Electronics and Communication Engineering

Learning activity -2
USING ALGORITHM AND FLOW CHART IMPLEMENT
SPEECH RECOGNITION USING NEURAL NETWORK

Submitted By : Sayantan Das -1NT18EC198

Soumojit Gain -1NT18EC163
Suryansh Gupta - 1NT18EC200
Shreya Thimmaya – 1NT18EC154
5th Semster ,ECE
Submitted To : Dr. Jayavrinda Vrindavanam
Associate Professor,Nmit
PROJECT DETAILS
AIM -
The aim of the system is to accurately and efficiently convert a speech
signal into a text message transcription of the spoken words
independent of the speaker, environment or the device used to record
the speech (i.e. the microphone)
THEORY-
This process begins when a speaker decides what to say and actually speaks
a sentence. The software then produces a speech wave form, which
embodies the words of the sentence as well as the extraneous sounds and
pauses in the spoken input . First it converts the speech signal into a sequence
of vectors which are measured throughout the duration of the speech signal.
Then, using a syntactic decoder it generates a valid sequence of
representations.
ESSENTIAL SOFTWARES AND HARDWARES USED
• Based upon stated preferences and system specifications, the following conditions have been established,
• 1.Continuous speech recognition software must provide is preferred, rather than the slower, more unnatural
and lower-priced discrete speech recognition software also on the market.
• 2.The application must run on a Pentium-powered PC under Windows 95, and be capable of integration with
Microsoft Word97.
• 3.The software program must be easily and successfully installed by any intermediate-level computer user in the
office.The program must be one that can be learned and customized reasonably quickly by nearly anyone in
the office.
All four programs run on Pentium-powered PC's utilizing Windows 95, 98 or NT 4.0 and require 16-bitSoundBlaster-
compatible sound cards. Random access memory (RAM) requirements for software run under Windows NT are
higher for all of these programs
• 1. .Dragon Systems' NaturallySpeaking requires a
• Pentium/133MHz processor or higher, 32MB of RAM, and 180MB of hard disk space
• 2. IBM Via Voice 98 requires a Pentium/166MHz with MMX (multimedia chip) or higher, 32MB of RAM, 180MB
of hard disk space, and 256K L2 cache.
• 3. L&H Voice Xpress Plus requires a Pentium/166MHz with MMX, 40MB of RAM, and 130 MB of hard disk
space.
• 4. Philips FreeSpeech98 requires a Pentium/166MHz processor, 32MB of RAM, and 150MB of hard disk
space.
• inclusion of microphone is necessary for capturing of Spoken words ….
Past Present And Future Of Speech Recognition
IMPLEMENTATION
Algorithm:-
There are mainly 3 algorithms that are used for Speech Recognition. Those are given
below:
• 1. Hidden Markov Model(HMM)
• 2. Dynamic Time Warping(DTW)
• 3. Artificial Neural Networks(ANN)
• HIDDEN MARKOV MODEL (HMM)
A hidden Markov model (HMM) is a statistical Markov model in which the
system being modelled is assumed to be a Markov process with unobserved
(hidden) states.It can be presented as the simplest dynamic Bayesian network.
It can be thought of as a black box, where the sequence of output symbols
generated over time is observable, but the sequence of states visited over
time is hidden from view. This is why it’s called a Hidden Markov Model.

When an HMM is applied to speech recognition, the states are interpreted as

acoustic models, indicating what sounds are likely to be heard during their
corresponding segments of speech; while the transitions provide ,temporal
constraints, indicating how the states may follow each other in sequence
DYNAMIC TIME WARPING
• The simplest way to recognize an isolated word sample is to compare it
against a number of stored word templates and determine which the “best
match” is. This goal is complicated by a number of factors. First, different
samples of a given word will have somewhat different durations. This
problem can be eliminated by simply normalizing the templates and the
unknown speech so that they all have an equal duration. However, another
problem is that the rate of speech may not be constant throughout the
word; in other words, the optimal alignment between a template and the
speech sample may be nonlinear.
• Dynamic Time Warping (DTW) is an efficient method for finding this optimal
nonlinear alignment
ARTIFICIAL NEURAL NETWORKS(ANN)
• A neural network can be defined as a model of reasoning based on the human
brain. The brain consists of a densely interconnected set of nerve cells, or basic
information-processing units, called neurons.. By using multiple neurons
simultaneously, the brain can perform its functions much faster than the fastest
computers in existence today.

• Each neuron has a very simple structure, but an army of such elements constitutes a
tremendous processing power.
• Feedforward network is the first and the simplest form of ANN. In this network,
the information flows only in one i.e. forward direction from input node via
hidden nodes to the output node. Learning is the adaptation of free
parameters of neural network through a continuous process of stimulation by
the embedded environment. The back-propagation algorithm has emerged
to design the new class of layered feedforward network called as Multi-Layer
Perceptrons (MLP). It generally contains at least two layers of perceptrons. It
has one input layer, one or more hidden layers and output layers. The hidden
layer plays very important role and acts as a feature extractor.
Struture of Speech Recognition
The structure of a standard speech recognition system is illustrated in Figure. The elements
are as follows:
• Raw speech - Speech is typically sampled at a high frequency, e.g., 16 KHz over a
microphone or 8 KHz over a telephone. This yields a sequence of amplitude values over
time.
• Signal analysis - Raw speech should be initially transformed and compressed, in order to
simplify subsequent processing. Many signal analysis techniques are available which can
extract useful features and compress the data by a factor of ten,without losing any
important information. Among the most popular:
• Fourier analysis (FFT)- yields discrete frequencies over time, which can be
interpreted visually. Frequencies are often distributed using a Mel scale,
which is linear in the low range but logarithmic in the high range,
corresponding to physiological characteristics of the human ear.
• Perceptual Linear Prediction (PLP)- is also physiologically motivated, but
yields coefficients that cannot be interpreted visually.
• Linear Predictive Coding (LPC)- yields coefficients of a linear equation
that approximate the recent history of the raw speech values.
FLOWCHART OF THE SYSTEM
• The general structure of the speech
recognition program is shown in figure
bellow. The input of the system is the Close-talking microphone
speech signal. The preprocessing includes Microphone array

Denoising and End points detection

Auditory models
(EIH, SMC, PLP)

operations. After preprocessing the signal Adaptive filtering

Noise subtraction

is sent to the feature extraction block. In

Comb filtering
Spectral mapping
Cepstral mean normalization
this thesis three methods- LPC, MFCC, and
A cepstra
RASTA

Spectrogram are used for feature Noise addition

extraction. Finally the decision is taken if

HMM (de) composition(PMC)
Model transformation(MLLR)
Bayesian adaptive learning
there is matching or no in the last block Frequency weighting measure
Weighted cepstral distance
Cepstrum projection measure

Word spotting
Utterance verification

Language model adaptation

AUTOMATIC SPEECH RECOGNITION
• Automatic speech recognition (ASR) can be defined as the independent, computer‐driven
transcription of spoken language into readable text in real time In ASR is technology that
allows a computer to identify the words that a person speaks into a microphone or
telephone and convert it to written text. The ultimate goal of ASR research is to allow a
computer to recognize in real‐time, with 100% accuracy, few words that are intelligibly
spoken by any person, independent of vocabulary size, noise, speaker characteristics or
accent. The goal of an ASR system is to accurately and efficiently convert a speech signal
into a text message transcription of the spoken words independent of the speaker,
environment or the device used to record the speech (i.e. the microphone).
• This process begins when a speaker decides what to say and actually speaks a sentence.
The software then produces a speech wave form, which embodies the words of the
sentence as well as the extraneous sounds and pauses in the spoken input. Next, the
software attempts to decode the speech into the best estimate of the sentence. First it
converts the speech signal into a sequence of vectors which are measured throughout the
duration of the speech signal. Then, using a syntactic decoder it generates a valid
sequence of representations.
WORKING
• This process begins when a speaker decides what to say and actually speaks
a sentence. The software then produces a speech wave form, which
embodies the words of the sentence as well as the extraneous sounds and
pauses in the spoken input. Next, the software attempts to decode the
speech into the best estimate of the sentence. First it converts the speech
signal into a sequence of vectors which are measured throughout the
duration of the speech signal. Then, using a syntactic decoder it generates a
valid sequence of representations
The GALAXY-II conversational system at MIT Galaxy is a clientserver
architecture developed at MIT for accessing online information using spoken
dialogue [9]. Ithas served as the testbed for developing human language The
boxes in this figure represent various human language technology servers as
well as information and domain servers
RESULT
Author(s) Year Paper name Technique Results
Ibrahim Patel 2010 Speech Recognition Resolution It show an improvement
Using HMM with Decomposition with in the quality metrics of
MFCC-an analysis using Separating speech recognition with
Frequency Frequency is the respect to computational
Spectral mapping approach time, learning accuracy
Decomposition for a speech
Technique recognition system
Kavita Sharma 2012 Speech Denoising using FIR, IIR, Use of filter shows that
Different WAVELETS, estimation of clean
Types of Filters FILTER speech and noise for
speech enhancement in
speech recognition
Bhupinder Singh 2012 Speech Recognition with Hidden Markov Develop a voice based
Hidden Markov Model Model user machine interface
system
Author(s) Year Paper name Technique Results

Patiyuth Pramkeaw 2012 Improving MFCCbased FIR Filter Shows the improvement
speech in recognition rates of
classification spoken words
with FIR filter

Shivanker Dev 2013 Isolated Speech Dynamic Time It shows that the DTW
Dhingra Recognition using Warping(DTW) is the best non linear
MFCC and DTW feature

matching technique in
speech
identification,
with minimal
error rates and fast
computing speed
CONCLUSION
• For SR ANN is a effective and efficient way as it has multi layer network.
Speech Recognition is also used in smart phones. In smart phones
speech/spoken words are given as an input and SR s/w gives appropriate
search or information that user wants as a output.Neural networks, with their
remarkable ability to derive meaning from complicated or imprecise data,
can be used to extract patterns and detect trends that are too complex to
be noticed by either humans or other computer techniques. A trained neural
network can be thought of as an "expert" in the category of information it
has been given to analyse.
• ANN has,
1.Adaptive learning: An ability to learn how to do tasks based on the data
given for training or initial experience.
2.Self-Organisation: An ANN can create its own organisation or representation
of the information it receives during learning time.
3.Real Time Operation: ANN computations may be carried out in parallel, and
special hardware devices are being designed and manufactured which take
advantage of this capability.
4.Fault Tolerance via Redundant Information Coding: Partial destruction of a
network leads to the corresponding degradation of performance. However,
some network capabilities may be retained even with major network
damage. Thus for speech recognition artificial neural network is efficient and
effective algorithm among all algorithms.
BIBLIOGRAPHY
• http://en.wikipedia.org/wiki/Speech_recognition.
• http://en.wikipedia.org/wiki/Artificial_neural_network
• http://www.researchgate.net/
• Youtube
• Introduction to Various Algorithms of Speech Recognition: Hidden Markov
Model, Dynamic Time Warping and ArtificialNeural NetworksPahini A. Trived
• Automatic Speech Recognition System Prof. Pisal Ranjeet1 , Thite Prakash2 ,
Satpute Amruta3 & Shingade Monali4
• VOICE RECOGNITION SYSTEM: SPEECH-TO-TEXT Prerana Das, Kakali Acharjee,
Pranab Das and Vijay Prasad*

Survey of Deep Learning Paradigms For Speech Processing
No ratings yet
Survey of Deep Learning Paradigms For Speech Processing
37 pages
Voice Technology Seminar
100% (1)
Voice Technology Seminar
35 pages
Telecommunication Applications of Speech Recognition
No ratings yet
Telecommunication Applications of Speech Recognition
100 pages
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
Voice Assistant
No ratings yet
Voice Assistant
34 pages
Intro To Plastic Injection Molding Ebook
78% (9)
Intro To Plastic Injection Molding Ebook
43 pages
Final Report
No ratings yet
Final Report
35 pages
Speech Processing
No ratings yet
Speech Processing
70 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
SPEECH
100% (1)
SPEECH
17 pages
Text and Speech CCS369-UNIT 5
No ratings yet
Text and Speech CCS369-UNIT 5
9 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
NLP 1.3.1 - Speed Recogmnition
No ratings yet
NLP 1.3.1 - Speed Recogmnition
20 pages
2017 Book EndodonticPrognosis
100% (1)
2017 Book EndodonticPrognosis
250 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
U CMR March 2023
80% (5)
U CMR March 2023
2 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
No ratings yet
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
30 pages
Speech Recognition Using Artificial Neural Network: - A Review
100% (1)
Speech Recognition Using Artificial Neural Network: - A Review
4 pages
A Report On
No ratings yet
A Report On
35 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
Ai Project Sona-1 (1) - 250630 - 194118
No ratings yet
Ai Project Sona-1 (1) - 250630 - 194118
10 pages
Rohit
No ratings yet
Rohit
14 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
No ratings yet
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
24 pages
Speech Recognition: BY Charu Joshi
No ratings yet
Speech Recognition: BY Charu Joshi
26 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
No ratings yet
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
32 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Complete SQL
No ratings yet
Complete SQL
91 pages
Speech Recognition: From Wikipedia, The Free Encyclopedia
0% (1)
Speech Recognition: From Wikipedia, The Free Encyclopedia
16 pages
Iccsee 2012 359
No ratings yet
Iccsee 2012 359
4 pages
Speech Recognition Using Neural Networks IJERTV7IS100087
No ratings yet
Speech Recognition Using Neural Networks IJERTV7IS100087
7 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
The Development Process and Current State of The Speech Recognition Technology
No ratings yet
The Development Process and Current State of The Speech Recognition Technology
8 pages
Build Automatic Speech Recognition System: Bachelor of Technology
No ratings yet
Build Automatic Speech Recognition System: Bachelor of Technology
25 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Speech Recognition System
No ratings yet
Speech Recognition System
5 pages
Speech Recognition
0% (1)
Speech Recognition
27 pages
Speech Processing
No ratings yet
Speech Processing
4 pages
Seminar Presentation: Topic: Speech Recognition
No ratings yet
Seminar Presentation: Topic: Speech Recognition
26 pages
Speech Recognition Technology in A Ubiquitous Computing Environment
No ratings yet
Speech Recognition Technology in A Ubiquitous Computing Environment
24 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
SPEECH RECOGNITION SYSTEM Final
No ratings yet
SPEECH RECOGNITION SYSTEM Final
16 pages
CHAT GPT CHEAT CODES - v1.5
94% (47)
CHAT GPT CHEAT CODES - v1.5
77 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Mud Logging
No ratings yet
Mud Logging
10 pages
Antus's PPL
No ratings yet
Antus's PPL
6 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
Project New
No ratings yet
Project New
2 pages
ABSTRACT Seminar
No ratings yet
ABSTRACT Seminar
5 pages
Artificial Intelligence: Presented By: A.Sowmya CH - Sushma
No ratings yet
Artificial Intelligence: Presented By: A.Sowmya CH - Sushma
10 pages
Epicor 9.05 Performance Tuning Guide - SQL
No ratings yet
Epicor 9.05 Performance Tuning Guide - SQL
21 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
Blockchain Linkedin Bio
No ratings yet
Blockchain Linkedin Bio
2 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
The PC Interfaced Voice Recognition System Is To Implement A Password For Authentication
No ratings yet
The PC Interfaced Voice Recognition System Is To Implement A Password For Authentication
7 pages
Solidity Basic Code
No ratings yet
Solidity Basic Code
1 page
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Qualitrol - Low Frequency Vs High Frequency Partial Discharge Detection
No ratings yet
Qualitrol - Low Frequency Vs High Frequency Partial Discharge Detection
20 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Lab Manual
No ratings yet
Lab Manual
56 pages
Back Propagation Maths
No ratings yet
Back Propagation Maths
7 pages
Systemair Fans KVO Data Sheet Eng PDF
No ratings yet
Systemair Fans KVO Data Sheet Eng PDF
4 pages
Zhongdong - Wang@manchester - Ac.uk Qiang - Liu@manchester - Ac.uk
No ratings yet
Zhongdong - Wang@manchester - Ac.uk Qiang - Liu@manchester - Ac.uk
3 pages
List Spare Part NCR BSB - 6622 - 6622e - Rev1
No ratings yet
List Spare Part NCR BSB - 6622 - 6622e - Rev1
56 pages
10 Vallarta v. CA
No ratings yet
10 Vallarta v. CA
2 pages
Spatial Information Technology For Sustainable Development Goals
No ratings yet
Spatial Information Technology For Sustainable Development Goals
254 pages
CS IMMIGRATION LTD. Financial Statements 2023
No ratings yet
CS IMMIGRATION LTD. Financial Statements 2023
7 pages
Conrado Lopez v. Atty. Mata, Atty. Sentillas, and Atty. Abellana AC No. 9334 July 28, 2020 Facts
No ratings yet
Conrado Lopez v. Atty. Mata, Atty. Sentillas, and Atty. Abellana AC No. 9334 July 28, 2020 Facts
2 pages
Universidades e Instituciones de Investigación de Venezuela: Análisis Integral de 54 Instituciones y 2.515 Científicos (AD Scientific Index 2025)
No ratings yet
Universidades e Instituciones de Investigación de Venezuela: Análisis Integral de 54 Instituciones y 2.515 Científicos (AD Scientific Index 2025)
30 pages
Facilitator's CALA Guide: Learning Area: CALA Type: Level: Topic: Duration
No ratings yet
Facilitator's CALA Guide: Learning Area: CALA Type: Level: Topic: Duration
8 pages
J Applied Clin Med Phys - 2024 - Dunn - Assessing The Sensitivity and Suitability of A Range of Detectors For SIMT PSQA
No ratings yet
J Applied Clin Med Phys - 2024 - Dunn - Assessing The Sensitivity and Suitability of A Range of Detectors For SIMT PSQA
21 pages
CSE Lecture02.note
No ratings yet
CSE Lecture02.note
22 pages
01 JRODOS Overview
No ratings yet
01 JRODOS Overview
25 pages
كاتلوج 2
No ratings yet
كاتلوج 2
44 pages
FRG NHNWWW AEURy Z
No ratings yet
FRG NHNWWW AEURy Z
1 page
Black Box and White Box Testing
No ratings yet
Black Box and White Box Testing
5 pages
Case Application 1-b
No ratings yet
Case Application 1-b
2 pages
VCB White Paper Public
No ratings yet
VCB White Paper Public
17 pages
01 AB 0.428 000638740261156 P Y R&R Atms Rentals and Vending LLC UNIT 61054 2478 E Desert Inn RD LAS VEGAS NV 89160-8044
No ratings yet
01 AB 0.428 000638740261156 P Y R&R Atms Rentals and Vending LLC UNIT 61054 2478 E Desert Inn RD LAS VEGAS NV 89160-8044
4 pages
Jovision JVS-517-TDL
No ratings yet
Jovision JVS-517-TDL
2 pages
CN Unit 2 Notes
No ratings yet
CN Unit 2 Notes
16 pages
Industrial Shakers
No ratings yet
Industrial Shakers
4 pages
The Technical Analyst WWW - Technicalanalyst.co - Uk
No ratings yet
The Technical Analyst WWW - Technicalanalyst.co - Uk
2 pages
Clientele and Audiences in Communication (Diass) PDF
No ratings yet
Clientele and Audiences in Communication (Diass) PDF
1 page
Mclaren Watch - Google Search
No ratings yet
Mclaren Watch - Google Search
1 page

Ann LA2 Project

Uploaded by

Ann LA2 Project

Uploaded by

Nitte Meenakshi Institute of Technology

Department of Electronics and Communication Engineering

Submitted By : Sayantan Das -1NT18EC198

When an HMM is applied to speech recognition, the states are interpreted as

Denoising and End points detection

operations. After preprocessing the signal Adaptive filtering

is sent to the feature extraction block. In

Spectrogram are used for feature Noise addition

extraction. Finally the decision is taken if

Language model adaptation

You might also like