0% found this document useful (0 votes)

220 views9 pages

Frame Blocking and Windowing Speech Signal: December 2018

Frame blocking and windowing is used to analyze speech signals. A speech signal is divided into short frames of 10-30 milliseconds that are assumed to be stationary. Adjacent frames overlap by 0-50% using a window function like Hamming to smooth the signal at frame edges. This allows interpreting the signal characteristics properly by analyzing short stationary segments. Speech recognition systems are then examined using frame blocking and windowing of the speech signal.

Uploaded by

rahmid farezi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

220 views9 pages

Frame Blocking and Windowing Speech Signal: December 2018

Uploaded by

rahmid farezi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331635757

Frame Blocking and Windowing Speech Signal

Article · December 2018

CITATIONS READS

6 3,071

1 author:

Oday Kamil
Dijlah University College
9 PUBLICATIONS 11 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Frame Blocking and Windowing Speech Signal View project

All content following this page was uploaded by Oday Kamil on 15 March 2019.

The user has requested enhancement of the downloaded file.

ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018

Frame Blocking and Windowing Speech Signal

Oday Kamil Hamid

Abstract— The key objective of this research is frame blocking With this information presented so far one question comes
and windowing, a speech signal is a slowly time varying signal in the naturally: now is speech recognition done? to get knowledge of how
sense that, when examined over a short period of time (between 10 to speech recognition problems can be approached today, a review of
30 ms), its characteristics are short time stationary. This is not the some research high lights will be presented the earliest attempt to
case if we look at a speech signal under a longer time perspective device systems for automatic speech recognition by machine were
(approximately time T › 0.5 s).in this case the signals characteristics made in the 1950's ,when various researchers tried to exploit the
are non-stationary, meaning that it changes to reflect the different fundamental idea of acoustic –phonetics in 1952, at bell laboratories ,
sounds spoken by the talker. For this reason we use frame blocking davis biddulph ,and balashek built a system for isolated digit
and windowing to be able to use a speech signal and interpret its recognition for a single speaker the system relied heavily on
characteristics in proper manner. In this project speech signal is measuring spectral resonances during the vowel region of each digit .
blocked into frames of N sample with adjacent frames being separated in 1959 another attempt was made by forgie and forgie , constructed
by M (M ‹ N) where N=256 sample correspond to (≈23 ms) and at M IT Lincoln laboratories ten vowel embedded in a/b/-vowel-/t
M (overlapping)=50℅ (128 sample)(11.37 ms)and signal is sampled format were recognized in speaker independent manner .in the 1970's
at 11.25 ms, and then we use hamming window because it is the most speech recognition research achieved a number of significant mile
widely used in speech processing. stones ,first the area of isolated word or discrete utterance recognition
The proposed speaker recognition systems are examined through became a viable and usable technology based on the fundamental
theoretical analysis and computer simulation using M atlab version 6 studies by velichko andzagoruyko in Russia ,sakoe and chiba in japan
programming language and sound forge 5 as a speech analyzer under and itakura, in united state .the Russian studies helped advance the
M icrosoft Windows 2007 operating system use of pattern recognition ideas in speech recognition ,the Japanese
research showed how dynamic program ming methods could be
I. Introduction successfully applied and itakura's research showed now the idea of
Speech recognition is a topic that is very useful in many linear predicting coding (LPC).
application and environment in our daily life. generally speech
recognizer is a machine which understand human and their spoken The purpose with this research is getting a deeper theoretical and
word in some way and can act thereafter it can be used, for example in practical understanding of speech recognition .the work started by
a car environment to voice control non critical operations, such as cutting the speech data signal into frames before analysis and the
dialing a phone number another possible scenario is on – board frame size is 10---30 ms and frames can be overlapped normally the
havigation, presenting the driving route to the driver applying voice over lapping region range from 0 to 50% of the frame size and then use
control the traffic safety will be increased. the matlab to process the speech signal .in the future it could be
A different aspect of speech recognition is to facilitate for people possible to use this information to create chip that could be used as
with functional disability or other kinds of handicap to make their anew interface to humans .for example it would be desired to get rid of
daily chores easier, voice control could be helpful .with their voice all remote controls in the home and just tell the TV,stereo or any
they could operate the light switch turn of/on the coffee machine or desired device what to do with the voice[2].
operate some other domestic appliances this leads to the discussion
about intelligent homes where these operation can be made available
for the common man as well as for handicapped [1]. II. Theory
Framing
Decompose the speech signal into a series of overlapping frames
– Traditional methods for spectral evaluation are reliable in the
Oday K .Hamid, Dept. of Computer Techniques Engineering, Dijlah case of a stationary signal (i.e., a signal whose statistical
University College, (e-mail: [email protected]). Baghdad, Iraq. characteristics are invariant with respect to time)

87
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018

• Imply that the region is short enough for the behavior of Time frame and overlap
(periodicity or noise-like appearance) the signal to be approximately ❧Since our ear cannot response to very fast change of speech data
constant content, we normally cut the speech data into frames before analysis
• In sense, the speech region has to be short enough so that it can ❧Frame size is 10~30ms
reasonably be assumed to be stationary ❧Frames can be overlapped:
• stationary in that region: i.e., the signal characteristics whether Normally the overlapping region ranges from 0 to 75% of the frame
periodicity or noise-like appearance) are uniform in that region. size.
Frame duration ranges are between 10 ~ 25 ms in the case of speech
processing [3].

Frame blocking and Windowing

Due to the differences in phoneme‘s spectral features, changes in
prosody, and random variations in the vocal tract, speech is a
non-stationary signal. However, in a short time interval (generally
from 10 to 20 ms) it is assumed that the speech signal is stationary,
and therefore it is analyzed over these shot-time windows. So the
frame blocking procedure consists essentially dividing the speech
signal into short frames of N samples, which overlap by M samples,
with adjacent frames.
In order to minimize spectral distortions when blocking the speech
signal, each frame is multiplied with a Hamming window of the form Figure (1) (sampled speech signal)

where N is the duration (in samples) of the speech frame. The

output
y (n) of the windowed signal becomes:

❧This windowing function acts as a low pass filter, enhancing the

signal at the window center and smoothening it at the edges.

Figure (2) (frame length of 256 samples and overlap of 128

samples)[5].
❧ To choose the frame size (N samples) and adjacent frames
separated by m samples.
Frame shifting
❧ i.e... A 11.5 KHz sampling signal, a 8ms window has
It is normal to use overlapping windows to ensure better temporal
N=256samples, (neighboring shift) m=128 sample[4].
continuity in the transform domain. An overlap of half the window
size (or less) is typical.

• Frame rate: the number of frames computed per second, in general

33 to 100 frames per second in sort-term speech processing [6].

88
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018

Framing and windowing–short-term processing

A frame-based analysis is essential for sp eech signals as shown in
figure (3).

Function of window
– rectangular window:
• h[n]=1, 0≤n≤L-1 and 0 otherwise
– Hamming window (raised cosine window):
• h[n]=0.54-0.46 cos(2πn/(L-1)), 0≤n≤L-1 and 0 otherwise
– rectangular window gives equal weight to all L
samples in the window (n,...,n-L+1)
– Hamming window gives most weight to middle samples and
tapers off strongly at the beginning and the end of the window
Figure (3) (frame analysis)
[7].

Windowing and the types of window

Windows in S TFT

Since speech is non-stationary, we are interesting in short-term

For Xn(ejw )to represent the short-time spectral properties of
estimates of parameters such as the Fourier spectrum. This requires
X(n) inside the window, Xn(ejw ) should be much narrower in
that a speech segment be chosen for analysis. We are effectively
frequency than significant spectral regions of Xn(ejw ) i.e., almost an
cross-multiplying the signal by a window function.
impulse in frequency. Consider rectangular and hamming windows,
where width of the main spectral lobe is inversely proportional to
window length and side lobe levels are essentially independent of
window length.
• Rectangular Window: flat window of length N samples; first zero
in
Rectangular window
frequency response occurs at FS/N, with side lobe levels of -14 dB
or lower.
• Hamming Window: raised cosine window of length L samples;
1 0n N
wn    first zero in frequency response occurs at 2FS/N, with side lobe levels
0 Otherwise of -40 dB or lower as shown in figure (4) below:

• Just extract the frame part of signal without further processing

• Whose frequency response has high side lobes.
–M ain lobe: spreads out in a wider frequency range the narrow
band power of the signal, and thus reduces the local frequency
resolution
–Side lobe: swaps energy from different and distant frequencies of
xm[n], which is called leakage
However, it is desirable to use a tapered window such as:

Hamming window

Figure (4) Frequency response

0.54  0.46 cos2 n N  1 0  n  N 1

w n   
0 Otherwise
• 500sample windows (50 msec)
• can see periodicity in time and in frequency
• can see strong first formant (300-400 Hz), strong resonance at
2200 Hz, resonance at 3800 Hz as shown in figure(5)[8].:

89
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018

S ampling rate in frequency

Figure (5) Frequency response

Total” S ampling Rate of S TFT

• the ―total‖ sampling rate for the STFT is the product of the
sampling rates in time and frequency, i.e.,
SR = SR(time) x SR(frequency)
= 2B x L samples/sec
B = frequency bandwidth of window (Hz)
L = time width of window (samples)
• for most windows of interest, B is a multiple of FS/L, i.e.,
B = C FS/L (Hz), C=1 for Rectangular Window
C=2 for Hamming Window
SR = 2C FS samples/second
• can define an ‗over sampling rate‘ of
SR/ FS = 2C = over sampling rate of STFT as compared to
conventional sampling representation of x(n)
for RW, 2C=2; for HW 2C=4 => range of over sampling is 2-4
this over sampling gives a very flexible rep resentation of the
speech signal

S hort-term Energy

The long term definition of signal energy is as below:

There is little or no utility of this definition for time-varying

signals, say speech.
For a short-term speech signal (the n-th frame speech after framing
Figure (6) sampling theorem
and windowing:

90
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018

Xn(m)=x(m)w(n-m) n-N+1<=m<=n Since speech is time-varying in that the vocal-tract configuration

changes over time, an accurate set of predictor coefficients is
Where w (n) is the window, n is the sample that the analysis adaptively determined over short time frames (typically 10ms to
window is centered on, and N is the window size. Window 30ms) during which time-invariance is assumed [31]. So for this
jumps/slides across sequence of squared values, selecting interval for reason the continuous speech signal is blocked into frames of
processing, as shown below in figure (7) [9].: N sample with adjacent frames being separated by MM  N . In this
research N  256 sample correspond to (~23 msec) and M (over
lapping) =50% (128 sample) (11.37 msec). The first frame consists
of the first N samples; the second frame being M samples after the
first frame, and overlaps it by N-M sample. Similarly, the third frame
beings 2M samples after the first frame (or M samples after the
second frame) and overlaps it by N-2M samples. This process
continues until all the speech is accounted for within one or more
frames. We denote the l th frame of speech by x l n  , and there are L
frames within the speech signal, then x l n   xMl  n  n=0,1,……..,
N-1 l=0,1,…..L-1 that is, the first frame of
speech, x 0 n  encompasses speech samples x0, x1,.......xN 1 ,the
Figure (7) Original sample and windowed
second frame of speech x1 n  ,encompasses samples
sample
xM, xM  1,.......xM  N 1 ,and the Lth frame of
Speech, x L1 n  ,Encompasses speech samples
xML 1, xML 1  1,............xML 1  N 1 as shown in
III. Database
figure (8)
Any speech or speaker recognition system depend on the type of
data input to this system, so some elements must be available in order
to get on a good data as: 1
 High quality microphones used in recording for both training and 0.
testing sessions. 8
0.6
 Ideal recording must be used in rooms with little or no 0.4
background noise or reverberation for both training and testing 0.2
sessions. 0
 Collect a large database for many tries. -0.2
 Using modern programs in recording because it has a capability -0.4
for cutting voices or synthesis and it gives a good representation of -0.6
signal‘s shape. -0.8
database had been recorded by using high quality microphone to -10
record voice( ‫ ( الخير‬this word are recorded by using Sound Forge 500 1000 1500 2000
program figure (3.1) shows recorded signal displayed by this
programs screen. II. M
Data is sampled at 11.25 KHZ (sampling rate), with 16-bit sample N A. M
value A/D, but unfortunately we did not have the chance of recording
in suitable place for such a purpose and that deal of the collected
N M
database from each speaker considered as a little for suggested N
systems.

Figure (8): Blocking of speech into overlapping frames.

Preprocessing speech signal
The basic idea of the preprocessing is not to use the high
dimensional redundant speech signal for the recognition procedure, Windowing
but to describe it by a low dimensional set of features being typical The next step in the processing is to window each individual frame,
mainly for the speaker‘s identity . The speech signal coming from the most widely used windows in speech processing are the
microphone pass through these steps: rectangular window that weights all samples in the analysis frame
1.Reading the database (convert it from analog to digital equally, and Hamming window which is used to taper the segment
signal x n  . because prediction residual must be kept to minimize at the beginning
and end of the frame.
Frame blocking

91
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018

There is an important difference between rectangular window and 2- Then the original sampled speech signal was cut into
hamming window that the bandwidth of hamming window is about frames each frame have 256 sample as discussed
twice the bandwidth of a rectangular window of the same length as before, figure(11) shows frame number 25.
shown in It is also clear that the hamming window gives much greater
attenuation outside the pass band than the comparable rectangular
window

 Hamming Window
Which it was used in this research, figure (9) and it‘s defined as:

0.54  0.46 cos2 n  N  1 0  n  N 1

wn   
0 Otherwise

n=Length of window.
N=Number of sample [10].
Figure (11) frame number 25
Volt
3- Applied hamming window for each frame as shown in
figure (12).
Figure (13) shows both frame and windowed sampled and
0 N Time observe how the hamming window taper the beginning and
end of the frame.
Figure (9): Hamming window

Windows are used in signal analysis so as to minimize the signal

discontinuities at the beginning and end of each frame. The concept
here is to minimize the spectral distortion by using the window to
taper the signal to zero at the beginning and end of each frame [6].
In this research a typical hamming window was used as in equation
(3-2):
~x n   x n wn  0  n  N 1
l l
N=256.

IV. Evaluation test for the proposed method

1- convert the analog signal to digital signal X(n) as

shown in figure (10) below:

Figure (12) windowed frame

Figure (10) sampled speech signal

92
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018

Figure (13) frame and windowed frame

The using hamming window instead of rectangular window is

because this has to do with the assumption of periodicity
made by the DFT, and become clearer in the frequency Figure (14): frequency response with N=51, for hamming
domain. window
A window (on its own) tends to have an averaging effect.
Thus it has a low pass spectral characteristic. Ideally, we want
• To preserve spectral detail
• To produce little spectral distortion The log magnitude
spectrum of a rectangular window can be compared with that
of a Hamming window:
• The Hamming has a wider main lobe, but much better
attenuation of side lobes (typically 20-30 dB better than
rectangular). For a designed window, wish that:
- A narrow bandwidth main lobe
- Large attenuation in the magnitudes of the side lobes
However, this is a trade-off!
Notice that:
1. A narrow main lobe will resolve the sharp details of speech
signal (the frequency response of the framed signal) as the
convolution proceeds in frequency domain
2. The attenuated side lobes prevents noise from other parts of
the spectrum from corrupting the true spectrum at a given
frequency
3. Band width of hamming window is about twice the band
width of rectangular window of the same length as shown in
figures (14) and (15).

Figure (15): frequency response with N=51, for rectangular window

93
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018

REFERENCES

1. J. P. Hosom, R. Cole and M . Fanty, “Speech Recognition Using

Neural Networks”, Center for Spoken Language Understanding
(cslu)
Oregon Graduate Institute of Science and Technology, July 6,
1999, ―http://cslu.cse.ogi.edu/corpora/available/‖.
2. B. S. Atal, “Automatic Recognition of Speakers From Their
Voices”, IEEE, Vol. 64, pp. 460-475, April1975.
3. B. S. Atal, “Automatic Speaker Recognition Based Upon Pitch
Contours”, Journal of acoustic of America society (JAAS), Vol.
52, pp. 1687-1697, 1972.
4. L. R. Rabinar and B. H. Juang, “Fundamentals of Speech
Recognition”, Prentice-Hell, New Jersey, 1993.
5. M . N. AL-Trfi, “Speaker Recognition Based Upon Phonemes
Using Wavelet Packet Transform”, M .Sc. Thesis, College of
Engineering, University of Baghdad, 2000.
6. N. Do. M inh, “An Automatic Speaker Recognition System”,
Swiss Federal Institute of Technology, lausanne-Epel,
―http://lcavwww. Epfl. Ch./~mindho/asr_project/. Html‖, January
2000.
7. R. D. Rodman, “Speaker Recognition of Disguised Voices”,
―www.csc.Ncsu.edu/factly/rodman/‖, Speaker Recognition,
Disguised Voices, 1997.and references there in
8. A. H. Al-Nakkash, “A Novel Approach For Speakers
Recognition Using Vector Quantization Technique”, M .Sc. Thesis,
University of Technology, Baghdad, 2001.
9. H. Fenglie and W. Bingxi, “An Integrated System for
Text-Independent Speaker Recognition Using Binary Neural
Network Classifiers”, Proceeding of ICASSP 2001.
10. K. G. M argaritis, “Development of a Text-Dependent Speaker
Identification System with the OGI Toolkit”, 2nd Hellenic Conf. on
AI, SETN-2002, 11-12 April 2002, Thessaloniki, Greece,
Proceedings, Companion Volume, pp. 525-530.

View publication stats

Speech Recognition Using Python
100% (2)
Speech Recognition Using Python
6 pages
Speech Recognition
100% (4)
Speech Recognition
576 pages
Modern Speech Recognition Approa
No ratings yet
Modern Speech Recognition Approa
337 pages
Speech Recognition
100% (3)
Speech Recognition
66 pages
Manage Mixed Mode
100% (1)
Manage Mixed Mode
72 pages
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Implementing A Hidden Markov Model Speech Recognit
No ratings yet
Implementing A Hidden Markov Model Speech Recognit
12 pages
Electronic Scanned Array Design
100% (2)
Electronic Scanned Array Design
357 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Speech Recognition
No ratings yet
Speech Recognition
17 pages
Technical Seminar - Report UPDATED
No ratings yet
Technical Seminar - Report UPDATED
22 pages
Abinaya.K Gayathiri.S Kirubamanohari.R Nivetha.M: Speech Interface Challan Application
No ratings yet
Abinaya.K Gayathiri.S Kirubamanohari.R Nivetha.M: Speech Interface Challan Application
62 pages
Isolated Digit Recognition System
100% (1)
Isolated Digit Recognition System
3 pages
FARSDAT
No ratings yet
FARSDAT
12 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
ASR Proof
No ratings yet
ASR Proof
19 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
Speech Recognition System
No ratings yet
Speech Recognition System
12 pages
Rohit
No ratings yet
Rohit
14 pages
Hidden Markov Model and Persian Speech Recognition
No ratings yet
Hidden Markov Model and Persian Speech Recognition
9 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Automatic Speech Recognition 2
No ratings yet
Automatic Speech Recognition 2
22 pages
Iccsee 2012 359
No ratings yet
Iccsee 2012 359
4 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
No ratings yet
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
8 pages
Extra Paper
No ratings yet
Extra Paper
11 pages
Injection Pump Test Specifications
No ratings yet
Injection Pump Test Specifications
3 pages
Voice Recognition PDF
No ratings yet
Voice Recognition PDF
37 pages
Tejaswini Group Report
No ratings yet
Tejaswini Group Report
18 pages
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
No ratings yet
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
32 pages
AJSAT Vol.5 No.2 July Dece 2016 pp.23 30
No ratings yet
AJSAT Vol.5 No.2 July Dece 2016 pp.23 30
9 pages
Voice Recognition System Speech To Text
No ratings yet
Voice Recognition System Speech To Text
5 pages
Design of Matlab Based Automatic Speaker Recognition and Control System
No ratings yet
Design of Matlab Based Automatic Speaker Recognition and Control System
7 pages
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
No ratings yet
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
9 pages
Jasmeet Seminar Report
No ratings yet
Jasmeet Seminar Report
24 pages
Speech Recognition System
No ratings yet
Speech Recognition System
5 pages
1 Paper
No ratings yet
1 Paper
9 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
Speaker Recognition System - v1
No ratings yet
Speaker Recognition System - v1
7 pages
Blok Diagram Pitch Correction
No ratings yet
Blok Diagram Pitch Correction
37 pages
Automatic Speech Recognition (Attempt) : ECE 113DB Final Project, Winter 2019 Fong Chi Ho, Zijun Sun, Shao Xiong Lee
No ratings yet
Automatic Speech Recognition (Attempt) : ECE 113DB Final Project, Winter 2019 Fong Chi Ho, Zijun Sun, Shao Xiong Lee
4 pages
Speech Recognition UTHM
No ratings yet
Speech Recognition UTHM
30 pages
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
No ratings yet
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
13 pages
Speech Recognition Using Correlation Tec
No ratings yet
Speech Recognition Using Correlation Tec
8 pages
Ann LA2 Project
No ratings yet
Ann LA2 Project
23 pages
Fifa 21 Origin
No ratings yet
Fifa 21 Origin
22 pages
Signals & System Using Matlab - RIT Pampady
No ratings yet
Signals & System Using Matlab - RIT Pampady
45 pages
Synopsis
No ratings yet
Synopsis
5 pages
Speech Recognition System - A Review: April 2016
No ratings yet
Speech Recognition System - A Review: April 2016
10 pages
Seminar Presentation: Topic: Speech Recognition
No ratings yet
Seminar Presentation: Topic: Speech Recognition
26 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Speech Recognition System - A Review
No ratings yet
Speech Recognition System - A Review
10 pages
An Introduction To Speech and Speaker Recognition
No ratings yet
An Introduction To Speech and Speaker Recognition
8 pages
Motorola Professional Radio Gp328 Users Manual 272055
No ratings yet
Motorola Professional Radio Gp328 Users Manual 272055
4 pages
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
No ratings yet
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
6 pages
Speech Recognition: SK - Rahil 1602-11-735-046
No ratings yet
Speech Recognition: SK - Rahil 1602-11-735-046
1 page
Multiple Choice Questions and Answers On Amplitude Modulation
100% (1)
Multiple Choice Questions and Answers On Amplitude Modulation
13 pages
NLP Project Reportttt
No ratings yet
NLP Project Reportttt
9 pages
Speech Recognition System: Surabhi Bansal Ruchi Bahety
No ratings yet
Speech Recognition System: Surabhi Bansal Ruchi Bahety
5 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
A Survey On Speech Recognition
No ratings yet
A Survey On Speech Recognition
2 pages
1.2 MHZ Networking For BCCH TRXs (GBSS21.1 - 03) PDF
No ratings yet
1.2 MHZ Networking For BCCH TRXs (GBSS21.1 - 03) PDF
35 pages
Fyp Proposal
No ratings yet
Fyp Proposal
4 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
220 KV Grid Sub Station Heerapura, Jaipur
No ratings yet
220 KV Grid Sub Station Heerapura, Jaipur
15 pages
Link Budget: Comunicaciones Satelitales
No ratings yet
Link Budget: Comunicaciones Satelitales
61 pages
CD-312s Nova Versao Ms
No ratings yet
CD-312s Nova Versao Ms
43 pages
On DTH
100% (2)
On DTH
23 pages
Pcspma
No ratings yet
Pcspma
30 pages
How 2G Technology Works
100% (1)
How 2G Technology Works
4 pages
Wireless 802.11b/g PCI Adapter: User's Manual
No ratings yet
Wireless 802.11b/g PCI Adapter: User's Manual
41 pages
Cispr TR 16-4-4-2007
No ratings yet
Cispr TR 16-4-4-2007
64 pages
UNI T UTD2102 User Manual
No ratings yet
UNI T UTD2102 User Manual
65 pages
Hanumanth 9686810929 Resume
No ratings yet
Hanumanth 9686810929 Resume
8 pages
SW 102
No ratings yet
SW 102
2 pages
BR1654 Iss 2
No ratings yet
BR1654 Iss 2
56 pages
TL-WA730RE User Guide
No ratings yet
TL-WA730RE User Guide
64 pages
ATmega 328
No ratings yet
ATmega 328
11 pages
Esp Wroom 02u FCC Certificate
No ratings yet
Esp Wroom 02u FCC Certificate
1 page
Experiment No. 03: Date of Performance: Grade: Date of Assessment: Signature of Lecturer/ Tta
No ratings yet
Experiment No. 03: Date of Performance: Grade: Date of Assessment: Signature of Lecturer/ Tta
4 pages
Wide Band RF Transformers
No ratings yet
Wide Band RF Transformers
13 pages
Denoising MR Images With Weighted 3D Discrete Wavelet Transform
No ratings yet
Denoising MR Images With Weighted 3D Discrete Wavelet Transform
5 pages
Me345 Professor John M. Cimbala: Example: Pdfs
No ratings yet
Me345 Professor John M. Cimbala: Example: Pdfs
5 pages
Energyharvestingonroadpavements Stateoftheart
No ratings yet
Energyharvestingonroadpavements Stateoftheart
13 pages
ME345 Professor John M. Cimbala: Example: Significant Digits
No ratings yet
ME345 Professor John M. Cimbala: Example: Significant Digits
4 pages
Linear XR-1 User Manual PDF
No ratings yet
Linear XR-1 User Manual PDF
2 pages
Antenna Wave and Propagation BTETC601
No ratings yet
Antenna Wave and Propagation BTETC601
1 page
Table Hasil Pipa 1,5 Inci
No ratings yet
Table Hasil Pipa 1,5 Inci
5 pages
Dbaudio Datasheet gsl12 en
No ratings yet
Dbaudio Datasheet gsl12 en
2 pages
Jma Mx06fit845-02 Spec Sheet
No ratings yet
Jma Mx06fit845-02 Spec Sheet
3 pages
Tabel Pemilihan Bahan Dan Proses
No ratings yet
Tabel Pemilihan Bahan Dan Proses
2 pages
1X2 Roadm - 50/100Ghz: Cost-Efficient Roadm For Flexible Optical Networks
No ratings yet
1X2 Roadm - 50/100Ghz: Cost-Efficient Roadm For Flexible Optical Networks
2 pages
ME345 Professor John M. Cimbala: The True Temperature of The Ice Bath Is 0.0000 C
No ratings yet
ME345 Professor John M. Cimbala: The True Temperature of The Ice Bath Is 0.0000 C
4 pages
EMD Vs WT
No ratings yet
EMD Vs WT
1 page
ME345 Professor John M. Cimbala: Example: Basic Statistics
No ratings yet
ME345 Professor John M. Cimbala: Example: Basic Statistics
3 pages
Slots in Planes
No ratings yet
Slots in Planes
2 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Frame Blocking and Windowing Speech Signal: December 2018

Uploaded by

Frame Blocking and Windowing Speech Signal: December 2018

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Frame Blocking and Windowing Speech Signal

Article · December 2018

Frame Blocking and Windowing Speech Signal View project

The user has requested enhancement of the downloaded file.

Frame Blocking and Windowing Speech Signal

Oday Kamil Hamid

Frame blocking and Windowing

where N is the duration (in samples) of the speech frame. The

❧This windowing function acts as a low pass filter, enhancing the

Figure (2) (frame length of 256 samples and overlap of 128

• Frame rate: the number of frames computed per second, in general

Framing and windowing–short-term processing

Windowing and the types of window

Since speech is non-stationary, we are interesting in short-term

• Just extract the frame part of signal without further processing

Figure (4) Frequency response

0.54  0.46 cos2 n N  1 0  n  N 1

S ampling rate in frequency

Figure (5) Frequency response

Total” S ampling Rate of S TFT

The long term definition of signal energy is as below:

There is little or no utility of this definition for time-varying

Xn(m)=x(m)w(n-m) n-N+1<=m<=n Since speech is time-varying in that the vocal-tract configuration

Figure (8): Blocking of speech into overlapping frames.

0.54  0.46 cos2 n  N  1 0  n  N 1

Windows are used in signal analysis so as to minimize the signal

IV. Evaluation test for the proposed method

1- convert the analog signal to digital signal X(n) as

Figure (12) windowed frame

Figure (10) sampled speech signal

Figure (13) frame and windowed frame

The using hamming window instead of rectangular window is

Figure (15): frequency response with N=51, for rectangular window

1. J. P. Hosom, R. Cole and M . Fanty, “Speech Recognition Using

View publication stats

You might also like