Frame Blocking and Windowing Speech Signal: December 2018
Frame Blocking and Windowing Speech Signal: December 2018
net/publication/331635757
CITATIONS READS
6 3,071
1 author:
Oday Kamil
Dijlah University College
9 PUBLICATIONS 11 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Oday Kamil on 15 March 2019.
Abstract— The key objective of this research is frame blocking With this information presented so far one question comes
and windowing, a speech signal is a slowly time varying signal in the naturally: now is speech recognition done? to get knowledge of how
sense that, when examined over a short period of time (between 10 to speech recognition problems can be approached today, a review of
30 ms), its characteristics are short time stationary. This is not the some research high lights will be presented the earliest attempt to
case if we look at a speech signal under a longer time perspective device systems for automatic speech recognition by machine were
(approximately time T › 0.5 s).in this case the signals characteristics made in the 1950's ,when various researchers tried to exploit the
are non-stationary, meaning that it changes to reflect the different fundamental idea of acoustic –phonetics in 1952, at bell laboratories ,
sounds spoken by the talker. For this reason we use frame blocking davis biddulph ,and balashek built a system for isolated digit
and windowing to be able to use a speech signal and interpret its recognition for a single speaker the system relied heavily on
characteristics in proper manner. In this project speech signal is measuring spectral resonances during the vowel region of each digit .
blocked into frames of N sample with adjacent frames being separated in 1959 another attempt was made by forgie and forgie , constructed
by M (M ‹ N) where N=256 sample correspond to (≈23 ms) and at M IT Lincoln laboratories ten vowel embedded in a/b/-vowel-/t
M (overlapping)=50℅ (128 sample)(11.37 ms)and signal is sampled format were recognized in speaker independent manner .in the 1970's
at 11.25 ms, and then we use hamming window because it is the most speech recognition research achieved a number of significant mile
widely used in speech processing. stones ,first the area of isolated word or discrete utterance recognition
The proposed speaker recognition systems are examined through became a viable and usable technology based on the fundamental
theoretical analysis and computer simulation using M atlab version 6 studies by velichko andzagoruyko in Russia ,sakoe and chiba in japan
programming language and sound forge 5 as a speech analyzer under and itakura, in united state .the Russian studies helped advance the
M icrosoft Windows 2007 operating system use of pattern recognition ideas in speech recognition ,the Japanese
research showed how dynamic program ming methods could be
I. Introduction successfully applied and itakura's research showed now the idea of
Speech recognition is a topic that is very useful in many linear predicting coding (LPC).
application and environment in our daily life. generally speech
recognizer is a machine which understand human and their spoken The purpose with this research is getting a deeper theoretical and
word in some way and can act thereafter it can be used, for example in practical understanding of speech recognition .the work started by
a car environment to voice control non critical operations, such as cutting the speech data signal into frames before analysis and the
dialing a phone number another possible scenario is on – board frame size is 10---30 ms and frames can be overlapped normally the
havigation, presenting the driving route to the driver applying voice over lapping region range from 0 to 50% of the frame size and then use
control the traffic safety will be increased. the matlab to process the speech signal .in the future it could be
A different aspect of speech recognition is to facilitate for people possible to use this information to create chip that could be used as
with functional disability or other kinds of handicap to make their anew interface to humans .for example it would be desired to get rid of
daily chores easier, voice control could be helpful .with their voice all remote controls in the home and just tell the TV,stereo or any
they could operate the light switch turn of/on the coffee machine or desired device what to do with the voice[2].
operate some other domestic appliances this leads to the discussion
about intelligent homes where these operation can be made available
for the common man as well as for handicapped [1]. II. Theory
Framing
Decompose the speech signal into a series of overlapping frames
– Traditional methods for spectral evaluation are reliable in the
Oday K .Hamid, Dept. of Computer Techniques Engineering, Dijlah case of a stationary signal (i.e., a signal whose statistical
University College, (e-mail: [email protected]). Baghdad, Iraq. characteristics are invariant with respect to time)
87
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018
• Imply that the region is short enough for the behavior of Time frame and overlap
(periodicity or noise-like appearance) the signal to be approximately ❧Since our ear cannot response to very fast change of speech data
constant content, we normally cut the speech data into frames before analysis
• In sense, the speech region has to be short enough so that it can ❧Frame size is 10~30ms
reasonably be assumed to be stationary ❧Frames can be overlapped:
• stationary in that region: i.e., the signal characteristics whether Normally the overlapping region ranges from 0 to 75% of the frame
periodicity or noise-like appearance) are uniform in that region. size.
Frame duration ranges are between 10 ~ 25 ms in the case of speech
processing [3].
88
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018
Function of window
– rectangular window:
• h[n]=1, 0≤n≤L-1 and 0 otherwise
– Hamming window (raised cosine window):
• h[n]=0.54-0.46 cos(2πn/(L-1)), 0≤n≤L-1 and 0 otherwise
– rectangular window gives equal weight to all L
samples in the window (n,...,n-L+1)
– Hamming window gives most weight to middle samples and
tapers off strongly at the beginning and the end of the window
Figure (3) (frame analysis)
[7].
Hamming window
89
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018
S hort-term Energy
90
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018
91
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018
There is an important difference between rectangular window and 2- Then the original sampled speech signal was cut into
hamming window that the bandwidth of hamming window is about frames each frame have 256 sample as discussed
twice the bandwidth of a rectangular window of the same length as before, figure(11) shows frame number 25.
shown in It is also clear that the hamming window gives much greater
attenuation outside the pass band than the comparable rectangular
window
Hamming Window
Which it was used in this research, figure (9) and it‘s defined as:
n=Length of window.
N=Number of sample [10].
Figure (11) frame number 25
Volt
3- Applied hamming window for each frame as shown in
figure (12).
Figure (13) shows both frame and windowed sampled and
0 N Time observe how the hamming window taper the beginning and
end of the frame.
Figure (9): Hamming window
92
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018
93
ISSN: 2413-6999
Journal of Information, Communication, and Intelligence Systems (JICIS)
Volume 4, Issue 5, December 2018
REFERENCES
94