0% found this document useful (0 votes)

39 views67 pages

Asr02 Signal

Speech recognition

Uploaded by

vinay thakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views67 pages

Asr02 Signal

Speech recognition

Uploaded by

vinay thakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Speech Signal Analysis 1

Hao Tang

Automatic Speech Recognition—ASR Lecture 2

20 January 2022

Hao Tang Speech Signal Analysis 1

Announcement

Please sign up for Piazza

(https://piazza.com/ed.ac.uk/spring2022/infr11033)

No labs next week. The labs will start on week 3.

Hao Tang Speech Signal Analysis 1

waveform

Signal Analysis

acoustic
features

ASR system

Hao Tang Speech Signal Analysis 1

waveform

Speech is part of sound waves.

If we want to study speech, we need to be able to record,
replay, and visualize speech.

Hao Tang Speech Signal Analysis 1

Phonautograph (1857)

Hao Tang Speech Signal Analysis 1

Phonograph (1877)

Hao Tang Speech Signal Analysis 1

Carbon Microphone (1877)

Diaphragm (ﬂexible electrode)

Carbon granules
Fixed electrode
ves
d wa
Soun
Signal

Voltage source (battery)

Signal

Hao Tang Speech Signal Analysis 1

Wave Samples

Sound waves are sampled and quantized.

The typical sampling rate is 16,000 Hz. Each sample is
typically a 16-bit integer.
We will use x[t] to denote the t-th sample in the signal x.

Hao Tang Speech Signal Analysis 1

Line Plots and Vectors

−0.53 −0.32 0.02 0.44 · · · 0.18

Hao Tang Speech Signal Analysis 1

Common Preprocessing in the Time Domain

Dithering

y [t] = x[t] + ϵ ϵ ∼ N (0, 1)

Add a little Gaussian noise to the signal.

Avoid the signal being zeros, since we will be taking logarithm
at some point.

Removing DC offset
T
1 X
y [t] = x[t] − x[i]
T
i=1

Ensure that the signal has mean zero.

Most processing assumes the signal to have zero mean.

Hao Tang Speech Signal Analysis 1

Common Preprocessing in the Time Domain

Pre-emphasis

y [t] = x[t] − 0.97 · x[t − 1]

Emphasize the high-frequency components.

We will come back to this after we talked about frequency
analysis.

Hao Tang Speech Signal Analysis 1

Ohm’s Acoustic Law (1843)

If you hear a pitch of a certain fre-

quency, then there must be energy
of that frequency present in the
sound wave.

Hao Tang Speech Signal Analysis 1

Periodicity in Speech

Hao Tang Speech Signal Analysis 1

Periodicity in Speech

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

T −1
X √
X [k] = x[t]e −i2πtk/T for k = 0, . . . , T − 1, and i = −1
t=0

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

T −1
X √
X [k] = x[t]e −i2πtk/T for k = 0, . . . , T − 1, and i = −1
t=0

 
x[0]
∗  x[1] 
X [k] = e i2πk·0/T e i2πk·(T −1)/T

e i2πk·1/T ···
 
 .. 
 . 
x[T − 1]

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

T −1
X √
X [k] = x[t]e −i2πtk/T for k = 0, . . . , T − 1, and i = −1
t=0

 
x[0]
∗  x[1] 
X [k] = e i2πk·0/T e i2πk·(T −1)/T

e i2πk·1/T ···
 
 .. 
 . 
x[T − 1]

vk ≜ e i2πk·0/T e i2πk·(T −1)/T

e i2πk·1/T

···

e iθ = cos θ + i sin θ

Hao Tang Speech Signal Analysis 1

Fourier Basis

R{v0 }

R{v1 }

R{v2 }

R{v3 }

R{v4 }

Hao Tang Speech Signal Analysis 1

Fourier Basis

The larger the k, the higher the frequency.

vk = e i2πk·0/T e i2πk·1/T · · · e i2πk·(T −1)/T

Hao Tang Speech Signal Analysis 1

Fourier Basis

The larger the k, the higher the frequency.

vk = e i2πk·0/T e i2πk·1/T · · · e i2πk·(T −1)/T

The set {v0 /T , v1 /T , . . . , vT −1 /T } is an orthonormal basis.

(
∗ 0 if m ̸= n
vm vn =
T if m = n

Hao Tang Speech Signal Analysis 1

Fourier Basis

The larger the k, the higher the frequency.

vk = e i2πk·0/T e i2πk·1/T · · · e i2πk·(T −1)/T

The set {v0 /T , v1 /T , . . . , vT −1 /T } is an orthonormal basis.

(
∗ 0 if m ̸= n
vm vn =
T if m = n

Fourier transform is a change of coordinates.

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

T
X −1
X [k] = x[t]e −i2πtk/T = vk∗ x
t=0

X [k] is a complex number.

X [k] is a (complex) dot product of a complex sinusoid vk and
the signal x.
X [k] tells us how similar x is to vk .
The large k’s in X are high-frequency components, while the
small k’s in X are low-frequency components.

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

X = F{x}

DFT decomposes a signal into frequency components.

X is also called the spectrum of x.

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1

Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1

Properties of DFT

Linearity

F{a1 x1 + a2 x2 } = a1 F{x1 } + a2 F{x2 }

Hao Tang Speech Signal Analysis 1

Properties of DFT

Linearity

F{a1 x1 + a2 x2 } = a1 F{x1 } + a2 F{x2 }

Shift Theorem

If y [t] = x[t − 1], then Y [k] = e i2πk/T X [k].

Hao Tang Speech Signal Analysis 1

Proof of the Shift Theorem

T
X −1
Y [k] = y [t]e −i2πtk/T
t=0

Hao Tang Speech Signal Analysis 1

Proof of the Shift Theorem

T
X −1
Y [k] = y [t]e −i2πtk/T
t=0
T
X −1
= x[t − 1]e −i2πtk/T
t=0

Hao Tang Speech Signal Analysis 1

Proof of the Shift Theorem

T
X −1
Y [k] = y [t]e −i2πtk/T
t=0
T
X −1
= x[t − 1]e −i2πtk/T
t=0
T
X −1
= e i2πk/T x[t − 1]e −i2π(t−1)k/T
t=0

Hao Tang Speech Signal Analysis 1

Proof of the Shift Theorem

T
X −1
Y [k] = y [t]e −i2πtk/T
t=0
T
X −1
= x[t − 1]e −i2πtk/T
t=0
T
X −1
= e i2πk/T x[t − 1]e −i2π(t−1)k/T
t=0
i2πk/T
=e X [k]

Hao Tang Speech Signal Analysis 1

Pre-emphasis
Definition

y [t] = x[t] − 0.97 · x[t − 1]

Hao Tang Speech Signal Analysis 1

Pre-emphasis
Definition

y [t] = x[t] − 0.97 · x[t − 1]

DFT of pre-emphasis

Y [k] = X [k] − 0.97 · e i2πk/T X [k]

= (1 − 0.97 · e i2πk/T )X [k]

Hao Tang Speech Signal Analysis 1

Pre-emphasis
Definition

y [t] = x[t] − 0.97 · x[t − 1]

DFT of pre-emphasis

Y [k] = X [k] − 0.97 · e i2πk/T X [k]

= (1 − 0.97 · e i2πk/T )X [k]

Hao Tang Speech Signal Analysis 1

Pre-emphasis
Definition

y [t] = x[t] − 0.97 · x[t − 1]

DFT of pre-emphasis

Y [k] = X [k] − 0.97 · e i2πk/T X [k]

= (1 − 0.97 · e i2πk/T )X [k]

In other words, pre-emphasis emphsizes the high-frequency

region.
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
high freq

low high low freq

freq freq

Hao Tang Speech Signal Analysis 1

high freq

low high low freq

freq freq

Hao Tang Speech Signal Analysis 1

Short-Time Fourier Transform

Speech is non-stationary.
Extract spectra with a sliding window, typically with a 25ms
window size and a 10ms hop.
Display the spectra as a heat map.

Hao Tang Speech Signal Analysis 1

Hao Tang Speech Signal Analysis 1
Sound Spectrograph (1946)

Hao Tang Speech Signal Analysis 1

Fast Fourier Transform (1965)

The algorithm that we know of today was proposed in 1965.

It was applied to speech on a computer around 1969.

Hao Tang Speech Signal Analysis 1

Hao Tang Speech Signal Analysis 1
Windowing

Hao Tang Speech Signal Analysis 1

Windowing

Hao Tang Speech Signal Analysis 1

Windowing

Hao Tang Speech Signal Analysis 1

Windowing

Hao Tang Speech Signal Analysis 1

Windowing

Hao Tang Speech Signal Analysis 1

Windowing

y [t] = x[t] · w [t]

Hamming Hann Rectangle

The signal w is called a window.

Windowing is elementwise product.

Hao Tang Speech Signal Analysis 1

Hao Tang Speech Signal Analysis 1
windowing

DFT

Hao Tang Speech Signal Analysis 1

Spectrogram

dithering, removing DC offset, pre-emphasis

windowing
Discrete Fourier transform (DFT)
Short-time Fourier transform (STFT)

Hao Tang Speech Signal Analysis 1

Chapter 1–5, Oppenheim, Willsky, and Nawab, “Signals and

Systems,” 1997

Chapter 2, O’Shaughnessy, “Speech Communications: Human

and Machine,” 2000

Hao Tang Speech Signal Analysis 1

Spectral Modeling and Signal Processing Intro421
100% (2)
Spectral Modeling and Signal Processing Intro421
35 pages
Ocean Maths Homework
100% (1)
Ocean Maths Homework
8 pages
Liveloud Lyrics 2021
No ratings yet
Liveloud Lyrics 2021
606 pages
Neuromuscular Assessments of Form and Function (Neuromethods, 204) (Philip J. Atherton (Editor) Etc.) (Z-Library)
No ratings yet
Neuromuscular Assessments of Form and Function (Neuromethods, 204) (Philip J. Atherton (Editor) Etc.) (Z-Library)
323 pages
Chap 11 TLH DFT
No ratings yet
Chap 11 TLH DFT
38 pages
Ifr Cross Country Flight Planning Guide Aerodynamic
100% (2)
Ifr Cross Country Flight Planning Guide Aerodynamic
4 pages
Laphormur F7 - Rieter Manual
No ratings yet
Laphormur F7 - Rieter Manual
391 pages
Sembagavally A/p Murugason V Tee Seng Hock (Evrol Mariette Peters JC)
No ratings yet
Sembagavally A/p Murugason V Tee Seng Hock (Evrol Mariette Peters JC)
22 pages
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
100% (1)
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
32 pages
Role of Statistics in Psychology
No ratings yet
Role of Statistics in Psychology
4 pages
HP250 G7 Laptop PDF
No ratings yet
HP250 G7 Laptop PDF
4 pages
Introduction To Signal Processing
100% (1)
Introduction To Signal Processing
162 pages
DSP
No ratings yet
DSP
539 pages
DSP Chapter8 PDF
No ratings yet
DSP Chapter8 PDF
66 pages
Digital Signal Processing: Course
No ratings yet
Digital Signal Processing: Course
47 pages
Numerical Measures To Describe Data
No ratings yet
Numerical Measures To Describe Data
103 pages
HW1 Solution
No ratings yet
HW1 Solution
7 pages
Chapter6 - SPEECH SIGNAL PROCESSING
No ratings yet
Chapter6 - SPEECH SIGNAL PROCESSING
54 pages
CH 03
No ratings yet
CH 03
38 pages
Signal Processing First: Periodic Signals, Harmonics & Time-Varying Sinusoids
No ratings yet
Signal Processing First: Periodic Signals, Harmonics & Time-Varying Sinusoids
33 pages
XLSTH C7
No ratings yet
XLSTH C7
38 pages
Speech Analisys
No ratings yet
Speech Analisys
56 pages
Signals and Systems Using Matlab Chapter 4 - Frequency Analysis: The Fourier Series
No ratings yet
Signals and Systems Using Matlab Chapter 4 - Frequency Analysis: The Fourier Series
22 pages
l4n JN Uhbh Hiunun Hbinun
No ratings yet
l4n JN Uhbh Hiunun Hbinun
36 pages
M8 - Discrete Time Fourier Transform
No ratings yet
M8 - Discrete Time Fourier Transform
30 pages
Optimum Semiconductors For High-Power Electronics: Loss Caused
No ratings yet
Optimum Semiconductors For High-Power Electronics: Loss Caused
13 pages
EDU431 Mega For Final Term Obj+Subj All in 1 File by Everblue August2023
No ratings yet
EDU431 Mega For Final Term Obj+Subj All in 1 File by Everblue August2023
193 pages
Time-Series Econometrics
No ratings yet
Time-Series Econometrics
36 pages
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
No ratings yet
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
50 pages
3.2 Automatic Speech Recognition
No ratings yet
3.2 Automatic Speech Recognition
151 pages
Frequency Domain Characterisation of Signals
No ratings yet
Frequency Domain Characterisation of Signals
49 pages
Module2 SSP
No ratings yet
Module2 SSP
70 pages
CMP4101 - 4 - Frequency Domain Signal Processing - Part II
No ratings yet
CMP4101 - 4 - Frequency Domain Signal Processing - Part II
80 pages
ADC Week 03
No ratings yet
ADC Week 03
27 pages
Corex Delivery
No ratings yet
Corex Delivery
37 pages
The Salvatore Saga Part
No ratings yet
The Salvatore Saga Part
45 pages
Bruh
No ratings yet
Bruh
28 pages
Ch4. Fourier Analysis For Continuous-Time Signals
No ratings yet
Ch4. Fourier Analysis For Continuous-Time Signals
44 pages
Pahal Solar PVT
No ratings yet
Pahal Solar PVT
21 pages
02 Electrochemistry Ques. Final E PDF
No ratings yet
02 Electrochemistry Ques. Final E PDF
21 pages
DT0400002 en - Fe FRENIC Lift Asíncrono - Síncrono r0b
No ratings yet
DT0400002 en - Fe FRENIC Lift Asíncrono - Síncrono r0b
24 pages
Lecture 2
No ratings yet
Lecture 2
30 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
DSP 1
No ratings yet
DSP 1
9 pages
Fourier Series Expansion of Periodic Signal: (With Period of T)
No ratings yet
Fourier Series Expansion of Periodic Signal: (With Period of T)
45 pages
Audio Noise Detection
No ratings yet
Audio Noise Detection
29 pages
Speech and Audio Signal Processing ECE554 - Lec - 5 STFT Analysis v2.1
No ratings yet
Speech and Audio Signal Processing ECE554 - Lec - 5 STFT Analysis v2.1
17 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
EC256 Lab Task3
No ratings yet
EC256 Lab Task3
10 pages
Aist2010 03 Analysis
No ratings yet
Aist2010 03 Analysis
22 pages
Lectures PPT2020 2021 Fourier Series FT Laplace Transform Lectures
No ratings yet
Lectures PPT2020 2021 Fourier Series FT Laplace Transform Lectures
38 pages
Bias Variance Annotated
No ratings yet
Bias Variance Annotated
73 pages
Assignment 1 Name: Vinay Thakar Roll NO: P20EL002 Sub: Modelling of Machines and DC Drives
No ratings yet
Assignment 1 Name: Vinay Thakar Roll NO: P20EL002 Sub: Modelling of Machines and DC Drives
12 pages
Speech and Audio Signal Processing ECE554 - Lec - 5 STFT Analysis v2.0
No ratings yet
Speech and Audio Signal Processing ECE554 - Lec - 5 STFT Analysis v2.0
10 pages
03 Audio
No ratings yet
03 Audio
32 pages
4 Acoustic Signal Analysis
No ratings yet
4 Acoustic Signal Analysis
25 pages
Voice Signal Processing For Speech Synthesis: June 2006
No ratings yet
Voice Signal Processing For Speech Synthesis: June 2006
6 pages
Wind Energy
No ratings yet
Wind Energy
26 pages
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
Dynamics Problem Solving
No ratings yet
Dynamics Problem Solving
6 pages
Filter Banks, Short-Time Fourier Analysis, and The Phase Vocoder
No ratings yet
Filter Banks, Short-Time Fourier Analysis, and The Phase Vocoder
7 pages
APPFDL
No ratings yet
APPFDL
9 pages
Safety Data Sheet: 1. Identification of The Substance/Mixture and The Supplier
No ratings yet
Safety Data Sheet: 1. Identification of The Substance/Mixture and The Supplier
8 pages
395 SrivastavaS
No ratings yet
395 SrivastavaS
10 pages
EECS4214 Lab 1
No ratings yet
EECS4214 Lab 1
4 pages
Signal Processing First Reading Assignments: This Lecture
No ratings yet
Signal Processing First Reading Assignments: This Lecture
9 pages
1 Pre-Lab: ECE 2026 Fall 2018 Lab #5: Spectrograms: Harmonic Lines & Aliasing
No ratings yet
1 Pre-Lab: ECE 2026 Fall 2018 Lab #5: Spectrograms: Harmonic Lines & Aliasing
9 pages
Silent Songs Possible Kcse Questions Set 1
No ratings yet
Silent Songs Possible Kcse Questions Set 1
5 pages
Operating - Station Master
No ratings yet
Operating - Station Master
9 pages
Voice Signal Processing For Speech Synthesis: June 2006
No ratings yet
Voice Signal Processing For Speech Synthesis: June 2006
6 pages
Lecture3 1
No ratings yet
Lecture3 1
7 pages
DSP 5
No ratings yet
DSP 5
6 pages
SP Question Bank
No ratings yet
SP Question Bank
2 pages
Lab 8
No ratings yet
Lab 8
5 pages
Ece503 ps03
No ratings yet
Ece503 ps03
5 pages
Application Guide
No ratings yet
Application Guide
4 pages
Problem Solving and Conceptual Understanding
No ratings yet
Problem Solving and Conceptual Understanding
4 pages
Sheet 2
No ratings yet
Sheet 2
3 pages
Speech Assignment
No ratings yet
Speech Assignment
4 pages
Fourier Ep23bt006
No ratings yet
Fourier Ep23bt006
4 pages
WriteupApplication 86
No ratings yet
WriteupApplication 86
4 pages
Aqa Accn4 W SQP 07
No ratings yet
Aqa Accn4 W SQP 07
6 pages
Biosignal Processing Final Exam Updated
No ratings yet
Biosignal Processing Final Exam Updated
3 pages
Central University of Haryana: Temporary Camp Office: Govt. B.Ed. College Building, Narnaul (Distt. Mahendergarh) Haryana
No ratings yet
Central University of Haryana: Temporary Camp Office: Govt. B.Ed. College Building, Narnaul (Distt. Mahendergarh) Haryana
7 pages
SP Question Bank
No ratings yet
SP Question Bank
2 pages
SK 58
No ratings yet
SK 58
2 pages
Toolbox Talks - Overhead Power Lines
No ratings yet
Toolbox Talks - Overhead Power Lines
2 pages
64709b0902cd9 RN Ati Capstone Proctored Comprehensive Assessment 2019 B Ati Comprehensive Practice Test B Best Study Guide Version With Complete Solution 2 Revised (1) - 2
No ratings yet
64709b0902cd9 RN Ati Capstone Proctored Comprehensive Assessment 2019 B Ati Comprehensive Practice Test B Best Study Guide Version With Complete Solution 2 Revised (1) - 2
1 page
Feelings When Your Needs Are Satisfied: Engaged
No ratings yet
Feelings When Your Needs Are Satisfied: Engaged
4 pages
Laplace Transforms Essentials
From Everand
Laplace Transforms Essentials
Morteza Shafii-Mousavi
3.5/5 (3)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet