0% found this document useful (0 votes)
39 views67 pages

Asr02 Signal

Speech recognition

Uploaded by

vinay thakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views67 pages

Asr02 Signal

Speech recognition

Uploaded by

vinay thakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Speech Signal Analysis 1

Hao Tang

Automatic Speech Recognition—ASR Lecture 2


20 January 2022

Hao Tang Speech Signal Analysis 1


Announcement

Please sign up for Piazza


(https://piazza.com/ed.ac.uk/spring2022/infr11033)

No labs next week. The labs will start on week 3.

Hao Tang Speech Signal Analysis 1


waveform

Signal Analysis

acoustic
features

ASR system

Hao Tang Speech Signal Analysis 1


waveform

Speech is part of sound waves.


If we want to study speech, we need to be able to record,
replay, and visualize speech.

Hao Tang Speech Signal Analysis 1


Phonautograph (1857)

Hao Tang Speech Signal Analysis 1


Phonograph (1877)

Hao Tang Speech Signal Analysis 1


Carbon Microphone (1877)

Diaphragm (flexible electrode)


Carbon granules
Fixed electrode
ves
d wa
Soun
Signal

Voltage source (battery)

Signal

Hao Tang Speech Signal Analysis 1


Wave Samples

Sound waves are sampled and quantized.


The typical sampling rate is 16,000 Hz. Each sample is
typically a 16-bit integer.
We will use x[t] to denote the t-th sample in the signal x.

Hao Tang Speech Signal Analysis 1


Line Plots and Vectors

 
−0.53 −0.32 0.02 0.44 · · · 0.18

Hao Tang Speech Signal Analysis 1


Common Preprocessing in the Time Domain

Dithering

y [t] = x[t] + ϵ ϵ ∼ N (0, 1)

Add a little Gaussian noise to the signal.


Avoid the signal being zeros, since we will be taking logarithm
at some point.

Removing DC offset
T
1 X
y [t] = x[t] − x[i]
T
i=1

Ensure that the signal has mean zero.


Most processing assumes the signal to have zero mean.

Hao Tang Speech Signal Analysis 1


Common Preprocessing in the Time Domain

Pre-emphasis

y [t] = x[t] − 0.97 · x[t − 1]

Emphasize the high-frequency components.


We will come back to this after we talked about frequency
analysis.

Hao Tang Speech Signal Analysis 1


Ohm’s Acoustic Law (1843)

If you hear a pitch of a certain fre-


quency, then there must be energy
of that frequency present in the
sound wave.

Hao Tang Speech Signal Analysis 1


Periodicity in Speech

Hao Tang Speech Signal Analysis 1


Periodicity in Speech

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

T −1
X √
X [k] = x[t]e −i2πtk/T for k = 0, . . . , T − 1, and i = −1
t=0

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

T −1
X √
X [k] = x[t]e −i2πtk/T for k = 0, . . . , T − 1, and i = −1
t=0

 
x[0]
∗  x[1] 
X [k] = e i2πk·0/T e i2πk·(T −1)/T

e i2πk·1/T ···
 
 .. 
 . 
x[T − 1]

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

T −1
X √
X [k] = x[t]e −i2πtk/T for k = 0, . . . , T − 1, and i = −1
t=0

 
x[0]
∗  x[1] 
X [k] = e i2πk·0/T e i2πk·(T −1)/T

e i2πk·1/T ···
 
 .. 
 . 
x[T − 1]

vk ≜ e i2πk·0/T e i2πk·(T −1)/T



e i2πk·1/T

···

e iθ = cos θ + i sin θ

Hao Tang Speech Signal Analysis 1


Fourier Basis

R{v0 }

R{v1 }

R{v2 }

R{v3 }

R{v4 }

Hao Tang Speech Signal Analysis 1


Fourier Basis

The larger the k, the higher the frequency.

vk = e i2πk·0/T e i2πk·1/T · · · e i2πk·(T −1)/T


 

Hao Tang Speech Signal Analysis 1


Fourier Basis

The larger the k, the higher the frequency.

vk = e i2πk·0/T e i2πk·1/T · · · e i2πk·(T −1)/T


 

The set {v0 /T , v1 /T , . . . , vT −1 /T } is an orthonormal basis.


(
∗ 0 if m ̸= n
vm vn =
T if m = n

Hao Tang Speech Signal Analysis 1


Fourier Basis

The larger the k, the higher the frequency.

vk = e i2πk·0/T e i2πk·1/T · · · e i2πk·(T −1)/T


 

The set {v0 /T , v1 /T , . . . , vT −1 /T } is an orthonormal basis.


(
∗ 0 if m ̸= n
vm vn =
T if m = n

Fourier transform is a change of coordinates.

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

T
X −1
X [k] = x[t]e −i2πtk/T = vk∗ x
t=0

X [k] is a complex number.


X [k] is a (complex) dot product of a complex sinusoid vk and
the signal x.
X [k] tells us how similar x is to vk .
The large k’s in X are high-frequency components, while the
small k’s in X are low-frequency components.

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

X = F{x}

DFT decomposes a signal into frequency components.


X is also called the spectrum of x.

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1


Discrete Fourier Transform

Hao Tang Speech Signal Analysis 1


Properties of DFT

Linearity

F{a1 x1 + a2 x2 } = a1 F{x1 } + a2 F{x2 }

Hao Tang Speech Signal Analysis 1


Properties of DFT

Linearity

F{a1 x1 + a2 x2 } = a1 F{x1 } + a2 F{x2 }

Shift Theorem

If y [t] = x[t − 1], then Y [k] = e i2πk/T X [k].

Hao Tang Speech Signal Analysis 1


Proof of the Shift Theorem

T
X −1
Y [k] = y [t]e −i2πtk/T
t=0

Hao Tang Speech Signal Analysis 1


Proof of the Shift Theorem

T
X −1
Y [k] = y [t]e −i2πtk/T
t=0
T
X −1
= x[t − 1]e −i2πtk/T
t=0

Hao Tang Speech Signal Analysis 1


Proof of the Shift Theorem

T
X −1
Y [k] = y [t]e −i2πtk/T
t=0
T
X −1
= x[t − 1]e −i2πtk/T
t=0
T
X −1
= e i2πk/T x[t − 1]e −i2π(t−1)k/T
t=0

Hao Tang Speech Signal Analysis 1


Proof of the Shift Theorem

T
X −1
Y [k] = y [t]e −i2πtk/T
t=0
T
X −1
= x[t − 1]e −i2πtk/T
t=0
T
X −1
= e i2πk/T x[t − 1]e −i2π(t−1)k/T
t=0
i2πk/T
=e X [k]

Hao Tang Speech Signal Analysis 1


Pre-emphasis
Definition

y [t] = x[t] − 0.97 · x[t − 1]

Hao Tang Speech Signal Analysis 1


Pre-emphasis
Definition

y [t] = x[t] − 0.97 · x[t − 1]

DFT of pre-emphasis

Y [k] = X [k] − 0.97 · e i2πk/T X [k]


= (1 − 0.97 · e i2πk/T )X [k]

Hao Tang Speech Signal Analysis 1


Pre-emphasis
Definition

y [t] = x[t] − 0.97 · x[t − 1]

DFT of pre-emphasis

Y [k] = X [k] − 0.97 · e i2πk/T X [k]


= (1 − 0.97 · e i2πk/T )X [k]

Hao Tang Speech Signal Analysis 1


Pre-emphasis
Definition

y [t] = x[t] − 0.97 · x[t − 1]

DFT of pre-emphasis

Y [k] = X [k] − 0.97 · e i2πk/T X [k]


= (1 − 0.97 · e i2πk/T )X [k]

In other words, pre-emphasis emphsizes the high-frequency


region.
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
Hao Tang Speech Signal Analysis 1
high freq

low high low freq


freq freq

Hao Tang Speech Signal Analysis 1


high freq

low high low freq


freq freq

Hao Tang Speech Signal Analysis 1


Short-Time Fourier Transform

Speech is non-stationary.
Extract spectra with a sliding window, typically with a 25ms
window size and a 10ms hop.
Display the spectra as a heat map.

Hao Tang Speech Signal Analysis 1


Hao Tang Speech Signal Analysis 1
Sound Spectrograph (1946)

Hao Tang Speech Signal Analysis 1


Fast Fourier Transform (1965)

The algorithm that we know of today was proposed in 1965.

It was applied to speech on a computer around 1969.

Hao Tang Speech Signal Analysis 1


Hao Tang Speech Signal Analysis 1
Windowing

Hao Tang Speech Signal Analysis 1


Windowing

Hao Tang Speech Signal Analysis 1


Windowing

Hao Tang Speech Signal Analysis 1


Windowing

Hao Tang Speech Signal Analysis 1


Windowing

Hao Tang Speech Signal Analysis 1


Windowing

y [t] = x[t] · w [t]

Hamming Hann Rectangle

The signal w is called a window.


Windowing is elementwise product.

Hao Tang Speech Signal Analysis 1


Hao Tang Speech Signal Analysis 1
windowing

DFT

Hao Tang Speech Signal Analysis 1


Spectrogram

dithering, removing DC offset, pre-emphasis


windowing
Discrete Fourier transform (DFT)
Short-time Fourier transform (STFT)

Hao Tang Speech Signal Analysis 1


Further Reading

Chapter 1–5, Oppenheim, Willsky, and Nawab, “Signals and


Systems,” 1997

Chapter 2, O’Shaughnessy, “Speech Communications: Human


and Machine,” 2000

Hao Tang Speech Signal Analysis 1

You might also like