0% found this document useful (0 votes)

97 views54 pages

Chapter6 - SPEECH SIGNAL PROCESSING

The document discusses speech signal processing. It covers an introduction to speech signals, including their basic properties and overview. Time-domain features and applications are also covered, including voiced/unvoiced/silence segmentation and pitch estimation. Voiced/unvoiced discrimination and pitch estimation algorithms are discussed in detail.

Uploaded by

Quyền Phan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views54 pages

Chapter6 - SPEECH SIGNAL PROCESSING

Uploaded by

Quyền Phan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

COURSE:

DIGITAL SIGNAL PROCESSING

Instructor: Ninh Khanh Duy

CHAPTER 6:
SPEECH SIGNAL PROCESSING

Lecture 6.1: Introduction to speech signals

Lecture 6.2: Time-domain features and applications
Lecture 6.3: Frequency-domain features and applications
Duration: 6 periods
Lecture 6.1
Introduction to speech signals

! Outline:
1. Overview of speech signals
2. Basic properties of speech signals
Overview of speech signals

! Speech signals are obtained by a digital recording process

(sampling, quantizing, coding) of acoustic waves

“Các bạn trẻ …”

! Speech signals encode messages of speakers, which include

linguistic information such as phonemes, sentence types, etc
Overview of speech signals

! Acoustic wave at mouth and nose is the output of the air low
going from lung through human vocal tract
Mechanisms of phones and voicing

Air flow

/s/ /a/
Vocal
cords/folds

" Speech (Output signal): include different phones and voicing

" Resonance cavities (System) ⇒ diff. phones: /a/, /m/, /s/, /z/
" Air flow after vocal cords (Input signal) ⇒ diff. voicing:
• Vocal cords vibrate: Quasi-periodic pulses ⇒ voiced phones: /a/, /m/
• Vocal cords close: Turbulence ⇒ unvoiced phones: /s/, /z/, /p/, /k/
Lecture 6.1
Introduction to speech signals

! Outline:
1. Overview of speech signals
2. Basic properties of speech signals
Basic properties of speech signals

! Randomness
" Speech (like most real-world signals) is random: impossible to
predict with certainty their future values from past values
# Deterministic signal: for each value of time we have a rule which
enables us to determine the precise value of the signal

" The value of a signal at any instant of time x(t) is a random

variable
# The actual value of a signal is only known after observation

" A signal is assumed to be generated by a random process with a

structure that can be characterized and described
Basic properties of speech signals

! Variability
" Depend on different microphones
Basic properties of speech signals

! Variability
" Depend on different speakers (voices)
Basic properties of speech signals

! Variability
" Depend on dif. physical/emotional states of the same speaker
Basic properties of speech signals

! Characteristics are slowly varying in time

" Time/Frequency related features are quite stable within short
segments of 10-50 ms (duration to pronounce a phoneme)
Short-time processing technique

! Divide a signal into consecutive frames, each having a fixed duration

(e.g., 25 ms)

! Extract features frame-by-frame

! Combine extracted features into feature sequence (time axis is now

frame index)
Homework

1. Read Section 2 & 3 of “CS425 Audio and Speech Processing_Hodgkinson_2012”

2. Write a program to compute the energy and power of a recorded signal

following the formulas (2.1) & (2.2) in page 25 of the textbook
“Applied Digital Signal Processing -Theory and Practice_Manolakis-Ingle_2011”
CHAPTER 6:
SPEECH SIGNAL PROCESSING

Lecture 6.1: Introduction to speech signals

Lecture 6.2: Time-domain features and applications
Lecture 6.3: Frequency-domain features and applications
Duration: 6 periods
Lecture 6.2
Time-domain features and applications

! Outline:
1. Voiced/Unvoiced/Silence segmentation
2. Time-domain pitch estimation
Introduction to
Voiced/Unvoiced/Silence classification

! Recorded signal include speech & silence regions

" Speech: regions exhibit voice activities (producing phones)

" Silence: regions exhibit no phone except environmental noise

Introduction to
Voiced/Unvoiced/Silence classification

! A speech region is divided into voiced & unvoiced segments

" Voiced: exhibit strong periodicity, resulted by vibration of vocal folds

" Unvoiced: exhibit weak/no periodicity, resulted by closed vocal folds

Speech/Silence discrimination

! Problem statement
" Input: a signal

" Output: the signal with vertical boundaries between speech and
silence regions

! Constraint
" The minimum length of silence region is 300ms to exclude very
short pauses when speaking
Speech/Silence discrimination

! Observation

Level of silence is mostly lesser than that of speech segments,

except when
" Environmental noise may has level higer than that of unvoiced
fricatives (e.g., /s/, /z/)

" Recording environment has a high noise level (or low Signal-to-
Noise Ratio (SNR))

$ Use signal level as the discrimination criterion

Speech/Silence discrimination

! Candidate attribute functions

" Short-Time Energy (STE): sum of square of the waveform values
over a finite number of samples belonging to a frame (20-25 ms)

n: frame index
m: sample index
N: frame length (samples)
Speech/Silence discrimination

! Candidate attribute functions

" Magnitude Average (MA): sum of absolute of the waveform values
over a finite number of samples belonging to a frame

n: frame index
m: sample index
N: frame length (samples)
" For practical uses, we rather use the N values centered around n,

from n−N/2 to n+N/2−1

Speech/Silence discrimination

! Candidate attribute functions

" Short-Time Energy (STE) vs. Magnitude Average (MA)

Both functions reflect the waveform envelope, but STE emphasizes large values
Speech/Silence discrimination

! Algorithm in general
" Based on some threshold of the attribute function to discriminate a
frame as speech or silence

" This threshold is to be found based on given training signals with

different environmental noise levels
Speech/Silence discrimination

! Algorithm to find the threshold

" Can be set manually or automatically

" Should be based on the distribution (histogram) of feature data

(STE/MA) of frames belong to speech or silence (no label needed),
or based on a binary search (label needed)

" Or should be based on simple statistics (mean & standard

deviation) (label needed) (assuming normal distribution)
Voiced/Unvoiced discrimination

! Problem statement
" Input: a signal including only speech region (assuming no silence)

" Output: the signal with vertical boundaries between voiced and
unvoiced segments

! If input signal includes some silence $ no problem because

silence is non-periodic & could be considered as unvoiced
Voiced/Unvoiced discrimination

! Same idea as previous task

" Look for attributes that characterise contrastingly the states to
discriminate

" Setting for each state a threshold based on training signals

! Different point
" Combine several features to discriminate voiced vs. unvoiced
Voiced/Unvoiced discrimination

! Discriminatory attributes and functions

" STE or MA: unvoiced segments has level generally lesser than
voiced segments
Voiced/Unvoiced discrimination

! Discriminatory attributes and functions

" Zero-Crossing Rate (ZCR): the rate at which the waveform crosses
the zero-axis

" Unvoiced segments exhibit a denser waveform, more turbulent

than voiced segments $ UV has significantly higher ZCR than V
Voiced/Unvoiced discrimination

! Discriminatory attributes and functions

" Zero-Crossing Rate (ZCR): the rate at which the waveform crosses
the zero-axis

n: frame index
m: sample index
N: frame length
Voiced/Unvoiced discrimination

! Normalisation of attribute functions

" Useful when combine (e.g., adding) multiple attribute functions into
one

" Then a voicing threshold can be set for the composite function

" Otherwise, must set various thresholds for dif. attribute functions
Lecture 6.2
Time-domain features and applications

! Outline:
1. Voiced/Unvoiced/Silence discrimination
2. Time-domain pitch estimation
Pitch or Fundamental frequency (F0)

! A feature dedicated only for periodic signals (e.g., voiced segments)

! Definition

" Fundamental frequency (F0), inverse of the fundamental period, is

the number of signal cycles per seconds
• For speech: F0 is actually the vibration frequency of vocal cords

" Pitch is the perceptual counterpart of F0 (e.g, high/low-pitched

voice)

! Importance

" Pitch contour conveys the intonation of an utterance (rising/falling)

" For Vietnamese: 06 tones (ngang, huyền, ngã, hỏi, sắc, nặng)
Pitch/F0 estimation

! Problem statement
" Input: a signal (may including silence/voiced/unvoiced segments)

" Output: F0 contour of the signal (a F0 value for each frame)

! Constraint
" Valid F0 values for adult voices is from 70Hz to 400 Hz
Pitch/F0 estimation

! An example F0 contour extracted from signal

Pitch/F0 estimation

! Two time-domain methods

" Short-Time Autocorrelation function (ACF)

" Short-Time Average Magnitude Difference Function (AMDF)

! Both based on the following property of periodic signal

NT : pitch period/fundamental period (in samples)

! Voiced segments of speech are quasi-periodic

$ “=“ never occurs

Autocorrelation function (ACF)

! The ACF of a signal gives an indication of how alike itself a

signal is when shifted

! Definition

n: lag/shift
m: sample index

! Application: for a periodic signal x, the ACF is globally

maximal at every lag that is an integer multiple of the period
" For quasi-periodic signal$ local maximal (peak)
Autocorrelation function (ACF)

! Short-time ACF of a frame:

n: lag (samples)
m: sample index
N: frame length (samples)
! The ACF should be normalized to obtain maximum value of 1
by dividing by largest autocorrelation value at lag zero xx[0]

! Complexity per frame: O(N2)

Short-Time Autocorrelation function

(Kondoz, 2004)
Short-Time Autocorrelation function

The normalized height of highest local peak is propotional

to degree of voicing $ can be used for V/U decision
Algorithm
(for a frame)

(Trần Văn Tâm, 2019)

Short-Time Autocorrelation function

! Autocorrelation peak detection

! Determine a suitable threshod for V/U decision

! Reducing the scope of the search

" F0 is from 70Hz to 400 Hz $ searching range of maximum lag
Short-Time Autocorrelation function

! Be careful with virtual pitch values

Lucky frame $ correct F0

Short-Time Autocorrelation function

! Be careful with virtual pitch values

Unlucky frame $ incorrect F0

Average Magnitude Difference Function

! The AMDF of a signal gives an indication of how different a

signal itself is compared to its shifted version

! Definition

(n: lag, m: sample index)

! Application: for a periodic signal x, the AMDF is zero at every

lag that is an integer multiple of the period of the waveform
" For quasi-periodic signal$ local minimal (dip)
Average Magnitude Difference Function
(Ex. w/ 4 frames)

(Kondoz, 2004)
Average Magnitude Difference Function

! Short-time AMDF of a frame

n: lag (samples)
N: frame length (samples)

! Computationally much cheaper than the ACF

! Have similar algorithm & problems to the ACF

Homework
Các thành viên mỗi nhóm thảo luận và phân công nhiệm vụ,
ghi rõ SV nào làm task nào (ko được trùng nhau):
- 1a (phân đoạn speech vs. silence)
- 1b (phân đoạn voiced vs. unvoiced)
- 2a (tính F0 dùng hàm tự tương quan)
- 2b (tính F0 dùng hàm AMDF).
Nhập task (1a/1b/2a/2b) vào link danh sách nhóm.
Hạn cuối: trước buổi học tuần sau.
Sau hạn này SV nào ko nhập coi như ko tham gia làm BT
nhóm và nhận 0 điểm thi GK.
CHAPTER 6:
SPEECH SIGNAL PROCESSING

Lecture 6.1: Introduction to speech signals

Lecture 6.2: Time-domain features and applications
Lecture 6.3: Frequency-domain features and applications
Duration: 6 periods
Lecture 6.3
Frequency-domain features & applications

! Outline:
1. Frequency-domain pitch (F0) estimation
Theory of CTFS

A periodic signal x(t) has a line spectrum with uniform spacing

F0 = 1/T0 (F0: fundamental frequency of x(t))
F0 = 1/T0
Main idea

Spectrum of a periodic signal has a harmonic structure with the

distance between harmonics being the F0

$ The frame-based solution includes 2 steps:

! Estimate the spectrum using FFT (fast computation of DFT)

! Detect the spacing of adjacent harmonics (i.e., spectral lines)

Spectrum estimation using FFT

! Important parameters when using function fft(x,N)

" Window function to reduce spectral leakage (Hamm/Hann)

" # of FFT points (# of frequency-domain sampling points)

% Spectral resolution = Sampling frequency / N

% larger N to have better resolution $ more accurate F0 estimates

% But too large $ over-detailed spectrum $ harder to detect harmonics

% Should be chosen with high care

! Log magnitude spectrum should be used for low dynamic

range between spectral peaks
Harmonics spacing detection

! Detect all of harmonic peaks based on estimated spectrum

! Measure the F0 as either the common divisor of these

harmonics or the spacing of adjacent harmonics

! Note:
" Harmonic peaks appear clearer in low-frequency range (<2 kHz)

! Algorithm:
" Self-proposed (searching for spectral peaks in low-frequency range)

" Harmonic product spectrum (HPS)

Biometric Voice Recognition
100% (1)
Biometric Voice Recognition
33 pages
Cinematography: Lighting
88% (24)
Cinematography: Lighting
77 pages
Land Use & Zoning: Line & Grade
No ratings yet
Land Use & Zoning: Line & Grade
19 pages
Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
Fundamentals of Meter Provers and Proving Methods
100% (1)
Fundamentals of Meter Provers and Proving Methods
9 pages
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
67% (3)
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
35 pages
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
100% (1)
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
32 pages
3.2 Automatic Speech Recognition
No ratings yet
3.2 Automatic Speech Recognition
151 pages
Tiếng Anh Chuyên Nghành Điện Tử - Viễn Thông
No ratings yet
Tiếng Anh Chuyên Nghành Điện Tử - Viễn Thông
181 pages
Audproc 2
No ratings yet
Audproc 2
40 pages
Water Energy Generator US20060180473A1
100% (1)
Water Energy Generator US20060180473A1
26 pages
Module2 SSP
No ratings yet
Module2 SSP
70 pages
An Automatic Speaker Recognition System
100% (1)
An Automatic Speaker Recognition System
11 pages
Spectral Energy Based Voice Activity Detection For Real-Time Voice Interface
No ratings yet
Spectral Energy Based Voice Activity Detection For Real-Time Voice Interface
17 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
Introduction (UCS749)
No ratings yet
Introduction (UCS749)
59 pages
Towards Neurocomputational Speech and So
No ratings yet
Towards Neurocomputational Speech and So
279 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
30 pages
Human Speech Communication
No ratings yet
Human Speech Communication
44 pages
SVP (1-5) Units Notes 4th Yr CSM
No ratings yet
SVP (1-5) Units Notes 4th Yr CSM
35 pages
Time Dependent Processing of Speech
No ratings yet
Time Dependent Processing of Speech
26 pages
Time-Domain Methods For Speech Processing
No ratings yet
Time-Domain Methods For Speech Processing
77 pages
Speech Recognition Using DSP PDF
No ratings yet
Speech Recognition Using DSP PDF
32 pages
Iot Project Report
No ratings yet
Iot Project Report
15 pages
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
No ratings yet
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
50 pages
Lectures 7-8 Winter 2012
No ratings yet
Lectures 7-8 Winter 2012
73 pages
Lab 9 A
No ratings yet
Lab 9 A
12 pages
Steve Harris+Joern Nettingsmeier-Audio Engineering
No ratings yet
Steve Harris+Joern Nettingsmeier-Audio Engineering
57 pages
Information Optimization For Speaker Recognition Using Correlation Functions
No ratings yet
Information Optimization For Speaker Recognition Using Correlation Functions
11 pages
Lec 65
No ratings yet
Lec 65
11 pages
Hands-On Lab On Speech Processing-Time-domain Processing - 2021
No ratings yet
Hands-On Lab On Speech Processing-Time-domain Processing - 2021
11 pages
Lec2 Audition
No ratings yet
Lec2 Audition
37 pages
Acoustic Analysis
No ratings yet
Acoustic Analysis
11 pages
Automatic Identification of Silence, Unvoiced and Voiced Chunks in Speech
No ratings yet
Automatic Identification of Silence, Unvoiced and Voiced Chunks in Speech
10 pages
Voiced/Unvoiced Decision For Speech Signals Based On Zero-Crossing Rate and Energy
No ratings yet
Voiced/Unvoiced Decision For Speech Signals Based On Zero-Crossing Rate and Energy
5 pages
A Tutorial To Extract The Pitch in Speech Signals Using Autocorrelation
No ratings yet
A Tutorial To Extract The Pitch in Speech Signals Using Autocorrelation
11 pages
Speech Acoustics Project
No ratings yet
Speech Acoustics Project
22 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
15 pages
A New Silence Removal and Endpoint Detection Algorithm For Speech and Speaker Recognition Applications
No ratings yet
A New Silence Removal and Endpoint Detection Algorithm For Speech and Speaker Recognition Applications
5 pages
Week 5 Silent Discrimination
No ratings yet
Week 5 Silent Discrimination
7 pages
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
No ratings yet
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
30 pages
Review Analysis of Real World Noise: Dheeraj Joshi, Prashant Moud
No ratings yet
Review Analysis of Real World Noise: Dheeraj Joshi, Prashant Moud
6 pages
Avionics
100% (1)
Avionics
43 pages
Pitch Detection of Speech Signals (Project Report)
No ratings yet
Pitch Detection of Speech Signals (Project Report)
9 pages
Am-Demodulation of Speech Spectra and Its Application To Noise Robust Speech Recognition
No ratings yet
Am-Demodulation of Speech Spectra and Its Application To Noise Robust Speech Recognition
4 pages
Digital Signal Processing: Course
No ratings yet
Digital Signal Processing: Course
47 pages
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
No ratings yet
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
31 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Unit 4 NLP Kcs072
No ratings yet
Unit 4 NLP Kcs072
9 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
29 pages
Algorithms For Speech Processing
No ratings yet
Algorithms For Speech Processing
18 pages
Hydrocracking Technology
100% (1)
Hydrocracking Technology
12 pages
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
No ratings yet
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
22 pages
6.3 Time-Domain Parameters
No ratings yet
6.3 Time-Domain Parameters
7 pages
Week 4 Auditory Perception & Time Domain Parameters
No ratings yet
Week 4 Auditory Perception & Time Domain Parameters
8 pages
Voice Signal Processing For Speech Synthesis: June 2006
No ratings yet
Voice Signal Processing For Speech Synthesis: June 2006
6 pages
Discrete Time Processing of Speech Signa
No ratings yet
Discrete Time Processing of Speech Signa
12 pages
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
No ratings yet
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
7 pages
Terez Pitch Detection Algorithm
No ratings yet
Terez Pitch Detection Algorithm
4 pages
Abstract:: Text-Independent and Dependent Methods. in A Text
No ratings yet
Abstract:: Text-Independent and Dependent Methods. in A Text
11 pages
Lab9: Speech Synthesis
No ratings yet
Lab9: Speech Synthesis
13 pages
46 Silence PDF
No ratings yet
46 Silence PDF
8 pages
Irrigation Engineering II
100% (1)
Irrigation Engineering II
1 page
LPC Vocoder: 1-Introduction
No ratings yet
LPC Vocoder: 1-Introduction
12 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
Credit Scoring Using Machine Learning
No ratings yet
Credit Scoring Using Machine Learning
381 pages
Silence Removal
No ratings yet
Silence Removal
3 pages
Geneaid - GSYNC DNA Extraction Kit - Protocol
100% (1)
Geneaid - GSYNC DNA Extraction Kit - Protocol
16 pages
PDMS Procedure: 2D DRAFT Intermediate - Structural Discipline
No ratings yet
PDMS Procedure: 2D DRAFT Intermediate - Structural Discipline
14 pages
ADM202EA
No ratings yet
ADM202EA
16 pages
Scopa Rules
No ratings yet
Scopa Rules
2 pages
Injectors. Adaptations. Coding - Bimmerprofs - Com - NOx Emulator NOXEM 129 - 130 - 402 Developed For BMW N43 & N53 Series Engines
No ratings yet
Injectors. Adaptations. Coding - Bimmerprofs - Com - NOx Emulator NOXEM 129 - 130 - 402 Developed For BMW N43 & N53 Series Engines
27 pages
2014 Experimental Investigations and Thermodynamic Modelling of KCl-LiCl-UCl3 System
No ratings yet
2014 Experimental Investigations and Thermodynamic Modelling of KCl-LiCl-UCl3 System
16 pages
History of Computing
No ratings yet
History of Computing
3 pages
Hi Ac 8011 Liquid Particle Counting System Manual
No ratings yet
Hi Ac 8011 Liquid Particle Counting System Manual
36 pages
Chem Lab 2
No ratings yet
Chem Lab 2
6 pages
MA507 Syllabus
No ratings yet
MA507 Syllabus
2 pages
1-Tac-12csu Tbfi1 Test Report
No ratings yet
1-Tac-12csu Tbfi1 Test Report
15 pages
Xaliss Jamal Omer - Numerical
No ratings yet
Xaliss Jamal Omer - Numerical
16 pages
C: Identify The Structures of The Given Sentences. P: Create Sentences Using Sentence Structures. A: Share Ideas Regarding Sentence Structures
No ratings yet
C: Identify The Structures of The Given Sentences. P: Create Sentences Using Sentence Structures. A: Share Ideas Regarding Sentence Structures
11 pages
Numerical Methods L3 Ok
No ratings yet
Numerical Methods L3 Ok
28 pages
HTML Tags
No ratings yet
HTML Tags
14 pages
Notes On EV:CV
No ratings yet
Notes On EV:CV
13 pages
Cryptanalysis of A New Ultralightweight RFID Authentication ProtocolSASI
No ratings yet
Cryptanalysis of A New Ultralightweight RFID Authentication ProtocolSASI
5 pages
Kig1009 Um-Pt01-Mqf-Br003-S00
No ratings yet
Kig1009 Um-Pt01-Mqf-Br003-S00
2 pages
Assignment On MAT141
No ratings yet
Assignment On MAT141
2 pages
Design Parameters For De-Formable Cushion Systems
No ratings yet
Design Parameters For De-Formable Cushion Systems
19 pages
Sound Design and Mixing in Reason
From Everand
Sound Design and Mixing in Reason
Andrew Eisele
3/5 (2)
Voice on the Air! Easy FM Transmitter for Beginners
From Everand
Voice on the Air! Easy FM Transmitter for Beginners
GURUPRASAD N H
No ratings yet