Speech Signal Processing Core
Speech Signal Processing Core
Valerio Velardo
Sound
period
Frequency
period
Amplitude
amplitude
Phase
phase
Frequency and amplitude
Frequency Amplitude
● Logarithmic perception
● 2 frequencies are perceived similarly if they differ by a power of 2
Midi notes
Midi notes
Midi notes
Midi notes
Midi notes
Midi notes
440 Hz
Midi notes
440 Hz 880 Hz
Pitch-frequency chart
Mapping pitch to frequency
Mapping pitch to frequency
Mapping pitch to frequency
Cents
thesoundofai.slack.com
Intensity, loudness, and timbre
Valerio Velardo
The power of sound!
Sound power
● Logarithmic scale
● Measured in decibels (dB)
● Ration between two intensity values
● Use an intensity of reference (TOH)
Intensity level
Intensity level
Intensity level
log(1) = 0
Intensity level
● Colour of sound
Timbre
● Colour of sound
● Diff between two sounds with same intensity, frequency, duration
Timbre
● Colour of sound
● Diff between two sounds with same intensity, frequency, duration
● Described with words like: bright, dark, dull, harsh, warm
What are the features of timbre?
● Timbre is multidimensional
What are the features of timbre?
● Timbre is multidimensional
● Sound envelope
● Harmonic content
● Amplitude / frequency modulation
Sound envelope
● Attack-Decay-Sustain-Release Model
Sound envelope
Complex sound
● Superposition of sinusoids
Complex sound
● Superposition of sinusoids
● A partial is a sinusoid used to describe a sound
Complex sound
● Superposition of sinusoids
● A partial is a sinusoid used to describe a sound
● The lowest partial is called fundamental frequency
Complex sound
● Superposition of sinusoids
● A partial is a sinusoid used to describe a sound
● The lowest partial is called fundamental frequency
● A harmonic partial is a frequency that’s a multiple of the fundamental
frequency
Complex sound
● Superposition of sinusoids
● A partial is a sinusoid used to describe a sound
● The lowest partial is called fundamental frequency
● A harmonic partial is a frequency that’s a multiple of the fundamental
frequency
Complex sound
● Superposition of sinusoids
● A partial is a sinusoid used to describe a sound
● The lowest partial is called fundamental frequency
● A harmonic partial is a frequency that’s a multiple of the fundamental
frequency
Complex sound
● Superposition of sinusoids
● A partial is a sinusoid used to describe a sound
● The lowest partial is called fundamental frequency
● A harmonic partial is a frequency that’s a multiple of the fundamental
frequency
Complex sound
● Superposition of sinusoids
● A partial is a sinusoid used to describe a sound
● The lowest partial is called fundamental frequency
● A harmonic partial is a frequency that’s a multiple of the fundamental
frequency
● Inharmonicity indicates a deviation from a harmonic partial
Harmonic vs inharmonic instruments
Harmonic content
Frequency modulation
● AKA vibrato
● Periodic variation in frequency
● In music, used for expressive purposes
Frequency modulation
Amplitude modulation
● AKA tremolo
● Periodic variation in amplitude
● In music, used for expressive purposes
Amplitude modulation
Timbre recap
● Sound is a wave
● Frequency, intensity, timbre
● Pitch, loudness, timbre
What’s up next?
thesoundofai.slack.com
Understanding audio signals for ML
Valerio Velardo
Audio signal
● Representation of sound
● Encodes all info we need to reproduce sound
Houston we have a problem!
Houston we have a problem!
Digital
Analog
vs
Analog signal
● Sampling
● Quantization
Pulse-code modulation
Sampling
Sampling period
T
Sampling period
Sampling period
Locating samples
Sampling rate
Sampling rate
Why sampling rate = 44100hz?
Nyquist frequency
Nyquist frequency for CD
Aliasing
Quantization
Quantization
ADC
How do we record sound?
ADC
How do we reproduce sound?
How do we reproduce sound?
How do we reproduce sound?
DAC
How do we reproduce sound?
DAC
What’s up next?
thesoundofai.slack.com
How do we extract audio features?
Valerio Velardo
Join the community!
thesoundofai.slack.com
Previously on Audio Processing for ML
● Time-domain features
● Frequency-domain features
● Time-frequency domain features
Time-domain feature pipeline
Time-domain feature pipeline
ADC
Time-domain feature pipeline
512
44100
Frames
512
= 11.6ms
44100
Time-domain feature pipeline
aggregation
(mean, median, GMM)
Time-domain feature pipeline
aggregation
(mean, median, GMM)
feature value/vector/matrix
Frequency-domain feature pipeline
Frequency-domain feature pipeline
ADC
Frequency-domain feature pipeline
FT
Frequency-domain feature pipeline
frame size K
Overlapping frames
hop length
Frequency-domain feature pipeline
feature
computation
Frequency-domain feature pipeline
aggregation feature
(mean, median, GMM) computation
Frequency-domain feature pipeline
● Time-domain features