Audio/Speech Signal Processing
An Overview
Application Fields
Audio Processor: FM Broadcasting
Sound Mixer: Music Recording
Synthesizer: Sound Synthesis
Voice call: Noise reduction and Speech
Codecs
Signal Processing Tasks
• Audio/Speech Encoding/Decoding - Codecs
( DFT – Spectral Analysis, Filtering & Modifications)
• Audio effects
( FIR/IIR - Digital Filtering & Spectral Modifications)
Audio/Speech Codecs
Voice Call flow through mobile
Echo Cancellation
Speech Codec
Noise Reduction
Approximate data transfer size for 60 sec Call
Raw Data: (Just analog to digital converted data)
Sampling rate: 8000 samples/sec
Storage space for one sample : 8bit
Total data size = Number of samples * Storage space for one sample
= Samples/sec * Number of seconds * Storage space
= 8000 * 60 * 8 bits = 3840 Kbits
Bit rate = Samples/sec * Storage space for one sample = 64 Kbits/sec
Encoded/Compressed data: (DSP algorithm over sampled digital data)
Bit rate = 6.5 to 13 Kbits/sec (GSM Speech codecs output)
Data size = Transferred bits/sec * Number of seconds
= Bit rate * Number of seconds = 6.5 (13.5) * 60 = 390 to 810 Kbits
Audio Quality Measure
Audio 1 Raw Audio
1441Kbps
Audio 2 Compressed audio at
128Kbps
Audio 3 Compressed audio at
32Kbps
Signal Compression in Frequency domain
Audio/Speech Codecs
Spectrogram : Frequency variation with time
1411 Kbits Raw Audio
Frequency
128 Kbits MP3 Encoded Audio
Frequency
32 Kbits MP3 Encoded Audio
Frequency
Time
Audio and Speech Codecs
Audio Frequency Range: 20Hz – 20KHz
Speech Frequency Range: 300Hz – 3500Hz
Speech Codecs: (Linear Prediction approach)
AMR, G.723
bitrate: 1.2 Kbits/sec
Sampling rate: 8 - 16Khz
Audio Codecs : (MDCT, Psychoacoustics analysis)
MP3, AAC
bitrate: 32-768 Kbits/sec
Sampling rate : 8 - 48Khz
Audio/Sound Effects – Android Apps
Audio Effects
• Intelligent Loudness Control (Automatic Gain Control)
• Wideband Automatic Noise Removal (WANR)
• Envelope/Stereo Processing
• Voice/Vocal Enhancement
• Base Enhancement
• Sibilant/Fricative Smoothing
• Dynamic Listening Fatigue Reduction (DLFR)
• Multi-Band Graphic Equalizer (Equalizer)
• Low Pass Filtering
Echo Effect : Information in Time domain
Signal delay:
y(t) = x(t) + decay*x(t-delay)
Raw Sound:
Echoed Sound:
Bass Enhancement :Information in Frequency domain
Subwoofer: reproduce low-pitched audio frequencies
known as bass (e.g.: Drum Sound)
Frequency range : 20-200Hz
Bass system frequency response
Resources
QA Community:
Signal Processing Stack exchange
http://dsp.stackexchange.com/
Open Source Contribution:
Audacity: Free Audio Editor and Recorder
audacity.sourceforge.net/
FFmpeg (solution to record, convert and stream audio and video)
https://www.ffmpeg.org/
Resources
Indian Research Start-ups:
• ATC Labs, Noida
• Violet 3D, Bangalore
• Akshar Speech Technologies, Hyderabad
Research Labs:
• Fraunhofer Institute, Germany
• Dolby Laboratories
• Philips Research
• DTS/SRS Labs
Acknowledgment
Special thanks to,
Prof. Naren Naik
&
ATC Labs, Noida, India
Thanks for your time.