0% found this document useful (0 votes)
59 views18 pages

Audio/Speech Signal Processing: An Overview

This document discusses audio/speech signal processing. It provides an overview of application fields like FM broadcasting, music recording, and sound synthesis. It also describes common signal processing tasks like audio encoding/decoding using codecs, and digital filtering for audio effects. Specific techniques are explained like echo cancellation in voice calls, frequency-domain compression in codecs, and time-domain processing for echo effects. Resources for further learning about audio signal processing are provided.

Uploaded by

paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views18 pages

Audio/Speech Signal Processing: An Overview

This document discusses audio/speech signal processing. It provides an overview of application fields like FM broadcasting, music recording, and sound synthesis. It also describes common signal processing tasks like audio encoding/decoding using codecs, and digital filtering for audio effects. Specific techniques are explained like echo cancellation in voice calls, frequency-domain compression in codecs, and time-domain processing for echo effects. Resources for further learning about audio signal processing are provided.

Uploaded by

paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Audio/Speech Signal Processing

An Overview
Application Fields

Audio Processor: FM Broadcasting

Sound Mixer: Music Recording


Synthesizer: Sound Synthesis

Voice call: Noise reduction and Speech


Codecs
Signal Processing Tasks

• Audio/Speech Encoding/Decoding - Codecs


( DFT – Spectral Analysis, Filtering & Modifications)

• Audio effects
( FIR/IIR - Digital Filtering & Spectral Modifications)
Audio/Speech Codecs
Voice Call flow through mobile

Echo Cancellation
Speech Codec
Noise Reduction
Approximate data transfer size for 60 sec Call
Raw Data: (Just analog to digital converted data)

Sampling rate: 8000 samples/sec


Storage space for one sample : 8bit

Total data size = Number of samples * Storage space for one sample
= Samples/sec * Number of seconds * Storage space
= 8000 * 60 * 8 bits = 3840 Kbits

Bit rate = Samples/sec * Storage space for one sample = 64 Kbits/sec

Encoded/Compressed data: (DSP algorithm over sampled digital data)

Bit rate = 6.5 to 13 Kbits/sec (GSM Speech codecs output)

Data size = Transferred bits/sec * Number of seconds


= Bit rate * Number of seconds = 6.5 (13.5) * 60 = 390 to 810 Kbits
Audio Quality Measure

Audio 1 Raw Audio


1441Kbps

Audio 2 Compressed audio at


128Kbps

Audio 3 Compressed audio at


32Kbps
Signal Compression in Frequency domain
Audio/Speech Codecs
Spectrogram : Frequency variation with time
1411 Kbits Raw Audio

Frequency

128 Kbits MP3 Encoded Audio

Frequency

32 Kbits MP3 Encoded Audio

Frequency

Time
Audio and Speech Codecs

Audio Frequency Range: 20Hz – 20KHz


Speech Frequency Range: 300Hz – 3500Hz

Speech Codecs: (Linear Prediction approach)


AMR, G.723
bitrate: 1.2 Kbits/sec
Sampling rate: 8 - 16Khz

Audio Codecs : (MDCT, Psychoacoustics analysis)


MP3, AAC
bitrate: 32-768 Kbits/sec
Sampling rate : 8 - 48Khz
Audio/Sound Effects – Android Apps
Audio Effects

• Intelligent Loudness Control (Automatic Gain Control)

• Wideband Automatic Noise Removal (WANR)

• Envelope/Stereo Processing

• Voice/Vocal Enhancement

• Base Enhancement

• Sibilant/Fricative Smoothing

• Dynamic Listening Fatigue Reduction (DLFR)

• Multi-Band Graphic Equalizer (Equalizer)

• Low Pass Filtering


Echo Effect : Information in Time domain

Signal delay:
y(t) = x(t) + decay*x(t-delay)

Raw Sound:

Echoed Sound:
Bass Enhancement :Information in Frequency domain

Subwoofer: reproduce low-pitched audio frequencies


known as bass (e.g.: Drum Sound)

Frequency range : 20-200Hz

Bass system frequency response


Resources

QA Community:
Signal Processing Stack exchange
http://dsp.stackexchange.com/

Open Source Contribution:


Audacity: Free Audio Editor and Recorder
audacity.sourceforge.net/

FFmpeg (solution to record, convert and stream audio and video)


https://www.ffmpeg.org/
Resources

Indian Research Start-ups:


• ATC Labs, Noida
• Violet 3D, Bangalore
• Akshar Speech Technologies, Hyderabad

Research Labs:
• Fraunhofer Institute, Germany
• Dolby Laboratories
• Philips Research
• DTS/SRS Labs
Acknowledgment

Special thanks to,

Prof. Naren Naik


&
ATC Labs, Noida, India
Thanks for your time.

You might also like