ADAMA SCIENCE AND TECHNOLOGY
UNIVERSITY
DEPARTMENT OF ELECTRONICS AND COMMUNICATION
ENGINEERING
Advanced Digital Signal Processing
Project presentation
Speech Compression Using DCT
By Geleta Aman
ID_No : PGR 35870/16
ABSTRACT
Speech compression is a fundamental aspect of modern
communication systems and enabling efficient transmission and
storage of audio data. Discrete Cosine Transform (DCT) has emerged as
a powerful tool in speech compression due to its ability to concentrate
signal energy into a reduced set of coefficients. This paper presents
analysis of speech compression using DCT, focusing on the
mathematical underpinnings and practical implementation aspects. The
trade-off between compression ratio and quality is carefully examined,
considering parameters such as thresholding and quantization step size.
ABSTRACT
Evaluation metrics including Signal-to-Noise Ratio (SNR) and Mean
Squared Error (MSE) are utilized to assess the fidelity of the
reconstructed speech signal. Through mathematical analysis and
experimental validation, this study highlights the efficacy of DCT-based
speech compression in achieving significant compression ratios while
preserving perceptual quality. The findings contribute to the
understanding and optimization of speech compression techniques,
paving the way for enhanced audio communication systems in various
domains.
INTRODUCTION
Objective of speech is communication, whether face to face or cell phone to
cell phone. A huge amount of data is a big issue for transmission or storage.
Speech compression is the technology of converting human speech into an
efficiently encoded representation that can later be decoded to produce a
close approximation of the original signal. Major objective of speech
compression is to represent speech with less or few numbers of bits with
level of quality.
INTRODUCTION
By removing redundancy between neighboring samples signal can be compressed. In this paper we
have implemented compression technique in two steps, in 1st step a transform function is applied
on speech signal to get result with a new set of data with smaller values and more repetition, 2nd
step is coding(compression) step, this step will represent the data set in its minimal form by using
encoding techniques such as Run Length encoding, Huffman encoding, run length encoding
followed by Huffman encoding. Performance measures compression factor (CF), signal to noise
ratio (SNR), peak signal to noise ratio (PSNR), normalized root mean square error (NRMSE),
retained signal energy (RSE) is measured for reconstructed speech obtained DCT based speech
compression techniques.
Objectives
Here are four specific objectives of speech compression using DCT:
Enhancing data storage efficiency by reducing the size of speech
signals
Minimizing bandwidth requirements for speech transmission
Mitigating storage and transmission costs
Preserving essential speech features while reducing redundancy
enabling efficient utilization of communication resources in various
applications.
Statement of the Problem:
Speech compression is a critical aspect of various applications including
telecommunications, multimedia streaming, and storage systems.
Efficient compression techniques are essential to reduce the storage
requirements and bandwidth usage while maintaining acceptable audio
quality. In this context, the utilization of the Discrete Cosine Transform
(DCT) for speech compression presents a promising approach.
SYSTEM DESIGN AND
MATHEMATICAL ANALYSIS
Methodology for compression of speech signal
In this paper we are implementing speech compression technique based on DCT transform
method. in case of DCT transform speech can be represented in terms of DCT coefficient. Thus,
data operation can be performed using just the corresponding DCT coefficients. Transform
techniques and thresholding does not actually compress a signal, it simply provides information
about the signal, which allows the data to be compressed by standard encoding techniques.
Speech compression is achieved by neglecting small coefficients as insignificant data and
discarding them and then applying quantization and encoding scheme on coefficients.
SYSTEM DESIGN
Methodology for compression of speech signal
Steps in Speech Compression using DCT:
• Segmentation: Divide the speech signal into small segments or frames. Each frame typically
consists of a few milliseconds of audio data.
• DCT Transformation: Apply DCT to each frame of the speech signal.
• Quantization: Quantize the DCT coefficients by rounding them to a smaller number of bits or
by using a quantization matrix. This step reduces the precision of the coefficients.
• Entropy Coding: Apply entropy coding techniques (e.g., Huffman coding) to further compress
the quantized coefficients.
• Transmission/Storage: Transmit or store the compressed coefficients along with necessary
side information (e.g., frame size, quantization parameters) to reconstruct the speech signal.
• Reconstruction: At the decoder side, inverse the compression process by applying the inverse
steps: entropy decoding, dequantization, inverse DCT, and frame concatenation.
System Block Diagram
.
MATHEMATICAL ANALYSIS
Mathematical model
METHODOLOGY
Mathematical model
RESULT AND DISCUSSION
Performance evaluation
To evaluate the overall performance of the proposed audio compression
scheme, several objective tests were made. To measure the performance
of the reconstructed signal, various factors such as compression factor,
Signal to noise ratio, PSNR& mean square error are taken into
consideration.
RESULT AND DISCUSSION
Performance evaluation
Signal to Noise Ratio (SNR)
Where σx2 is the mean square of the speech signal and σe 2 is the mean
square difference between the original and reconstructed speech signal.
RESULT AND DISCUSSION
Performance evaluation
Peak Signal to Noise Ratio (PSNR)
Where N is the length of reconstructed signal, X is the maximum
absolute square value of signal x and ||x-x`||2 is the energy of the
difference between the original and reconstructed signal.
RESULT AND DISCUSSION
Performance evaluation
Normalized Root Mean Square Error (NRMSE)
Here, X(n) is the speech signal, x‟(n) is reconstructed speech signal and
μ x(n) is the mean of speech signal.
RESULT AND DISCUSSION
Results
The results for Compression factor, Signal to Noise ratio, PSNR & Mean
square error for the speech signal using the DCT based compression are
summarized in table 1.
No Error PSNR RMSE Size before compression Size after Decompression
1 3.0587e+04 21.8790 174.8914 110033 110033
RESULT AND DISCUSSION
Results
CONCLUSION
In conclusion, speech signal compression can be achieved through
various methods, but one of the simplest and effective approaches is
employing the Discrete Cosine Transform (DCT). By applying DCT, we
can identify threshold coefficients within the speech signal and
subsequently reduce its size, thereby facilitating efficient compression.
CONCLUSION
While numerous other transforms and techniques exist for speech signal
compression, the utilization of DCT stands out as the simplest and widely
adopted method. Its effectiveness lies in its ability to efficiently represent
the signal in the frequency domain, enabling significant reductions in data
size while preserving essential information within the speech signal.
Thank you