DCT Sample Solved
DCT Sample Solved
PART A
Answer All Questions. Each Question Carries 3 Marks
1. Specify different quantities used to measure the performance of a data
compression ratio -Ratio of the number of bits required to represent the data before
compression to the number of bits required to represent the data after compression
rate - the average number of bits required to represent a single sample
In lossy compression, the reconstruction differs from the original data.The difference
between the original and the reconstruction is often called the distortion.
Fidelity and quality. When we say that the fidelity or quality of a reconstruction is
high, we mean that the difference between the reconstruction and the original is small
If asked for 7 marks can provide examples.
2. Explain mathematical model for lossless compression
Physical Models
Physics of the data generation process • In speech-related applications, knowledge
about the physics of speech production can be used to construct a mathematical model
for the sampled speech process • Models for certain telemetry data can also be obtained
through knowledge of the underlying process • Residential electrical meter readings at
hourly intervals were to be coded, knowledge about the living habits of the populace
could be used to determine when electricity usage would be high and when the usage
would be low. • Instead of the actual readings, the difference (residual) between the
actual readings and those predicted by the model could be coded.
Probability Models
• Ignorance model
assume that each letter that is generated by the source is independent of every
other letter, and each occurs with the same probability
• Probability model
assume that each letter that is generated by the source is independent of every
other letter, and each occurs with the different probability • For a source that
generates letters from an alphabet A= a1, a2,….aM , we can have a probability
model P= Pa1, Pa2,…….PaM
3. State and prove Kraft-McMillan inequality
Let C be a code with N codewords with lengths l1,l2…ln. If C is uniquely decodable,
then
of a uniquely decodable (UD) code. It establishes the relation between such a code and
the lengths L of its codewords.The first part provides necessary condition on the
codeword lengths of uniquely decodable codes. The second part shows that we can
always find a prefix code that satisfies this necessary condition.Therefore, if we have a
uniquely decodable code that is not a prefix code, we can always find a prefix code with
the same codeword lengths.If we have a uniquely decodable code, the codeword lengths
have to satisfy the Kraft-McMillan inequality. And, given codeword lengths that satisfy
the Kraft-McMillan inequality, we can always find a prefix code with those codeword
lengths.
4. Compare Huffman and Arithmetic coding
Arithmetic Coding is more complicated than Huffman coding, but it allows us to code
sequences of symbol. Arithmetic coding is not a good idea if you are going to encode
your message one symbol at a time. As we increase the number of symbols per message,
our results get better and better.
To generate a codeword for a sequence of length m, using the Huffman procedure
requires building the entire code for all possible sequences of length m. If the original
alphabet size was k, then the size of the codebook would be k to power m. For the
arithmetic coding procedure, we do not need to build the entire codebook. Instead, we
simply obtain the code for the tag corresponding to a given sequence.
If the alphabet size is relatively large and the probabilities are not too skewed, the
maximum probability pmax is generally small. In these cases, the advantage of
arithmetic coding over Huffman coding is small, and it might not be worth the extra
complexity to use arithmetic coding rather than Huffman coding. However, there are
many sources, such as facsimile, in which the alphabet size is small, and the
probabilities are highly unbalanced. In these cases, the use of arithmetic coding is
generally worth the added complexity.
It is much easier to adapt arithmetic codes to changing input statistics.
5. Describe LZ77 approach of encoding a string with the help of an example
In the LZ77 approach, the dictionary is simply a portion of the previously encoded
sequence. The encoder examines the input sequence through a sliding window. The
window consists of two parts, a search buffer that contains a portion of the recently
encoded sequence, and a look-ahead buffer that contains the next portion of the
sequence to be encoded.
To encode the sequence in the look-ahead buffer, the encoder moves a search pointer
back through the search buffer until it encounters a match to the first symbol in the
look-ahead buffer. The distance of the pointer from the look-ahead buffer is called the
offset. The encoder then examines the symbols following the symbol at the pointer
location to see if they match consecutive symbols in the look-ahead buffer.
The number of consecutive symbols in the search buffer that match consecutive
symbols in the look-ahead buffer, starting with the first symbol, is called the length of
the match. The encoder searches the search buffer for the longest match. Once the
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
longest match has been found, the encoder encodes it with a triple <o,l,c> where o is
the offset, l is the length of the match, and c is the code word corresponding to the
symbol in the look-ahead buffer that follows the match
MPEG uses I, P, and B pictures.They are arranged in groups, where a group can be
open or closed.The pictures are arranged in a certain order, called the coding order, but
(afterbeing decoded) they are output and displayed in a different order, called the
display order.In a closed group, P and B pictures are decoded only from other pictures
in the group.In an open group, they can be decoded from pictures outside the
group.Different regions of a B picture may use different pictures for their decoding.A
region may be decoded from some preceding pictures, from some following pictures,
from both types, or from none.A region in a P picture may use several preceding
pictures for its decoding, or use none at all, in which case it is decoded using MPEG’s
intra methods. The basic building block of an MPEG picture is the macroblock.It
consists of a 16×16 block of luminance (grayscale) samples(divided into four 8×8
blocks) and two 8 × 8 blocks of the matching chrominance samples.The MPEG
compression of a macroblock consists mainly in passing each of the six blocks through
a discrete cosine transform, which creates decorrelated values, then quantizing and
encoding the results.It is very similar to JPEG compression, the main differences being
that different quantization tables and different code tables are used in MPEG for intra
and non intra, and the rounding is done differently. A picture in MPEG is organized in
slices. Slice is a contiguous set of macroblocks (in raster order) that have the same
grayscale (i.e., luminance component). The concept of slices makes sense because a
picture may often contain large uniform areas, causing many contiguous macroblocks
to have the same grayscale. A slice can continue from scan line to scan line When a
picture is encoded in nonintra mode (i.e., it is encoded by means of another picture,
normally its predecessor), the MPEG encoder generates the differences between the
pictures, then applies the DCT to the differences, it is followed by quantization.
audio samples that are below the threshold. Since the threshold depends on the
frequency, the encoder needs to know the frequency spectrum of the sound being
compressed at any time.
The range of audible frequencies can therefore be partitioned into a number of critical
bands that indicate the declining sensitivity of the ear (rather, its declining resolving
power) for higher frequencies. We can think of the critical bands as a measure similar
to frequency. However, in contrast to frequency, which is absolute and has nothing to
do with human hearing, the critical bands are determined according to the sound
perception of the ear. Thus, they constitute a perceptually uniform measure of
frequency.
two more properties of the human hearing system are used in audio compression. They
are frequency masking and temporal masking.Frequency masking (also known as
auditory masking) occurs when a sound that we can normally hear (because it is loud
enough) is masked by another sound with a nearby frequency. This source raises the
normal threshold in its vicinity (the dashed curve), with the result that the nearby sound
represented by t “x”, a sound that would normally be audible because it is above the
threshold, is now masked, and is inaudible .A good lossy audio compression method
should identify this case and delete the signals corresponding to sound “x”, because it
cannot be heard anyway. This is one way to lossily compress sound.The frequency
masking depends on the frequency. It varies from about 100 Hz for the lowest audible
frequencies to more than 4 kHz for the highest.
Temporal masking may occur when a strong sound A of frequency f is preceded or
followed in time by a weaker sound B at a nearby (or the same) frequency. If the time
interval between the sounds is short, sound B may not be audible.
Part B
(Answer any one question from each module. Each question carries 14 Marks)
10. (a) Explain mathematical model for lossy compression and lossless compression
(10)
Mathematical model for lossless compression
If the experiment is a source that puts out symbols Ai from a set A , then the entropy is
a measure of the average number of binary symbols needed to code the output of the
source.
Physical Models • Physics of the data generation process • In speech-related
applications, knowledge about the physics of speech production can be used to
construct a mathematical model for the sampled speech process • Models for certain
telemetry data can also be obtained through knowledge of the underlying process
• Residential electrical meter readings at hourly intervals were to be coded, knowledge
about the living habits of the populace could be used to determine when electricity
usage would be high and when the usage would be low. • Instead of the actual readings,
the difference (residual) between the actual readings and those predicted by the model
could be coded.
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
Probability Models • Ignorance model • assume that each letter that is generated by the
source is independent of every other letter, and each occurs with the same probability •
Probability model • assume that each letter that is generated by the source is
independent of every other letter, and each occurs with the different probability • For a
source that generates letters from an alphabet A= a1, a2,….aM , we can have a
probability model P= Pa1, Pa2,…….PaM .
Mathematical model for lossy compression
• Uniform distribution - this is an ignorance model. we do not know anything about
the distribution of the source output, except possibly the range of values, we can use
the uniform distribution to model the source.
• Laplacian distribution - distributions that are quite peaked at zero. For example,
speech consists mainly of silence. Therefore, samples of speech will be zero or close to
zero with high probability.
Many sources that we deal with have distributions that are quite peaked at zero. For
example, speech consists mainly of silence. Therefore, samples of speech will be zero
or close to zero with high probability. Image pixels themselves do not have any
attraction to small values. However, there is a high degree of correlation among pixels.
Therefore, a large number of the pixel-to-pixel differences will have values close to
zero. In these situations, a Gaussian distribution is not a very close match to the data.
1. Models
If the experiment is a source that puts out symbols Ai from a set A , then the entropy is
a measure of the average number of binary symbols needed to code the output of the
source.
Probability Models • Ignorance model • assume that each letter that is generated by the
source is independent of every other letter, and each occurs with the same probability •
Probability model • assume that each letter that is generated by the source is
independent of every other letter, and each occurs with the different probability • For a
source that generates letters from an alphabet A= a1, a2,….aM , we can have a
probability model P= Pa1, Pa2,…….PaM .
2. Coding
Uniquely Decodable Codes
A code is distinct if each codeword is recognizable from every other (i.e., the planning
from source messages to codewords is coordinated). A distinct code is extraordinarily
decodable if each codeword is recognizable when drenched in a grouping of codewords
or if the first source arrangement can be remade consummately from the encoded binary
sequence. Unique decodability ensures that codewords can be recognized
unambiguously in the received signal so that the decoding process is the exact inverse
of the encoding process.
A code is uniquely decodable if the mapping C+ : A+X → A+Z is one-to-one
A prefix code is a variable-size code that satisfies the prefix property. This property
requires that once a certain bit pattern has been assigned as the code of a symbol, no
other codes should start with that pattern (the pattern cannot be the prefix of any other
code).
13. (a) With a help of flowchart discuss the RLE text compression for text data
given below
‘ABBBBBBBBBCDEEEEF’ (10)
The compressed text will be
A@9BCD@4EF
(b) calculate the compression ratio for the example while taking repetitions = 4 (4)
are assigned in such a way that the code assigned to one character is not the prefix of
code assigned to any other character algorithm builds a tree in bottom up manner
The prior difference between the Huffman coding and Shannon fano coding is that the
Huffman coding suggests a variable length encoding. Conversely, in Shannon fano
coding the codeword length must satisfy the Kraft inequality where the length of the
codeword is limited to the prefix code.
It works by translating the characters contained in a data file into a binary code.
However, the major characters in the file have the smallest binary code, and the least
occurring characters have the longest binary codes.
Shannon Fano algorithm also uses the probabilities of the data to encode it. Although,
it does not ensures the optimal code generation. It is considered as the technique of
constructing prefix codes in accordance with the group of symbols and probabilities.
The Huffman coding employs the prefix code conditions while Shannon fano coding
uses cumulative distribution function.
(b) How Huffman coding is handling the unpredictability of input data stream (4)
15. (a) Explain in detail the working of LZ78 with example and dictionary Tree
(10)
The LZ77 approach implicitly assumes that like patterns will occur close together.
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
It makes use of this structure by using the recent past of the sequence as the dictionary
for encoding.
this means that any pattern that recurs over a period longer than that covered by the
coder window will not be captured.
This is a periodic sequence with a period of nine. If the search buffer had been just one
symbol longer, this sequence could have been significantly compressed.
The LZ78 algorithm solves this problem by dropping the reliance on the search buffer
and keeping an explicit dictionary.
This dictionary has to be built at both the encoder and decoder, and care must be taken
that the dictionaries are built in an identical manner.
The inputs are coded as a double < i, c> with i being an index corresponding to the
dictionary entry that was the longest match to the input, and c being the code for the
character in the input following the matched portion of the input.
As in the case of LZ77, the index value of 0 is used in the case of no match. This double
then becomes the newest entry in the dictionary.
each new entry into the dictionary is one new symbol concatenated with an existing
dictionary entry
where stands for space. Initially, the dictionary is empty, so the first few symbols
encountered are encoded with the index value set to 0. The first three encoder outputs
are <0 ,C(w ) >, <0, C( a ) >, <0 ,C(b ) >
The fourth symbol is a b, which is the third entry in the dictionary.
If we append the next symbol, we would get the pattern ba, which is not in the dictionary
so we encode these two symbols as <3,C(a ) >and add the pattern ba as the fourth entry
in the dictionary.
Continuing in this fashion, the encoder output and the dictionary develop as in Table
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
(b) Illustrate with example, how the compression factor LZW differ from the
LZ78(4)
Compression Ratio = Size of the original image /Size of the compressed image Using
LZW, 60-70 % of compression ratio can be achieved for monochrome images and text
files with repeated data.
16. (a) How quantization and coding helps in compression and their role in JPEG.
(6)
Quantization
After each 8×8 data unit of DCT coefficients Gij is computed, it is quantized. This is
the step where information is lost (except for some unavoidable loss because of finite
precision calculations in other steps). Each number in the DCT coefficients matrix is
divided by the corresponding number from the particular “quantization table” used, and
the result is rounded to the nearest integer. Three such tables are needed, for the three
color components. The JPEG standard allows for up to four tables, and the user can
select any of the four for quantizing each color component. The 64 numbers that
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
constitute each quantization table are all JPEG parameters. In principle, they can all be
specified and fine-tuned by the user for maximum compression. JPEG software
normally uses the following two approaches:
1. Default quantization tables. Two such tables, for the luminance (grayscale) and
the chrominance components, are the result of many experiments performed by the
JPEG committee. They are included in the JPEG standard and are reproduced here
This is how JPEG reduces the DCT coefficients with high spatial frequencies.
2. A simple quantization table Q is computed, based on one parameter R specified by
the user. A simple expression such as Qij = 1 + (i + j) × R guarantees that QCs start
small at the upper-left corner and get bigger toward the lower-right corner.
If the quantization is done correctly, very few nonzero numbers will be left in the DCT
coefficients matrix, and they will typically be concentrated in the upper-left region.
These numbers are the output of JPEG, but they are further compressed before being
written on the output stream. In the JPEG literature this compression is called “entropy
coding,”.
Three techniques are used by entropy coding to compress the 8 × 8 matrix of integers:
1. The 64 numbers are collected by scanning the matrix in zig zags . This produces a
string of 64 numbers that starts with some nonzeros and typically ends with many
consecutive zeros. Only the nonzero numbers are output (after further compressing
them) and are followed by a special end-of block (EOB) code. This way there is no
need to output the trailing zeros (we can say that the EOB is the run-length encoding of
all the trailing zeros).
2.The nonzero numbers are compressed using Huffman coding
3. The first of those numbers (the DC coefficient) is treated differently from the others
(the AC coefficients).
Coding
Each 8×8 matrix of quantized DCT coefficients contains one DC coefficient [at
position (0, 0), the top left corner] and 63 AC coefficients. The DC coefficient is a
measure of the average value of the 64 original pixels, constituting he data unit. In a
continuous-tone image the average values of the pixels in adjacent data units are close.
DC coefficient of a data unit is a multiple of the average of the 64 pixels constituting
the unit. This implies that the DC coefficients of adjacent data units don’t differ much.
JPEG outputs the first one (encoded), followed by differences (also encoded) of the DC
coefficients of consecutive data units.If the first three 8×8 data units of an image have
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
quantized DC coefficients of 1118, 1114, and 1119, then the JPEG output for the first
data unit is 1118 (Huffman encoded, see below) followed by the 63 (encoded) AC
coefficients of that data unit. The output for the second data unit will be 1114 - 1118 =
-4 (also Huffman encoded), followed by the 63 (encoded) AC coefficients of that data
unit, and the output for the third data unit will be 1119 - 1114 = 5 (also Huffman
encoded), again followed by the 63 (encoded) AC coefficients of that data unit.
17. (a) With the help of equations discuss Composite and Components Video (7)
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
(b) Differentiate the major changes in MPEG - 2 and MPEG-4 Video (7)
MPEG-2 extends the basic MPEG system to provide compression support for TV
quality transmission of digital video. To understand why video compression is so
important, one has to consider the vast bandwidth required to transmit uncompressed
digital TV pictures
Because the MPEG-2 standard provides good compression using standard algorithms,
it has become the standard for digital TV. It has the following features:
• Traditionally, methods for compressing video have been based on pixels. Each video
frame is a rectangular set of pixels, and the algorithm looks for correlations between
pixels in a frame and between frames The compression paradigm adopted for MPEG-4
is based on objects
• “coding of audiovisual objects.
• Defining objects, such as a flower, a face, or a vehicle, and then describing how each
object should be moved and manipulated in successive frames.
• A flower may open slowly, a face may turn, smile, and fade, a vehicle may move
toward the viewer and appear bigger.
• MPEG-4 includes an object description language that provides for a compact
description of both objects and their movements and interactionsAnother important
feature of MPEG-4 is interoperability.
• term refers to the ability to exchange any type of data, be it text, graphics, video, or
audio.
• Interoperability is possible only in the presence of standards.• All devices that produce
data, deliver it, and consume (play, display, or print) it must obey the same rules and
read and write the same file structures
OR
18. (a) Describe in details about functionalities for MPEG-4 (8)
Content-based multimedia access tools
• The MPEG-4 standard should provide tools for accessing and organizing audiovisual
data.
• Such tools may include indexing, linking, querying, browsing, delivering files, and
deleting them
Frame Segmentation: The current frame is divided into equal-size non overlapping
blocks. The blocks may be squares or rectangles.
• large blocks reduce the chance of finding a match, and small blocks result in many
motion vectors. In practice, block sizes that are integer powers of 2, such as 8 or 16, are
used
Search Threshold: Each block B in the current frame is first compared to its
counterpart C in the preceding frame. If they are identical, or if the difference
between them is less than a preset threshold, the encoder assumes that the block hasn’t
been moved.
Block Search:
• This is a time-consuming process
• If B is the current block in the current frame, then the previous frame has to be
searched for a block identical to or very close to B.
• The search is normally restricted to a small area (called the search area) around B,
defined by the maximum displacement parameters dx and dy.
• These parameters specify the maximum horizontal and vertical distances, in pixels,
between B and any matching block in the previous frame.
• If B is a square with side b, the search area will contain (b + 2dx)(b + 2dy) and will
consist of (2dx + 1)(2dy + 1) distinct, overlapping b×b squares.
The mean square difference is a similar measure, where the square, rather than the
absolute value, of a pixel difference is calculated
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
• The pel difference classification (PDC) measure counts how many differences |Bij −
Cij | are smaller than the PDC parameter p.
• The integral projection measure computes the sum of a row of B and subtracts it from
the sum of the corresponding row of C. The absolute value of the difference is added to
the absolute value of the difference of the columns sum
Suboptimal Search Methods: These methods search some, instead of all, the
candidate blocks in the (b+2dx)(b+2dy) area. They speed up the search for a
matching block, at the expense of compression efficiency
Motion Vector Correction:
• Once a block C has been selected as the best match for B, a motion vector is computed
as the difference between the upper-left corner of C and the upper-left corner of B.
• Regardless of how the matching was determined, the motion vector may be wrong
because of noise, local minima in the frame, or because the matching algorithm is not
perfect.
• It is possible to apply smoothing techniques to the motion vectors after they have been
calculated, in an attempt to improve the matching. Spatial correlations in the image
suggest that the motion vectors should also be correlated.
Coding Motion Vectors: Two properties of motion vectors help in encoding them:
(1) They are correlated and (2) their distribution is nonuniform
• No single method has proved ideal for encoding the motion vectors.
• two different methods that may perform better:
• Predict a motion vector based on its predecessors in the same row and its predecessors
in the same column of the current frame. Calculate the difference between the
prediction and the actual vector, and Huffman encode it. This algorithm is important. It
is used in MPEG and other compression methods.
• Group the motion vectors in blocks. If all the vectors in a block are identical, the block
is encoded by encoding this vector. Other blocks are encoded as in 1 above. Each
encoded block starts with a code identifying its type.
Coding the Prediction Error:
• Motion compensation is lossy, since a block B is normally matched to a somewhat
different block C.
• Compression can be improved by coding the difference between the current
uncompressed and compressed frames on a block by block basis and only for blocks
that differ much.
• The difference is written on the output, following each frame, and is used by the
decoder to improve the frame after it has been decoded.
19. (a) How The Human Auditory System limitations can be taken in audio (7)
Compressions
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
The frequency range of the human ear is from about 20 Hz to about 20,000 Hz, but the
ear’s sensitivity to sound is not uniform. It depends on the frequency, and experiments
indicate that in a quiet environment the ear’s sensitivity is maximal for frequencies in
the range 2 kHz to 4 kHz. The existence of the hearing threshold suggests an approach
to lossy audio compression. Just delete any audio samples that are below the threshold.
The frequency range of the human ear is from about 20 Hz to about 20,000 Hz, but the
ear’s sensitivity to sound is not uniform. It depends on the frequency. The existence of
the hearing threshold suggests an approach to lossy audio compression. Just delete any
audio samples that are below the threshold. Since the threshold depends on the
frequency, the encoder needs to know the frequency spectrum of the sound being
compressed at any time.
The range of audible frequencies can therefore be partitioned into a number of critical
bands that indicate the declining sensitivity of the ear (rather, its declining resolving
power) for higher frequencies. We can think of the critical bands as a measure similar
to frequency. However, in contrast to frequency, which is absolute and has nothing to
do with human hearing, the critical bands are determined according to the sound
perception of the ear. Thus, they constitute a perceptually uniform measure of
frequency.
two more properties of the human hearing system are used in audio compression. They
are frequency masking and temporal masking.Frequency masking (also known as
auditory masking) occurs when a sound that we can normally hear (because it is loud
enough) is masked by another sound with a nearby frequency. This source raises the
normal threshold in its vicinity (the dashed curve), with the result that the nearby sound
represented by t “x”, a sound that would normally be audible because it is above the
threshold, is now masked, and is inaudible .A good lossy audio compression method
should identify this case and delete the signals corresponding to sound “x”, because it
cannot be heard anyway. This is one way to lossily compress sound.The frequency
masking depends on the frequency. It varies from about 100 Hz for the lowest audible
frequencies to more than 4 kHz for the highest.
Temporal masking may occur when a strong sound A of frequency f is preceded or
followed in time by a weaker sound B at a nearby (or the same) frequency. If the time
interval between the sounds is short, sound B may not be audible.
(b) Discuss the complexity of Layer III compared to others in MPEG Audio
Coding(7)
In layer 1 and 2 encoding schemes, subbands at lower frequencies have a wide
bandwidth than the critical bands. Makes it difficult to accurately judge the mask-to-
signal ratio. A simple way to increase the spectral resolution would be to decompose
the signal directly into a higher number of bands.The spectral decomposition in the
Layer III algorithm is performed in two stages.
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR
Course Code: CST446
Course Name: Data Compression Techniques
Max.Marks:100 Duration: 3 Hours
The Layer III algorithm specifies two sizes for the MDCT, 6 or 18.
OR
20. (a) Discuss Format of Compressed Data and encoding in layer I and II (9)
● Thus, at low frequencies the critical band can have a bandwidth as low as 100 Hz,
while at
higher frequencies the bandwidth can be as large as 4 kHz. This increase of the
threshold has major implications for compression. Here a tone at 1 kHz has raised the
threshold of audibility so that the adjacent tone above it in frequency is no longer
audible.
● At the same time, while the tone at 500 Hz is audible, because of the increase in the
threshold the tone can be quantized more crudely.
● This is because increase of the threshold will allow us to introduce more quantization
noise at that frequency.
Temporal Masking
The temporal masking effect is the masking that occurs when a sound raises the
audibility threshold for a brief interval preceding and following the sound.
● In Figure 16.3 we show the threshold of audibility close to a masking sound. Sounds
that occur in an interval around the masking sound (both after and before the masking
tone) can be masked.
If the masked sound occurs prior to the masking tone, this is called premasking or
backward masking, and if the sound being masked occurs after the masking tone this
effect is called postmasking or forward masking.
● The forward masking remains in effect for a much longer time interval than the
backward masking.