Data Compression UNIT1
Data Compression UNIT1
Data Compression
Data Compression
• Data compression is defined as the process whereby information is encoded in less bits than it had
originally occupied. This mainly happens through methods that eliminate duplication and other extraneous
information.
• Compression techniques are useful for reducing file sizes for storage, minimizing bandwidth during
transmission and enabling faster uploading/downloading of web content over the internet.Data
compression is the process of reducing the size of data files by encoding, restructuring, or modifying the
data. It's a way to store and transmit data more efficiently.
• Data compression involves various algorithms and standards that determine how data is compacted. These
different algorithms dictate the methods and rules for reducing the size of files or data streams.
• Compression reduces the cost of storage, increases the speed of algorithms, and reduces the transmission
cost. Compression is achieved by removing redundancy, that is repetition of unnecessary data.
• “Data compression” is also referred as “Bit rate reduction" as it is the process of encoding information
using fewer bits than the original representation, essentially reducing the size of a data file by eliminating
redundancies and representing data more efficiently, effectively lowering the "bit rate" needed to store or
transmit the information.
• Data Compression mainly happens through methods that eliminate duplication and other
extraneous information.
• In data compression, "bit rate" refers to the number of bits used to represent data per unit
of time, essentially measuring how much information is compressed into a given
timeframe, typically expressed in bits per second (bps) and often used to describe the
quality of compressed audio or video files where a higher bit rate indicates better fidelity
but also a larger file size; essentially.
• When compressing data, a lower bit rate means more compression is applied, resulting in
a smaller file size but potentially lower quality.
• The primary goal of data compression is to achieve bit rate reduction, meaning to
represent data with fewer bits while still maintaining the essential information.
• Compression algorithms identify patterns and redundancies in data, then replace them
with more efficient codes, effectively reducing the number of bits needed to represent
the information.
Advantages OR Need of Data Compression
• Reduced storage space: Compressed files take up significantly less physical storage on hard drives, allowing
for more data to be stored on the same device.
• Faster data transfer: Smaller file sizes transmit quicker over networks, resulting in faster download and
upload speeds, especially beneficial for large files.
• Lower bandwidth usage: By requiring less data to be transferred, compression reduces the strain on network
bandwidth, leading to cost savings for data transmission.
• Cost efficiency: Combining reduced storage needs with lower bandwidth usage translates to significant cost
savings for data storage and transmission.
• Improved network performance: Faster data transfer due to compression can improve the overall
performance of networks and cloud services.
• Efficient backups and archiving: Compression allows for more data to be backed up or archived within a
smaller storage space.
• Data sharing facilitation: Smaller file sizes make it easier to share large datasets across networks, enabling
quicker collaboration.
• Improved Performance: In applications like web browsing, streaming, and gaming, compressed files load
faster, enhancing user experience.
USES:
1) File Compression
• File compression is the most familiar use of data compression. It’s used to reduce the size of files, making
them more manageable for storage and faster to transfer. The common file compression formats include ZIP,
RAR etc.
• Imagine you have a PowerPoint presentation, with numerous high-resolution images, embedded videos, and
detailed charts. This large file in its original state could be a hassle to send. It might take a long time to
upload, eat up your email storage space, and be challenging for your colleague to download. However, a
common file compression format like ZIP, you can easily address these issues.
2) Video Compression
• Video compression is the process of reducing the size of a video file while retaining some visual quality. It's
essential for streaming videos on the internet, as it allows for more efficient storage, transmission.
• It makes it easier to stream video on the web, especially for users with slow internet speeds It allows
streaming platforms to deliver high-quality content efficiently
3. Image compression: in photography and graphic design allows for faster loading of digital images
and efficient storage of photographs. JPEG is a widely used image compression format that balances
image quality and original size.
4. Audio compression is a technique that reduces the size of audio files and the dynamic range of an
audio signal. It's used to make the sound of vocals, instruments, and more smoother and more balanced.
5. Telecommunication
• In telecommunications, data compression reduces the bandwidth required to transmit data. It allows for
faster internet connections, efficient data transfer over networks, and smooth video conferencing. In
video conferencing, the ability to transmit high-quality audio and video data over networks efficiently
is crucial.
Applications of Data Compression
• Multimedia Files: Compressing audio, video and image files to reduce size while maintaining quality,
e.g. MP3 and JPEG formats.
• Email Attachments: This can be done through compressing large attachments to fit within email size
limits like ZIP files.
• Medical Imaging: These are medical images that are compressed in order to occupy less space and
also ease the transmission process without losing diagnostic details like DICOM format for medical
scans.
• Backup and Recovery: Data is compressed during backup so as to save on space thus speeding up
recovery since compressed archives can be used for storing data.
• Database Management: The records of databases can be compressed so as to improve query
performance as well as save on storage whereby columnar data compression is an example of this.
• Streaming Services: Video or audio streaming services where video or audio is compressed such that
minimal bandwidth usage occurs like H.264 codec for video streaming.
• Mobile Applications: Such applications require the reduction of app size to make them download
quickly therefore conserve the storage space by compressing images and other information that comes
with it.
• Document Management: Text documents together with PDFs are compressed by converting them
into a smaller file size that makes it easy to handle, share e.g., using PDF compression tools.
Process or Working of Data Compression
Data compression is the process of reducing the size of a file or data to save storage space and improve
transmission speed. The process involves encoding information using fewer bits than the original representation.
1. Input Data:
1. The original data (text, image, audio, or video) is taken as input.
2. Preprocessing:
1. The data is analyzed to identify redundant or irrelevant parts that can be removed or encoded efficiently.
3. Encoding (Compression Algorithm):
1. A compression algorithm (Lossless or Lossy) is applied to reduce data size.
2. Common algorithms include Huffman coding, Run-Length Encoding (RLE), and Lempel-Ziv-Welch
(LZW).
4. Compressed Output:
1. The reduced-size data is stored or transmitted.
2. The compressed file is smaller in size compared to the original.
5. Decompression (Reconstruction):
1. The compressed data is decoded using the corresponding decompression algorithm.
2. Lossless compression restores the original data exactly, while lossy compression approximates the
original.
Morse Code
• Morse code is considered an early example of data compression because it assigns shorter code
sequences (dots and dashes) to frequently used letters, effectively reducing the amount of data
needed to transmit text by taking advantage of letter frequency patterns, similar to how modern
compression algorithms work by giving shorter codes to more common data elements; essentially, it
compresses text by representing common letters with fewer symbols.
• In order to reduce the average time required to send a message, he assigned shorter sequences to
letters that occur more frequently, such as e (·) and a (·−), and longer sequences to letters that occur
less frequently, such as q (−−·−) and j (·−−−).
Compression And Reconstruction
• Compression techniques are essential for efficient data storage and transmission. There
are two forms of compression: lossless and lossy.
Lossless Compression
• Lossless compression techniques, as their name implies, involve no loss of information. If data have
been lossless compressed, the original data can be recovered exactly from the compressed data.
• Lossless compression is generally used for applications that cannot tolerate any difference between
the original and reconstructed data.
• Text compression is an important area for lossless compression. It is very important that the
reconstruction is identical to the text original, as very small differences can result in statements with
very different meanings.
• The main advantage of lossless data compression is that we can restore the original data in its
original form after the decompression.
• For example, suppose we compressed a radiological image in a lossy fashion, and the difference
between the reconstruction and the original was visually undetectable. If this image was later
enhanced, the previously undetectable differences may cause the appearance of artifacts that could
seriously mislead the radiologist. Because the price for this kind of mishap may be a human life, it
makes sense to be very careful about using a compression scheme that generates a reconstruction
that is different from the original.
• Lossless data compression mainly used in the sensitive documents,
confidential information. Some most important Lossless data
compression techniques are –
Suitable for Text and Archives: Ideal for text-based files, software installations, and backups.
Minor File Size Reduction: Reduces file size without compromising quality significantly.
• Mainly MP3 audio, MPEG video, and JPEG image formats make use of the technique of
lossy data technique. Some very crucial techniques of lossy data compression are:
• Transform Coding
• Quantization
Lossy Compression Lossless Compression
Lossy compression is the method which eliminate the data While Lossless Compression does not eliminate the data
which is not noticeable. which is not noticeable.
In Lossy compression, A file does not restore or rebuilt in its While in Lossless Compression, A file can be restored in its
original form. original form.
Lossy compression reduces the size of data. But Lossless Compression does not reduce the size of data.
Algorithms used in Lossy compression are: Transform Algorithms used in Lossless compression are: Run Length
coding, Discrete Cosine Transform, Discrete Wavelet Encoding, Lempel-Ziv-Welch, Huffman Coding, Arithmetic
Transform, fractal compression etc. encoding etc.
Lossy compression is used in Images, audio, video. Lossless Compression is used in Text, medical images.
File quality is high in the lossless data compression. File quality is high in the lossless data compression.
File quality is low in the lossy data compression. File quality is low in the lossy data compression.
Lossy compression is also termed as irreversible Lossless Compression is also termed as reversible
compression. compression.
Measures of Performance of Compression Algorithms
A compression algorithm can be evaluated in a number of different ways. Here the amount
of compression, and how closely the reconstruction resembles the original are important
points based on which the performance metrics are divided. with the last two criteria. Let us
take each one in turn.
(ii) Distortion
• Data compression ratio, also known as compression power, is a measurement of the relative reduction in
size of data representation produced by a data compression algorithm.
• The compression ratio in data compression is the ratio of the uncompressed size of a file to its compressed
size. It's usually expressed as a ratio or percentage.
• Example If a file is 10 MB uncompressed and 2 MB compressed, the compression ratio is 5. This can be
written as 5:1 or 5/1. If a file is 1,000 KB uncompressed and 500 KB compressed, the compression ratio is
50%.
• Compression ratio is a universal measurement used in many fields, including video streaming,
file storage, and data transfers.
• To calculate the compression ratio percentage, subtract the compressed file size from the original
file size, then divide by the original file size and multiply by 100.
• Compression ratio measures how much a file or data set is reduced in size through compression,
usually expressed as a ratio or percentage. For example, if a file is originally 1,000 kilobytes
(KB) and after compression becomes 500KB, the compression ratio is 50%.
• Suppose storing an image made up of a square array of 256×256 pixels requires 65,536 bytes.
The image is compressed and the compressed version requires 16,384 bytes. We would say that
the compression ratio is 4:1. We can also represent the compression ratio by expressing the
reduction in the amount of data required as a percentage of the size of the original data. In this
particular example the compression ratio calculated in this manner would be 75%.
• Sometimes the space saving is given instead, which is defined as the reduction in size relative to
the uncompressed size:
2. DISTORTION
• In order to determine the efficiency of a compression algorithm, we have to have same way of
quantifying the difference.
• The difference between the original and the reconstruction is called as ‘Distortion’.
• The most basic distortion measure is the numerical difference between the original and reconstructed
data. However, this may not be the best measure for images, as a small numerical difference may not
be noticeable to the human eye.
• For data that will be perceived by humans, such as images, videos, or audio, the distortion measure
should be based on human perception. For example, in audio compression, perceptual models are used
to estimate what aspects of speech the human ear can hear.
3. Compression rate:
• It is the average number of bits required to represent a single sample.
• Ex. In the case of the compressed image if we assume 8 bits per byte (or pixel), and the average
number of bits per pixel in the compressed representation is 2. Thus, we would say that the
compression rate is 2 bits/ pixel.
Fidelity and Quality:
• In data compression, "fidelity“ and “Quality” refers to the degree to which a
compressed data accurately represents the original data, essentially measuring how
close the reconstructed data is to the original data.
• It indicates the overall level of acceptable distortion or loss in the compressed data.
• When we say that the fidelity or quality of a reconstruction is high, we mean that
the difference between the reconstruction and the original is small.
• High fidelity implies minimal loss of information during compression due to good
balance between data size reduction and preserving important details.
Fidelity
• Refers to the accuracy and faithfulness of the compressed data compared to the original.
• High fidelity means the compressed version is nearly identical to the original.
• It is often used in lossless compression, where no data is lost, ensuring 100% fidelity.
• In lossy compression, fidelity depends on how much important information is retained.
• PNG image retains all pixel data (high fidelity)
2. Quality
• Refers to the perceived experience or subjective assessment of the compressed data.
• High quality means the compressed version looks, sounds, or performs well even if some data
is lost.
• Used mainly in lossy compression, where some data is removed but the result is still acceptable
for human perception.
• JPEG image may lose details but still look good (high quality)
Compression Factor
The Compression Factor in data compression is the ratio of the compressed size to the original
size of the data. It is a measure of how much the data has been reduced during compression.
MODELLING AND CODING
• The development of data compression algorithms for a variety of data can be divided into two phases
i.e., modelling and coding. In data compression, modeling and coding are processes that work together
to reduce the size of data by encoding it using fewer bits.
• The first phase is usually referred to as Modeling. In this phase we try to extract information about any
redundancy that exists in the data and describe the redundancy in the form of a model.
• The model acts as the basis of any data compression algorithm and the performance of any algorithm
depends on how well the model is being formed. Modelling is the process of extracting information
from the source to help the coder choose the right codes.
• The second phase is called Coding. It is the process of encoding information using fewer bits than the
original representation. This can be done by restructuring, modifying, or encoding the data.
• In coding process, the description of the model and a “description” of how the data differ from the
model are generally encoded using a binary bits. The difference between the data and the model is
often referred to as the residual.
• Many compression algorithms use predictive coding, where they predict the next data point based on
previous data, and the "residual" is the difference between the actual data and the predicted value.
• In lossy compression, some of the residual information might be discarded to achieve higher compression
ratios, while in lossless compression, all residual information is preserved for perfect reconstruction.
• After constructing the model by exploiting the characteristics of the data and structure or redundancy in
data that need to be compressed. There are number of ways to categorize the data and different
characterizations will lead to different compression schemes.
• After construction the compression model we use the model to obtain compression by encoding the
symbols in data using codewords having fewer bits.
• Data redundancy is considered statistical in nature because it often arises from inherent patterns and
correlations within data, where certain information is repeated or highly predictable based on other data
points; this means that by analyzing statistical relationships, we can often identify and reduce redundancy
within a data to create a model for data compression
Example 1 of modelling and coding :
Consider the following sequence of numbers:
27 28 29 28 26 27 29 28 30 32 34 36 38
The sequence is plotted in Figure as.
• Suppose we send the first value, then in place of subsequent values we send the difference
between it and the previous value. The sequence of transmitted values would be
27 1 1 −1 −2 1 2 −1 2 2 2 2 2
• The number of distinct values has been reduced. Fewer bits are required to represent each
number and compression is achieved. The decoder adds each received value to the previous
decoded value to obtain the reconstruction corresponding.
• NOTE: Often we will encounter sources that generate some symbols more often than others.
In these situations, it will be advantageous to assign binary codes of different lengths to
different symbols. This is discussed in second example.
Example 2 of modelling and coding :
• Suppose we have the following sequence:
abarayaranbarraybranbfarbfaarbfaaarbaway
• which is typical of all sequences generated by a source. Notice that the sequence is made up of
eight different symbols. In order to represent eight symbols, we need to use 3 bits per symbol.
• Suppose instead we used the code shown in Table 1.1. Notice that we have assigned a codeword
with only a single bit to the symbol that occurs most often, and correspondingly longer
codewords to symbols that occur less often.
• If we substitute the codes for each symbol, we will use 106 bits to encode the entire sequence.
As there are 41 symbols in the sequence, this works out to approximately 258 bits per symbol.
This means we have obtained a compression ratio of 1.16:1.
• In data compression, a "codeword" refers to a single, specific binary string assigned to represent a
piece of data (like a character or symbol), while "codes" refers to the entire set of these codewords
used by a compression algorithm to encode data, essentially the mapping between original data and
their compressed representations; each individual mapping within this set is called a codeword.
Codeword:
• A single binary string that represents a specific piece of data within a compression scheme.
Codes: The entire collection of codewords used by a compression algorithm, defining the mapping
between original data and their compressed representations.
Codeword: In a Huffman coding scheme, "01" might be a codeword representing the letter "e".
Codes: The complete set of codewords used by the Huffman algorithm, including "01" for "e", "10"
for "t", "11" for "a", et.
A={ A,,F, F, 9, #, 2}
• In probability, a random experiment is a procedure that produces a definite outcome that
cannot be predicted with certainty. An event is a subset of the sample space, which is the
set of all possible outcomes for a random experiment.
• A sample space is a collection or a set of possible outcomes of a random experiment.
• Random experiment: An action that can be repeated under similar conditions but produces
an unpredictable outcome each time, like tossing a coin or rolling a dice.
• Event: A specific outcome or collection of outcomes from a random experiment, like
getting heads on a coin toss or rolling a number greater than 4 on a dice.
• Example:
Random experiment: Rolling a six-sided die
Event: Getting an even number (considered as the event "E" which would include the
outcomes 2, 4, and 6)
• For tossing a coin the sample space is = {H,T}; probability of getting a head is denoted as
P(H)=0.5 and probability of getting a tail is denoted as P(T)=0.5
Conditional probability
• Conditional probability is one type of probability in which the possibility of an event depends upon
the existence of a previous event. Conditional probability is the likelihood of an outcome occurring
based on a previous outcome in similar circumstances. In probability notation, this is denoted as A
given B, expressed as P(A|B), indicating that the probability of event A is dependent on the
occurrence of event B.
• Conditional probability is known as the possibility of an event or outcome happening, based on the
existence of a previous event or outcome.
• The main difference between the probability and the conditional probability is that probability is the
likelihood of occurrence of an event say A, whereas the conditional probability defines the
probability of an event by assuming another event has already occurred, i.e. in the conditional
probability of A given B, the event B is assumed to have already occurred.
• Conditional probability is required when some events may occur in relation to the occurrence of
another event.
•
Example-1 We roll a six-sided die. What is the probability that the roll is a 6, given that the outcome is
an even number
The possible outcomes when rolling a die are: 1, 2, 3, 4, 5, 6. The even numbers are: 2, 4, 6 (3
outcomes). Only one of these is a 6.
So our answer = 1/3
Example 2: Let’s consider two events in tossing two coins be,
A: Getting a head on the first coin.
B: Getting a head on the second coin.
Sample space for tossing two coins is:
S = {HH, HT, TH, TT}
Conditional probability of getting a head on the second coin (B) given that we got a head on the first
coin (A) is = P(B|A)
Since the coins are independent (one coin’s outcome does not affect the other), P(B|A) = P(B) = 0.5
(50%), which is the probability of getting a head on a single coin toss.
Mathematical Preliminaries for Lossless
Compression
➢Information Theory
a) Self Information
b) Entropy
INFORMATION THEORY
• Information theory was introduced by Claude Elwood Shannon for the development of lossless data
compression methods.
• Shannon's information theory, in the context of data compression, is a mathematical framework that
defines the theoretical limit of how much data can be compressed without losing information, based
on the concept of "entropy" which measures the randomness or uncertainty within the data source.
• In Shannon's information theory, "entropy" refers to a mathematical measure of the uncertainty or
randomness within a message source, essentially representing the average amount of information
contained in each event within a set of possible outcomes, and is considered a fundamental concept
for understanding how efficiently data can be compressed; the higher the entropy, the more
uncertain the information is and the more bits are needed to represent it effectively.
• Information theory gives the mathematical framework for compression codes. Statistical encoding
schemes use the frequency of each character of the source for compression and, by encoding the
most frequent characters with the shortest codewords, put themselves close to the entropy.
• Shannon defined a quantity called self-information. Suppose we have an event A, which is a set of
outcomes of some random experiment. "self-information" refers to the amount of information
contained within a single event, based on its probability.
• Example of self information:
• If P(A) is the probability that the event A will occur, then the self-information associated with A is given
by:
• Recall that log(1) = 0, and −log(x) increases as x decreases from one to zero. Therefore, if the probability
of an event is low, the amount of self-information associated with it is high; if the probability of an event is
high, the information associated with it is low.
• Likewise, For example, a totally random string of letters will contain more information (in the
mathematical sense) than a well-thought-out treatise on information theory.
• Suppose A and B are two independent events. The self-information associated with the occurrence of both
event A and event B is,
• Note that to calculate the information in bits, we need to take the logarithm base 2 of the
probabilities.
• Let’s review logarithms briefly.
Example of Self Information:
Mathematically, the occurrence of a head conveys much more information than the occurrence of
a tail.
Entropy & Self Information
• Self Information is calculated for a particular event while Entropy is calculated for entire
random experiment.
• Entropy is the average number of bits required to represent or transmit an event drawn from
the probability distribution for the random variable.
• "self-information" refers to the amount of information contained within a single event, based
on its probability, while "entropy" is the average self-information across all possible events
within a system, essentially representing the overall uncertainty or surprise value of a random
variable; meaning self-information is a measure of the surprise of a single event, while entropy
is the average surprise across all possible events.
• Entropy is the average self-information associated with the random experiment is given
by H.
• Here, A1 , A2 …. Depicts the independent events in the random experiment. Thus, Ai
represent s the set of outcomes of some experiment A.
A1 U A2 U A3 ……….U An = A
The entropy for this
source is 3.25 bits. This
means that the
best scheme we could
find for coding this
sequence could only code
it at 3.25 bits/sample.
Joint probability
• The term joint probability refers to a statistical measure that calculates the likelihood of two
events occurring together and at the same point in time.
• Put simply, a joint probability is the probability of event Y occurring at the same time that
event X occurs.
• In order for joint probability to work, both events must be independent of one another, which
means they aren't conditional or don't rely on each other.
Ques. The joint probabilities of the transmitted message is given below. Find the H(X)
and H(Y).
Types of Models for Data Compression
• Having a good model for the data can be useful in estimating the entropy of the source.
• Good models for sources lead to more efficient compression algorithms.
• In general, in order to develop techniques that manipulate data using mathematical operations,
we need to have a mathematical model for the data.
• Obviously, the better the model (i.e., the closer the model matches the aspects of reality that are
of interest to us), the more likely it is that we will come up with a satisfactory technique.
• There are several approaches to building mathematical models:
1) Physical Model
2) Probability Model
3) Markov Model
4) Composite Source Model
1. Physical Models
• If we know something about the physics of the data generation process, we can use that
information to construct a model.
• It is based on the underlying physical principles and mechanisms that create data in a
system, essentially looking at the natural laws and processes that govern how data is
produced
• For example, in speech-related applications, knowledge about the physics of speech
production can be used to construct a mathematical model for the sampled speech process.
Sampled speech can then be encoded using this model
• In general, however, the physics of data generation is simply too complicated to understand,
let alone use to develop a model.
• If the physics of the problem is too complicated, we can also obtain a model based on
empirical observation of the statistics of the data.
2. Probability Model
• The simplest statistical model for the source is to assume that each letter that is generated by the
source is independent of every other letter, and each occurs with some probability.
• We could call this the ignorance model as its generation be useful only when we know nothing about
the source.
• The next step up in complexity is to keep the independence assumption but remove the equal
probability assumption and assign a probability of occurrence to each letter in the alphabet.
• For a source that generates letters from an alphabet 𝐴 = {𝑎1 , 𝑎2 , … … . . 𝑎𝑚 } we can have a probability
model P= {𝑃(𝑎1 ), 𝑃(𝑎2 ), … … . . 𝑃(𝑎𝑚 ).
• Given a probability model (and the independence assumption), we can compute the entropy of the
source. Using the probability model, we can also construct some very efficient codes to represent the
letters in A.
• If the assumption of independence does not fit with our observation of the data, we can generally find
better compression schemes if we discard this assumption. When we discard the independence
assumption, we have to come up with a way to describe the dependence of elements of the data
sequence on each other.
3. Markov Model
• One of the most popular ways of representing dependence in the data is through the use of
Markov models, named after the Russian mathematician Andrei Andrevich Markov.
• It is a stochastic model that depicts a sequence of possible events where predictions or
probabilities for the next state are based on its previous states
• For models used in lossless compression, we use a specific type of Markov process called a
discrete time Markov chain. Let 𝑥𝑛 be a sequence of observations. This sequence is said to follow
a kth-order Markov model if:
• In other words, knowledge of the past k symbols is equivalent to the knowledge of the entire past
history of the process. The values taken on by the set 𝑥𝑛−1 … … … … 𝑥𝑛−𝑘 are called the states of
the process.
• The most commonly used Markov model is the first-order Markov model, for which
• This first model Markov model equation, depicts a sequence of possible events where
probabilities for the next state or future state are based solely on its previous (current) state, not
the states before.
• In simple words, the probability that n+1th steps will be x depends only on the nth steps not the
complete sequence of steps that came before n.
Markov Chains are characterized by:
States: The distinct conditions or positions that the system can be in.
Transition Probabilities: The probabilities of moving from one state to another, which are
typically represented in a matrix form.
State Transition Diagram: The diagram representing the states involved in Markov Model. It is
basically a directed graph where, each arrow represents the transition from one state to another.
The sum of weights of outgoing arrows is equal to one always.
• Note: Markov models are particularly useful in text compression, where the probability of the
next letter is heavily influenced by the preceding letters.
The Markov chain diagram shown here shows a simple chain of states with arrows
indicating transition probabilities.
Example: A weather model where states are "Sunny", "Rainy", "Cloudy", and
arrows show the probability of moving from one state to another from one day to
the next.
• For example, consider a binary image. The image has only two types of pixels, white pixels and
black pixels.
• We know that the appearance of a white pixel as the next observation depends, to some extent, on
whether the current pixel is white or black.
• Therefore, we can model the pixel process as a discrete time Markov chain.
• Define two states Sw and Sb. Here, Sw would correspond to the case where the current pixel is a
white pixel, and Sb corresponds to the case where the current pixel is a black pixel).
• We define the transition probabilities P(w/b) and P(b/w), and the probability of being in each
state P(Sw) and P(Sb).
• The Markov model can then be represented by the state diagram shown in Figure.
The entropy of a finite state process with states Si is simply the average value of the
entropy at each state:
This results in an entropy for the Markov model of 0.107 bits, about a half of the entropy
obtained using the probability model.
4. Composite Source Model
• In many applications, it is not easy to use a single model to describe the source. In such
cases, we can define a composite source, which can be viewed as a combination or
composition of several sources, with only one source being active at any given time.
• A composite source can be represented as a number of individual sources i, each with its
own model i, and a switch that selects a source 𝑆𝑖 with probability 𝑃𝑖 .
• This is an exceptionally rich model and can be used to describe some very complicated
processes
Coding in Data Compression
• When we talk about coding we mean the assignment of binary sequences to elements of an
alphabet.
• The set of binary sequences is called a code, and the individual members of the set are called
codewords.
• An alphabet is a collection of symbols called letters. For example, the alphabet used in writing
most books consists of the 26 lowercase letters, 26 uppercase letters, and a variety of punctuation
marks.
• The ASCII code for the letter a is 1000011, the letter A is coded as 1000001, and the letter “,” is
coded as 0011010. Notice that the ASCII code uses the same number of bits to represent each
symbol. Such a code is called a fixed-length code.
• If we want to reduce the number of bits required to represent different messages, we need to use a
different number of bits to represent different symbols.
• In data compression, a "fixed length code" assigns the same number of bits to each data symbol,
while a "variable length code" allows different numbers of bits to represent different symbols, with
more frequent symbols typically getting shorter codes, leading to better compression efficiency.
• Fixed length codes provide a consistent bit representation for each symbol, whereas variable length
codes adapt the bit length based on the symbol's probability of occurrence.
In the given table code 1 depicts fixed length code whereas code 2
and code 3 depicts variable length codes
Uniquely Decodable Codes
• The average length of the code is not the only important point in designing a “good” code.
• Consider the following example. Suppose our source alphabet consists of four letters a1, a2, a3,
and a4, with probabilities P(a1) = 1/2 , P(a2) = 1/4 , and P(a3) = P(a4) = 1/8. The entropy for this
source is 1.75 bits/symbol.
• The average length l for each code is given by:
• where n(𝑎𝑖 ) is the number of bits in the codeword for letter ai and the average length is given in
bits/symbol.
• Consider the codes for this source in Table.
• Based on the average length, Code 1 appears to be the best code. However, to be useful, a code
should have the ability to transfer information in an unambiguous manner.
• This is obviously not the case with Code 1. Both 𝑎1 and 𝑎2 have been assigned the codeword 0.
When a 0 is received, there is no way to know whether an a1 was transmitted or an 𝑎2.
• We would like each symbol to be assigned a unique codeword.
• Code 2 does not seem to have the problem of ambiguity; each symbol is assigned a distinct
codeword. However, suppose we want to encode the sequence 𝑎2 𝑎1 𝑎1.
• Using Code 2, we would encode this with the binary string 100. However, when the string 100 is
received at the decoder, there are several ways in which the decoder can decode this string. The
string 100 can be decoded as 𝑎2 𝑎1 𝑎1, or as 𝑎2 𝑎3.
• This means that once a sequence is encoded with Code 2, the original sequence cannot be
recovered with certainty.
• Hence, both Code 1 and Code 2 are not uniquely decodable.
• There is necessity of unique decodability from the code; that is, any given sequence of codewords
can be decoded in one, and only one, way.
• A uniquely decodable code is a code that can be decoded without ambiguity, where no
codeword is a prefix of another codeword. This means that each sequence of source
symbols can be decoded without confusion.
• Distinct: A code is distinct if each codeword is different from every other codeword.
Test of Uniquely decodable codes
• In the context of unique decodability in coding theory, a "dangling suffix“ (DS) refers to the
remaining part of a longer codeword when it starts with a shorter codeword as a prefix, essentially
the portion of the longer codeword that cannot be uniquely decoded due to the prefix ambiguity.
• The test for unique decodability requires examining the dangling suffixes initially generated
by codeword pairs in which one codeword is the prefix of the other.
• Make random pairs of codewords for instance, Consider two codewords A and B. A is of k bits and
B is of n bits(k<n), if the first k bits of B are identical to A, then A is called a prefix of B, the
remaining last n-k bits are called as the dangling suffix.
• Example: A = 010, B = 01001, the dangling suffix is 01.
• If the dangling suffix is itself a codeword, then the code is not uniquely decodable.
• If dangling suffix does not exist after considering all possible pairs of codewords, then the given
code is uniquely decodable.
STEPS:
i) Make random all possible pairs of given codewords. On each each pair of codewords
perform the given steps:
a) If no prefix exists then the given code is uniquely decodable.
b) else if prefix code exists find the dangling suffix.
c) After finding the dangling suffix check whether dangling suffix is identical to any
of the codewords.
If yes then the code is not uniquely decodable not then add the DS to the list of
codewords.
ii) Repeat the above steps until no unique dangling suffix is generated (stop when we get
repeated DS)
Example 1: Determine whether the code is uniquely decodable or not.
Example 1: Determine whether the code is uniquely decodable or not.
PREFIX CODES
• The test for unique decodability requires examining the dangling suffixes initially generated by
codeword pairs in which one codeword is the prefix of the other.
• If the dangling suffix is itself a codeword, then the code is not uniquely decodable. One type of
code in which we will never face the possibility of a dangling suffix being a codeword is a
code in which no codeword is a prefix of the other.
• In this case, the set of dangling suffixes is the null set, and we do not have to worry about
finding a dangling suffix that is identical to a codeword. A code in which no codeword is a
prefix to another codeword is called a prefix code.
• Prefix codes are also called as instantaneous codes.
• These are variable length codes.
• All prefix codes are uniquely decodable.
• A simple way to check if a code is a prefix code is to draw the rooted binary tree corresponding
to the code.
• Draw a tree that starts from a single node (the root node) and has a maximum of two possible
branches at each node. One of these branches corresponds to a 1 and the other branch
corresponds to a 0.
• For checking the code to be prefix code we draw a tree with the root node at the top, the left
branch corresponds to a 0 and the right branch corresponds to a 1. Using this convention, we
can draw the binary tree for Code 2, Code 3, and Code 4 as shown in Figure.
• Note that apart from the root node, the trees have two kinds of nodes—nodes that give rise to
other nodes and nodes that do not.
• The first kind of nodes are called internal nodes, and the second kind are called external nodes
or leaves.
• In a prefix code, the codewords are only associated with the external nodes. A code that is
not a prefix code, such as Code 4, will have codewords associated with internal nodes.
• The code for any symbol can be obtained by traversing the tree from the root to the external
node corresponding to that symbol.
• Each branch on the way contributes a bit to the codeword: a 0 for each left branch and a 1 for
each right branch.
NOT PREFIX CODE PREFIX CODE PREFIX CODE
• > In code 2, all the letters are not the leaf nodes or external nodes
therefore, it is not the prefix code.
• In code 3, all the letters a1, a2, a3, a4 are the external nodes therefore
it is the prefix code.
• In code 4, all the letters are external or leaf nodes therefore the code4
is prefix code.