0% found this document useful (0 votes)

33 views11 pages

Ip 4

The document discusses different methods of image compression including lossless and lossy compression. Lossless compression algorithms like run-length encoding and Huffman coding reduce file sizes without quality loss by eliminating redundant patterns. Lossy methods like JPEG discard unimportant image data, achieving higher compression but possible quality degradation. Run-length encoding replaces repeated pixel values with run counts for compression.

Uploaded by

Anuradha Pise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views11 pages

Ip 4

Uploaded by

Anuradha Pise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Image compression is the process of reducing the size of an image file without significantly

degrading its visual quality. It is an essential technique used in various applications, such as
digital photography, image storage, transmission over networks, and multimedia systems. The
primary goal of image compression is to minimize the file size while preserving the essential
information and perceptual fidelity of the image.

The compression of images is an important step before we start the processing of larger images
or videos. The compression of images is carried out by an encoder and output a compressed
form of an image. In the processes of compression, the mathematical transforms play a vital
role.

Need
Image compression is essential in our digital lives because large image files can cause slow
website loading times, difficulties in sharing images online, and limited storage space. By
compressing images, their size is reduced, making it easier to store and transmit them.

1. Reduced Storage Requirements: Image compression reduces file sizes, allowing for
efficient storage of images on devices with limited storage capacity.
2. Bandwidth Efficiency: Compressed images require less bandwidth, resulting in faster
upload and download times during image transmission over networks.
3. Faster Processing: Smaller file sizes from compression lead to faster loading times and
improved performance in image processing tasks.
4. Cost Reduction: Compression reduces storage, network infrastructure, and data
transfer costs associated with images.
5. Improved User Experience: Smaller compressed images result in quicker website
loading, enhanced multimedia streaming, and better user satisfaction.
6. Compatibility: Compression adapts images to meet the size and format limitations of
different devices and platforms.
7. Archiving and Preservation: Image compression reduces storage requirements,
making it more feasible to archive and preserve large collections of images over time.

Classification
There are two main types of image compression: lossless compression and lossy compression.

Lossless Compression
● Lossless compression algorithms reduce the file size of an image without any loss of
information. The compressed image can be perfectly reconstructed to its original form.
● This method is commonly used in scenarios where preserving every detail is crucial,
such as medical imaging or scientific data analysis
● Lossless compression achieves compression by exploiting redundancy and eliminating
repetitive patterns in the image data.
● Some common lossless compression algorithms include:
○ Run-Length Encoding (RLE): This algorithm replaces consecutive repetitions of
the same pixel value with a count and the pixel value itself.
○ Huffman coding: It assigns variable-length codes to different pixel values based
on their frequency of occurrence in the image.
○ Lempel-Ziv-Welch (LZW): This algorithm replaces repetitive sequences of pixels
with shorter codes, creating a dictionary of commonly occurring patterns.
● Lossless compression techniques typically achieve modest compression ratios
compared to lossy compression but ensure exact data preservation.
● Types of lossless images include:
○ RAW - these files types tend to be quite large in size. Additionally, there are
different versions of RAW, and you may need certain software to edit the files.
○ PNG - Compresses images to keep their small size by looking for patterns on a
photo, and compressing them together. The compression is reversible, so once
you open a PNG file, the image recovers exactly.
○ BMP - A format found exclusively to Microsoft. It's lossless, but not frequently
used.

Lossy Compression
● Lossy compression algorithms achieve higher compression ratios by discarding some
information from the image that is less perceptually significant.
● This method is widely used in applications such as digital photography, web images, and
multimedia streaming, where a small loss in quality is acceptable to achieve significant
file size reduction.
● Lossy compression techniques exploit the limitations of human visual perception and the
characteristics of natural images to remove or reduce redundant or less noticeable
details
● The algorithms achieve this by performing transformations on the image data and
quantizing it to reduce the number of distinct values.
● The main steps involved in lossy compression are:
○ Transform Coding: The image is transformed from the spatial domain to a
frequency domain using techniques like Discrete Cosine Transform (DCT) or
Wavelet Transform. These transforms represent the image data in a more
compact manner by concentrating the energy in fewer coefficients.
○ Quantization: In this step, the transformed coefficients are quantized, which
involves reducing the precision or dividing the range of values into a finite set of
discrete levels. Higher levels of quantization lead to greater compression but also
more loss of information. The quantization process is typically designed to
allocate more bits to visually important coefficients and fewer bits to less
important ones.
○ Entropy Encoding: The quantized coefficients are further compressed using
entropy coding techniques like Huffman coding or Arithmetic coding. These
coding schemes assign shorter codes to more frequently occurring coefficients,
resulting in additional compression.
● The amount of compression achieved in lossy compression is customizable based on
the desired trade-off between file size reduction and visual quality. Different compression
algorithms and settings can be used to balance the compression ratio and the
perceptual impact on the image.

Methods of compression

Run-length Coding
Run-length coding is a simple and effective technique used in image compression, especially for
scenarios where the image contains long sequences of identical or highly similar pixels. It
exploits the redundancy present in such sequences to achieve compression.

The basic idea behind run-length coding is to represent consecutive repetitions of the same
pixel value with a count and the pixel value itself, instead of explicitly storing each pixel
individually. By doing so, run-length coding reduces the amount of data required to represent
these repetitive patterns.

Say you have a picture of red and white stripes, and there are 12 white pixels and 12 red pixels.
Normally, the data for it would be written as WWWWWWWWWWWWRRRRRRRRRRRR, with
W representing the white pixel and R the red pixel. Run length would put the data as 12W and
12R. Much smaller and simpler while still keeping the data unaltered.

Here's how run-length coding works for image compression:

Run-length Encoding (RLE):

● Scanning: The image is scanned row by row or column by column. The scanning
direction is not crucial, but it should be consistent for encoding and decoding.
● Finding Runs: During scanning, the algorithm identifies runs, which are sequences of
consecutive pixels with the same value. The length of each run is determined.
● Encoding: For each run, the algorithm stores the length of the run (count) and the pixel
value. This information is typically represented using a pair of values: <count, value>.
The count is usually represented using a fixed number of bits or a variable-length code,
depending on the specific implementation.
● Storing the Encoded Data: The encoded run-length information is stored, usually in a
compressed form, as a sequence of <count, value> pairs or a compressed bitstream.

Run-length Decoding:
● Retrieving the Encoded Data: The encoded run-length data is retrieved from storage.
● Decoding: The decoding process involves reconstructing the original image from the
run-length data. Starting from the first <count, value> pair, the algorithm repeats the
value count times to obtain the sequence of pixels. This process is repeated for each
<count, value> pair until the entire image is reconstructed.
The decoding step essentially reverses the encoding process, reconstructing the original
image by expanding the compressed run-length representation.

The original data isn't instantly accessible, we have to decode everything before you can access
anything. Also You can't tell how large the decoded data will be.
Run-length coding is particularly effective for images with areas of solid color or regions with
uniform patterns, such as line drawings, text, or simple graphics. However, it may not be as
efficient for more complex and detailed images, as they tend to have fewer long runs of identical
pixels.

Run-length coding is often used in conjunction with other compression techniques, such as
Huffman coding or arithmetic coding, to achieve higher compression ratios. By combining
run-length coding with these entropy encoding techniques, the frequency of occurrence of
different runs or pixel values can be exploited to assign shorter codes to more frequent patterns,
resulting in additional compression.

Shannon Fano Coding

Shannon-Fano coding is a technique used for entropy encoding in image compression. It
assigns variable-length codes to different symbols (in this case, pixel values) based on their
probability of occurrence. The codes are designed in a way that ensures a prefix-free property,
meaning that no code is a prefix of another code. Shannon-Fano coding is a precursor to
Huffman coding and provides a foundation for understanding its concepts.

Here's how Shannon-Fano coding works for image compression:

It's important to note that if the probabilities of two or more pixel values are the same, a different
mechanism may be employed to ensure the prefix-free property. For example, in such cases,
the pixel values can be sorted based on their original order in the image or some other
tie-breaking rule.

Shannon-Fano coding is a basic technique that assigns codes based on probabilities but does
not guarantee the optimal code lengths. Huffman coding, which is an extension of
Shannon-Fano coding, provides a more efficient encoding scheme by considering the
probabilities and constructing a binary tree where shorter codes are assigned to more frequent
symbols.

In image compression, Shannon-Fano coding is often used as a stepping stone or as a

foundation for more advanced entropy encoding techniques like Huffman coding or Arithmetic
coding. By assigning variable-length codes based on pixel probabilities, Shannon-Fano coding
contributes to the overall compression of image data by efficiently representing frequent pixel
values with shorter codes.

Huffman coding
Huffman coding is a widely used entropy encoding technique for image compression. It assigns
variable-length codes to different symbols (in this case, pixel values) based on their probabilities
or frequencies of occurrence. Huffman coding achieves efficient compression by assigning
shorter codes to more frequently occurring symbols.

Here's how Huffman coding works for image compression:

● Probability Calculation:
○ Frequency Counting: The first step is to determine the frequency of occurrence
for each unique pixel value in the image. This is done by counting the number of
times each pixel value appears in the image.
○ Probability Calculation: Once the frequencies are determined, probabilities can
be calculated by dividing each frequency by the total number of pixels in the
image. These probabilities represent the likelihood of encountering each pixel
value.
● Construction of Huffman Tree:
○ Symbol Creation: Each unique pixel value is treated as a symbol.
○ Node Creation: A leaf node is created for each symbol, containing the symbol
value and its probability.
○ Combining Nodes: The nodes are sorted based on their probabilities, and the two
nodes with the lowest probabilities are combined to create a new parent node.
The probability of the parent node is the sum of the probabilities of its child
nodes.
○ Tree Formation: The process of combining nodes is repeated iteratively until all
nodes are combined into a single root node. This results in the creation of a
binary tree known as the Huffman tree.
● Code Assignment:
○ Traversing the Huffman Tree: Starting from the root node, a traversal is
performed through the Huffman tree. Moving to the left child represents a binary
digit "0," while moving to the right child represents a binary digit "1."
○ Code Assignment: The codes are assigned by concatenating the binary digits
encountered during the traversal from the root to each leaf node. The codes for
frequently occurring symbols are shorter, while the codes for less frequent
symbols are longer.
○ Building Codebook: The assigned codes for each symbol are stored in a
codebook, which is used for encoding and decoding.
● Encoding:
○ Replace each pixel value with its corresponding Huffman code from the
codebook to generate a sequence of Huffman codes representing the
compressed image.
● Decoding:
○ Traverse the sequence of Huffman codes from left to right, starting from the root
of the Huffman tree. Decode each code to its corresponding pixel value,
reconstructing the original image.

Huffman coding achieves efficient compression by assigning shorter codes to more frequently
occurring symbols, which results in a reduction of the overall number of bits required to
represent the image. This technique is widely used in image compression algorithms such as
JPEG (Joint Photographic Experts Group) and is known for its simplicity and effectiveness in
achieving good compression ratios while preserving visual quality

Scalar and vector quantization

Scalar Quantization:
Scalar quantization, also known as scalar quantization or scalar quantization coding (SQC), is a
technique used in image compression to reduce the number of bits required to represent an
image by quantizing individual pixel values. It assigns discrete levels or values to each pixel
based on a quantization table or codebook.

Here's how scalar quantization works for image compression:

● Quantization Table or Codebook Generation:
○ Range Determination: The range of pixel values in the image is determined,
typically by examining the minimum and maximum pixel values.
○
○ Division into Intervals: The range is divided into a set of non-overlapping intervals
or levels. The number of levels determines the number of bits used to represent
each pixel value.
● Pixel Quantization:
○ For each pixel in the image, the quantization process involves mapping its
original value to the nearest level in the quantization table or codebook. This
mapping is performed based on the proximity of the pixel value to the available
levels.
● Encoding:
○ The quantized values, which are represented by the assigned levels, are
encoded and stored using the corresponding number of bits for each quantized
pixel value.
● Decoding:
○ During decoding, the encoded quantized values are retrieved, and the inverse
process is applied. The quantized values are mapped back to their original pixel
values based on the inverse mapping of the quantization table or codebook.

Scalar quantization provides a simple and efficient means of reducing the number of bits
required to represent an image. However, it may introduce quantization errors and loss of fine
details since each pixel is quantized independently.

Vector Quantization:
Vector quantization (VQ) is a technique used in image compression that extends the concept of
scalar quantization by grouping multiple pixels together into blocks or vectors. It aims to capture
the statistical dependencies and similarities among neighboring pixels, resulting in improved
compression performance and preservation of local image features.

Here's how vector quantization works for image compression:

● Block Division:
○ The image is divided into non-overlapping blocks or vectors, each containing
multiple pixels. The size of the blocks can vary depending on the specific
implementation.
● Codebook Generation:
○ For vector quantization, a codebook is generated by applying clustering
algorithms such as k-means to the blocks in the image. The codebook represents
a set of representative vectors that will be used for quantization.
● Vector Quantization:
○ Each block in the image is quantized by finding the codebook entry that best
approximates the block's content. The quantization is performed by assigning the
index of the closest codebook entry to the block.
● Encoding:
○ The indices of the selected codebook entries for each block are encoded and
stored, typically using a variable-length code or a fixed number of bits.
● Decoding:
○ During decoding, the encoded indices are retrieved, and the corresponding
codebook entries are used to reconstruct the quantized blocks.

Vector quantization offers improved compression performance compared to scalar quantization

as it takes into account the correlation among neighboring pixels. It captures the statistical
structure of the image more effectively and can preserve image details and features better.
However, vector quantization requires more computational complexity and memory to generate
and store the codebook compared to scalar quantization.

Compression Standards - JPEG/MPEG ???

Compression Standards for JPEG (Joint Photographic Experts Group) and MPEG (Moving
Picture Experts Group) are widely used in image and video compression, respectively. These
standards define the encoding and decoding processes for efficient compression and
decompression, ensuring interoperability across different devices and platforms.

JPEG Compression Standard:

The JPEG compression standard is primarily designed for still image compression. It provides a
lossy compression method that achieves high compression ratios while maintaining acceptable
image quality.

The main components of the JPEG compression standard include:

● Color Space Conversion: The input image is typically converted from the RGB color
space to the YCbCr color space, which separates the luminance (Y) and chrominance
(Cb and Cr) components. This color space transformation exploits the fact that the
human visual system is more sensitive to changes in brightness (luminance) than in
color (chrominance).
● Discrete Cosine Transform (DCT): The image is divided into blocks, typically 8x8
pixels, and a two-dimensional DCT is applied to each block. The DCT transforms the
spatial image data into frequency components, separating the low-frequency and
high-frequency information.
● Quantization: The DCT coefficients are quantized, reducing the precision of the
frequency components. This quantization step introduces loss of information, resulting in
a lossy compression scheme. The quantization parameters can be adjusted to control
the trade-off between compression ratio and image quality.
● Entropy Encoding: The quantized DCT coefficients are further compressed using
entropy encoding techniques such as Huffman coding. Huffman coding assigns shorter
codes to more frequently occurring coefficients, reducing the overall bit rate required for
encoding.
MPEG Compression Standard:
The MPEG compression standard is designed for compressing digital video sequences. It
provides both lossy and lossless compression methods, enabling efficient video storage and
transmission.

The main components of the MPEG compression standard include:

● Intra-frame and Inter-frame Compression: MPEG uses a combination of intra-frame
and inter-frame compression techniques. Intra-frame compression compresses
individual frames using similar techniques as JPEG, treating each frame as a separate
image. Inter-frame compression exploits temporal redundancy by encoding the
difference between consecutive frames, known as motion compensation.
● Motion Compensation: In inter-frame compression, motion compensation is used to
estimate and encode the motion vectors between frames. By predicting the motion
between frames, only the differences (residuals) need to be encoded, resulting in
efficient compression.
● Discrete Cosine Transform (DCT): Similar to JPEG, MPEG applies DCT to the blocks
within frames or the residuals to transform the spatial information into frequency
components.
● Quantization and Entropy Encoding: The DCT coefficients or residuals are quantized
and entropy encoded using techniques such as Huffman coding or arithmetic coding.
● Bitrate Control: MPEG provides various profiles and levels that define different
compression capabilities and target applications. Bitrate control mechanisms help
achieve a desired compression level while maintaining a specified bitrate for video
streaming or storage.

The MPEG compression standard is used in various formats such as MPEG-1, MPEG-2,
MPEG-4, and MPEG-7, each offering different levels of compression efficiency and supporting
different video applications, from low-quality video streaming to high-definition video storage.

Video compression
Video compression is the process of reducing the size of video data while maintaining an
acceptable level of visual quality. It involves applying various techniques to exploit spatial and
temporal redundancies in video sequences, resulting in efficient storage, transmission, and
streaming of videos.

Video compression techniques aim to achieve a balance between compression efficiency and
visual quality. The choice of compression algorithm, parameters, and settings depends on the
specific requirements, such as target bitrate, resolution, desired quality, and available resources.

Common video compression standards include MPEG-2, MPEG-4, H.264/AVC, H.265/HEVC

(High-Efficiency Video Coding), and VP9. These standards have been widely adopted in various
applications, including video streaming platforms, video conferencing systems, digital television,
and video storage devices.
Object Recognition ???
Object recognition refers to the process of identifying and classifying objects within digital
images or video frames. It is a fundamental task in computer vision, a field of study that focuses
on enabling computers to understand and interpret visual information. Object recognition
algorithms analyze visual data and extract meaningful features to make sense of the objects
present in the scene.

Object recognition can be approached using different techniques, ranging from traditional
computer vision methods to more advanced deep learning-based approaches. Traditional
methods often rely on handcrafted features and classifiers, while deep learning methods
leverage the power of neural networks to automatically learn discriminative features and
classifiers from large amounts of labeled data.

Computer Vision
Computer vision is a field of study and research that focuses on enabling computers to gain a
high-level understanding of visual information from digital images or video. It involves
developing algorithms and techniques that allow computers to analyze, interpret, and make
sense of visual data, mimicking human visual perception and understanding.

The main goals of computer vision include:

● Image and Video Understanding: Computer vision aims to enable machines to
understand and interpret the content of images and videos. This includes tasks such as
object detection and recognition, scene understanding, image segmentation, tracking,
and motion analysis.
● Feature Extraction and Representation: Computer vision algorithms extract
meaningful features from images or videos that can capture relevant information for
further analysis. These features can be low-level visual cues like edges, colors, and
textures, or higher-level semantic features that represent objects, shapes, or structures.
● Object Detection and Recognition: Computer vision algorithms can detect and
recognize objects within images or videos. This involves identifying specific objects or
classes of objects and localizing their positions or regions of interest. Object recognition
can be performed using machine learning techniques, such as support vector machines
(SVMs), convolutional neural networks (CNNs), or deep learning architectures.
● Scene Understanding and Understanding Context: Computer vision algorithms aim
to comprehend the overall scene context, including the relationships between objects,
spatial layout, and semantic understanding of the scene. This involves tasks such as
scene classification, scene segmentation, and understanding the interactions between
objects within the scene.
Computer vision algorithms utilize a range of techniques, including image processing, pattern
recognition, machine learning, deep learning, and probabilistic models. These algorithms
leverage mathematical and statistical methods to analyze visual data and extract meaningful
information.
Object recognition techniques
Object recognition techniques are methods used in computer vision to identify and classify
objects within images or video frames. These techniques aim to mimic human visual perception
and enable machines to understand and interpret visual information. Here are some commonly
used object recognition techniques:

● Template Matching:
○ Template matching compares a predefined template image with sub-regions of
the input image to find matching patterns. It involves calculating the similarity
between the template and image patches using metrics like correlation or sum of
squared differences. Template matching is straightforward but can be sensitive to
variations in scale, rotation, and lighting conditions.
● Feature-Based Methods:
○ Feature-based methods extract distinctive features from images and use them to
recognize objects. Examples of feature descriptors include Scale-Invariant
Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Oriented
FAST and Rotated BRIEF (ORB). These methods detect keypoints in images and
compute descriptors that represent the local visual characteristics of the
keypoints. Object recognition is then performed by matching and comparing
these features across images.
● Deep Learning:
○ Deep learning, particularly convolutional neural networks (CNNs), has
revolutionized object recognition. CNNs are capable of automatically learning
hierarchical features from raw image data. Training involves feeding labeled
images to the network, and it learns to recognize objects by adjusting the weights
of its layers. Deep learning-based object recognition models, such as YOLO (You
Only Look Once), Faster R-CNN (Region-based Convolutional Neural Networks),
and SSD (Single Shot MultiBox Detector), have achieved impressive results in
terms of accuracy and real-time performance.
● Histogram-based Methods:
○ Histogram-based methods utilize color and texture information to recognize
objects. These methods analyze the distribution of color or texture features in
images and use statistical measures to compare and classify objects. Examples
include color histograms, local binary patterns (LBPs), and histogram of oriented
gradients (HOG). Histogram-based methods are effective for simple object
recognition tasks but may struggle with complex scenes or object variations.
● Ensemble Techniques:
○ Ensemble techniques combine multiple object recognition models or classifiers to
improve overall performance. This can involve techniques such as ensemble
averaging, boosting, or bagging. By combining the predictions of multiple models,
ensemble techniques can enhance robustness, accuracy, and generalization of
object recognition systems.