0% found this document useful (0 votes)
12 views17 pages

Umit 1 Mmdcs

Data compression reduces the size of data for storage and transmission, with two main types: lossless, which preserves original quality, and lossy, which sacrifices some quality for greater size reduction. Techniques include Huffman coding, arithmetic coding, and dictionary methods, each with their own advantages and disadvantages. Context-based compression further enhances efficiency by utilizing contextual information to predict and encode data patterns.

Uploaded by

Tamil2009 Swetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

Umit 1 Mmdcs

Data compression reduces the size of data for storage and transmission, with two main types: lossless, which preserves original quality, and lossy, which sacrifices some quality for greater size reduction. Techniques include Huffman coding, arithmetic coding, and dictionary methods, each with their own advantages and disadvantages. Context-based compression further enhances efficiency by utilizing contextual information to predict and encode data patterns.

Uploaded by

Tamil2009 Swetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT:1

Data Compression:

Data compression is the process of encoding information using


fewer bits than the original representation. Its primary goal is to reduce the
size of data to save storage space or transmission time.

Types:

There are two main types of data compression:

1. *Lossless Compression: This method reduces file size without losing


any information, allowing the original data to perfectly reconstructed
from the compressed data. Examples include ZIP, PNG, and FLAC.
2. *Lossy Compression: This method reduces file size by permanently
eliminating certain information, especially redundant or less important
data, which may not be noticeable to the user. This results in a
reduction in quality, but a significant decrease in file size. Examples
include JPEG, MP3, and MPEG.

Compression algorithms exploit patterns and redundancies in data to encode


it more efficiently.

Difference:

| Feature | Lossless Compression | Lossy Compression

| Quality of Data| Preserves original quality | Reduces quality by removing


data|

| Use Case| Text files, software, scientific data | Images (JPEG), audio (MP3),
video (MPEG)|

| *Compression Ratio | Lower (50-70% of original size) | Higher (10-20% of


original size)|

| Reconstruction| Original data can be perfectly reconstructed | Original data


cannot be perfectly reconstructed |

|Examples| ZIP, PNG, FLA| JPEG, MP3, MPEG |

Diagram:
+---------------------+

| Original Data |

+---------------------+

+------------------------+

| Compression Method |

+------------------------+

+-------+--------+ +-------+-------+

+-------v------+ +------v-------+ +------v-------+

| | | | | |

| Lossless | | Lossy | | Lossless |

| Compression | | Compression | | Compression |

| | | | | |

+--------------+ +--------------+ +--------------+

V vv

+--------------+ +--------------+ +--------------+

| | | | | |

| Compressed | | Compressed | | Compressed |

| Lossless Data| | Lossy | | Lossless Data|

| | | Data | | |
+--------------+ +--------------+ +--------------+

V vv

+--------------+ +--------------+ +--------------+

| | | | | |

| Original | | Approximate | | Original |

| Data | | Data | | Data |

| (Reconstructed)| (Degraded Quality)| (Reconstructed)|

| | | | | |

+--------------+ +--------------+ +--------------+

Original Data: The initial set of data before compression.

Compression Method: The process used to compress the data.Lossless

Compression: The method that allows exact data reconstruction.

Lossy Compression: The method that discards some data to reduce file size.

Compressed Data: The result of the compression method.

Original Data (Reconstructed): The result of decompressing lossless data,


identical to the original.Approximate Data

(Degraded Quality): The result of decompressing lossy data, similar but not
identical to the original.

Advantages:

Lossless Compression

- **Data Integrity**: Perfectly reconstructs original data.

- **Versatility**: Suitable for a wide range of data types.

- **Preservation**: Ideal for archival and backups.

- **Reversibility**: Fully reversible process.


- **No Quality Loss**: Maintains original quality.

Lossy Compression

- **High Compression Ratio**: Significantly reduces file sizes.

- **Storage Efficiency**: Saves substantial storage space.

- **Bandwidth Efficiency**: Improves download/upload times.

- **Cost Savings**: Reduces storage and transmission costs.

- **Acceptable Quality Trade-off**: Quality loss often imperceptible

Disadvantages:

Lossless Compression

- **Lower Compression Ratios**: Less efficient in reducing file sizes compared


to lossy compression.

- **Storage Requirements**: Requires more storage space than lossy


compression.

- **Bandwidth Requirements**: Requires more bandwidth for transmission


compared to lossy compression.

Lossy Compression

- **Quality Loss**: Permanently removes some data, leading to a loss in


quality.

- **Irreversible**: Original data cannot be perfectly reconstructed.

- **Not Suitable for All Data Types**: Inappropriate for text files, executable
files, and critical data where quality cannot be compromised.

Basics of Huffman coding:


Huffman coding is a popular algorithm used for lossless data compression. It
creates variable-length codes based on the frequencies of the characters in
the input data. Here’s a basic overview:

Steps in Huffman Coding:

1. **Frequency Calculation**: Calculate the frequency of each character in


the input data.

2. **Building the Huffman Tree**:

- Create a leaf node for each character and build a priority queue (min-heap)
where each node is ordered by the frequency of the character.

- Extract the two nodes with the smallest frequencies from the queue.

- Create a new internal node with these two nodes as children, and the
frequency equal to the sum of the two nodes’ frequencies.

- Insert this new node back into the queue.

- Repeat the process until only one node remains in the queue. This node
becomes the root of the Huffman tree.

3. **Generate Huffman Codes**: Assign binary codes to each character by


traversing the Huffman tree. Typically, a left edge represents a ‘0’ and a right
edge represents a ‘1’.

4. **Encoding**: Replace each character in the input data with its


corresponding Huffman code.

5. **Decoding**: Use the Huffman tree to decode the binary string back into
the original characters.

Example:

Let’s say we want to compress the string “ABRACADABRA”.


1. **Frequency Calculation**:

```

A: 5

B: 2

R: 2

C: 1

D: 1

```

2. **Building the Huffman Tree**:

- Create a priority queue with leaf nodes: `A(5), B(2), R(2), C(1), D(1)`.

- Extract `C(1)` and `D(1)`, create a new node `CD(2)`.

- Extract `B(2)` and `R(2)`, create a new node `BR(4)`.

- Extract `CD(2)` and `BR(4)`, create a new node `CD_BR(6)`.

- Extract `A(5)` and `CD_BR(6)`, create the root node `A_CD_BR(11)`.

2. **Generate Huffman Codes**:

```

A: 0

B: 101

R: 100

C: 1101

D: 1100

```

3. **Encoding**:

```
Original string: ABRACADABRA

Encoded string: 01011001001101101010100100

```

5. **Decoding**:

- Traverse the Huffman tree according to the encoded binary string to


reconstruct the original string.

Advantages of Huffman Coding:

- **Optimality**: Produces the smallest possible average code length for a


given set of symbols and their frequencies.

- **Efficiency**: Suitable for data with varying character frequencies.

Disadvantages of Huffman Coding:

- **Dependency on Frequency**: Less effective if character frequencies are


uniform.

- **Overhead**: Requires storing the Huffman tree or the codes along with
the compressed data for decoding.

Huffman coding is widely used in various applications, including file


compression formats like ZIP and image formats like JPEG.

Arithmetic coding:

Arithmetic coding is another method of lossless data


compression, distinct from Huffman coding. It allows for more efficient
compression by encoding sequences of symbols (like characters or pixels)
into a single floating-point number between 0 and 1.
Basic Concept;

Arithmetic coding works by partitioning the interval [0, 1] into sub-intervals,


each representing a symbol or sequence of symbols from the input data. The
size of each sub-interval corresponds to the probability of the symbol(s) it
represents.

Steps in Arithmetic Coding:

1. **Initialization**:

- Initialize a range [low, high] to [0, 1].

- Divide the range into sub-intervals proportional to the probabilities of the


symbols.

2. **Encoding**:

- For each symbol in the input sequence, narrow down the range [low, high]
to the sub-interval that represents the symbol’s probability.

- Update [low, high] based on the cumulative probabilities of the symbols


encountered so far.

3. **Decoding**:

- Start with the encoded value and the same probabilities used for
encoding.

- Reconstruct the original sequence by iteratively determining which


symbol(s) correspond to the current value within the current range [low,
high].

# Example:

arithmetic coding with a simple example using the string “ABBACAB”.

Step-by-Step Encoding
1. **Frequency Calculation**:

```

A: 3, B: 3, C: 1

```

2. **Probability Calculation**:

- Calculate probabilities based on frequencies:

```

P(A) = 3/7, P(B) = 3/7, P(C) = 1/7

```

3. **Constructing the Cumulative Probability Range**:

- Assign cumulative ranges to each symbol:

```

A: [0, 3/7)

B: [3/7, 6/7)

C: [6/7, 1)

```

4. **Encoding “ABBACAB”**:

- Initial range [0, 1].

- Narrow down the range for each symbol:

- Encode ‘A’: Current range [0, 1) → Narrow to [0, 3/7).

- Encode ‘B’: Current range [0, 3/7) → Narrow to [3/7, 6/7).

- Encode ‘B’: Current range [3/7, 6/7) → Narrow to [3/7, 6/7).

- Encode ‘A’: Current range [3/7, 6/7) → Narrow to [3/7, 9/14).


- Encode ‘C’: Current range [3/7, 9/14) → Narrow to [6/7, 12/14).

- Encode ‘A’: Current range [6/7, 12/14) → Narrow to [6/7, 18/28).

- Encode ‘B’: Current range [6/7, 18/28) → Narrow to [18/28, 21/28).

-Final encoded value lies between [18/28, 21/28).

Decoding:

- Start with the encoded value and reconstruct the sequence using the
same cumulative ranges and probabilities used in encoding.

Advantages of Arithmetic Coding

**Efficiency**: Can achieve better compression ratios than Huffman coding,


especially for sequences with uneven symbol frequencies.

- **Adaptability**: Can adapt to changes in symbol frequencies without re-


encoding entire blocks of data.

- **Suitability for Continuous Data**: Can handle data where symbols are not
discrete but continuous (like pixels in images).

Disadvantages of Arithmetic Coding

Complexity**: More complex to implement compared to Huffman coding.

Floating-point Arithmetic**: Requires careful handling of floating-point


operations, which may introduce rounding errors.

Arithmetic coding is widely used in various applications where high


compression efficiency is critical, such as in video and image compression
algorithms like JPEG2000

Dictionary techniques:
Dictionary techniques in data compression refer to methods that
leverage dictionaries or tables to store and manage repetitive patterns or
sequences in data efficiently. These techniques are particularly effective for
compressing data with frequent repetitions or patterns, such as text, images,
and multimedia.

Types of Dictionary Techniques;

1. **Static Dictionary Compression**:

- **Lempel-Ziv (LZ) Algorithms**: LZ77 and LZ78 are fundamental static


dictionary techniques.

- **LZ77**: Uses a sliding window to find and replace repeated patterns


with references to previous occurrences.

- **LZ78**: Builds a dictionary dynamically during compression and


decompression to replace repeated sequences.

- **LZMA (Lempel-Ziv-Markov chain algorithm)**: Combines LZ77 with a


range of compression filters and a final coder based on range encoding.

2. **Dynamic Dictionary Compression**:

- **Lempel-Ziv-Welch (LZW)**: Similar to LZ78 but maintains a dynamic


dictionary during compression without transmitting it. Widely used in GIF and
UNIX compress utilities.

- **DEFLATE**: Combines LZ77 and Huffman coding, widely used in ZIP


files and PNG images.

3. **Adaptive Dictionary Compression**:

- **Prediction by Partial Matching (PPM)**: A statistical data compression


technique that builds and adapts a model of the source data.

- **Burrows-Wheeler Transform (BWT)**: Reorders the characters into runs


of similar characters, making it easier to compress using move-to-front and
Huffman coding.
How Dictionary Techniques Work?

- **Dictionary Initialization**: The compression process starts with an empty


dictionary or a predefined set of symbols.

- **Dictionary Maintenance**: During compression, as patterns or sequences


repeat, they are added to the dictionary.

- **Encoding**: Replaces repetitive sequences with references to the


dictionary entries or codes that represent those sequences.

- **Decoding**: Rebuilds the original data by referencing the dictionary


entries or codes provided during compression.

Advantages of Dictionary Techniques:

- **High Compression Ratios**: Effective in reducing data size, especially for


repetitive or patterned data.

- **Efficiency**: Can handle various types of data efficiently, including text,


images, and multimedia.

- **Adaptability**: Some techniques (like adaptive methods) can adjust to


changing patterns in the data, improving compression performance over
time.

Disadvantages of Dictionary Techniques:

- **Complexity**: Implementing and managing dictionaries can be


computationally intensive.

- **Memory Overhead**: Requires additional memory to store dictionaries,


especially in dynamic or adaptive techniques.

- **Compression Speed**: Depending on the algorithm, compression speed


may vary, with some techniques being slower due to dictionary
management.

Applications:
- **Text Compression**: Efficient for compressing text documents, where
words and phrases often repeat.

- **Image Compression**: Useful in formats like GIF and JPEG where pixel
patterns can be represented compactly.

- **Video Compression**: Applied in video codecs like H.264 and HEVC to


reduce data size for transmission and storage.

- **Archiving**: Widely used in file compression utilities like ZIP, RAR, and 7z
for storing and transferring files efficiently.

Dictionary techniques play a crucial role in modern data compression


standards, balancing between compression efficiency, computational
complexity, and adaptability to various data types and patterns.

CONTENT BASED COMPRESSION:

Context-based compression is a data compression technique that utilizes


contextual information or models to predict and encode data more efficiently.
Unlike simple statistical methods that treat data independently, context-
based compression considers the surrounding context or history of the data
to make better predictions about its structure and redundancy.

Key Concepts of Context-Based Compression:

1. **Context Modeling**:

- **Adaptive Models**: These models adapt and update based on


previously processed data. They capture dependencies and patterns that
occur within a specific context or window of the data.

- **Static Models**: These models are fixed and predefined based on


statistical properties of the data. They provide a baseline for comparison with
adaptive models.
2. **Prediction and Encoding**:

- **Prediction**: Using the context or model, predict the most likely


symbols or sequences that will occur next in the data.

- **Encoding**: Replace predictable patterns with shorter codes or


references to improve compression ratios. The encoding process may vary
depending on the specific algorithm used.

3. **Types of Context-Based Compression Techniques**:

- **Prediction by Partial Matching (PPM)**: A classic context-based method


that uses statistical models to predict upcoming symbols based on a history
of previous symbols. It adapts its model dynamically as more data is
processed.

- **Context-Tree Weighting (CTW)**: Explores all possible contexts up to a


certain depth and assigns weights to each context based on observed data.

- **Context-Adaptive Binary Arithmetic Coding (CABAC)**: Used in video


compression standards like H.264/AVC and H.265/HEVC, CABAC adapts to the
local context of binary data to achieve high compression efficiency.

Advantages of Context-Based Compression

- **High Compression Ratios**: Context-based techniques can exploit


complex dependencies and patterns in data, leading to superior compression
compared to simpler methods.

- **Adaptability**: Models can adapt to changes in data patterns over time,


improving compression efficiency.

- **Wide Applicability**: Effective for compressing a wide range of data


types, including text, images, audio, and video.

Disadvantages of Context-Based Compression

- **Complexity**: Implementing and managing context-based models can be


computationally intensive and may require significant memory resources.
- **Training Overhead**: Some methods may require initial training or
learning periods to build accurate models, which can impact compression
speed.

- **Algorithm-specific**: Different context-based algorithms may perform


differently depending on the characteristics of the data being compressed.

Applications

- **Text Compression**: Efficient for compressing natural language text


where the context of words and phrases influences compression
effectiveness.

- **Image and Video Compression**: Used in modern video codecs like


H.264/AVC, H.265/HEVC, and VP9 to exploit spatial and temporal
dependencies in image and video data.

- **Data Transmission**: Beneficial for reducing bandwidth requirements in


data transmission applications, such as internet protocols and wireless
communications.

APPLICATION:

Data compression finds application in numerous fields and


scenarios where reducing the size of data is beneficial. Here are some key
applications of data compression:

1. Data Storage and ArchivingFile Compression: Compressing files before


storage reduces disk space requirements. Formats like ZIP, RAR, and 7z
are commonly used for this purpose.

2. Backup Compression: Compressing backup data saves storage space


and reduces the time required for backup operations

3. .Data TransmissionNetwork Communication: Compressing data before


transmission reduces bandwidth usage and speeds up data transfer.
This is crucial for internet protocols (HTTP, FTP), email (SMTP, IMAP),
and cloud services.
4. Streaming Media: Video and audio streaming services use compression
to deliver content efficiently over networks while maintaining
acceptable quality.

5. . Multimedia and EntertainmentImage : Formats like JPEG, PNG, and GIF


compress images for storage and web display without significant loss
of quality.

6. Audio Compression: Formats like MP3, AAC, and OGG compress audio
files for storage and streaming while maintaining perceptible quality.

7. Video Compression: Codecs like H.264/AVC, H.265/HEVC, VP9, and AV1


compress video data for efficient storage, streaming, and broadcasting.

8. Database SystemsData Compression in Databases: Database


management systems (DBMS) employ compression techniques to
reduce storage requirements and improve query performance.

9. . Embedded Systems and IoTResource-constrained Devices: Devices


with limited storage and processing capabilities use compression to
store and transmit data efficiently. This includes sensors, IoT devices,
and embedded systems

10. . Genomics and BioinformaticsDNA Sequencing Data:


Compression techniques are used to store and analyze large volumes
of genomic data efficiently, enabling faster processing and reducing
storage costs.

11. . Scientific and Engineering ApplicationsSimulation and Modeling:


Scientific simulations and engineering models generate vast amounts
of data. Compression helps in storing and analyzing this data more
effectively.
12. Sensor Networks: Data from environmental monitoring, industrial
sensors, and IoT devices benefit from compression to optimize storage
and communication bandwidth.

13. Compression in Programming and DevelopmentExecutable Code:


Code and binaries can be compressed to reduce file size for
distribution and deployment.

14. Version Control Systems: Compressing repositories and version


history in systems like Git and SVN reduces storage and improves
performance.

15. Medical ImagingDICOM Compression: Medical imaging formats


use compression to store and transmit images efficiently while
maintaining diagnostic quality.

16. . Compression in Cloud ComputingCloud Storage and Computing:


Data compression is utilized in cloud services to optimize storage
costs, improve data transfer speeds, and enhance scalability.

You might also like