Umit 1 Mmdcs
Umit 1 Mmdcs
Data Compression:
Types:
Difference:
| Use Case| Text files, software, scientific data | Images (JPEG), audio (MP3),
video (MPEG)|
Diagram:
+---------------------+
| Original Data |
+---------------------+
+------------------------+
| Compression Method |
+------------------------+
+-------+--------+ +-------+-------+
| | | | | |
| | | | | |
V vv
| | | | | |
| | | Data | | |
+--------------+ +--------------+ +--------------+
V vv
| | | | | |
| | | | | |
Lossy Compression: The method that discards some data to reduce file size.
(Degraded Quality): The result of decompressing lossy data, similar but not
identical to the original.
Advantages:
Lossless Compression
Lossy Compression
Disadvantages:
Lossless Compression
Lossy Compression
- **Not Suitable for All Data Types**: Inappropriate for text files, executable
files, and critical data where quality cannot be compromised.
- Create a leaf node for each character and build a priority queue (min-heap)
where each node is ordered by the frequency of the character.
- Extract the two nodes with the smallest frequencies from the queue.
- Create a new internal node with these two nodes as children, and the
frequency equal to the sum of the two nodes’ frequencies.
- Repeat the process until only one node remains in the queue. This node
becomes the root of the Huffman tree.
5. **Decoding**: Use the Huffman tree to decode the binary string back into
the original characters.
Example:
```
A: 5
B: 2
R: 2
C: 1
D: 1
```
- Create a priority queue with leaf nodes: `A(5), B(2), R(2), C(1), D(1)`.
```
A: 0
B: 101
R: 100
C: 1101
D: 1100
```
3. **Encoding**:
```
Original string: ABRACADABRA
```
5. **Decoding**:
- **Overhead**: Requires storing the Huffman tree or the codes along with
the compressed data for decoding.
Arithmetic coding:
1. **Initialization**:
2. **Encoding**:
- For each symbol in the input sequence, narrow down the range [low, high]
to the sub-interval that represents the symbol’s probability.
3. **Decoding**:
- Start with the encoded value and the same probabilities used for
encoding.
# Example:
Step-by-Step Encoding
1. **Frequency Calculation**:
```
A: 3, B: 3, C: 1
```
2. **Probability Calculation**:
```
```
```
A: [0, 3/7)
B: [3/7, 6/7)
C: [6/7, 1)
```
4. **Encoding “ABBACAB”**:
Decoding:
- Start with the encoded value and reconstruct the sequence using the
same cumulative ranges and probabilities used in encoding.
- **Suitability for Continuous Data**: Can handle data where symbols are not
discrete but continuous (like pixels in images).
Dictionary techniques:
Dictionary techniques in data compression refer to methods that
leverage dictionaries or tables to store and manage repetitive patterns or
sequences in data efficiently. These techniques are particularly effective for
compressing data with frequent repetitions or patterns, such as text, images,
and multimedia.
Applications:
- **Text Compression**: Efficient for compressing text documents, where
words and phrases often repeat.
- **Image Compression**: Useful in formats like GIF and JPEG where pixel
patterns can be represented compactly.
- **Archiving**: Widely used in file compression utilities like ZIP, RAR, and 7z
for storing and transferring files efficiently.
1. **Context Modeling**:
Applications
APPLICATION:
6. Audio Compression: Formats like MP3, AAC, and OGG compress audio
files for storage and streaming while maintaining perceptible quality.