0% found this document useful (0 votes)

16 views20 pages

7.file Compression

Uploaded by

rgn12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views20 pages

7.file Compression

Uploaded by

rgn12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 20

File Compression

1
What is file compression?

• Compression creates smaller files

by eliminating or reducing
redundancy
• For example, let’s say
“AAAADDDDDDD” represents a
letter on your computer
• Compression software would refer to
this as “4A7D” to save 64% of space

2
Symbols and Encoding
• We assume a piece of source data that is subject to
compression (such as text, image, etc.) is
represented by a sequence of symbols. Each symbol
is encoded in a computer by a code (or codeword,
value), which is a bit string.
• Example:
– English text: abc (symbols), ASCII (coding)
– Chinese text: 多媒體 (symbols), BIG5 (coding)
– Image: color (symbols), RGB (coding)

3
Character distribution
• Some symbols are used more frequently than others.
• In English text, ‘e’ and space occur most often.
• Fixed-length encoding: use the same number of bits
to represent each symbol.
• With fixed-length encoding, to represent n symbols,
we need log2n bits for each code.

4
Finding redundancy
• Most files are fairly redundant
• Instead of listing the same information over
and over, compression programs list the
information once and refer back to it when the
information appears again

5
Example

• “Ask not what your country can do for you —

ask what you can do for your country.”
• This quote has 17 words, with 61 letters, 16
spaces, one dash, and one period.
• Let’s say each character uses one unit of
memory; that’s a total of 79 units.
• Notice that many words appear several times.

6
• “Ask not what your country can do for you — ask what you
can do for your country.”
• Below is a sample dictionary and the compressed sentence.

 The file would now be 74 units

long (37 for the dictionary; 37 for
the sentence). Applied to the
whole speech, the space savings
would be much larger. 7
Compression Techniques
Lossless
Data can be completely recovered after decompression
Recovered data is identical to original
Exploits redundancy in data
Lossy
Data cannot be completely recovered after
decompression
Some information is lost for ever
Gives more compression than lossless
Discards “insignificant” data components

8
File Compression
• Lossless compression loses no data and is
used for data backup.

9
File Compression (continued)

• Lossy compression is used for applications

like sound and video compression and causes
minor loss of data.

10
File Compression (continued)

• The compression ratio is the ratio of the

number of bits in the original data to the
number of bits in the compressed image. For
instance, if a data file contains 500,000 bytes
and the compressed data contains 100,000
bytes, the compression ratio is 5:1

11
RUN-LENGTH ENCODING
Data files frequently contain the same character
repeated many times in a row.

Example of run-length encoding. Each run of zeros is

replaced by two characters in the compressed file: a zero
to indicate that compression is occurring, followed by the
number of zeros in the run.

12
Run Length Encoding
CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

CT5A3GTCG6TG3C5GCCT7C } Run length encoded: 21

symbols

13
Run Length Encoding (Contd.)
WWWBWWWWWBWWWBWWWWBWWWWWBWWWBWW
WWWBWWBWWWWWWBBBWWWWWWWBWBWWWWW
WWBWWBBWWWWWBWWWWBWWWWBWWWWB

WWWBWWWWWBWWWBWWWWB….

3WB5WB3WB4WB….

3151314 possible optimization, but…

#W3151314….. Optimization requires escape character

14
Variable - Length Encoding

• Variable length encoding is a scheme in which

the codes are of different lengths. More frequently
occurring codes are given shorter length and less
frequently occurring codes are given longer
lengths.
• Example:
– Morse Code (letter e and t, are most frequent in
English thus they are assigned a dot (.) and a dash (-))
– Huffman encoding - a variable length encoding in which
the lengths of the codes are based on the probability of
the their occurrence (binary tree structure)

15
Variable - Length Encoding

fixed-length encoding with 5-bit binary representation

ABRACADABRA

00001 00010 10010 00001 00011 00001 00100 00001 00010 10010 00001 : 55 bits

D appears once but requires same number of bits as A (appears 5 times)

Encode: more frequently used characters with a few bits as possible

A: 0, B: 1, R: 01, C: 10, D: 11

0 1 01 0 10 0 11 0 1 01 0 : 15 bits + 10 delimiters

16
Variable - Length Encoding
delimiters are not needed if no code is a prefix of another.

A: 11, B: 00, C: 010, D: 10, R: 011

ABRACADABRA

1100011110101110110001111

Represent this code in a Trie

Trie with M nodes can be used to encode a message with M different characters

R
B D A
B

C R D C 17
Variable - Length Encoding
A

R
B D A
B

C R (a) D C
(b)

the code for each character is determined by the path from the root to the
character with 0 for “left” and 1 for “right”.

(a) produces 000100111011

(b) produces 01011011101111

18
Huffman Coding (cont’d)
• Optimal code: minimizes the number of
code symbols per source symbol.
• Forward Pass
1. Sort probabilities per symbol
2. Combine the lowest two probabilities
3. Repeat step2 until only two
probabilities remain.

19
Huffman Coding (cont’d)
• Backward Pass
Assign code symbols going backwards

Chapter 4 - Introduction To Source Coding
No ratings yet
Chapter 4 - Introduction To Source Coding
72 pages
Forouzan6e ch11 PPTs Accessible
No ratings yet
Forouzan6e ch11 PPTs Accessible
119 pages
3 Chapter Text and Image Compression
No ratings yet
3 Chapter Text and Image Compression
132 pages
Libro Papayita: "Aprenderinglese Spapayita"
No ratings yet
Libro Papayita: "Aprenderinglese Spapayita"
42 pages
Cohort 6 AICTE Registrtion and Internship Aplly Process Document
No ratings yet
Cohort 6 AICTE Registrtion and Internship Aplly Process Document
43 pages
MMC Module 3
No ratings yet
MMC Module 3
65 pages
Semantic Barriers: Prepared By
No ratings yet
Semantic Barriers: Prepared By
10 pages
Data Compression
No ratings yet
Data Compression
35 pages
Bec613a - MMC - Module 3
No ratings yet
Bec613a - MMC - Module 3
55 pages
CH-2 Java Classes and Inheritence
No ratings yet
CH-2 Java Classes and Inheritence
97 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
MMC Module Iii-1
No ratings yet
MMC Module Iii-1
73 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Huffman Code
No ratings yet
Huffman Code
47 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
57 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Data Structures and Algorithms Compression Methods
No ratings yet
Data Structures and Algorithms Compression Methods
21 pages
09 Sn2072eu01sn 0001 App Basics ss7
No ratings yet
09 Sn2072eu01sn 0001 App Basics ss7
34 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
05 Compression
No ratings yet
05 Compression
46 pages
Pembahasan SMP Bahasa Inggris - FSN 2024
No ratings yet
Pembahasan SMP Bahasa Inggris - FSN 2024
16 pages
Information Theory in Dynamic Systems
No ratings yet
Information Theory in Dynamic Systems
44 pages
Kap 5
No ratings yet
Kap 5
29 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Text and Text Compression
No ratings yet
Text and Text Compression
28 pages
Module IV
No ratings yet
Module IV
37 pages
Huffman Encoding Supplement
No ratings yet
Huffman Encoding Supplement
10 pages
Notes 07 Compression PDF
No ratings yet
Notes 07 Compression PDF
193 pages
Day 20
No ratings yet
Day 20
33 pages
A Conversation With Chinua Achebe (2008)
No ratings yet
A Conversation With Chinua Achebe (2008)
23 pages
Data Compression 2
No ratings yet
Data Compression 2
19 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
Chapter 4 - Introduction To Source Coding PDF
No ratings yet
Chapter 4 - Introduction To Source Coding PDF
72 pages
Elective: Data Compression and Encryption V Extc ECCDLO 5014
No ratings yet
Elective: Data Compression and Encryption V Extc ECCDLO 5014
60 pages
BestPractices zDevOps
No ratings yet
BestPractices zDevOps
16 pages
Borobudur As A Complete YANTRA Signifying Exposition of Buddhist Doctrine
No ratings yet
Borobudur As A Complete YANTRA Signifying Exposition of Buddhist Doctrine
20 pages
CH 6
No ratings yet
CH 6
21 pages
Course Syllabus Structure in English
No ratings yet
Course Syllabus Structure in English
5 pages
Reed
No ratings yet
Reed
12 pages
Image and Video Compression: Lecture 12, April 27, 2009 Lexing Xie
No ratings yet
Image and Video Compression: Lecture 12, April 27, 2009 Lexing Xie
77 pages
Dimensional Analysis and Model Studies - DPP 01
No ratings yet
Dimensional Analysis and Model Studies - DPP 01
3 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
Data Compression
No ratings yet
Data Compression
28 pages
Chapter 4 Lossless Compression Algorithims
No ratings yet
Chapter 4 Lossless Compression Algorithims
30 pages
Oral Communication Grade 11 Q1 W3
No ratings yet
Oral Communication Grade 11 Q1 W3
16 pages
20 Compression
No ratings yet
20 Compression
58 pages
BG Notes
No ratings yet
BG Notes
3 pages
210 Huffman Encoding
No ratings yet
210 Huffman Encoding
10 pages
Grammar Exercises - Revisión Del Intento
No ratings yet
Grammar Exercises - Revisión Del Intento
6 pages
Multimedia Systems Chapter 7
No ratings yet
Multimedia Systems Chapter 7
21 pages
Lesson Plan 3
No ratings yet
Lesson Plan 3
4 pages
Ungs 9 May
No ratings yet
Ungs 9 May
9 pages
Huffman Encoding: Farhad Muhammad Riaz
No ratings yet
Huffman Encoding: Farhad Muhammad Riaz
17 pages
Supplications For The Lunar Month The Digital Ambler
No ratings yet
Supplications For The Lunar Month The Digital Ambler
2 pages
Multimedia System: Chapter Eight: Multimedia Data Compression
No ratings yet
Multimedia System: Chapter Eight: Multimedia Data Compression
29 pages
Image Compression
No ratings yet
Image Compression
50 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Class Notes CS 3137 1 LZW Encoding
No ratings yet
Class Notes CS 3137 1 LZW Encoding
5 pages
Compression: Some Slides Courtesy James Allan@umass
No ratings yet
Compression: Some Slides Courtesy James Allan@umass
47 pages
Compression
100% (1)
Compression
38 pages
A-Level 14 Presentation - Compression, Encryption and Hashing
No ratings yet
A-Level 14 Presentation - Compression, Encryption and Hashing
61 pages
Laborator2 Utilizarea Tehnicilor de Sortare A Datelor. Algoritmi de Sortare Internă"
No ratings yet
Laborator2 Utilizarea Tehnicilor de Sortare A Datelor. Algoritmi de Sortare Internă"
5 pages
Huffman Coding, RLE, LZW
No ratings yet
Huffman Coding, RLE, LZW
41 pages
2017 May 24 Huffman Lecture1
No ratings yet
2017 May 24 Huffman Lecture1
24 pages
Ivan M. Linforth Soul and Sieve in Plato's Gorgias. University of California Publications in Classical Philology Tate, J
No ratings yet
Ivan M. Linforth Soul and Sieve in Plato's Gorgias. University of California Publications in Classical Philology Tate, J
2 pages
Compress: Input
No ratings yet
Compress: Input
2 pages
31 Huffman Encoding
No ratings yet
31 Huffman Encoding
10 pages
Accent NeutralizationV2.0
100% (1)
Accent NeutralizationV2.0
57 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
Aadel Veri
No ratings yet
Aadel Veri
37 pages
Band Performance - Rubric
No ratings yet
Band Performance - Rubric
1 page
PS2 Final Exam Description (Online)
No ratings yet
PS2 Final Exam Description (Online)
2 pages
Grammar Dim 1 Su
No ratings yet
Grammar Dim 1 Su
22 pages
Alice
No ratings yet
Alice
7 pages
Triduum Prayer Blessed Virgin Mary: GREETINGS: (Everyday)
100% (1)
Triduum Prayer Blessed Virgin Mary: GREETINGS: (Everyday)
2 pages
Write A Program For Generalized Bresenham's Line Drawing Algorithm
No ratings yet
Write A Program For Generalized Bresenham's Line Drawing Algorithm
4 pages
Compressor Principles
No ratings yet
Compressor Principles
32 pages
Huffman
No ratings yet
Huffman
17 pages
Chapter 1: Introduction To Compiler: April 2019
No ratings yet
Chapter 1: Introduction To Compiler: April 2019
14 pages
Bank Catering Assistant - JD
No ratings yet
Bank Catering Assistant - JD
4 pages
Synopsis On: Data Compression
No ratings yet
Synopsis On: Data Compression
25 pages
Ap Syllabus
No ratings yet
Ap Syllabus
7 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Making Secret Codes
From Everand
Making Secret Codes
Jillian Gregory
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)

7.file Compression

Uploaded by

7.file Compression

Uploaded by

File Compression

• Compression creates smaller files

• “Ask not what your country can do for you —

 The file would now be 74 units

• Lossy compression is used for applications

• The compression ratio is the ratio of the

Example of run-length encoding. Each run of zeros is

CT5A3GTCG6TG3C5GCCT7C } Run length encoded: 21

3151314 possible optimization, but…

#W3151314….. Optimization requires escape character

• Variable length encoding is a scheme in which

fixed-length encoding with 5-bit binary representation

D appears once but requires same number of bits as A (appears 5 times)

Encode: more frequently used characters with a few bits as possible

A: 11, B: 00, C: 010, D: 10, R: 011

Represent this code in a Trie

(a) produces 000100111011

(b) produces 01011011101111

You might also like