0% found this document useful (0 votes)

67 views7 pages

Data Compression

This document discusses data compression techniques. It explains the need for data compression due to increasing file sizes and limited data storage and transfer speeds. It describes lossless compression, which allows exact reconstruction of data, and lossy compression, which discards some data for higher compression ratios. Two techniques are explained in detail: Huffman coding assigns variable length codes based on frequency, while Lempel-Ziv-Welch replaces repeated strings with codes to reduce file sizes. The document concludes that both lossless and lossy compression have advantages, and future discoveries may lead to even better compression methods.

Uploaded by

M.Allarit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views7 pages

Data Compression

Uploaded by

M.Allarit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

DATA COMPRESSION

T h e De r b y Un i v e r s i t y
Sc h o o l s o f Art , De s i g n & Te c h n o l o g y
Di v i s i o n o f El e c t ro n i c s an d So u n d

By: Mohammed Allarit

[9|12|2010]

NEED FOR DATA COMPRESSION

LOSSLESS COMPRESSION
LOSSY COMPRESSION

3
3

COMPRESSION TECHNIQUES

HUFMANN CODING
LEMPEL-ZIV-WELCH COMPRESSION
CONCLUSION

3
6
7

Table 1 ........................................................................................................................................ 4
Table 2 ........................................................................................................................................ 4
Table 3 ........................................................................................................................................ 5
Table 4 ........................................................................................................................................ 6
Table 5 ........................................................................................................................................ 6

Figure 1 ...................................................................................................................................... 5

Need for data compression

Since the advent of computers there has been need for faster; processers, memory, dataaccess/storage and computer-networking.
Internets data transfer and data storage have become an issue when the size is considered,
even with faster broadband connection. A file of 100MB can take a long time to transfer
over an internet connection dependent on connections true speed, to get some idea of time
see the following example;

Connection Speed
Modem (56Kb)
ADSL (128Kb)
ADSL-1 (8Mb)
ADSL-2 (12Mb)

Actual speed
7KB
16KB
1MB
1.5MB

File size (MB)

100
100
100
100

Transfer Time
4 hours
1.7 hours
1.7 minutes
1.1 minutes

When it comes to data storage for personal or business use, data compression can make
huge difference when archiving data, compression can reduce cost and space.
There are two type of data compression; lossless and lossy, both of these can be utilize in
many different applications.

Lossless compression
This compression is used as 2:1 ratio, lossless compression is important when dealing with
critical information, like documents, applications and executable programs, for this reason,
after the decompression, file must be identical to original.
A disadvantage of this compression can result in little difference in size when compressed,
or even become twice the size of original file and compression ratio is very low.

Lossy compression
This compression is used when some loss of data can be discarded without detrimental
effect on final result. Lossy compression is manly used for; audio, video and pictures where
loss of some information is insignificant. An advantage of this method is higher compression
ratio and smaller file size and disadvantage is great amount of redundant data is lost, so this
is compression is only suitable for particular applications.

Compression techniques
Hufmann coding
The idea of understanding what information is, and how it can be utilised for information
and technology networks. Since nineteen-fifty-two, Dr. Hufmann was at the forefront of
development of data compression, he understood the importance of information and how it
can be utilised.
Huffmans concept, the total sum of any information is one, from this he came up with a
method referred to as Hufmann coding. This method works on probability frequency by

assigning a value to each ASCII symbol by the amount it repeats see Fig (1), foundation of
this method is based on Huffman algorithms. This method is lossless data compression; it
generates the least amount of program code.
The Huffman coding is widely incorporated into popular software, capable of running on
multi-platform systems, still present today.
An example is given to better understand this coding method.
Example:
This example will only work on a very small scale to show how this method works. A word
allarit is selected for Huffman coding.
A standard ASCII table-1 is used to show the value of each letter with its corresponding ASCII
to binary equivalent.
TABLE 1

Character
a
l
r
i
t

ASCII
097
108
114
105
116

Binary
01100001
01101100
01110010
01101001
01110100

The standard ASCII byte value of each letter is eight bits. This totals to 56 bits to make up
the word allarit.
The word allarit in term of bits without data compression is:
(01100001011011000110110001100001011100100110100101110100)

Following steps are taken to accomplish the task;

Step one is to find frequency of each letter.

Step two is to assign probability redundancy value to each letter according to its
frequency.
Step three is to draw a Huffman tree, based on letter redundancy.
Step four is to assign Huffman binary value to each letter from Huffman tree.

The following table-2 implements step one and two;

TABLE 2

Character

Frequency

Probability

Value in bits

2/7 = 0.285

8*2= 16bits

l
r
i
t

2
1
1
1

2/7 = 0.285
1/7 = 0.142
1/7 = 0.142
1/7 = 0.142

8*2= 16bits
8bits
8bits
8bits

The third step is being able to draw Hufmann tree, to do this, all the letters should be
arranged in descending order with respect to its probability frequency; (a, l, i, r, t).
First lowest assigned value letters are branch into a new node with sum value of previous
two nodes.
The fourth step is to assign binary value (1) and (0) to each branch. For every left branch put
zero (0) and for every right branch put (1) see Fig (1) until reaching consequent letter.

FIGURE 1
TABLE 3

Character
a
l
i
r
t

Huffman Binary Code

0
10
01
110
111

Total Number of bits

2
4
2
3
3

A maximum bit value of Huffman binary code Table-3 equals three bits. Total amount equals
14 bits as compared to 56 bits originally, which makes it 25 percent smaller than
uncompressed value.

Lempel-Ziv-Welch compression
Abraham Lempel and Jacob Ziv was first to introduce Lempel-Ziv data compression (LZ77
and LZ78). An over a decade later, Terry A. Welch published an improved version LempelZiv-Welch (LZW) in 1984, a derivative of LZ78. Lempel-Ziv-Welch (LZW) is a lossless data
compression and it is used for various data format e.g. GIF, TIFF.
Terry Welch implemented an algorithm to produce an identical dictionary to encode and
decode data compression. The LZW algorithm comes encoded with preset symbol dictionary
ranging from 0256 individual bytes. As the data stream is processed, preset dictionary is
expended from 2564095 with new redundancy strings, referred to as substrings, value of
each byte is 12 bits.
The LZW creates a table of, strings of highly redundant data and assigns a string an index ID,
if the string of data repeats; it will be replaced by substring index ID to reduce the size of
initial data.
Example:
For a demonstration purpose if a file contains, a string (ALLARIT), which have been assigned
a substring index ID see Table-4 below, and now by replacing every string (ALLARIT) with
substring ID, would reduce the file size by great deal, higher the substring redundancy
would result in higher compression ratio,
TABLE 4

String

Character ASCII Value

String ID

ALLARIT

065, 076, 076, 065, 082, 073, 084

275

Below is another example, illustrates see Table-5 how data is encoded using LZW method:
TABLE 5

This processes could generate a huge library if it had lots of repetitive data but due to
limitation of 4 Kbyte library size implemented by Terry Welch, ones the library is full no
future entries can be added. The LZW lossless compression is great benefit for English text
encoding, file size can be greatly reduced by more than fifty percent.
Decoding takes place in a same way as encoding process. The LZW algorithm does not
compare string table for decompression process, instead it is possible to convert the output
stream of the compression algorithm to input stream as data and rebuild the table, which
would be identical to compression table, this method is simple as well as increases the
speed of encoding and decoding process.

Conclusion
Information processing evolution, have resulted in great discoveries and inventions. At
present there is information overload, due to new technological advancement of
multimedia sharing, online gaming, virtual-reality and interaction of people around the
world, regardless of time.
Data compressions have made big difference, when it comes to storing or transferring
information. There is no preference between lossless or lossy compressions both do the job
well.
Lossless compression is important, when archiving critical information but lacks
compression ratio. Lossy compression have greater compression ratio but lacks reliability.
The Huffman method is very efficient compression for documents and program files; total
reduction in file size 20 to 30 percent. It produces, least amount of line code, uses least
memory and compression/decompression is fast. The disadvantage of this method is been
not able to change much because of data integrity.
The Lempel-Ziv-Welch compression can be used for lossless and lossy compression, it is
ideal for text and graphical information, where higher compression ratio is required. It
works best with files which have lost of repetition like a text and monochrome image. The
LZW have fast compression. Disadvantage of this compression, a files size can double if
there is no repetition.
Industrial revolution have changed the way we share information but it still lacks something
bit better then what we have. Perhaps soon, we might witness something more spectacle
discovery of data compression, which over comes all the issue we face with current
methods of compression, only the time will tell what future holds for human kind.

Cloud Tier Implementation and Administration Participant Guide PDF
No ratings yet
Cloud Tier Implementation and Administration Participant Guide PDF
70 pages
How To Synchronize Two Combo Boxes On A Form in Access With Code
No ratings yet
How To Synchronize Two Combo Boxes On A Form in Access With Code
4 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Vik
No ratings yet
Vik
23 pages
Assignment 1
No ratings yet
Assignment 1
14 pages
Data Compression
No ratings yet
Data Compression
20 pages
Module 5 IVP
No ratings yet
Module 5 IVP
112 pages
Data Compression (RCS 087)
No ratings yet
Data Compression (RCS 087)
51 pages
Comparison of Lossless Data Compression Algorithms
No ratings yet
Comparison of Lossless Data Compression Algorithms
12 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
unit 5 data compression
No ratings yet
unit 5 data compression
98 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Compression
No ratings yet
Compression
21 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
18 pages
Data Compression Techniques: Pushpender Rana, Student
No ratings yet
Data Compression Techniques: Pushpender Rana, Student
4 pages
Digital Data Compression
No ratings yet
Digital Data Compression
10 pages
Lossless Compression
No ratings yet
Lossless Compression
11 pages
[IEEE Comput. Soc 1990 Symposium on Applied Computing - -- Saeed, F.; Lu, H.; Hedrick, G.E. -- pages 348-354, 1990 -- IEEE Comput. Soc -- 10.1109_SOAC.1990.82195 -- 0975c509acf1f116eb51ae52ca832598 -- Anna’s Archive
No ratings yet
[IEEE Comput. Soc 1990 Symposium on Applied Computing - -- Saeed, F.; Lu, H.; Hedrick, G.E. -- pages 348-354, 1990 -- IEEE Comput. Soc -- 10.1109_SOAC.1990.82195 -- 0975c509acf1f116eb51ae52ca832598 -- Anna’s Archive
7 pages
ut1ppt
No ratings yet
ut1ppt
77 pages
What Is Huffman Coding and Its History
No ratings yet
What Is Huffman Coding and Its History
5 pages
ICT - Module 1 Lecture 3
No ratings yet
ICT - Module 1 Lecture 3
43 pages
Analysis and Comparison of Algorithms For Lossless Data Compression
No ratings yet
Analysis and Comparison of Algorithms For Lossless Data Compression
8 pages
#G9943LK
No ratings yet
#G9943LK
4 pages
Compression and Decompression Using Huffman Convention Synopsis
No ratings yet
Compression and Decompression Using Huffman Convention Synopsis
10 pages
Huffman Coding, RLE, LZW
No ratings yet
Huffman Coding, RLE, LZW
41 pages
Image Compression
No ratings yet
Image Compression
4 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
Lossless Img Comp
No ratings yet
Lossless Img Comp
8 pages
Supplementary Notes On Compression and Formats
No ratings yet
Supplementary Notes On Compression and Formats
15 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
Huffman Coding Technique For Image Compression: ISSN:2320-0790
No ratings yet
Huffman Coding Technique For Image Compression: ISSN:2320-0790
3 pages
Huffman Coding Paper
No ratings yet
Huffman Coding Paper
3 pages
Chap15 1473751047 598113
No ratings yet
Chap15 1473751047 598113
34 pages
Chapter 3-Part II
100% (1)
Chapter 3-Part II
26 pages
Group-8 DIP Presentation
No ratings yet
Group-8 DIP Presentation
100 pages
Huffman Coding A Case Study of A Comparison
No ratings yet
Huffman Coding A Case Study of A Comparison
2 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
Lossless Image Compression Techniques A State-Of-T PDF
No ratings yet
Lossless Image Compression Techniques A State-Of-T PDF
22 pages
Image Compression
No ratings yet
Image Compression
50 pages
symmetry-11-01274-v2 (1)
No ratings yet
symmetry-11-01274-v2 (1)
22 pages
Text and Text Compression
No ratings yet
Text and Text Compression
28 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
chapter 7
No ratings yet
chapter 7
70 pages
Computer Science Extended Essay
No ratings yet
Computer Science Extended Essay
15 pages
Why Compress?: Lossless Compression
No ratings yet
Why Compress?: Lossless Compression
28 pages
File Compression: When To Use An Archive or File Compression
No ratings yet
File Compression: When To Use An Archive or File Compression
5 pages
Fast Lempel-ZIV (LZ'78) Algorithm Using Codebook Hashing: Megha Atwal, Lovnish Bansal
No ratings yet
Fast Lempel-ZIV (LZ'78) Algorithm Using Codebook Hashing: Megha Atwal, Lovnish Bansal
4 pages
16 San
No ratings yet
16 San
7 pages
Lecture 13 - Delta Coding
No ratings yet
Lecture 13 - Delta Coding
41 pages
WBMP Compression: R. Rajeswari, R. Rajesh
No ratings yet
WBMP Compression: R. Rajeswari, R. Rajesh
4 pages
Data Compression and Huffman Algorithm
0% (1)
Data Compression and Huffman Algorithm
18 pages
Image Compression by Retaining Image Quality - Ieee Format
No ratings yet
Image Compression by Retaining Image Quality - Ieee Format
4 pages
A Comparitive Study of Text Compression Algorithms PDF
No ratings yet
A Comparitive Study of Text Compression Algorithms PDF
9 pages
Multimedia Class 11
No ratings yet
Multimedia Class 11
6 pages
A Novel Approach of Lossless Image Compression Using Two Techniques
No ratings yet
A Novel Approach of Lossless Image Compression Using Two Techniques
5 pages
Spectra, Signals Report
No ratings yet
Spectra, Signals Report
8 pages
Gray Level Count Probabil Ity 21 12 3/8 95 4 1/8 169 4 1/8 243 12 3/8
No ratings yet
Gray Level Count Probabil Ity 21 12 3/8 95 4 1/8 169 4 1/8 243 12 3/8
51 pages
Data Compression Report
No ratings yet
Data Compression Report
10 pages
Unit Iv - Multimedia File Handling: Compression and Decompression
No ratings yet
Unit Iv - Multimedia File Handling: Compression and Decompression
49 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
From Everand
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
Fouad Sabry
No ratings yet
9-0-SP1 Getting Started With WebMethods and Terracotta
100% (1)
9-0-SP1 Getting Started With WebMethods and Terracotta
86 pages
2 - Fundamental File Processing Operations
No ratings yet
2 - Fundamental File Processing Operations
18 pages
Relational Data Modal 11 lacture-WPS Office
No ratings yet
Relational Data Modal 11 lacture-WPS Office
24 pages
Training and Development in Indigo Airlines
50% (2)
Training and Development in Indigo Airlines
96 pages
Biostatistics Lesson 1 PDF
100% (1)
Biostatistics Lesson 1 PDF
34 pages
Marketing Skills-I (1)
No ratings yet
Marketing Skills-I (1)
27 pages
A Naming Convention Example
No ratings yet
A Naming Convention Example
22 pages
6861-1710754478385-CC6012 1 MS Coursework Y2324
No ratings yet
6861-1710754478385-CC6012 1 MS Coursework Y2324
6 pages
Chapter One: 1.1 Background of The Study
No ratings yet
Chapter One: 1.1 Background of The Study
41 pages
417 Class 9 AI_Facilitators_Handbook Study Materia(2025-26)
No ratings yet
417 Class 9 AI_Facilitators_Handbook Study Materia(2025-26)
157 pages
PLSQL
No ratings yet
PLSQL
220 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
avg & range
No ratings yet
avg & range
25 pages
CDC Guide For Linux, Unix, and Windows: Informatica Powerexchange (Version 9.0)
No ratings yet
CDC Guide For Linux, Unix, and Windows: Informatica Powerexchange (Version 9.0)
187 pages
Software Thesis
100% (2)
Software Thesis
5 pages
Instructional Sequence 1
No ratings yet
Instructional Sequence 1
17 pages
Rajalakshmi Engineering College: 1. What Is Database?
No ratings yet
Rajalakshmi Engineering College: 1. What Is Database?
10 pages
Sap BW 3.5 With Bi 7.0: - Data Warehouse Overview
No ratings yet
Sap BW 3.5 With Bi 7.0: - Data Warehouse Overview
5 pages
Kotler Keller Summary Compress
No ratings yet
Kotler Keller Summary Compress
69 pages
Consumer Attitude and Purchase Decision.
No ratings yet
Consumer Attitude and Purchase Decision.
24 pages
Topic Approval Form_khushbu_vinzuda 23 JAN
No ratings yet
Topic Approval Form_khushbu_vinzuda 23 JAN
4 pages
EIA Review Procedure
No ratings yet
EIA Review Procedure
12 pages
Business Intelligence Dissertation Topics
100% (2)
Business Intelligence Dissertation Topics
6 pages
Danko ISO Standards
No ratings yet
Danko ISO Standards
24 pages
Chapter 6
75% (4)
Chapter 6
4 pages
SAP DB2 On DB2 Administration
50% (2)
SAP DB2 On DB2 Administration
173 pages
Grade Descriptors
No ratings yet
Grade Descriptors
3 pages
ODI Migration Process
100% (2)
ODI Migration Process
10 pages

Data Compression

Uploaded by

Data Compression

Uploaded by

DATA COMPRESSION

By: Mohammed Allarit

NEED FOR DATA COMPRESSION

Need for data compression

File size (MB)

Following steps are taken to accomplish the task;

Step one is to find frequency of each letter.

The following table-2 implements step one and two;

Huffman Binary Code

Total Number of bits

Character ASCII Value

065, 076, 076, 065, 082, 073, 084

You might also like