0% found this document useful (0 votes)

122 views9 pages

Lossless Data Compression Algorithm Abraham Lempel Jacob Ziv Terry Welch LZ78

Lempel-Ziv-Welch (LZW) is a lossless data compression algorithm that builds a dictionary of strings as it encodes the input data. It replaces strings in the input with codes for strings in the dictionary, allowing longer and longer strings to be encoded with each code. The decompressor builds the same dictionary from the codes to reconstruct the original input data. The algorithm achieves compression by coding repeated strings more concisely than if they were encoded character by character.

Uploaded by

rlnandha_2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views9 pages

Lossless Data Compression Algorithm Abraham Lempel Jacob Ziv Terry Welch LZ78

Uploaded by

rlnandha_2006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Lempel-Ziv-Welch

Lempel-Ziv-Welch (LZW) is a universal lossless data compression algorithm created by

Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an
improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978.
The algorithm is designed to be fast to implement but is not usually optimal because it
performs only limited analysis of the dataThe algorithm in a nutshell

The algorithm takes a string as input. It processes the string using a Dictionary.
The string is encoded (and the dictionary grows) as the string is being processed. Initially, the
dictionary contains all the possible characters (the alphabet) with their corresponding encoding.
The algorithm takes the longest word w (from the dictionary) that can replace the next characters
in the string. It encodes this part of the string with the encoding of w. Now it takes the character
c that followed w in the string, and adds wc to the dictionary. This repeats until the string is
consumed. The idea is that, as the string is being processed, we populate the dictionary with
longer strings, allowing encoding of bigger chunks of the string at each replacement.

Alternative description of the algorithm

The compressor algorithm builds a string translation table from the text being compressed. The
string translation table maps fixed-length codes (usually 12-bit) to strings. The string table is
initialized with all single-character strings (256 entries in the case of 8-bit characters). As the
compressor character-serially examines the text, it stores every unique two-character string into
the table as a code/character concatenation, with the code mapping to the corresponding first
character. As each two-character string is stored, the first character is sent to the output.
Whenever a previously-encountered string is read from the input, the longest such previously-
encountered string is determined, and then the code for this string concatenated with the
extension character (the next character in the input) is stored in the table. The code for this
longest previously-encountered string is output and the extension character is used as the
beginning of the next word.

The decompressor algorithm only requires the compressed text as an input, since it can build an
identical string table from the compressed text as it is recreating the original text. However, an
abnormal case shows up whenever the sequence character/string/character/string/character
(with the same character for each character and string for each string) is encountered in the input
and character/string is already stored in the string table. When the decompressor reads the code
for character/string/character in the input, it cannot resolve it because it has not yet stored this
code in its table. This special case can be dealt with because the decompressor knows that the
extension character is the previously-encountered character.[1]

Algorithm

Compressor algorithm:

w = NIL;
add all possible charcodes to the dictionary
for (every character c in the uncompressed data) do
if ((w + c) exists in the dictionary) then
w = w + c;
else
add the dictionary code for w to output;
add (w + c) to the dictionary;
w = c;
endif
done
add the dictionary code for w to output;
display output;
Decompressor algorithm:

add all possible charcodes to the dictionary

read a char k;
output k;
w = k;
while (read a char k) do
if (index k exists in dictionary) then
entry = dictionary entry for k;
else if (k == currSizeDict)
entry = w + w[0];
else
signal invalid code;
endif
output entry;
add w+entry[0] to the dictionary;
w = entry;
done

This example shows the LZW algorithm in action, showing the status of the output and the
dictionary at every stage, both in encoding and decoding the message. In order to keep things
clear, let us assume that we're dealing with a simple alphabet - capital letters only, and no
punctuation or spaces. This example has been constructed to give reasonable compression on a
very short message; when used on real data, repetition is generally less pronounced, and so the
initial parts of a message will see little compression. As the message grows, however, the
compression ratio tends asymptotically to the maximum.[2] A message to be sent might then look
like the following:

TOBEORNOTTOBEORTOBEORNOT#
The # is a marker used to show that the end of the message has been reached. Clearly, then, we
have 27 symbols in our alphabet (the 26 capital letters A through Z, plus the # character). A
computer will render these as strings of bits; 5-bit strings are needed to give sufficient
combinations to encompass the entire dictionary. As the dictionary grows, the strings will need
to grow in length to accommodate the additional entries. A 5-bit string gives 2 5 = 32 possible
combinations of bits, and so when the 33rd dictionary word is created, the algorithm will have to
start using 6-bit strings (for all strings, including those which were previously represented by
only five bits). Note that since the all-zero string 00000 is used, and is labeled "0", the 33rd
dictionary entry will be labeled 32. The initial dictionary, then, will consist of the following:

# = 00000
A = 00001
B = 00010
C = 00011
.
.
.
Z = 11010

Encoding

If we weren't using LZW, and just sent the message as it stands (25 symbols at 5 bits each), it
would require 125 bits. We will be able to compare this figure to the LZW output later. We are
now in a position to apply LZW to the message.
Symbol: Bit Code: New Dictionary Entry:
(= output)

T 20 = 10100
O 15 = 01111 28: TO <--- Don't forget, we originally had 27 symbols, so the next one is
28th.
B 2 = 00010 29: OB
E 5 = 00101 30: BE
O 15 = 01111 31: EO <--- start using 6-bit strings
R 18 = 010010 32: OR
N 14 = 001110 33: RN
O 15 = 001111 34: NO
T 20 = 010100 35: OT
TO 28 = 011100 36: TT
BE 30 = 011110 37: TOB
OR 32 = 100000 38: BEO
TOB 37 = 100101 39: ORT
EO 31 = 011111 40: TOBE
RN 33 = 100001 41: EOR
OT 35 = 100011 42: RNO
# 0 = 000000 43: OT#
This is somewhat clearer:

Current Next Output Value Extended

Sequence Char (# of bits) Dictionary
NULL T
T O 20 = 5 bits 27: TO <-- This IS the 28th entry, but the initial entries are numbered
0-26 so this is #27.
O B 15 = 5 bits 28: OB
B E 2 = 5 bits 29: BE
E O 5 = 5 bits 30: EO
O R 15 = 5 bits 31: OR
R N 18 = 6 bits 32: RN <-- Starting at R, 6 bits are used {floor(lg2(init_dict_size +
num_chars_output)) + 1}
N O 14 = 6 bits 33: NO i.e. O: floor(lg2(27 + 4)) + 1 = 5 bits -> 01111
O T 15 = 6 bits 34: OT R: floor(lg2(27 + 5)) + 1 = 6 bits -> 010010
T T 20 = 6 bits 35: TT
TO B 27 = 6 bits 36: TOB
BE O 29 = 6 bits 37: BEO
OR T 31 = 6 bits 38: ORT
TOB E 36 = 6 bits 39: TOBE
EO R 30 = 6 bits 40: EOR
RN O 32 = 6 bits 41: RNO
OT # 34 = 6 bits 42: OT#
# 0 = 6 bits

Total Length = 55 + 126 = 97 bits.

In using LZW we have made a saving of 28 bits out of 125 -- we have reduced the message by
almost 22%. If the message were longer, then the dictionary words would begin to represent
longer and longer sections of text, allowing repeated words to be sent very compactly.
Decoding

Imagine now that we have received the message produced above, and wish to decode it. We need
to know in advance the initial dictionary used, but we can reconstruct the additional entries as we
go, since they are always simply concatenations of previous entries.

Bits: Output: New Entry:

Full: Partial:

10100 = 20 T 28: T?
01111 = 15 O 28: TO 29: O?
00010 = 2 B 29: OB 30: B?
00101 = 5 E 30: BE 31: E?
01111 = 15 O 31: EO 32: O? <--- start using 6-bit strings
010010 = 18 R 32: OR 33: R?
001110 = 14 N 33: RN 34: N?
001111 = 15 O 34: NO 35: O?
010100 = 20 T 35: OT 36: T?
011100 = 28 TO 36: TT 37: TO? <- for 36, only add 1st element
011110 = 30 BE 37: TOB 38: BE? of next dictionary word
100000 = 32 OR 38: BEO 39: OR?
100101 = 37 TOB 39: ORT 40: TOB?
011111 = 31 EO 40: TOBE 41: EO?
100001 = 33 RN 41: EOR 42: RN?
100011 = 35 OT 42: RNO 43: OT?
000000 = 0 #

The only slight complication comes if the newly-created dictionary word is sent immediately. In
the decoding example above, when the decoder receives the first symbol, T, it knows that
symbol 28 begins with a T, but what does it end with? The problem is illustrated below. We are
decoding part of a message that reads ABABA:
Bits: Output: New Entry:
Full: Partial:

.
.
.
011101 = 29 AB 46: (word) 47: AB?
101111 = 47 AB? <--- what do we do here?

At first glance, this may appear to be asking the impossible of the decoder. We know ahead of
time that entry 47 should be ABA, but how can the decoder work this out? The critical step is to
note that 47 is built out of 29 plus whatever comes next. 47, therefore, ends with "whatever
comes next". But, since it was sent immediately, it must also start with "whatever comes next",
and so must end with the same symbol it starts with, namely A. This trick allows the decoder to
see that 47 must be ABA.

More generally the situation occurs whenever the encoder encounters the input of the form
cScSc, where c is a single character, S is a string and cS is already in the dictionary. The encoder
outputs the symbol for cS putting new symbol for cSc in the dictionary. Next it sees the cSc in
the input and sends the new symbol it just inserted into the dictionary. By the reasoning
presented in the above example this is the only case where the newly-created symbol is sent
immediately.

Lec5 - LZW Compression
No ratings yet
Lec5 - LZW Compression
29 pages
Unit - 5 - Dictionary Technique
No ratings yet
Unit - 5 - Dictionary Technique
19 pages
LZW Fundamentals: Lempel Ziv 1977 1978 Terry Welch's 1978 Algorithm 1984
No ratings yet
LZW Fundamentals: Lempel Ziv 1977 1978 Terry Welch's 1978 Algorithm 1984
9 pages
Lecture 10-Print
No ratings yet
Lecture 10-Print
50 pages
LZ78
No ratings yet
LZ78
17 pages
LZW Encoding and Decoding
No ratings yet
LZW Encoding and Decoding
18 pages
Lossless Compression Techniques-Slides
No ratings yet
Lossless Compression Techniques-Slides
11 pages
Drilling For Non Technical People
100% (5)
Drilling For Non Technical People
87 pages
Lempel Ziv Welch
No ratings yet
Lempel Ziv Welch
16 pages
Dictionary Techniques (Lempel-Ziv Codes) : Dictionary, and Encode These Patterns by Transmitting
No ratings yet
Dictionary Techniques (Lempel-Ziv Codes) : Dictionary, and Encode These Patterns by Transmitting
26 pages
Lempel-Ziv-Welch - 2024
No ratings yet
Lempel-Ziv-Welch - 2024
9 pages
English and Business Communication
87% (31)
English and Business Communication
392 pages
LZW HW
No ratings yet
LZW HW
74 pages
Efficient Sequential Algorithms, Comp309: University of Liverpool
No ratings yet
Efficient Sequential Algorithms, Comp309: University of Liverpool
20 pages
Manual MOV-Auma
No ratings yet
Manual MOV-Auma
132 pages
Module 3 - LZW
No ratings yet
Module 3 - LZW
29 pages
Multimedia 2017 2018 Lec10
No ratings yet
Multimedia 2017 2018 Lec10
34 pages
Multimedia Systems Chapter 7
No ratings yet
Multimedia Systems Chapter 7
21 pages
Literature of LZW Algorithm: Data Compression
No ratings yet
Literature of LZW Algorithm: Data Compression
4 pages
LZW Based Compression
No ratings yet
LZW Based Compression
8 pages
Initialize Table With Single Character Strings P First Input Character
No ratings yet
Initialize Table With Single Character Strings P First Input Character
5 pages
Raytech Retail Price List
100% (2)
Raytech Retail Price List
1 page
Lecture19 PDF
No ratings yet
Lecture19 PDF
8 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
Lempel Ziv
No ratings yet
Lempel Ziv
11 pages
Rei Hoxha Laborator3
No ratings yet
Rei Hoxha Laborator3
4 pages
Day 20
No ratings yet
Day 20
33 pages
Unit31 LZ78
No ratings yet
Unit31 LZ78
15 pages
Data Compression
No ratings yet
Data Compression
12 pages
CS 300 Data Structures: Sabancı University Faculty of Engineering and Natural Sciences
No ratings yet
CS 300 Data Structures: Sabancı University Faculty of Engineering and Natural Sciences
6 pages
Lecture 13 - Delta Coding
No ratings yet
Lecture 13 - Delta Coding
41 pages
Imc14 05 Dictionary Codes
No ratings yet
Imc14 05 Dictionary Codes
31 pages
The Lempel Ziv Algorithm: Seminar "Famous Algorithms" January 16, 2003
No ratings yet
The Lempel Ziv Algorithm: Seminar "Famous Algorithms" January 16, 2003
26 pages
Aim: Implementation of LZW in Matlab.: Experiment No
No ratings yet
Aim: Implementation of LZW in Matlab.: Experiment No
2 pages
Lempel-Ziv-Welch (LZW) - Is A Universal Lossless Data Compression Algorithm Created by Abraham
No ratings yet
Lempel-Ziv-Welch (LZW) - Is A Universal Lossless Data Compression Algorithm Created by Abraham
5 pages
Class Notes CS 3137 1 LZW Encoding
No ratings yet
Class Notes CS 3137 1 LZW Encoding
5 pages
Tut2 Arvr
No ratings yet
Tut2 Arvr
5 pages
4 LZW
No ratings yet
4 LZW
7 pages
Huffman Coding, RLE, LZW
No ratings yet
Huffman Coding, RLE, LZW
41 pages
Image Compression
No ratings yet
Image Compression
33 pages
Zte Lte FDD Volte Feature Guide
100% (2)
Zte Lte FDD Volte Feature Guide
216 pages
Lempel-Ziv-Welch (LZW) Compression Algorithm
No ratings yet
Lempel-Ziv-Welch (LZW) Compression Algorithm
12 pages
A New Approach For Compression On Textual Data
No ratings yet
A New Approach For Compression On Textual Data
4 pages
LZ77 JensMueller
No ratings yet
LZ77 JensMueller
14 pages
Lempel Ziv
No ratings yet
Lempel Ziv
22 pages
Lemp El Ziv Compression
No ratings yet
Lemp El Ziv Compression
6 pages
CS 11 - Machine Problem 2 PDF
No ratings yet
CS 11 - Machine Problem 2 PDF
3 pages
LZW Compression and Decompression: December 4, 2015
No ratings yet
LZW Compression and Decompression: December 4, 2015
7 pages
Lempel-Ziv-Welch (LZW) Compression Algorithm
No ratings yet
Lempel-Ziv-Welch (LZW) Compression Algorithm
22 pages
Seminar Data Compression
No ratings yet
Seminar Data Compression
32 pages
LZW Data Compression
No ratings yet
LZW Data Compression
5 pages
4.ResM Non Stat Coding
No ratings yet
4.ResM Non Stat Coding
9 pages
LZW Compression Algorithm
No ratings yet
LZW Compression Algorithm
4 pages
LZW (Lempel Ziv Welch) : 60.1 Brief History
No ratings yet
LZW (Lempel Ziv Welch) : 60.1 Brief History
4 pages
Implementation of Lempel-Ziv Algorithm For Lossless Compression Using VHDL
No ratings yet
Implementation of Lempel-Ziv Algorithm For Lossless Compression Using VHDL
2 pages
Channel Coding Using Matlab
No ratings yet
Channel Coding Using Matlab
14 pages
Butterfly Knife
No ratings yet
Butterfly Knife
5 pages
Lempel Ziv Coding Explained
No ratings yet
Lempel Ziv Coding Explained
1 page
Unit 2 - Part 7 Coding Information Sources: 1 Adaptive Variable-Length Codes
No ratings yet
Unit 2 - Part 7 Coding Information Sources: 1 Adaptive Variable-Length Codes
5 pages
Design and Implementation Af LZW Data Compression Algorithm
No ratings yet
Design and Implementation Af LZW Data Compression Algorithm
11 pages
Compression: Author: Paul Penfield, Jr. Url: Toc
No ratings yet
Compression: Author: Paul Penfield, Jr. Url: Toc
5 pages
Real Test Bank Legal and Ethical Aspects of Health Information Management 4th Edition by Dana C McWay Ebook and TestBank Bundle Digital Bundle
No ratings yet
Real Test Bank Legal and Ethical Aspects of Health Information Management 4th Edition by Dana C McWay Ebook and TestBank Bundle Digital Bundle
351 pages
Srijaya Manem 1
No ratings yet
Srijaya Manem 1
35 pages
... System For Ranking Jobs Logically & Fairly: To Determine The Relative Size of Jobs in An Organization
No ratings yet
... System For Ranking Jobs Logically & Fairly: To Determine The Relative Size of Jobs in An Organization
23 pages
Ls Comp 1ed Tr9 U2 Worksheet Ans
No ratings yet
Ls Comp 1ed Tr9 U2 Worksheet Ans
8 pages
Concrete Hollow Blocks
No ratings yet
Concrete Hollow Blocks
6 pages
Functional Reach
No ratings yet
Functional Reach
16 pages
4-Creating A Web Application With Spring Boot
No ratings yet
4-Creating A Web Application With Spring Boot
27 pages
DB en Quint4 Ups 24dc 24dc 20 Usb PN Eip Ec 107553 en 00a
No ratings yet
DB en Quint4 Ups 24dc 24dc 20 Usb PN Eip Ec 107553 en 00a
81 pages
Ansari Mansur Ahammad Resume
No ratings yet
Ansari Mansur Ahammad Resume
5 pages
Professional Development Plan
No ratings yet
Professional Development Plan
3 pages
E 0211
No ratings yet
E 0211
23 pages
Fungi CHP Question Paper
No ratings yet
Fungi CHP Question Paper
4 pages
Five Pieces C Trumpet MStockhausen
No ratings yet
Five Pieces C Trumpet MStockhausen
18 pages
Student Lms - Usecs
No ratings yet
Student Lms - Usecs
1 page
He Week 2
No ratings yet
He Week 2
19 pages
Valve Operator Matl Control Data
No ratings yet
Valve Operator Matl Control Data
12 pages
Gu01 2009 Standard Reference
No ratings yet
Gu01 2009 Standard Reference
7 pages
2 - Part1-For
No ratings yet
2 - Part1-For
6 pages
Certifications: Toastmasters Diploma in IFRS Us-Gaap (FP&A) Oracle
No ratings yet
Certifications: Toastmasters Diploma in IFRS Us-Gaap (FP&A) Oracle
1 page
TOI Ahmadabad
No ratings yet
TOI Ahmadabad
24 pages
Neonatal Omphalitis After Lotus Birth
No ratings yet
Neonatal Omphalitis After Lotus Birth
5 pages
1 World Cup Russia 2018 Stickers
No ratings yet
1 World Cup Russia 2018 Stickers
9 pages
Summer Gizmo Lab 2
No ratings yet
Summer Gizmo Lab 2
4 pages
3194 Production Technology - I PDF
No ratings yet
3194 Production Technology - I PDF
2 pages
The Scientific Part of Numerology
From Everand
The Scientific Part of Numerology
Nenad Ilic
No ratings yet
Encyclopaedia Britannica, 11th Edition, Volume 16, Slice 8 "Logarithm" to "Lord Advocate"
From Everand
Encyclopaedia Britannica, 11th Edition, Volume 16, Slice 8 "Logarithm" to "Lord Advocate"
Archive Classics
No ratings yet
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
C# Functions and Tutorial - 50 Examples
From Everand
C# Functions and Tutorial - 50 Examples
Nino Paiotta
No ratings yet
Learn Excel Functions: Count, Countif, Sum and Sumif
From Everand
Learn Excel Functions: Count, Countif, Sum and Sumif
Rajan
5/5 (4)

Lossless Data Compression Algorithm Abraham Lempel Jacob Ziv Terry Welch LZ78

Uploaded by

Lossless Data Compression Algorithm Abraham Lempel Jacob Ziv Terry Welch LZ78

Uploaded by

Lempel-Ziv-Welch

Lempel-Ziv-Welch (LZW) is a universal lossless data compression algorithm created by

Alternative description of the algorithm

add all possible charcodes to the dictionary

Current Next Output Value Extended

Total Length = 5*5 + 12*6 = 97 bits.

Bits: Output: New Entry:

You might also like

Total Length = 55 + 126 = 97 bits.