Fast Lempel-ZIV (LZ'78) Algorithm Using Codebook Hashing: Megha Atwal, Lovnish Bansal
Fast Lempel-ZIV (LZ'78) Algorithm Using Codebook Hashing: Megha Atwal, Lovnish Bansal
Fast Lempel-ZIV (LZ'78) Algorithm Using Codebook Hashing: Megha Atwal, Lovnish Bansal
220 www.erpublication.org
Fast Lempel-ZIV (LZ78) Algorithm Using Codebook Hashing
VI. RESULTS
Figure 1: The Lempel Ziv Algorithm Family [6].
A. Test Case #1
A. Principal Test Case #1 uses a string of 1,00,000 characters as input to
The LZ78 is a dictionary-based compression algorithm that LZ78 encoder. The following sub-sections illustrate the
maintains an explicit dictionary. The codewords output by the result of test case #1 using LZ78 with and without hashing.
algorithm consist of two elements: an index referring to the The GUI used is generated using Visual Studio 2012.
longest matching dictionary entry and the first non-matching
symbol.In addition to outputting the codeword for
storage/transmission, the algorithm also adds the index and
symbol pair to the dictionary. When a symbol that not yet in
the dictionary is encountered, the codeword has the index
value 0 and it is added to the dictionary as well. With this
method, the algorithm gradually builds up a dictionary [10].
This simplified pseudo-code version of the algorithm does not
prevent the dictionary from growing forever. There are
various solutions to limit dictionary size, the easiest being to
stop adding entries and continue like a static dictionary coder
or to throw the dictionary away and start from scratch after a
certain number of entries has been reached.
IV. OBJECTIVE
Fig. 2: Test Case #1 LZ78 Coding (Without Hashing)
221 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869, Volume-3, Issue -3, March 2015
The Encode button will encode the entered text using LZ78 Table 1.1: Compression Ratio comparison for different
with Hashing (as the Use Hashing checkbox is checked). message length.
Rest all the process of encoding and decoding is same as Input
explained in the previous section with LZ78 without hashing. Binary
Encoded
Code Book Compression
Message
Message Entries Ratio
Length
B. Test Case #2 Length
Unlike the previous test case the Test Case #2 uses a string of
2,00,000 characters as input to LZ78 encoder and performs 801,096 611,609 39,833 76.35 %
encoding with and without hashing. The following two figures
show the test results. 1,602,200 1,170,977 72,337 73.09 %
Fig. 4: Test Case #2 LZ78 Coding (Without Hashing) 2,402,448 1.562 seconds 319.818 seconds
VIII. CONCLUSION
In this paper we presented a source coding scheme that we call
Hashed Lempel-Ziv coding, as an extension for the LZ78
Fig. 5: Test Case #2 LZ78 Coding (With Hashing)
coding scheme, without sacrificing the coding efficiency and
the compression ratio attained by the original LZ78
algorithm.
VII. COMARISON In addition to outputting the codeword for
storage/transmission, the algorithm also adds the index and
We compare the above test cases using two parameters, viz., symbol pair to the dictionary. When a symbol that not yet in
encoding time and compression ratio. Encoding time should the dictionary is encountered, the codeword has the index
be as small as possible and compression ratio should ideally value 0 and it is added to the dictionary as well. With this
be less than 100%. Less ratio implies that the encoded method, the algorithm gradually builds up a dictionary.
sequence length is less than the original message length (in But LZ78 has several weaknesses. First of all, the dictionary
binary) and that the data is compressed. grows without bounds. Various methods have been
introduced to prevent this, the easiest being to become either
static once the dictionary is full or to throw away the
The following tables summarize the results obtained from the dictionary and start creating a new one from scratch. There
previous test cases for base-2 encoding only. are also more sophisticated techniques to prevent the
222 www.erpublication.org
Fast Lempel-ZIV (LZ78) Algorithm Using Codebook Hashing
REFERENCES
[1] A.-R. Elabdalla and M. Irshid, "An efficient bit-wise source encoding
technique based on source mapping," in Devices, Circuits and
Systems, 2000. Proceedings of the 2000 Third IEEE International
Caracas Conference on, 2000.
[2] D. Kirovski and Z. Landau, "Generalized Lempel--Ziv compression
for audio," Audio, Speech, and Language Processing, IEEE
Transactions on, vol. 15, no. 2, pp. 509-518, 2007.
[3] M. B. B. Loranca and L. A. O. Santos, "Multiple correspondences and
log-linear adjustment to test single correspondence relationships and
categorization of qualitative data in e-commerce," in Electronics,
Communications and Computers, 2004. CONIELECOMP 2004. 14th
International Conference on, 2004.
[4] C. E. Shannon, "A mathematical theory of communication," ACM
SIGMOBILE Mobile Computing and Communications Review, vol. 5,
no. 1, pp. 3-55, 2001.
[5] J. Ziv and A. Lempel, "Compression of individual sequences via
variable-rate coding," Information Theory, IEEE Transactions on,
vol. 24, no. 5, pp. 530-536, 1978.
[6] K. Sayood, Introduction to data compression, Newnes, 2012.
[7] T. Weissman, N. Merhav and A. Somekh-Baruch, "Twofold universal
prediction schemes for achieving the finite-state predictability of a
noisy individual binary sequence," Information Theory, IEEE
Transactions on, vol. 47, no. 5, pp. 1849-1866, 2001.
[8] T. A. Welch, "A technique for high-performance data compression,"
Computer, vol. 17, no. 6, pp. 8-19, 1984.
[9] M. E. Hellman, "An extension of the Shannon theory approach to
cryptography," Information Theory, IEEE Transactions on, vol. 23,
no. 3, pp. 289-294, 1977.
[10] R. N. Williams, "An extremely fast Ziv-Lempel data compression
algorithm," in Data Compression Conference, 1991. DCC'91., 1991.
223 www.erpublication.org