Compression Algo
Compression Algo
ISSN 2320–088X
RESEARCH ARTICLE
Abstract: - Data Compression is an area that needs to be given almost attention is text quality
assessment. Different methodologies have been defined for this purpose. Hence choosing the
best machine learning algorithm is really important. In addition to different compression
technologies and methodologies, selection of a good data compression tool is most important.
There is a complete range of different data compression techniques available both online and
offline working such that it becomes really difficult to choose which technique serves the best.
Here comes the necessity of choosing the right method for text compression purposes and
hence an algorithm that can reveal the best tool among the given ones. A data compression
algorithm is to be developed which consumes less time while provides more compression ratio
as compared to existing techniques. In this paper we represent a hybrid approach to compress
the text data. This hybrid approach is combination of Dynamic Bit reduction method and
Huffman coding.
Keywords: - Text data compression, Dynamic Bit Reduction method, Huffman coding, lossless
data compression.
1. INTRODUCTION
Data compression is a process by which a file (Text, Audio, and Video) may be transformed to
another (compressed) file, such that the original file may be fully recovered from the original file
without any loss of actual information. This process may be useful if one wants to save the
storage space. For example if one wants to store a 4MB file, it may be preferable to first
compress it to a smaller size to save the storage space.
Also compressed files are much more easily exchanged over the internet since they upload and
download much faster. We require the ability to reconstitute the original file from the
compressed version at any time. Data compression is a method of encoding rules that allows
substantial reduction in the total number of bits to store or transmit a file. The more information
being dealt with, the more it costs in terms of storage and transmission costs. In short, Data
Compression is the process of encoding data to fewer bits than the original representation so that
it takes less storage space and less transmission time while communicating over a network [1].
Data Compression is possible because most of the real world data is very redundant. Data
Compression is basically defined as a technique that reduces the size of data by applying
different methods that can either be Lossy or Lossless [1]. A compression program is used to
convert data from an easy-to-use format to one optimized for compactness. Likewise, an
uncompressing program returns the information to its original form.
Data Compression
Lossy Compression
Lossless Compression
Data Compression
The examples of frequent use of Lossy data compression are on the Internet and especially in the
streaming media and telephony applications. Some examples of lossy data compression
algorithms are JPEG, MPEG, MP3.Most of the lossy data compression techniques suffer from
generation loss which means decreasing the quality of text because of repeatedly compressing
and decompressing the file. Lossy image compression can be used in digital cameras to increase
storage capacities with minimal degradation of picture quality.
Lossless compression is used when it is important that the original data and the decompressed
data be identical. Lossless text data compression algorithms usually exploit statistical
redundancy in such a way so as to represent the sender's data more concisely without any error or
any sort of loss of important information contained within the text input data. Since most of the
real-world data has statistical redundancy, therefore lossless data compression is possible. For
instance, In English text, the letter 'a' is much more common than the letter 'z', and the
probability that the letter’t’ will be followed by the letter 'z' is very small. So this type of
redundancy can be removed using lossless compression. Lossless compression methods may be
categorized according to the type of data they are designed to compress. Compression algorithms
are basically used for the compression of text, images and sound. Most lossless compression
programs use two different kinds of algorithms: one which generates a statistical model for the
input data and another which maps the input data to bit strings using this model in such a way
that frequently encountered data will produce shorter output than improbable(less frequent) data.
The advantage of lossless methods over lossy methods is that Lossless compression results are in
a closer representation of the original input data. The performance of algorithms can be
compared using the parameters such as Compression Ratio and Saving Percentage. In a lossless
data compression file the original message can be exactly decoded. Lossless data compression
works by finding repeated patterns in a message and encoding those patterns in an efficient
manner. For this reason, lossless data compression is also referred to as redundancy reduction.
Because redundancy reduction is dependent on patterns in the message, it does not work well on
random messages. Lossless data compression is ideal for text.
2. LITERATURE REVIEW
This section involves the Literature survey of various techniques available for Data compression
and analyzing their results and conclusions.
R. S. Brar and B. Singh, et.al, (2013), “A survey on different compression techniques and bit
reduction algorithm for compression of text data”. International Journal of Advanced
Research in Computer Science and Software Engineering (IJARCSSE) Volume 3, Issue 3,
March 2013.This paper provides a survey of different basic lossless and lossy data compression
techniques. On the basis of these techniques a bit reduction algorithm for compression of text
data has been proposed by the authors based on number theory system and file differential
technique which is a simple compression and decompression technique free from time
complexity. Future work can be done on coding of special characters which are not specified on
key-board to revise better results.
S. Porwal, Y. Chaudhary, J. Joshi, M. Jain, et. al, (2013), “Data Compression Methodologies
for Lossless Data and Comparison between Algorithms”. International Journal of
Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 2, March 2013.
This research paper provides lossless data compression methodologies and compares their
performance. Huffman and arithmetic coding are compared according to their performances. In
this paper the author has found that arithmetic encoding methodology is powerful as compared to
Huffman encoding methodology. By comparing the two techniques the author has concluded that
the compression ratio of arithmetic encoding is better and furthermore arithmetic encoding
reduces channel bandwidth and transmission time also.
Saving Percentage: Saving Percentage calculates the shrinkage of the source file as a
percentage.
87 26 29.8 70.1
From Table 5.1, it is clear that the compression ratio achieved by the proposed system is lesser as
compared to the existing techniques which means it results in more savings of the storage space.
existing systems (Bit reduction and Huffman coding) as it is using dynamic Bit reduction
technique in the first phase and Huffman coding is applied in the second phase to further
improve the performance of the proposed system and to achieve better compression results.
FUTURE WORK
Improved Dynamic Bit Reduction Algorithm works only with text data written in single
language which can also be tested to compress the multi lingual data i.e. text data written in
multiple languages in a single file. In other words, the system works only on ASCII dataset
which can be extended to work with Unicode data for future work.
REFERENCES
[1] R.S. Brar and B. Singh, “A survey on different compression techniques and bit reduction
Algorithm for compression of text data” International Journal of Advanced Research In
Computer Science and Software Engineering (IJARCSSE) Volume 3, Issue 3, March 2013
[2] S. Porwal, Y. Chaudhary, J. Joshi and M. Jain, “Data Compression Methodologies for
Lossless Data and Comparison between Algorithms” International Journal of Engineering
Science and Innovative Technology (IJESIT) Volume 2, Issue 2, March 2013
[3] S. Shanmugasundaram and R. Lourdusamy, “A Comparative Study of Text Compression
Algorithms” International Journal of Wisdom Based Computing, Vol.1 (3), Dec 2011
[4] S. Kapoor and A. Chopra, "A Review of Lempel Ziv Compression Techniques" IJCST Vol.4,
Issue 2, April-June 2013
[5] S.R. Kodituwakku and U. S. Amarasinghe, “Comparison of Lossless Data Compression
Algorithms for Text Data “Indian Journal of Computer Science & Engineering Vol 1 No 4
[6] R. Kaur and M. Goyal, “An Algorithm for Lossless Text Data Compression” International
Journal of Engineering Research & Technology (IJERT), Vol. 2 Issue 7, July - 2013
[7] H. Altarawneh and M. Altarawneh, "Data Compression Techniques on Text Files: A
Comparison Study” International Journal of Computer Applications, Vol 26– No.5, and July
2011
[8] U. Khurana and A. Koul, “Text Compression and Superfast Searching” Thapar Institute Of
Engineering and Technology, Patiala, Punjab, India-147004
[9] A. Singh and Y. Bhatnagar, “Enhancement of data compression using Incremental
Encoding” International Journal of Scientific & Engineering Research, Vol 3, Issue 5, May-2012
[10] A.J Mann, “Analysis and Comparison of Algorithms for Lossless Data Compression”
International Journal of Information and Computation Technology, ISSN 0974-2239 Volume 3,
Number 3 (2013), pp. 139-146
Regards:
Amandeep Singh Sidhu [M.Tech] 1, Er. Meenakshi Garg [M.Tech] 2,
C.S.E. & Guru Kashi University, Talwandi Sabo, Bathinda, Punjab, India
[email protected], +91-99141-85500