Sublinear Algorithms For Approximating String Compressibility
Sublinear Algorithms For Approximating String Compressibility
Rakesh K R Rollno:40 Semester 7 Computer Science & Engineering Government Engineering College Idukki
CONTENTS
Introduction Approximating Compressibility Run-Length Encoding Lempel-Ziv Conclusion
10/15/2012
INTRODUCTION
Compression Compression is the process of coding that will effectively reducing the total number of bits needed to store an information.
Storage system
Machine learning
10/15/2012
INTRODUCTION
Mainly there are two types of compression schemes: 1. Lossy Compression. 2. Lossless Compression.
10/15/2012
INTRODUCTION
10/15/2012
Approximating compressibility
10/15/2012
Approximating compressibility
10/15/2012
Run-length encoding
Probably the simplest method of compression
compressed: (a,10)(b,7)(c,2)(d,5)(e,3)(f,1)
10/15/2012
Run-length encoding
10/15/2012
10/15/2012
10
The cost of the run length encoding, denoted by CRLE(w), is sum over all the runs of log(l+1)+log()
10/15/2012 Government Engineering College Idukki 11
10/15/2012
12
Algorithm for checking whether RLE is suitable for compressing this string.
1. Select substrings uniformly and independently at random. 2. Determine the how many characters are consecutively repeating in each substring. 3. If it is short, the RLE cost may be high. 4. Else RLE cost is low then it can used for compressing that string.
10/15/2012
13
Lempel-Ziv Encoding
It is a lossless compression technique that achieve compression by replacing repeated occurrences of data with references to a single copy of that data existing earlier in the input data stream. A match is encoded by a pair of numbers called a lengthdistance pair, which is equivalent to the statement "each of the next length characters is equal to the characters exactly distance characters behind it in the uncompressed stream".
10/15/2012 Government Engineering College Idukki 14
Lempel-Ziv Encoding
10/15/2012
15
Lempel-Ziv Encoding
10/15/2012
15
Lempel-Ziv Encoding
10/15/2012
15
Lempel-Ziv Encoding
10/15/2012
15
Lempel-Ziv Encoding
10/15/2012
15
Lempel-Ziv Encoding
10/15/2012
15
Lempel-Ziv Encoding
10/15/2012
15
10/15/2012
16
Lempel-Ziv Encoding :
Algorithm for checking whether LZ77 is suitable for compressing this string.
1. 2. 3. 4.
Select some part of string uniformly and independently at random. Determine the number of distinct length substring. If it is more, the LZ77 cost may be high. Else LZ77 cost is low then it can used for compressing that string.
10/15/2012
17
Lempel-Ziv Encoding :
10/15/2012
18
Conclusion
10/15/2012
19
Reference
1. Sublinear Algorithm for Approximating String Compressibility. Sofya Raskhodnikova , Dana Ron , Ronitt Rubinfeld , Adam Smith. 2. http://www.google.com 3. http://en.wikipedia.org 4. http://www.cse.psu.edu/~sofya/
10/15/2012
20
Thanks
10/15/2012
21