0% found this document useful (0 votes)
42 views27 pages

Sublinear Algorithms For Approximating String Compressibility

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes.We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly.

Uploaded by

Rakesh K R
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views27 pages

Sublinear Algorithms For Approximating String Compressibility

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes.We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly.

Uploaded by

Rakesh K R
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Sublinear Algorithms for Approximating String Compressibility

Rakesh K R Rollno:40 Semester 7 Computer Science & Engineering Government Engineering College Idukki

CONTENTS
Introduction Approximating Compressibility Run-Length Encoding Lempel-Ziv Conclusion

10/15/2012

Government Engineering College Idukki

INTRODUCTION
Compression Compression is the process of coding that will effectively reducing the total number of bits needed to store an information.

Application of Compression techniques in:


Communications

Storage system

Machine learning

10/15/2012

Government Engineering College Idukki

INTRODUCTION
Mainly there are two types of compression schemes: 1. Lossy Compression. 2. Lossless Compression.

In general the compressibility of a string is measured by using Kolmogrov complexity

10/15/2012

Government Engineering College Idukki

INTRODUCTION

10/15/2012

Government Engineering College Idukki

Approximating compressibility

10/15/2012

Government Engineering College Idukki

Approximating compressibility

10/15/2012

Government Engineering College Idukki

Run-length encoding
Probably the simplest method of compression

The general idea behind this method is to replace


consecutive repeating occurrences of a symbol by the one occurrence of a symbol followed by the number of occurrences. E.g. original string: aaaaaaaaaabbbbbbbccdddddeeef

compressed: (a,10)(b,7)(c,2)(d,5)(e,3)(f,1)

10/15/2012

Government Engineering College Idukki

Run-length encoding

10/15/2012

Government Engineering College Idukki

Run-length encoding: goal

10/15/2012

Government Engineering College Idukki

10

Run-length encoding: cost


Consider a string w over the alphabet can be partitioned in to maximal runs of identical characters of the form (,l) where, is a symbol in l is the length of run The # bits needed to represent this pair is log(l+1)+log()

The cost of the run length encoding, denoted by CRLE(w), is sum over all the runs of log(l+1)+log()
10/15/2012 Government Engineering College Idukki 11

Checking whether RLE is suitable


Input: Substrings of the original string

Sublinear Algorithm For Approximating String Compressibility

Output: Approximate cost for compressing the string using RLE

10/15/2012

Government Engineering College Idukki

12

Checking whether RLE is suitable

Algorithm for checking whether RLE is suitable for compressing this string.

1. Select substrings uniformly and independently at random. 2. Determine the how many characters are consecutively repeating in each substring. 3. If it is short, the RLE cost may be high. 4. Else RLE cost is low then it can used for compressing that string.

10/15/2012

Government Engineering College Idukki

13

Lempel-Ziv Encoding
It is a lossless compression technique that achieve compression by replacing repeated occurrences of data with references to a single copy of that data existing earlier in the input data stream. A match is encoded by a pair of numbers called a lengthdistance pair, which is equivalent to the statement "each of the next length characters is equal to the characters exactly distance characters behind it in the uncompressed stream".
10/15/2012 Government Engineering College Idukki 14

Lempel-Ziv Encoding

10/15/2012

Government Engineering College Idukki

15

Lempel-Ziv Encoding

10/15/2012

Government Engineering College Idukki

15

Lempel-Ziv Encoding

10/15/2012

Government Engineering College Idukki

15

Lempel-Ziv Encoding

10/15/2012

Government Engineering College Idukki

15

Lempel-Ziv Encoding

10/15/2012

Government Engineering College Idukki

15

Lempel-Ziv Encoding

10/15/2012

Government Engineering College Idukki

15

Lempel-Ziv Encoding

10/15/2012

Government Engineering College Idukki

15

Checking whether LZ77 is suitable


Input: Substrings of the original string

Sublinear Algorithm For Approximating String Compressibility

Output: Approximate cost for compressing the string using LZ77

10/15/2012

Government Engineering College Idukki

16

Lempel-Ziv Encoding :

Algorithm for checking whether LZ77 is suitable for compressing this string.

1. 2. 3. 4.

Select some part of string uniformly and independently at random. Determine the number of distinct length substring. If it is more, the LZ77 cost may be high. Else LZ77 cost is low then it can used for compressing that string.

10/15/2012

Government Engineering College Idukki

17

Lempel-Ziv Encoding :

10/15/2012

Government Engineering College Idukki

18

Conclusion

10/15/2012

Government Engineering College Idukki

19

Reference

1. Sublinear Algorithm for Approximating String Compressibility. Sofya Raskhodnikova , Dana Ron , Ronitt Rubinfeld , Adam Smith. 2. http://www.google.com 3. http://en.wikipedia.org 4. http://www.cse.psu.edu/~sofya/

10/15/2012

Government Engineering College Idukki

20

Thanks

10/15/2012

Government Engineering College Idukki

21

You might also like