Data compression algorithms reduce the size of digital files to reduce storage and transmission costs. There are two main types: lossless techniques exactly reconstruct the original data, while lossy techniques allow small changes but achieve much higher compression ratios. Common techniques include run-length encoding, Huffman coding, and Lempel-Ziv-Welch (LZW) compression. JPEG is a widely used industry standard that applies lossy compression well-suited for images, while MPEG does the same for video.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
51 views
Lecture 10 - Data Compression
Data compression algorithms reduce the size of digital files to reduce storage and transmission costs. There are two main types: lossless techniques exactly reconstruct the original data, while lossy techniques allow small changes but achieve much higher compression ratios. Common techniques include run-length encoding, Huffman coding, and Lempel-Ziv-Welch (LZW) compression. JPEG is a widely used industry standard that applies lossy compression well-suited for images, while MPEG does the same for video.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18
Data Compression
By Fareed Ahmed Jokhio
Data Compression • Data transmission and storage cost money. • The more information being dealt with, the more it costs. • In spite of this, most digital data are not stored in the most compact form. • Rather, they are stored in whatever way makes them easiest to use, such as: ASCII text from word processors, binary code that can be executed on a computer, individual samples from a data acquisition system, etc. Data Compression • Typically, these easy-to-use encoding methods require data files about twice as large as actually needed to represent the information. • Data compression is the general term for the various algorithms and programs developed to address this problem. • A compression program is used to convert data from an easy-to-use format to one optimized for compactness. • Likewise, an uncompression program returns the information to its original form. Data Compression • We examine five techniques for data compression in this lecture. • The first three are simple encoding techniques, called: run-length, Huffman, and delta encoding. • The last two are elaborate procedures that have established themselves as industry standards: LZW and JPEG. Data Compression Strategies • Table below shows two different ways that data compression algorithms can be categorized. • In (a), the methods have been classified as either lossless or lossy. Data Compression Strategies • A lossless technique means that the restored data file is identical to the original. • This is absolutely necessary for many types of data, for example: executable code, word processing files, tabulated numbers, etc. • You cannot afford to misplace even a single bit of this type of information. Data Compression Strategies • In comparison, data files that represent images and other acquired signals do not have to be keep in perfect condition for storage or transmission. • All real world measurements inherently contain a certain amount of noise. • If the changes made to these signals resemble a small amount of additional noise, no harm is done. • Compression techniques that allow this type of degradation are called lossy. Data Compression Strategies • This distinction is important because lossy techniques are much more effective at compression than lossless methods. • The higher the compression ratio, the more noise added to the data. Data Compression Strategies • Images transmitted over the world wide web are an excellent example of why data compression is important. • Suppose we need to download a digitized color photograph over a computer's 33.6 kbps modem. • If the image is not compressed (a TIFF file, for example), it will contain about 600 kbytes of data. Data Compression Strategies • If it has been compressed using a lossless technique (such as used in the GIF format), it will be about one-half this size, or 300 kbytes. • If lossy compression has been used (a JPEG file), it will be about 50 kbytes. • The point is, the download times for these three equivalent files are 142 seconds, 71 seconds, and 12 seconds, respectively. Data Compression Strategies • That's a big difference! JPEG is the best choice for digitized photographs, while GIF is used with drawn images, such as company logos that have large areas of a single color. Data Compression Strategies • Our second way of classifying data compression methods is shown in Table below Data Compression Strategies • Most data compression programs operate by taking a group of data from the original file, compressing it in some way, and then writing the compressed group to the output file. • For instance, one of the techniques in this table is CS&Q, short for coarser sampling and/or quantization. Data Compression Strategies • Suppose we are compressing a digitized waveform, such as an audio signal that has been digitized to 12 bits. • We might read two adjacent samples from the original file (24 bits), discard one of the sample completely, discard the least significant 4 bits from the other sample, and then write the remaining 8 bits to the output file. • With 24 bits in and 8 bits out, we have implemented a 3:1 compression ratio using a lossy algorithm. Data Compression Strategies • While this is rather crude in itself, it is very effective when used with a technique called transform compression. • As we will discuss later, this is the basis of JPEG. Data Compression Strategies • Table below shows CS&Q to be a fixed-input fixed-output scheme Data Compression Strategies • That is, a fixed number of bits are read from the input file and a smaller fixed number of bits are written to the output file. • Other compression methods allow a variable number of bits to be read or written. Data Compression Strategies • As you go through the description of each of these compression methods, refer back to this table to understand how it fits into this classification scheme. • Why are JPEG and MPEG not listed in this table? • These are composite algorithms that combine many of the other techniques. • They are too sophisticated to be classified into these simple categories.