Understanding Advanced Data Compression: F5 White Paper
Understanding Advanced Data Compression: F5 White Paper
Understanding Advanced
Data Compression
Nearly all WAN optimization appliances store and use
previously transferred network data to achieve high
compression ratios, while leveraging advanced compression
routines to improve application performance. How they
achieve these gains, and the limitations of certain
routines, vary widely and can significantly impact the
improvements and benefits associated with WAN
application delivery services.
by Lori MacVittie
Senior Technical Marketing Manager, Application Services
White Paper
Understanding Advanced Data Compression
Contents
Inroduction 3
Implementation Approaches 3
Packets versus Sessions 3
Dictionary Size 4
Heaps’ Law 5
Zipf’s Law 5
Conclusion 9
2
White Paper
Understanding Advanced Data Compression
Organizations have turned to WAN optimization as a way to combat the challenges Source: “Keys to Unlocking
IT Value Through WAN
of assuring application performance and help ensure timely transfer of large Optimization,” Dr. Jim Metzler
data sets across constrained network links. Many WAN optimization solutions
are focused wholly on network-layer optimizations and operate based on rigid
configurations. Not only are these solutions inflexible, but they also fail to
include optimizations that can further enhance the performance of applications
commonly delivered over WAN links.
Implementation Approaches
Packets versus Sessions
To date, most network compression systems have been packet-based. Packet-
based compression systems buffer packets destined for a remote network with a
decompressor. These packets are then compressed either one at a time or
as a group and then sent to the decompressor where the process is reversed
(See Figure 1). Packet-based compression has been available for many years and
can be found in routers, VPN clients, and Juniper Networks WX and WXC
application acceleration appliances.
3
White Paper
Understanding Advanced Data Compression
Packet
Compressor
Unlike previous compression solutions, the F5® BIG-IP® Local Traffic Manager™
(LTM) product with BIG-IP® WAN Optimization Module™ (WOM) operates at the
session layer (Figure 2). This enables BIG-IP LTM with the WAN Optimization Module
to apply compression across a completely homogenous data set while addressing
all application types, resulting in higher compression ratios than comparable
packet-based systems.
Session
Compressor
Dictionary Size
One limitation all compression routines have in common is limited storage space.
Some routines, such as those used by GNUzip (gzip), store as little as 64 kilobytes
(KBs) of data. Others techniques, such as disk-based compression systems, can store
4
White Paper
Understanding Advanced Data Compression
as much as 1 terabyte of data. In order to understand the impact of dictionary size, Zipf’s and Heaps’ laws are
a basic understanding of cache management is required. linguistics-derived mathematical
equations used to predicting
the repetitiveness of a
Similar to requests to a website, not all bytes transferred on the network repeat with
vocabulary subset in a finite text.
the same frequency. Some byte patterns occur with great frequency because they Both laws are applicable outside
are part of a popular document or common network protocol. Other byte patterns linguistics to describe observed
patterns of repetitiveness in
occur only once and are never repeated again. The relationship between frequently data. Both are often used in data
repeating byte sequences and less frequently repeating ones is seen in both Zipf’s deduplication and compression
and Heaps’ laws. algorithms as aids to predict
and optimize the elimination of
repeating byte patterns.
Heaps’ Law
Heaps’ law states that the number of unique words V in a collection with N words
is approximately Sqrt[N]. A plot graph of data that exhibits Heaps’ Law will have a
slope of approximately 0.5.
Zipf’s Law
Zipf’s law provides a mathematical formula for determining the frequency
distribution of words in a language.
r = rank of a word
r * freq(r) = A * N
Zipf’s law states that the frequency of any word in a collection is inversely proportional
to its rank in the frequency table. The most frequent word will occur twice as often
as the second most frequent, and so on. A plot graph of data that exhibits Zipf’s law
will have a slope of -1.
5
White Paper
Understanding Advanced Data Compression
reason gzip and bzip2 perform so well despite lacking a substantial data store is
that the most frequently occurring sequences of bytes represent the majority of
bytes on a network.
Unlike block-based systems, the entire repeating pattern is matched and compressed
by BIG-IP LTM with WOM. In the previous examples, instead of matching only
6
White Paper
Understanding Advanced Data Compression
256 bytes of data, BIG-IP LTM with WOM is able to match and reduce all 392 bytes
of repetitive data. This level of granularity enables BIG-IP LTM with WOM to achieve
greater levels of compression than competing block-based systems—not only on
documents, but also on application layer protocol headers.
7
White Paper
Understanding Advanced Data Compression
2870
2740
2080
1212
509
323
well. This feature is advantageous to organizations that have multiple WAN links
with varying speeds: CPU saver mode minimizes concern over less-than-ideal WAN
optimization that can result from differences in WAN characteristics.
BIG-IP LTM with WOM provides specific policies for file sharing across CIFS, to
optimize traffic between servers running Microsoft Exchange Server and clients running
Microsoft Office Outlook, and for optimizing web applications. These optimization
policies reduce chattiness of the protocols and add web-application-specific
acceleration options that can improve response time and overall performance of
applications delivered via the WAN. These optimizations and acceleration techniques
8
White Paper
Understanding Advanced Data Compression
are possible because of TMOS, which enables WAN optimization and application
acceleration solutions to share a unified internal architecture. This architecture
enhances the ability to apply multiple techniques to the same data, ensuring it
performs as well as possible.
TDR, as implemented in BIG-IP LTM with WOM, has been optimized to maintain
high throughput. While the Riverbed Steelhead 5520 peaks at 540 Mbps,
BIG-IP LTM with WOM can sustain speeds of up to 10,000 Mbps with a single
appliance (BIG-IP version 8900). When TDR is coupled with symmetric adaptive
compression capabilities, BIG-IP LTM with WOM can sustain up to 10,600 Mbps
with the same single appliance.
Conclusion
Achieving substantial application performance gains through compression requires
a good compression algorithm and a system architecture that is designed for
performance. The compression system must precisely match repetitive patterns
to achieve high compression ratios. When possible, the most efficient compression
algorithm based on the network link should be applied automatically. This
system must manage stored data and incoming application traffic to maximize
effectiveness, and it should optimize and accelerate the performance of applications
commonly accessed via a WAN link (see Figure 6). Finally, this system must do all
this quickly to minimize latency and continue to fill the network.
9
White Paper
Understanding Advanced Data Compression
Raw Data
Application Layer Data De-duplication Symmetric Adaptive SSL Encryption TCP Optimization Bandwith Allocation
Acceleration Compression
Optimized Data
Figure 6: How BIG-IP LTM with WOM optimizes applications and data transfers
BIG-IP LTM with WOM and the TDR feature were designed from the ground up
to meet these demands the requirements that a system not only provide significant
compression to improve data transfer rates but simultaneously accelerate and
optimize applications delivered over the WAN. By leveraging the capabilities
afforded by deployment on a unified application delivery platform, BIG-IP LTM
with WOM is able to apply compression algorithms dynamically, optimize
and accelerate web application and email access, reduce bandwidth utilization,
and minimize the time required to transfer large data sets across constrained
WAN links.
F5 Networks, Inc. 401 Elliott Avenue West, Seattle, WA 98119 888-882-4447 www.f5.com
© 2010 F5 Networks, Inc. All rights reserved. F5, F5 Networks, the F5 logo, BIG‑IP, FirePass, iControl, TMOS, and VIPRION are trademarks
or registered trademarks of F5 Networks, Inc. in the U.S. and in certain other countries. CS01-00009 0510