Quick HTTP Survey
Quick HTTP Survey
James Gettys
Visiting Scientist, World Wide Web Consortium
Digital Equipment Corporation
[email protected]
155
In this paper, we present the final results, and some of the documents using data compression; HTTP/l.0 does not have
thought processes that we went through while testing and sufficient facilities for transport compression. Further work is
optimizing our implementations. Our hope is that our experience continuing in this area [26].
may guide others through their own implementation efforts and
help them avoid some non-obvious performance pits we fell The major HTTP/l.1 design goals therefore include:
into, Further information, the data itself (and later data
collection runs) can be found on the Web [25]. . lower HTTP’s load on the Internet for the same
amount of “real work”, while solving the congestion
1.1 Changes to HTTP caused by HTTP
HTTP/l.1 [4] is an upward compatible protocol to HTTP/1.0 . HTTP/1.0’s caching is primitive and error prone;
[3], Both HTTP/l.0 and HTTP/1.1 use theTCP protocol [12] for HTTP/l.1 enable applications to work reliably with
data transport. However, the two versions of HTTP use TCP caching
differently.
. end user performance must improve, or it is unlikely
HTTP/l.0 opens and closes a new TCP connection for each that HTTP/l.1 will be deployed
operation. Since most Web objects are small, this practice means
a high fraction of packets are simply TCP control packets used HTTP/I.1 provides significant improvements to HTTP/l.0 to
to open and close a connection. Furthermore, when a TCP allow applications to work reliably in the face of caching, and to
connection is first opened, TCP employs an algorithm known as allow applications to mark more content cacheable. Today,
slow start [ 111. Slow start uses the first several data packets to caching is often deliberately defeated in order to achieve
probe the network to determine the optimal transmission rate. reliability. This paper does not explore these effects.
Again, because Web objects are small, most objects are
transferred before their TCP COMCCtiOn completes the slow start HTTP/1.1 does not attempt to solve some commonly seen
algorithm. In other words, most HTTP/l.0 operations use TCP problems, such as transient network overloads at popular web
at its least eflicient. The results have been major problems due sites with topical news (e.g. the Schumacher-Levy comet impact
to resulting congestion and unnecessary overhead [6]. on Jupiter), but should at least help these problems.
HTTP/l.1 leaves the TCP connection open between consecutive This paper presents measured results of the consequences of
operations. This technique is called “persistent connections,” HTTP/1.1 transport protocol additions. Many of these additions
which both avoids the costs of multiple opens and closes and have been available as extensions to HTTP/1.0. but this paper
reduces the impact of slow start. Persistent connections are more shows the possible synergy when the extensions to the HTTP
efficient than the current practice of running multiple short TCP protocol are used in concert, and in with changes in content.
connections in parallel.
1.2 Range Requests and Validation
By leaving the TCP connection open between requests, many
packets can be avoided, while avoiding multiple RlTs due to To improve the perceived response time, a browser needs to
TCP slow start. The first few packet exchanges of a new TCP learn basic size information of each object in a page (required
connection are either too fast, or too slow for that path. If these for page layout) as soon as possible. The first bytes typically
exchanges are too fast for the route (common in today’s contain the image size. To achieve better concurrency and
Internet), they contribute to Internet congestion. retrieve the first few bytes of embedded links while still
receiving the bytes for the master document, HTTP/1.0 browsers
Conversely, since most connections are in slow start at any usually use multiple TCP connections. We believe by using
given time in HTTP/l.0 not using persistent connections, range requests HTTP/I.1 clients can achieve similar or better
keeping a dialup PPP link busy has required running multiple results over a single connection.
TCP connections simultaneously (typical implementations have
used 4 TCP connections). This can exacerbate the congestion HTTP/1.1 defines as part of the standard (and most current
problem further. HTTP/l.0 servers already implement) byte range facilities that
allow a client to perform partial retrieval of objects. The initial
The “Keep-Alive” extension to HTTP/1.0 is a form of persistent intent of range requests was to allow caching proxy to finish
connections. HTTP/1.1’s design differs in minor details from interrupted transfers by requesting only the bytes of the
Keep-Alive to overcome a problem discovered when Keep- document they currently do not hold in their cache.
Alive is used with more than one proxy between a client and a
server. To solve the problem that browsers need the size of embedded
objects, we believe that the natural revalidation request for
Persistent connections allow multiple requests to be sent without HTJ.‘P/l.l will combine both cache validation headers and an If-
waiting for a response; multiple requests and responses can be Range request header, to prevent large objects from
contained in a single TCP segment. This can be used to avoid monopolizing the connection to the server over its connection.
many round trip delays, improving performance, and reducing The range requested should be large enough to usually return
the number of packets further. This technique is called any embedded metadata for the object for the common data
“pipelining in HTTP. types. This capability of HTl’P/I.I is implicit in its caching and
range request design.
HTTP/l.1 also enables transport compression of data types so
those clients can retrieve HTML (or other) uncompressed When a browser revisits a page, it has a very good idea what the
156
type of any embedded object is likely to be, and can therefore new HTTP methods to achieve the performance benefits
both make a validation request and also simultaneously request documented below. As this paper makes clear, both pipelining
the metadata of the embedded object if there has been any and persistent connections are needed to achieve high
change, The metadata is much more valuable than the embedded performance over a single HTTP connection.
image data, Subsequently, the browser might generate requests
for the rest of the object, or for enough of each object to allow Pipelining, or batching, have been successfully used in a number
for progressive display of image data types (e.g. progressive of other systems, notably graphics protocols such as the X
PNG, GIF or IPEG images), or to multiplex between multiple Window System [15] or Trestle [16], in its original RPC based
large images on the page. We call this style of use of HTTP/1.1 implementation.
“poor man’s multiplexing.”
Touch, Heidemann, and Obraczka [5] explore a number of
We believe cache validation combined with range requests will possible changes that might help HTTP behavior, including the
likely become a very common idiom of HTTP/1.1. sharing of TCP control blocks [19] and Transaction TCP
(T/TCP) [17], [ 181. The extended length of deployment of
1.3 Changes to Web Content changes to TCP argued against any dependency of HTTP/l.1 on
either of these; however, we believe that both mechanisms may
Roughly simultaneously to the deployment of the HTTP/1.1 improve performance, independently to the improvements made
protocol, (but not dependent upon it), the Web will see the by HTTP/1.1. T/TCP might help reduce latency when revisiting
deployment of Cascading Style Sheets (CSS) [30] and new a Web server after the server has closed its connection. Sharing
image and animation formats such as Portable Network of TCP control blocks would primarily help HTTP/1.0,
Graphics (PNG) [ZO] and Multiple-image Network Graphics however, since the HTTP/I.1 limits the number of connections
(MNG) [311. between a client/server pair.
In the scientific environment where the Web was born, people In independent work, Heidemann [7] describes the interactions
were generally more concerned with the content of their of persistent connections with Nagle’s algorithm. His experience
documents than the presentation, In a research report, the choice is confirmed by our experience described in this paper, and by
of fonts matters less than the results being reported, so early the experience of one of the authors with the X Window System,
versions of HypefIext Markup Language (HTML) sufficed for which caused the original introduction of the ability to disable
most scientists. However, when non-scientific communities Nagle’s algorithm into BSD derived TCP implementations.
discovered the Web, the perceived limitations of HTML became
a source of frustration. Web page designers with a background Simon Spero analyzed HTTP/l.0 performance [6] and prepared
in paper-based desktop publishing wanted more control over the a proposal for a replacement for HTTP. HTTP/1.1, however,
presentation of their documents than HTML was meant to was constrained to maintain upward compatibility with
provide, Cascading Style Sheets (CSS) offer many of the HTTP/1.0. Many of his suggestions are worthwhile and should
capabilities requested by page designers but is only now seeing be explored further.
widespread implementation.
Style sheets have a long history in the Web [30]. We believe
In the absence of style sheets, authors have had to meet design that the character of our results will likely be similar for other
challenges by twisting HTML out of shape, for instance, by style sheet systems. However, we are not aware of any prior
studding their pages with small images that do little more than work investigating the network performance consequences of
display text. In this section of the study, we estimate how Web style sheets.
performance will be affected by the introduction of CSS. We
will not discuss other benefits to be reaped with style sheets, 3 TestSetup
such as greater accessibility, improved printing, and easier site
management, 3.1 Test Web Site
On the web, most images are in GIF format. A new image We synthesized a test web site serving data by combining data
format, PNG, has several advantages over GIF. PNG images (HTML and GIF image data) from two very heavily used home
render more quickly on the screen and - besides producing pages (Netscape and Microsoft) into one; hereafter called
higher quality, cross-platform images - PNG images are usually ‘Microscape”. The initial layout of the Microscape web site was
smaller than GIF images. a single page containing typical HTML totaling 42KB with 42
inlined GIF images totaling 125KB. The embedded images
MNG is an animation format in the PNG family, which - along range in size from 70B to 40KB; most are small, with 19 images
with other advantages - is more compact than animated GIF. less than lKB, 7 images between 1KB and 2KB, and 6 images
between 2KB and 3KB. While the resulting HTML page is
2 Prior Work larger, and contains more images than might be typical, such
pages can be found on the Web.
Padmanabhan and Mogul [1] show results from a prototype
implementation which extended HTTP to support both persistent 3.2 First Time Retrieval Test
connections and pipelining, and study latencies, throughput, and
system overhead issues involved in persistent connections. This The first time retrieval test is equivalent to a browser visiting a
analysis formed the basic data and justification behind site for the first time, e.g. its cache is empty and it has to retrieve
HTTP/1.1’s persistent connection and pipelining design. the top page and all the embedded objects. In HTTP, this is
HTTP/l.1 primarily relies on pipelining rather than introducing equivalent to 43 GETrequests.
157
3.3 Revalidate Test Component Type and Version
This test is equivalent to revisiting a home page where the tum.ee.lbl.gov.Digital AlphaStation3000,
contents are already available in a local cache. The initial page WAN Client Hardware uND[ 4.0
and all embedded objects are validated, resulting in no actual
transfer of the HTML or the embedded objects. In HlTP, this is big.w3.org, Dual PentiumPro PC, Windows
PPP Client Hardware
equivalent to 43 Condirionaf GET requests. HTTP/l.1 supports NT Server4.0
two mechanisms for cache validation: e&y rugs, which are a
guaranteed unique tag for a particular version of an object, and HTTP ServerSofhvare Jigsaw 1.06 and Apache 1.2blO
date stamps. HTTP/l.0 only supports the latter.
libwww robot, Netscape Communicator4.0
HlTP Client Software beta 5 and MicrosoftInternetExplorer4.0
H’I’Wl.0 support was provided by an old version of libwww
beta 1 on Windows NT
(version 4.1D) which supported plain HTlWl.0 with multiple
simultaneous connections between two peers and no persistent
cache, In this case we simulated the cache validation behavior Table 2 - Applications, Machines, and OSs
by issuing HEAD requests on the images instead of Condirional
G,!iT requests. The profile of the HTTP/l .Orevalidation requests None of the machines were under significant load while the tests
therefore was a total of 43 associated with the top page with one were run. The server is identical through our final tests - only
GllT (HTML) and 42 HEAD requests (images), in the initial the client changes connectivity and behavior. Both Jigsaw and
tests. The HTI’P/I.I implementation of libwww (version 5.1) Libwww are currently available with HTTP/l.1 implementations
differs from the HTTP/l.0 implementation. It uses a full without support for the features described in this paper and
HTTP/l.1 compliant persistent cache generating 43 Condikmzl Apache is in beta release. During the experiments changes were
GET requests with appropriate cache validation headers to make made to all three applications. These changes will be made
the test more similar to likely browser behavior. Therefore the available through normal release procedures for each of the
number of packets in the results reported, below for HTTP/1.0 applications.
are higher than of the correct cache validation data reported for
H’lTP/l,I. 4 Initial Investigations and Tuning
3.4 Network Environments Tested The m/I.0 robot was set to use plain HTlWI.0 requests
using one TCP connection per request. We set the maximum
In order to measure the performance in commonly used different number of simultaneous connections to 4. the same as Netscape
network environments, we used the following gable 1) three Navigator’s default (and hard wired maximum, it turns out).
combinations of bandwidth and latency:
After testing HTIWl.0, we ran the robot as a simple HTIWl.1
Channel Connection RlT MSS client using persistent connections. That is, the request /
response sequence looks identical to HllWl.0 but all
High bandwidth,
LAN - IOMbitEthernet <Iltls 1460 communication happens on the same TCP connection instead of
low lntency 4, hence serializing all requests. The results as seen in Table 3
was a significant saving in TCP packets using HTIWl.1 but
High bandwidth, WAN -MA (MlT/LCS)to CA _ 9o ms 146o also a big increase in elapsed time.
high latency (J-W
158
flush the output buffer, First we implemented a version with two 4.1.2 Nagle Interaction
mechanisms:
We expected, due to experience of one of the authors, that a
1. The buffer was flushed if the data in the output buffer pipelined implementation of HTTP might encounter the Nagle
reached a certain size. We experimented with the output algorithm [2] [5] in TCP. The Nagle algorithm was introduced
buffer size and found that 1024 bytes is a good in TCP as a means of reducing the number of small TCP
compromise. In case the MTU is 536 or 512 we will segments by delaying their transmission in hopes of further data
produce two full TCP segments, and if the MTU is 1460 becoming available, as commonly occurs in telnet or rlogin
(Ethernet size) then we can nicely fit into one segment. traffic. As our implementation can generate data asynchronously
without waiting for a response, the Nagle algorithm could be a
2, We introduced a timer in the output buffer stream which bottleneck.
would time-out after a specified period of time and force
the buffer to be flushed, It is not clear what the optimal A pipelined application implementation buffers its output before
flush time-out period is but it is likely that it is a function writing it to the underlying TCP stack, roughly equivalent to
of the network load and connectivity. Initially we used a 1 what the Nagle algorithm does for telnet connections. These two
second delay for the initial results in Table 3, but used a buffering algorithms tend to interfere, and using them together
50 ms delay in for all later tests. Further work is required will often cause very significant performance degradation. For
to understand where we should set such a timer, which each connection, the server maintains a response buffer that it
might also take into account the R’IT for this particular flushes either when full, or when there is no more requests
connection or other factors, to support old clients which coming in on that connection, or before it goes idle. This
do not explicitly flush the buffer. buffering enabIes aggregating responses (for example, cache
validation responses) into fewer packets even on a high-speed
HlTP/l.I HlTF’/I.I network, and saving CPU time for the server.
HTTP’1’o Persistent Pipeline
In order to test this, we turned the Nagle algorithm off in both
Max simultaneoussockets 6 1 1 the client and the server. This was the first change to the server -
all other changes were made in the client. In our initial tests, we
Total number of socketsused 40 1 1 did not observe significant problems introduced by Nagle’s
algorithm, though with hindsight, this was the result of our
Packetsfrom client to server 226 70 25
pipelined implementation and the specific test cases chosen,
271
since with effective buffering, the segment sizes are large,
Packetsfrom serverto client 153 58
avoiding Nagle’s algorithm. In later experiments in which the
Tot01number of packets 497 223 83 buffering behavior of the implementations were changed, we did
observe significant (sometimes dramatic) transmission delays
Total elapsedtime [sets] 1.85 4.13 3.02 due to Nagle; we recommend therefore that HTTP/l.1
implementations that buffer output disable Nagle’s algorithm
Table 3 - Jigsaw - Initial High Bandwidth, Low Latency (set the TCPNODELAY socket option). This confirms the
Cache Revalidation Test experiences of Heidemann [7].
We were simultaneously very happy and quite disappointed with We also performed some tests against the Apache 1.2b2 server,
the initial results above, taken late at night on a quiet Ethernet. which also supports HlTP/I.I, and observed essentially similar
Elapsed time performance of HTTP/l.1 with pipelining was results to Jigsaw. Its output buffering in that initial beta test
worse than HTTP/l.0 in this initial implementation, though the release was not yet as good as our revised version of Jigsaw, and
number of packets used were dramatically better. We scratched in that release it processes at most five requests before
our heads for a day, then convinced ourselves that on a local terminating a TCP connection. When using pipelining, the
Ethernet, there was no reason that HTTP/1.1 should ever number of HTTP requests served is often a poor indicator for
perform more slowly than HTTP/1.0. The local Ethernet cannot when to close the connection. We discussed these results with
suffer from fairness problems that might give multiple Dean Gaudet and others of the Apache group and similar
connections a performance edge in a long haul network. We dug changes were made to the Apache server, our final results below
into our implementation tirther. are using a version of Apache 1.2blO.
After study, we realized that the application (the robot) has Implementations need to close connections carefully. HTTP/l.0
much more knowledge about the requests than libwww, and by implementations were able to naively close both halves of the
introducing an explicit flush mechanism in the application, we TCP connection simultaneously when finishing the processing
could get significantly better performance. We modified the of a request. A pipelined HTTP/l.1 implementation can cause
robot to force a flush after issuing the first request on the HTML major problems if it does so.
document and then buffer the following requests on the inlined
images. While H’ITP libraries can be arranged to automatically The scenario is as follows: An HTTP server can close its
flush buffers automatically after a timeout, taking advantage of connection between any two responses. An HTTP/l.1 client
knowledge in the application can result in a considerably faster talking to a HTTP/l.1 server starts pipelining a batch of
implementation than relying on such a timeout. requests, for example 15 requests, on an open TCP connection.
The server might decide that it will not serve more than 5
159
requests per connection and closes the TCP connection in both implementation; the overhead in our implementation became a
directions after it successfully has served the first five requests. performance bottleneck in our HTTP/l.1 tests. Time and
The remaining 10 requests that are already sent from the client resources did not permit optimizing this code. Our final
will along with client generated TCP ACK packets arrive on a measurements use correct HlTP/l.l cache validation requests,
closed port on the server. This “extra” data causes the server’s and run with a persistent cache on a memory file system to
TCP to issue a reset; this forces the client TCP stack to pass the reduce the disk performance problems that we observed.
last ACK’ed packet to the client application and discard all other
packets, This means that HTTP responses that are either being The measurements in Table 4 through Table 9 are a consistent
received or already have been received successfully but haven’t set of data taken just before publication. While Jigsaw had
been ACK’ed will be dropped by the client TCP. In this outperformed Apache in the first round of tests, Apache now
situation the client does not have any means of finding out outperforms Jigsaw (which ran interpreted in our tests). Results
which HTTP messages were successful or even why the server of runs generally resembled each other. For the WAN test
closed the connection. The server may have generated a however, the higher the latency, the better HTTP/l.1 performed.
“Conneclion: Close” header in the 5th response but the header The data below was taken when the Internet was particularly
may have been lost due to the TCP reset, if the server’s sending quiet.
side is closed before the receiving side of the connection.
Servers must therefore close each half of the connection 5.1 Changing Web Content Representation
independently.
After having determined that HTTP/l.1 outperforms HTTP/l.0
TCP’s congestion control algorithms [I 1] work best when there we decided to try other means of optimizing the performance.
are enough packets in a connection that TCP can determine the We therefore investigated how much we would gain by using
approximate optimal maximum rate at which to insert packets data compression of the HTIP message body. That is. we do not
into the Internet. Observed packet trains in the Internet have compress the HTTP headers, but only the body using the
been dropping [ 131, almost certainly due to H’lTP/l.O’s “Content-Encoding” header to describe the encoding
behavior, as demonstrated in the data above, where a single mechanism. We use the zlib compression library [23] version
connection rarely involves more than 10 packets, including TCP 1.04, which is a freely available C based code base. It has a
open and close, Some IP switch technology exploits packet stream based interface which interacts nicely with the libwww
trains to enable faster IP routing. In the tests above, the packet stream model. Note that the PNGlibrary also uses zlib, so
trains are significantly longer, but not as long as one might first common implementations will share the same data compression
expect, since fewer, larger packets are transmitted due to code. Implementation was at most a day or two.
pipelining.
The client indicates that it is capable of handling the “deflate”
The HTTP/l.1 proposed standard specification does specify at content coding by sending an “Accept-Encoding: deflate” header
most two connections to be established between a client/server in the requests. In our test, the server does not perform on-the-
pair. (If you get a long, dynamically generated document, a fly compression but sends out a pre-computed deflated version
second connection might be required to fetch embedded of the Microscape HTML page. The client performs on-the-fly
objects,) Dividing the mean length of packet trains down by a inflation and parses the inflated HTML using its normal HTML
factor of two diminish the benetits to the Internet (and possibly parser.
to the end user due to slow start) substantially. Range requests
need to be exploited to enable good interactive feel in Web Note that we only compress the HTML page (the first GET
browsers while using a single connection. Connections should request) and not any of the following images, which are already
be maintained as long as makes reasonable engineering sense compressed using various other compression algorithms (GIF).
[9], to pick up user’s “click ahead” while following links.
The zlib library has several flags for how to optimize the
5 After Initial Tuning Tests compression algorithm, however we used the default values for
both deflating and inflating. In our case this caused the
To make our final round of tests as close as possible to likely Microscape HTML page to be compressed more than a factor of
real implementations, we took the opportunity to change the three from 42K to IIK. This is a typical factor of gain using
HTTP/l.1 version of the robot to issue full HTTP/I.1 cache this algorithm on HTML files. This means that we decrease the
validation requests. These use If-None-Makh headers and overall payload with about 31K or approximately 19%.
opaque validators, rather than the HEAD requests used in our
HTTP/l.0 version of the robot. With the optimized clients and 6 Measurements
servers, we then took a complete set of data, for both the first
time retrieval and cache validation tests, in the three network The datga shown in these tables are a summary of the more
environments. detailed data acquisition overview. In all cases, the traces were
taken on client side, as this is where the interesting delays are.
It was easiest to implement full HlTP/I.l caching semantics by Each run was repeated 5 times in order to make up for network
enabling persistent caching in libwww. This had unexpected fluctuations, except Table 10 and Table 11, which were repeated
consequences due to libwww’s implementation of persistent three times. In the tables below, Pa = Packets, and Set =
caching, which is written for ease of porting and implementation Seconds. %ov is the percentage of overhead bytes due to TCP/IP
rather than performance. Each cached object contains two packet headers.
independent files: one containing the cacheable message headers
and the other containing the message body. This would be an
area that one would optimize carefully in a product
160
First Time Retrieval Cache Validation First Time Retrieval Cache Validation
Pa Bytes Set %ov Pa Bytes Set %ov Pa Bytes Set %ov Pa Bytes Set %ov
HTTP/I .O 510.2 216289 0.97 8.6 374.8 61117 0.78 19.7 HTTP/l.0 559.6 248655.2 4.09 8.3 370.0 61887 2.64 19.3
HTTP/l.1 281.0 191843 1.25 5.5 133.4 17694 0.89 23.2 HTlWl.1 309.4 191436.0 6.14 6.1 101.2 14255 4.43 22.6
HlTP/l.l HTTP/l.1
Pipelined 181.8 191551 0.68 3.7 32.8 17694 0.54 6.9 Pipelined 221.4 191180.6 2.23 4.4 29.8 15352 0.86 7.2
HTTP/l.1 HTlWl.1
Pipelined VI. 148.8 159654 0.71 3.6 32.6 17687 0.54 6.9 Pipelined w. 182.0 159170.0 2.11 4.4 29.0 15088 0.83 7.2
compression compression
Table 4 - Jigsaw - High Bandwidth, Low Latency Table 7 - Apache - High Bandwidth, High Latency
Pa Bytes Set %ov Pa Bytes Set %ov First Time Retrieval Cache Validation
HTTP/I .O 489.4 215536 0.72 8.3 365.4 60605 0.41 19.4 Pa Bytes Set %ov Pa Bytes Set %ov
HTTP/I.1 244.2 189023 0.81 4.9 98.4 14009 0.40 21.9 Hl-lWl.1 309.6 190687 63.8 6.1 89.2 17528 12.9 16.9
HITP/I.l HI-IW1.I
Pipelined 175.8 189607 0.49 3.6 29.2 14009 0.23 7.7 Pipelined 284.4 190735 53.3 5.6 31.0 17598 5.4 6.6
HTTP/l.1 Hl-rP/l.l
Pipelincdw. 139,s 156834 0.41 3.4 28.4 14002 0.23 7.5 Pipelinedw. 234.2 159449 47.4 5.5 31.0 17591 5.4 6.6
compression compression
Table 5 - Apache - High Bandwidth, Low Latency Table 8 - Jigsaw - Low Bandwidth, High Latency
First Time Retrieval Cache VaIidation Pa Bytes Set %ov Pa Bytes Set %ov
Pa Bytes Set %ov Pa Bytes Set %ov H’l-lWl.1 308.6 187869 65.6 6.2 89.0 13843 11.1 20.5
Netscape
Navigator’ 339.4 201807 58.8 6.3 108 19282 14.9 18.3
Internet
Explore? 360.3 199934 63.0 6.7 301.0 61009 17.0 16.5
161
For the first time retrieval test, bandwidth savings due to
First Time Retrieval Cache Validation pipelining and persistent connections of HTTP/1.1 is only a few
percent. Elapsed time on both WAN and LAN roughIy halved.
Pa Bytes Set %ov Pa Bytes Set %ov
An H’IlW1.1 implementation that does not implement A separate ACK packet is subject to the delayed
acknowledgement algorithm that may delay the packet up to
pipelining will perform worse (have higher elapsed time) than
2OOms.Another strategy, which we have not tried, would be to
an HTJW1.0 implementation using multiple connections.
always flush the buffer after processing the first segment if no
The mean number of packets in a TCP session increased new segments are available, though this would often cost an
between a factor of two and a factor of ten. The mean size of a extra packet.
packet in our traffic roughly doubled. However, if style sheets
We observed these delayed ACKs in out traces on the first
see widespread use, do not expect as large an improvement in
the number of packets in a TCP session, as style sheets may packet sent from the server to the client. In the Pipelining case,
eliminate unneeded image transfers, shortening packet trains. the HTML text (sent in clear) did not contain enough
information to force a new batch of requests. In the Pipelining
and HTML compression case, the first packet contains
Since fewer TCP segments were significantly bigger and couId
almost always fill a complete Ethernet segment, server approximately 3 times as much HTh4L so the probability of
performance also increases when using pipelined requests, even having enough requests to immediately send a new batch is
though only the client changed behavior. higher.
’ The measurements were performed using max 6 (default) The exact results may depend on how the slow start algorithm is
simultaneous connections and H’IlWI.0 Keep-Alive headers. implemented on the particular platform. Some TCP stacks
As with Netscape Communicator, it uses the HTTP/1.0 Keep- implement slow start using one TCP segment whereas others
Alive mechanism to allow for multiple H’ITP messages to be implement it using two packets
transmitted on the same TCP connection. The total number of
connections used in the test case is 6. In our tests, the dient is aIways decompressing compressed data
162
on the fly, This test does not take into account the time it would This section explores how CSS, PNG and MNG may be used to
take to compress an HTML object on the fly and whether this compression content. We converted the images in our test page
will take longer than the time gained transmitting fewer packets. to PNG, animations to MNG, and where possible replaced
Further experiments are needed to see if compression of images with HTML and CSS.
dynamically generated content would save CPU time over
transferring the data. Static content can be compressed in 9.1 Replacing Images with HTML and CSS
advance and may not take additional resources on the server
While CSS give page designers and readers greater control of
8.2 Summary of Compression Performance page presentation, it has the added value of speeding up page
downloads. First of all, modularity in style sheets means that the
Transport compression helped in all environments and enabled same style sheet may apply to many documents, thus reducing
significant savings (about 16% of the packets and 12% of the the need to send redundant presentation information over the
elapsed time in our first time retrieval test); the decompression network.
time for the client is more than offset by the savings in data
transmission, Deflate compression is more efficient than the data Second, CSS can eliminate small images used to represent
compression algorithms used in modems (see section 8.3). Your symbols (such as bullets, arrows, spacers, etc.) that appear in
mileage will vary depending on precise details of your Internet fonts for the Unicode character set. Replacing images with CSS
connection. reduces the number of separate resources referenced, and
therefore reduces the protocol requests and possible name
For clients that do not load images, transport compression resolutions required to retrieve them.
should provide a major gain. Faster retrieval of HTML pages
will also help time to render significantly, for all environments. Third, CSS gives designers greater control over the layout of
page elements, which will eliminate the practice of using
8.3 Further Compression Experiments invisible images for layout purposes. Images may now be
images -be seen and not waited for.
We also performed a simple test confirming that zlib
compression is significantly better than the data compression The Microscape test page contains 40 static GIF images, many
found in current modems [24]. The compression used tbe zlib of which may be replaced by HTML+CSS equivalents. Figure 1
compression algorithm and the test is done on the HTML page shows one such image that requires 682 bytes.
of the Microscape test site. We performed the HTML retrieval (a
r---------------- --.---------- 1
single HTTP GET request) only with no embedded objects. The : P i-
Ixe--.zzl’ll”_
!*+J 4 <’ 512 12 _.^__ __- -_. .__... ._- ._.-._.- -. -2
test was run over standard 28.8Kbps modems.
Figure l- “solutions” GIF
Jigsaw Apache The image depicts a word (“solutions”) using a certain font (a
bold, oblique sans-serif, approximately 20 pixels high) and color
Pa Set Pa Set combination (white on a yellowish background) and surrounding
it with some space. Using HTML+CSS, the same content can be
HTML
Uncompressed 67 12.21 67 12.13
represented with the following phrase:
CompressedHTML 21.0 4.35 4.35 4.43
P-banner {
color: white;
Snvedusing 68.7% 64.4% 68.7% 64.5%
background: 8FCO;
compression font: bold oblique 2Opx sans-serif; css
padding: 0.2em 1Oem 0.2em lem;
The default compression provided by zlib gave results very 1
similar to requesting best possible compression to minimize
size, cP CLASS=bannerxsolutions
i HTML
Case of HTML tags can effect compression. Compression is The HTML and CSS version only takes up around 150 bytes.
significantly worse (35 rather than .27) if mixed case HTML When displayed in a browser that supports CSS. the output is
tags are used. The best compression was found if all HTML similar to the image. Differences may occur due to
tags were uniformly lower case, (since the compression unavailability of fonts, of anti-aliasing and the use of non-pixel
dictionary can reuse what are common English words). HTML units.
tool writers should beware of this result, and we recommend
HTML tags be uniformly lower case for best performance of Replacing this image with HTML and CSS has two implications
compressed documents. for performance. First, the number of bytes needed to represent
the content is reduced by a factor of more than 4, even before
any transport compression is applied. Second, there is no need to
9 Impact of Changing Web Content
fetch the external image, and since HTML and CSS can coexist
in the same file one HTTP request is saved.
In the preceding section, the compression experiments did not
take advantage of knowledge of the content that was transmitted.
Trying to replicate all 40 images on the Microscape test page
By examining the content (text and images) of documents, they
reveals that:
can be re-expressed in more compact and powerful formats
while retaining visual fidelity.
163
l 22 of the 40 images can be represented in 10 Implementation Experience
HTMLtCSS. Encoded in GIF, these images take up
14791 bytes, and the HTML+CSS replacement is Pipelining implementation details can make a very significant
approximately 3200 bytes, a savings factor of around difference on network traffic, and bear some careful thought,
4.6. This factor will increase further if compression is understanding, and testing. To take full advantage of pipelining.
applied to the HTMLtCSS code. applications need explicit interfaces to flush buffers and other
minor changes.
. Further, 3 images can be reduced to roughly half their
size by converting part of their content to The read buffering of an implementation, and the details of how
HTMLtCSS. Their current size is 7541 bytes, and the urgently data is read from the operating system, can be very
HTMLtCSS demo-replacement is 610 bytes. significant to get optimal performance over a single connection
using HTlP/l.l. If too much data accumulates in a socket buffer
. The elimination of 22 HTI’P requests would save TCP may delay ACKs by 2OOms.Opening multiple connections
approximately 4600 bytes transmitted and the in HTTP/l.0 resulted in more socket buffers in the operating
approximately 4300 bytes received, presuming the system, which as a result imposed lower requirements of speed
length of the requests (210 bytes) and responses (192 on the application, while keeping the network busy.
bytes for cache validation, and -240 bytes for an
actual successful GET request). This slightly We estimate two peopIe for two months impIemented the work
overstates the savings; many style sheets will be stored reported on here, starting from working HTTP/l.0
separate from the documents and cached implementations. We expect others leveraging from the
independently. experience reported here might accomplish the same result in
much less time, though of course we may be more expert than
. 14 of the 40 images, taking up 80601 bytes, cannot be many due to our involvement in HTTP/l.1 design.
represented in HTMLtCSSl. These are photographs,
non-textual graphics, or textual effects beyond CSS 10.1 Tools
(e.g. rotated text). However, these images can be
converted to PNG. Our principle data gathering tool is the widely available
rcpdump program [14]; on Windows we used Microsoft’s
It should be noted that the HTML+CSS sizes are estimates NetMon program. We also used Tim Shepard’s xplot program
based on a limited test set, but the results indicate that style [8] to graphically plot the dumps; this was very useful to find a
sheets may make a very significant impact on bandwidth (and number of problems in our implementation not visible in the raw
end user delays) of the web. At the time of writing, no CSS dumps. We looked at data in both directions of the TCP
browser can render all the replacements correctly. connections. In the detailed data summary, there are direct links
to all dumps in xplot formats. The fcpshow program [21] was
9.2 Converting images from GIF’to PNG and MNG very useful when we needed to see the contents of packets to
understand what was happening.
The 40 static GIF images on the test page totaled 103,299 bytes,
much larger than the size of the HTML file. Converting these 11 Future Work
images to PNG using a standard batch process (giftopnm,
pnmtopng) resulted in a total of 92,096 bytes, saving 11,203 We believe the CPU time savings of HTTP/l.1 is very
bytes. substantial due to the great reduction in TCP open and close and
savings in packet overhead, and could now be quantified for
The savings are modest because many of the images are very Apache (currently the most popular Web server on the Internet).
small. PNG does not perform as well on the very low bit depth HlTP/l.l will increase the importance of reducing parsing and
images in the sub-200 byte category because its checksums and data transport overhead of the very verbose HTTP protocol,
other information make the file a bit bigger even though the which, for many operations, has been swamped by the TCP open
actual image data is often smaller. and close overhead required by HTTP/1.0. Optimal server
implementations for HTTP/l.1 will likely be significantly
The two GIF animations totaled 24,988 bytes. Conversion to different than current servers.
MNGgave a total of 16,329 bytes, a saving of 8,659 bytes.
Connection management is worth further experimentation and
It is clear that this sample is too small to draw any conclusions modeling. Padmanabhan [1] gives some guidance on how long
on typical savings (-19% of the image bytes, or -10% of the connections should be kept open, but this work needs updating
total payload bandwidth, in this sample) due to PNG and MNG. to reflect current content and usage of the Web, which have
Note that the converted PNG and MNG files contain gamma changed significantly since completion of the work.
information, so that they display the same on all platforms; this
adds 16 bytes per image. GIF images do not contain this Persistent connections, pipelining, transport compression, as
information. well as the widespread adoption of style sheets (e.g. CSS) and
more compact image representations (e.g. PNG) will increase
A very few images in our data set accounted for much of the the relative overhead of the very verbose HTTP text based
total size, Over half of the data was contained in a single image protocol. These are most critical for high latency and low
and two animations, Care in selection of images is clearly very bandwidth environments such as cellular telephones and other
important to good design. wireless devices. A binary encoding or tokenized compression
164
of HTTP and/or a replacement for HTTP will become more before the congestion state of the network is known).
urgent given these changes in the infrastructure of the Web.
Due to pipelining H’lXY1.1 changes dramatically the “cost”’and
We have not investigated perceived time to render (our browser performance of HTTP, particularly for revalidating cached
has not yet been optimized to use HTI’P/l.l features), but with items. As a result, we expect that applications will significantly
the range request techniques outlined in this paper, we believe change their behavior. For example, caching proxies intended to
H’ITP/l,l can perform well over a single connection. PNG also enable disconnected operation may find it feasible to perform
provides time to render benefits relative to GIF. The best much more extensive cache validation than was feasible with
strategies to optimize time to render are clearly significantly HTTP/1.0. Researchers and product developers should be very
different from those used by HTI’P/l.l. careful when extrapolating from current Internet and HTTP
server log data future web or Internet traffic and should plan to
Serious analysis of trace data is required to quantify actual rework any simulations as these improvements to web
expected bandwidth gains from transport compression. At best, infrastructure deploy.
the results here can motivate such research.
Changes in web content enabled by deployment of style sheets;
Future work worth investigating includes other compression more compact image, graphics and animation representations
algorithms and the use of compression dictionaries optimized for will also significantly improve network and perceived
HTML and CSSl text. performance during the period that HTTP/1.1 is being deployed.
To our surprise, style sheets promise to be the biggest possibility
12 Conclusions of major network bandwidth improvements, whether deployed
with HTTP/l.0 or HTTP/1.1, by significantly reducing the need
For HTTP/l .l to outperform HTTP/l.0 in elapsed time, an for inlined images to provide graphic elements, and the resulting
implementation must implement pipelining. Properly buffered network traffic. Use of style sheets whenever possible will result
pipelined implementations will gain additional performance and in the greatest observed improvements in downloading new web
reduce network traffic further. pages, without sacrificing sophisticated graphics design.
The savings in terms of number of packets of HTfP/l.l are truly [4] Fielding, R., J. Get&, J.C. Mogul, H. Frystyk, T. Bemers-
dramatic. Bandwidth savings due to HTTP/l.1 and associated Lee, “RFC 2068 - Hypertext Transfer Protocol -- HTTP/l.I,”
techniques are more modest (between 2% and 40% depending UC Irvine, Digital Equipment Corporation, MIT.
on the techniques used). Therefore, the HTTP/l.1 work on
caching is as important as the improvements reported in this [5] Touch, J., J. Heidemann, K. Obraczka. “Analysis of HTTP
paper to save total bandwidth on the Internet. Network overloads Performance,” USC/Information Sciences Institute, June, 1996.
caused by information of topical interest also strongly argue for
good caching systems, A back of the envelope calculation shows [6] Spero, S., “Analysis of HTTP Performance Problems,”
that if all techniques described in this paper were applied, our http://www.w3.org/Protocols/HTI’P/l.O/HTIPPerformance.htmI
test page might be downloaded over a modem in approximately July 1994.
60% of the time of HTI’P/l.O browsers without significant
change to the visual appearance. The addition of transport [7] Heidemamr, J., “Performance Interactions Between P-HTTP
compression in HTI’P/l.l provided the largest bandwidth and TCP Implementation.” ACM Computer Communication
savings, followed by style sheets, and finally image format Review, 27 2,65-73, April 1997.
conversion, for our test page.
[8] Shepard, T., Source for this very useful program is available
We believe HTTP/l .I will significantly change the character of at ftp://mercury.lcs.mit.edufpub/shep. S.M. thesis “TCP Packet
traffic on the Internet (given HTI’P’s dominant fraction of Trace Analysis.” The thesis can be ordered from MIT/LCS
Internet traffic). It will result in significantly larger mean packet Publications. Ordering information can be obtained from +l 617
sizes, more packets per TCP connection, and drastically fewer 253 5851 or send mail to [email protected]. Ask for
packets contributing to congestion (by elimination of most MIT/LCS/TR-494.
packets due to TCP open and close, and packets transmitted
165
[9] Mogul, J, ‘“The Case for Persistent-Connection HTIY, [26] Mogul, Jeffery, Fred Doughs, Anja Feldmann, Balachander
Western Research Laboratory Research Report 95/4, Krishnamurthy, “Potential benefits of delta-encoding and data
http://www,research.digital.com/wrVpublication~abstrac~95.4. compression for HTTP,” Proceedings of ACM SIGCOMM ‘97,
html, Digital Equipment Corporation, May 1995. Cannes France, September 1997.
[lo] Lie, H., B. Bos, “Cascading Style Sheets, level 1,” W3C [27] Nielsen, Henrik Frystyk, “Libwww - the W3C Sample
Recommendation, World Wide Web Consortium, 17 Dee 1996. Code Library,,, World Wide Web Consortium, April 1997.
Source code is available at http://www.w3.org/Library.
[I I] Jacobson, Van, “Congestion Avoidance and Control.”
Proceedings of ACM SIGCOMM ‘88, page 314329. Stanford,
[28] Baird-Smith, Anselm, “Jigsaw: An object oriented server,,,
CA, August 1988. World Wide Web Consortium, February 1997. Source and other
information are available at http://www.w3.org/Jigsaw.
[12] Pastel, Jon B., ‘Transmission Control Protocol,” RFC 793,
Network Information Center, SRI International, September [29] The Apache Group, ‘The Apache Web Server Project.” The
1981. Apache Web server is the most common Web server on the
Internet at the time of this paper’s publication. Full source is
[13] Paxson, V., “Growth Trends in Wide-Area TCP available at http://www.apache.org.
Connections,,’ IEEE Network, Vol. 8 No. 4, pp. 8-17, July 1994.
[30] A Web page pointing to style sheet information in general
[I41 Jacobson, V., C. Leres, and S. McCanne, tcpdump, can be found at http:l/www.w3.orglStyle/.
available at ftp://ftp.ee.lbl,gov/tcpdump.tar.Z.
[31] Multiple-image Network Graphics Format (MNG), version
[IS] Scheifler, R.W., J. Gettys, “The X Window System,,, ACM 19970427. ftp://swrinde.nde.swri.edu/pub/mng/documents/draft-
Transactions on Graphics # 63, Special Issue on User Interface mng-19970427.html.
Software,
[2OJ Boutell, T,, T. Lane et. al. “PNG (Portable Network Our thanks to Dean Gaudet of the Apache group for his timely
Graphics) Specification,,, W3C Recommendation, October cooperation to optimize Apache’s HTI’P/I.l implementation.
1996, RFC 2083, Boutell.Com Inc., January 1997.
http://www.w3.org/pub/WWW/Graphics/PNG has extensive Ian Jacobs wrote a more approachable summary of this work;
PNG information. some of it was incorporated into our introduction.
[21] Ryan, M., tcpshow, LT. NetworX Ltd., 67 Menion Square, Our thanks to John Heidemann of IS1 for pointing out one
Dublin 2, Ireland, June 1996. Solaris TCP performance problem we had missed in our traces.
Our thanks to Jerry Chu of Sun Microsystems for his help
[22] Deutsch, P., “DEFLATE Compressed Data Format working around the TCP problems we uncovered.
Specification version 1.3,“’RFC 1951, Aladdin Enterprises, May
1996. Digital Equipment Corporation supported Jim Gettys’
participation.
[23] Deutsch, L. Peter, Jean-Loup Gailly, “ZLIB Compressed
Data Format Specification version 3.3,” RFC 1950, Aladdin The World Wide Web Consortium supported this work.
Enterprises, Info-ZIP, May 1996.
166