0% found this document useful (0 votes)

13 views16 pages

Web Page Design and Download Time: Jing Zhi

Uploaded by

ssuryak889

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views16 pages

Web Page Design and Download Time: Jing Zhi

Uploaded by

ssuryak889

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Web Page Design and Download Time

Jing Zhi
Keynote Systems

Statistical analysis of Web page download time measurements suggests that some relatively simple
formulae can be derived to project page download times based on Web page composition and TCP
connect time for a browser/server pair.

Introduction subset containing approximately 90% of available

measurements.
Many factors contribute to Web site performance, most
We find that these experimental results represent and
of which are at least partially outside the control of the
explain the basic mechanisms that regulate many of
site designer. Web page download times depend on
the performance characteristics observed while
page design, on Web server and client hardware and
working with the extensive set of measurements
software configurations, and on the performance
collected annually by the author’s company.
characteristics of the Internet route connecting a client
to the site [Neil2000]. Of these, only page design is
truly under the site designer’s control. Components of Web Page Download Time
However, if we assume a user with a high-speed Before we can describe how we measure Web
connection and a site whose servers are not download performance, we include some background
overloaded — as is typical of a business to business on the components that constitute Web page
(“B2B”) interaction — then only two significant factors download. These will give us useful handles on
remain: site design and Internet latency between the different types of performance that we must build into
client and the server [Sper1995] [Heid1997] our models.
[To uc1998]. In this paper we analyze measurement
data based on test pages to explore various Packets
relationships between Web page design and page
download time. Concentrating on information about On the Internet, all information is carried in packets.
the page and measures of Internet round trip time, we Network transfer times are not affected by the type of
develop several specialized formulae to predict typical content being transmitted in those packets, but they
page download times in a B2B environment. are strongly influenced by the number of packets, and
possibly even by their sizes. Also, the time required to
After some introductory discussion of Web download
set up a flow of packets is much larger than the
components and experimental setup, the first part of
amount of time between successive packets in a single
this paper identifies packet count, rather than page
connection.
size, as the crucial predictor of download time. We
indicate how to calculate packet count in the absence The basic performance principle is therefore to make
of packet sniffer software. fewer requests and transmit fewer packets. From this
principle, we can derive two basic design rules for well-
Next, we explore page download time as a function of
performing Web pages. First, reduce the overall size
page size, in a single-threaded environment. Here we
of the page, thereby reducing the number of bytes (and
build two different linear models to understand the bulk
packets) to be transferred over the Internet. Second,
of page download for simple test pages
limit the number of embedded objects on the page,
(“Experiment A”).
such as images, each of which must be requested and
Then, we investigate how multi-threading improves transferred separately from server to browser.
performance for more complex pages. It is particularly
challenging to build accurate models for multi-threaded Web Page Download Components
behavior since the distribution of download activity
among threads can be altered substantially by a single To study Web page download time systematically, we
Web page element that loads unusually slowly. A consider its components, illustrated in Figure 1:
more complicated model is built and discussed for • DNS Lookup (DNS)
pages with one to 64 embedded images
• TCP Connection (TCP)
(“Experiment B”).
• Redirection
Since Internet performance varies erratically over time • First Packet Download (FPD)
even for a fixed client-server pair, we discuss tradeoffs • Base Page Download (BPD)
that can be made in modeling performance based on
• Content Download (CD)
all available measurements vs. a more well-behaved
1
COMPONENT CLIENT SERVER
DNS Lookup
TCP Connection Send SYN
Send SYN+ACK
Receive SYN+ACK
Send ACK
Redirection Send HTTP request
Send redirection response
First Packet Send HTTP request
Download
Perform server processing to locate or construct response
Send HTTP response & first (or first two) HTML packets
Receive 1st packet
Base Page Send remaining HTML packets …
Download
Receive last HTML packet
Content Send HTTP GET requests
Download
TCP Connection, Redirection, First Packet Download, and Download phases for each
Receive requested content
Figure 1. Web page response time components

The first five components together determine how long wild variability of the Internet makes it difficult to build
it takes to load the HTML for a base page. such models.
The First Packet Download is the time between the For example, consider that one variable our model
completion of the TCP connection with the destination uses to predict total page download time is the TCP
server and the reception of the first HTML packet for connect time measured for the base page only. In
the base page. It includes the client HTTP request and reality, the many different TCP connections obtained
server HTTP response time. In addition to reflecting during the content download process might each
Internet latency, this measurement can also be an perform quite differently. When such differences occur
indicator of server performance and could differentiate in practice, a single TCP time measured for the base
between HTTP Performance and TCP stack speed. page cannot tell the whole picture about the behavior
The last component, called Content Download, is a of the TCP connections for multiple page elements.
little different from the first five. It lumps into one Despite such difficulties, we found a good fit to
number the cumulative time for downloading all the modeled behavior.
embedded objects (such as images) on the page. This
process may actually include all five of the previous Measurement Data
component types, for each of the embedded Web
objects. For example, one of our test pages includes Each of the components described in the previous
64 images, so the Content Download component for section was captured for downloads of the test pages
that page includes the total time required for DNS described in the following sections. Downloads were
Lookup, TCP Connection, First Packet Download, and taken from test computers (called “agents”) at three
download of additional packets for all 64 images. different city/backbone combinations:
Although this description implies a huge overhead for • New York City / AT&T
each embedded element, in reality, many components • Houston / Qwest
of Content Download are either eliminated (by reusing • San Francisco / Sprint
cached DNS values and recycling existing TCP Measurements were taken at five-minute intervals, so
connections) or processed in parallel (by each page should have 288 measurements from each
multithreading). agent per day. In general, we have found that Internet
performance differs remarkably not just over time, but
Modeling Challenges also depending on the location and backbone from
which content is requested. For this reason, we chose
This study investigates subsets of these components
a variety of testing locations and used many repetitions
together with page design factors as predictors of total
of measurements to confirm patterns.
download time (including base page and content.) The

2
Agents have good Internet connectivity (a dedicated avoids multithreaded browser behavior. In these
T1/T3). Except as described later, the agents run a pages, we vary the size both of the base page and of
proprietary Keynote browser designed to mimic the the image, as shown in the table in Figure 2.
experience of business users.
Because this study focuses on the variability due to the Pages Used for Experiment B
Internet rather than that due to the server, the test Next, we study multithreading as a mechanism to
pages and all embedded objects (image files) were speed up downloads of complicated pages. We start
hosted on the same server (www6.keynote.com). with one of the pages used before (B.1 = A.7), and
This design decision has the side effect of eliminating vary the number of pieces into which we chop up the
the redirection time component from our study. To same image.
include redirection time in our results would require As we will show, the benefits of multithreading are
adding another variable to those we consider for each eventually outweighed as the number of images — and
measurement. Since in practice the presence of a thus the likelihood of complications — increases.
redirection component is an exception, rather than the
norm, it is more practical to construct general models The pages used in this experiment are described in the
that do not include it, adding a separate estimate for table in Figure 3. Each page uses the same total
redirection time only when it is applicable. number of bytes (85,049), about 1K of which are in the
base page (“HTML Size”). The remaining bytes are
Pages Used for Experiment A divided equally among multiple images in every page
other than B.1, which contains a single image.
The first models focus on the effects of page size. We
used pages with at most one embedded object. This

Page HTML # of Total Total Web Page URL

Index Size Images Content Bytes
A.1 62,300 0 0 62,300 www6.keynote.com/test/page/11.html
A.2 39,077 1 23,223 62,300 www6.keynote.com/test/page/12.html
A.3 20,893 1 41,407 62,300 www6.keynote.com/test/page/13.html
A.4 183 1 62,117 62,300 www6.keynote.com/test/page/14.html
A.5 183 1 18,118 18,301 www6.keynote.com/test/page/21.html
A.6 183 1 41,407 41,590 www6.keynote.com/test/page/22.html
A.7 1,136 1 83,913 85,049 www6.keynote.com/test/page/24.html
A.8 183 1 101,388 101,571 www6.keynote.com/test/page/25.html
A.9 183 1 120,665 120,848 www6.keynote.com/test/page/26.html
Figure 2: Test page information (Experiment A)

Page HTML # of Total Total Web Page URL

Index Size Images Content Bytes
B.1 1,136 1 83,913 85,049 www6.keynote.com/test/page/24.html
B.2 1,441 2 83,608 85,049 www6.keynote.com/test/page/32.html
B.4 1,293 4 83,756 85,049 www6.keynote.com/test/page/33.html
B.8 938 8 84,111 85,049 www6.keynote.com/test/page/34.html
B.16 1,049 16 84,000 85,049 www6.keynote.com/test/page/35.html
B.32 1,753 32 83,296 85,049 www6.keynote.com/test/page/36.html
B.64 2,233 64 82,816 85,049 www6.keynote.com/test/page/37.html
Figure 3: Test page information (Experiment B)

3
First Finding: Packet Count Is Much More B.64.1 was constructed with larger images (2262
Important Than Content Size bytes) that also fit into two packets. The two tests are
summarized in Figure 4.
Before we begin to build the quantitative relationships The histograms in Figure 5 show the measurement
found by experiments A and B, we cite a more general results for B.64 and B.64.1. These results show that,
finding. For a given server-agent pair, the size (in despite a 75% increase in content between B.64 and
bytes) of a page is a much poorer predictor of B.64.1, there is no substantial change in download
download time than the number of packets over which time. Thus, even though downloading B.64.1 requires
the page is transmitted. Because of this finding, we many bigger packets than downloading B.64, the fact
replace page size with packet count as an independent that they can be downloaded in the same number of
variable in our models. packets accounts for their similar download behavior.
Here, we summarize the finding and introduce the
method to calculate the number of packets required for Calculating Packet Count
content of a given size.
For the sake of reference, we include a brief summary
of how to calculate the packet count for given content.
Page # of Image # of Total In our experiments, sniffers validated all packet counts.
HTML
Index Images Size Packets Content
The network packet size depends on the Maximum
B.64 2,233 64 1294 64×2 82,816 Transmission Unit (MTU), which is the greatest amount
B.64.1 2,233 64 2262 64×2 144,768 of data or packet size that can be transferred in one
physical frame on the network.
Figure 4: Comparing packet count and page size
Packet size is configurable for both servers and clients.
When applying a sniffer to the download of page B.64, On the network at speeds above 128Kbps, all
we found that each image (of size 1294 bytes) was just connections are made with an MTU of 1500 bytes as
large enough that together with its overhead, it did not default.
fit into a single packet. As a result, 128 packets were
required to send the 64 images. Another test page

Figure 5: Content download times for sites B.64.1 and B.64

4
The Maximum Segment Size (MSS), the largest Model 1: Individual Models Per Agent
“chunk” size of data that TCP will send to the other To build the most basic model, we used test pages A1-
end, would be 1460 bytes, subtracting 20-byte TCP A9. Note that two kinds of payload are considered:
header and 20-byte IP header from the MTU. Usually pages A1-A3 contain significant amounts of HTML,
when a TCP connection is established, each end can while pages A2-A9 include significant image content.
announce its MSS. Agents for this study were homed
at dedicated T1/T3 connections and used the usual Measurements for this experiment were taken over one
MTU of 1500 bytes. Thus, they had TCP packet sizes weekend day to minimize fluctuation in Internet
of 1460 bytes. latency, and to make the relationships as clear as
possible.
The number of packets for an HTTP request may be
calculated by the formula, Model 1: Measurement Results
# Packet = [Payload MSS ] (1) To establish overall relationships, we based our model
on the simple averages of the approximately 288
where [ ] stands for rounding a number up to the measurements per agent per target. Average
nearest integer. performance per element is shown in Figure 6.
The Payload here is defined as the total number of Consider the numbers of packets shown in Figure 6.
data bytes required to transmit a Web page from the These were determined using a packet sniffer. Since
server through the Internet at the HTTP level, exclusive our Base Page Download component (the PTT in the
of TCP/IP header information. The payload also case of HTML) does not include the first packet, the
includes the HTTP response header, which is about numbers shown for HTML are one less than the
290 bytes for HTML pages. number of packets for the whole HTML page.
To obtain Payload Transmission Time (PTT) from the
Experiment A: The Effects of Total total content download time (CD), we subtract TCP and
Download Bytes and Packets FPD of base HTML file from the total CD.
In Figure 7, we plot the Payload Transmission Time
Introduction against the Number of Payload Packets for the three
Keynote agents. On visual inspection, the
In this experiment, we investigate the dependence of relationships look very close to linear within each
download time on page size, as measured in packets. agent. This linear relationship is analyzed further in
We will take two approaches to this problem. First, in the discussion of Model 1.
Model 1, we perform separate linear regressions for
each of three agents, examining results for their Model 1: Statistical Model
similarities and differences. We consider a basic linear least square regression
Given this experience, we construct a more general model for each agent individually, where the Number of
Model 2, which uses the generic component of TCP Packets (P) is the predictor (independent variable) and
connect time in a way that generalizes our analysis to the Payload Transmission Time (PTT) is the response
Web clients located anywhere on the Internet. In each (dependent variable). We capture the different Internet
case, we mention novel results and provide additional connectivity for each agent into that agent’s regression
validation in support of methodology and conclusions. coefficients, so that we avoid additional dependent
variables related to Internet performance. Later
As a first finding, we observe that base page HTML
models will take a more generalized approach.
and images download very similarly. Thus, we will
refer to either as payload, and will investigate payload To reduce the variance of the Payload Transmission
download as a function of packets. Time, we use the arithmetic mean of PTT for each
page as the response variable. According to the Law
For the download of either a base page or of an
embedded object, two main phases may be of Large Numbers, this averaging also allows us to
treat the residual values as normally distributed. This
distinguished. First, a connection is established
assumption is quite reasonable even though individual
(possibly after DNS lookup and redirection),
culminating in transmission of a first “batch” of one or Internet measurements are notoriously not normal.
two packets. Next, a steady stream of packets begins The model for each agent can be stated as follows:
flowing. The boundary we place between these PTT = α + β ⋅ P + ε (2)
phases derives partially from the fact that we measure
at the client. where ε is a random error term with mean 0 and
Whether we measure base page or embedded objects, variance σ ; and all the error terms are uncorrelated.
for this experiment we seek to model the time taken for The fitted coefficients for each agent are listed in
the second phase. We call this the Payload Figure 8.
Transmission Time (PTT.)

5
PTT in Seconds
Page Content Total Observed San
Index Type Bytes #Packets Houston New York Francisco
Qwest AT&T
Sprint

A.2 Image 23223 0.214 0.395 0.086

16
A.3 Image 41407 0.290 0.591 0.121
28
A.4 Image 62117 0.412 0.824 0.152
42
A.5 Image 18118 0.164 0.317 0.057
12
A.6 Image 41407 0.295 0.611 0.111
28
A.7 Image 83913 0.530 1.107 0.214
57
A.8 Image 101388 0.633 1.296 0.226
69
A.9 Image 120665 0.774 1.600 0.296
82
A.1 HTML 62300 0.384 0.796 0.144
42
A.2 HTML 39077 0.242 0.508 0.099
26
A.3 HTML 20893 0.142 0.307 0.058
14
Figure 6: Payload transmission time for individual elements

Payload Transmission Time vs # of Packets

1.8

1.6 Houston Qwest

1.4 New York AT&T
PTT in Seconds

1.2 San Francisco Sprint

0.8

0.6

0.4

0.2

0
0 10 20 30 40 50 60 70 80 90

Figure 7: Payload transmission time vs. number of packets

Houston Qwest New York AT&T SF Sprint

α̂ 0.0431 0.0748 0.0204

β̂ 0.0087 0.0181 0.0032

2
R 0.9909 0.9952 0.9815
TCP 0.0415 0.0936 0.0079
Figure 8: Fitted regression model coefficients and additional information

6
2
In Figure 8, R , we also show the coefficient of Because we have already factored out the TCP
determination, is an indicator of how well the data fits connect times and any server delays contained within
the least square model. It can be interpreted as the the “First Packet Download” component, a positive
proportion of variability of the dependent variable that intercept α, suggests another overhead factor is
is explained by the independent predictor variable. present.
2
The closer the R is to 1, the better the linear A likely explanation for this is the TCP slow start
association between the two variables is. We observe process that applies at the beginning of packet flow.
2
immediately that all the R values are very close to 1, As described, for example, in [Stev1997], the first
indicating very strong linear relationships (as would be batch of packets in a TCP transmission may contain
expected based on a visual inspection of Figure 7). only one or two packets, after which the number of
Figure 8 also lists an average TCP round trip time, packets sent in each batch increases.
which is seen to correlate with both sets of fitted Alternately, network congestion may require that the
coefficients. packet transmission rate be adjusted downward.
Model 1: Discussion
In summary, it seems that our linear model’s β̂ fits the
Observe how the coefficients β̂ are multiples of TCP steady state of flow that is achieved after slow start,
time. This fits well with theories that try to explain while α captures the overhead involved in ramping up
download times as multiples of trips across the to this steady-state packet flow. In Figure 9, the dotted
Internet. However, notice also that the approximate lines illustrate our model, while the curved lines
relationship illustrate the underlying behavior.

β̂ ≈ 0.2 * TCP It is interesting to see that just like the β̂ ’s, the α̂ ’s
of the other two agents fails to hold for the SF Sprint also appear related to the TCP round trip time. As with
2
agent. Comparing the values of R , we see that the β̂ , the α̂ is a larger multiple TCP in the case of SF
model fits slightly less well for the SF Sprint than for Sprint, where TCP is less. This suggests that in this
the other two agents. A likely explanation is that, as
case, α̂ is dominated by outlier “noise” that drives
measured by TCP round trip time, San Francisco averages up.
Sprint is much closer than the other two agents to the
target Web site server. When page download times The slope β̂ is interpreted as the increase in page
are short, small variations in network conditions tend to download time in seconds per additional packet (P). In
have a relatively larger impact on overall page practice, both server and network conditions will affect
download times. the latency per packet transmitted. In this experiment,
One added complication is that in the model above, the because all three agents measured the same Web
“intercept” terms α̂ of the regression models are very server, we can reasonably expect the differences in the
much nonzero. These are certainly statistically β̂ values to be related mainly to the differences in the
significant according to the t-test.
round trip times between the agents and the target
So what accounts for these terms? Theoretically, PTT Web site servers.
should be zero when P goes to zero.

PTT β1
β2

α1
α2

Number of Packets (P)

Figure 9: Relationship of the linear model and true behavior

7
A C

B D

Figure 10: Diagnostics for Residuals

Additional Validation of Model 1 The advantages of these generalizations are clear: not
only is it easier to apply this model to measurements
Under the least square model assumption, the
from other agents of other targets, but the model
residuals should look approximately like a sample of
explains total time in terms of the easily
independent random normal noise with common
understandable notion of a TCP round trip. Also, the
variance, and with no pattern relative to the predictor or
model relates much more directly to user experience
fitted PTT.
by studying individual times instead of averages.
For the statistician, in Figure 10 we plot the diagnostic
However, this also raises significant challenges. Most
graphs of the fitted model residuals for the New York
notably, Internet statistics are tremendously variable,
AT&T agent. We can see the linear regression model
and the usual statistics (mean and especially standard
is basically valid. However, there is a slight curvature
deviation) are dominated by outliers. For this reason,
in the plots of residuals vs. fitted values (Figure 10: A &
we use least trimmed squares regression (LTS)
C). The curvature doesn’t decrease much even if we
[Burn1992], a highly robust method for fitting a linear
try higher ordered regression models. While this
regression model. Parameters fitted by LTS minimize
demonstrates the usual pattern in Internet statistics of
the sum of certain fractions of smallest squared
outliers behaving qualitatively differently than the
residuals.
majority of measurements, further investigation may
uncover another cause. The LTS approach has the effect of explaining the
“well-behaved” majority of measurements, while not
Model 2: A Single Model For Multiple Agents even considering the worst outliers. While it is
Using TCP Time unpleasant to have to dismiss outliers, in practice, one
finds that models based on outliers are both unstable
From the previous model, we know of a reasonable and usually a poor fit. In addition, linear models built
predictor of PTT after α̂ s and β̂ s are fitted. Both of using outliers tend to be so contorted as to obscure
meaningful explanations of behavior.
the parameters are related to the network round trip
times. So in this section we generalize the previous Another difficulty arises from the fact that only one TCP
model in two ways. First, we use the same coefficients time is measured per page, while different TCP round
for all agents, basing these on the common trips may behave very differently even over such a
independent variable of TCP time. Second, we now fit small time span as a single page download. This is
to individual points, rather than to averages over part of the inaccuracy that must be accepted when
pages. building a model on so few variables.

8
Figure 11: Histograms of TCP time by agent

Figure 12: PTT as a function of TCP time, for different packet counts

Model 2: Measurement Results measurements from different agents in each sub-

The histogram in Figure 11 shows TCP Connection graph. Some outliers fall outside of the displayed
Time for each agent. As with most Internet component window. While it is hard to see here, the TCP time for
measurements, each histogram is far from normal, each agent-target pair has a roughly lognormal
having a large right skew. In fact, a more detailed distribution.
graph would show the histograms to be closer to Model 2: Statistical Model
lognormal, meaning that the logarithms of TCP times
We build the linear model (lines in Figure 12) based on
would appear close to normal. Again, to get around
the HTML of test pages A1 – A3, as determined by the
this difficulty, we use least trimmed squares
Base Page Download component. Independent
regression.
variables are TCP time, the number of packets, and
We find that for a fixed number of packets, there is a the interaction (i.e. product) of these two terms.
very clear near-linear relationship between the PTT As before, the number of packets used here is the
and TCP time. This is illustrated in Figure 12. The number of packets to transmit the base page minus
lines shown are the robust least trimmed square (LTS) one, to exclude the packet captured in the component
regression fitting lines. Observe that Figure 12 mixes First Packet Download.

9
The fitted linear model was determined as: illustrated in Figure 9. Also, we emphasize that our
“robust” fit excludes all those troublesome outliers,
PTTˆ = 0. 4947 ⋅ TCP + 0.0021 ⋅ P + 0. 1722 ⋅ TCP ⋅ P (3)
which while very different, are certainly not easy to
The Robust Multiple R-Squared for this is 0.9962 and ignore! In essence, we have modeled how things are
all the coefficients are statistically significant. About “supposed” to work.
90% observations determine the LTS estimates.
Further Validation of Model 2
Corresponding to Model 1’s term with coefficient β̂ , In order to see how well the PTT could be predicted by
here we have the term including TCP⋅ P. Just as this more general model, we took a further sample of
Model 1 had different values of α̂ and β̂ for each measurements. Measurement agents were selected
from both domestic and international locations, so that
agent, here we must include more than one additional the measured TCP Connection Time components
term: both the TCP and P terms together capture the would vary a lot from agent to agent.
effect of slow start, as well as the different balance
between linear and constant terms seen for faster This sample also included measurements of a page
connections. that was not one of our original test pages, and that
was served by another company’s server at a different
Model 2: Discussion Internet location.
How can we interpret the coefficients in equation (3)? In this case, we used Base Page Download Time as a
The coefficient for TCP⋅ P is very straightforward. It proxy for PTT. In Figure 13, we show sample data,
models how efficiently the chunk of payload packets predicted PTT, and the residuals (or observed minus
flows from server to client (agent). The P term could predicted). The predicted times are calculated from
be related to delay at the server side in waiting for ACK the number of packets (determined by TCP packet
packets before continuation of packet delivery. sniffer or formula (1)) and measured TCP time.
The TCP term is somewhat consistent with how the In Figures 13 and 14 (which includes more data points
intercept term α in Model 1 increases with the network than the Figure 13), the goodness of fit may be
round trip time (as varying from agent to agent.) assessed informally by comparing the observed and
We should point out that this model may break down predicted PTTs, which appear to agree quite well.
when the total payload is very small. This corresponds
to the difference between Model 1 and reality

# of Packets 26 42 26 42 26 42 19 42 42 19 26 42
TCP 0.06 0.07 0.11 0.17 0.17 0.12 0.31 0.16 0.23 0.67 0.69 0.54
Observed PTT 0.33 0.58 0.6 1.01 1.02 1.06 1.15 1.41 1.72 2.46 3.58 4.45
Predicted PTT 0.35 0.63 0.60 1.40 0.90 1.02 1.21 1.32 1.87 2.56 3.49 4.26
Residual -0.02 -0.05 0.00 -0.39 0.12 0.04 -0.06 0.09 -0.15 -0.10 0.09 0.19
Figure 13: Correspondence between modeled and observed times for typical sample data.

Observed and Predicted PTT

5
Observation
4 Prediction

0
0 1 2 3 4 5 6
Predicted PTT in Seconds

Figure 14: Graph of modeled and observed sample times

10
Experiment B: Number of Embedded (“threads”). For example, Internet Explorer uses 2
Images concurrent threads per server, and Netscape uses up
to 6 (usually 4)[Wang1998]. The agents used in this
testing use 4 concurrent threads. When the client
Introduction (browser or measurement agent) can send multiple
Multithreading adds another layer of complexity into requests in parallel, and the server is able to accept
performance modeling. First, we show how delivery of those parallel connections, the total content download
the same image content varies depending on how time is usually reduced. The actual reduction depends
many pieces it is sliced into, both in terms of on the number of concurrent connections (“threads”)
performance and in terms of reliability. Next, we used, the bandwidth of the browser’s connection to the
characterize the performance gain over single- Internet, and the mix of element sizes being
threaded browsing as a quantitative gain in efficiency. downloaded. While browsers connected via a dial-up
We put all of this knowledge together in a model for connection will usually gain nothing from parallel
multithreaded performance, which we follow with a threads (because the 56Kbs connection is a
discussion of tradeoffs between robust and regular bottleneck), browsers connected via high-speed
least-squares modeling. connections will benefit.
This section presents the most far-reaching results of
Measurements and Results
the study and suggests directions for future work.
We fix our testing page total download sizes to 85,049
Multithreading bytes in our experiment and change the number of
embedded images. As test pages, we use the seven
Most Web pages have more than one embedded pages named B1 to B64, which have from 1 to 64
image or program element. When downloading a Web
embedded images. The measured Content Download
page containing multiple embedded elements, Time includes the time required (usually short) to
browsers establish a separate TCP connection for decode the file after the HTML base page has been
each element, unless both client and server support
received, and the time to request and receive all
persistent (“keep-alive”) connections, and the objects referenced images.
are located on the same server as the HTML index file
or “base page” [HTTP1.0][HTTP1.1]. In our Figure 15 shows the results. Note how performance
experiment, we assume the worst-case scenario, in improves with multithreading as one moves from one
which each embedded object requires the browser to to four images, and then how performance worsens
open a new TCP connection. with more images as both the data transmission
overhead and the probability of encountering problems
When downloading Web page content elements, increase.
browsers can use multiple concurrent connections

Figure 15: Content download time vs. number of images

11
Figure 16: Content download time vs. linear scale number of images

To show quantitative relationships more clearly, Figure this should be a concern. Other types of errors shown
16 illustrates the same data as Figure 15, except with occurred during base page download, and bear no
the x-axis linear in the number of images, rather than apparent relationship to the number of images.
on an exponential scale as in Figure 15. Number of
The page with 4 images has the smallest mean Images 1 2 4 8 16 32 64
download time according to the Houston Qwest or New Connection
York AT&T measurements. The total time to download Refused 0 0 0 0 1 0 0
the 64-image page increases dramatically. The
Connection
download times almost double compared to the site Timed Out 2 2 2 5 3 4 2
with 32 images. In the case of New York AT&T, 1.55
additional seconds are required for such overhead as Content Error 0 4 8 13 39 67 114
additional TCP/IP connections and slow starts. Unknown Error 0 0 3 0 0 0 0
Even if the client and server use persistent
Total Errors 2 6 13 18 43 71 116
connections, the overhead of HTTP headers is an
inevitable factor increasing download times. The Figure 17: One-week table of measurement errors
HTTP header for each TCP/IP connection request is
about 290 bytes, so the site with 64 images will have In our general experience, the occurrence of errors is
290 · 64 = 18,560 additional bytes. More importantly, strongly associated with the occurrence of outliers,
as explained in the “Finding 1” section above, this which in turn associate with generally poor
overhead necessitates additional packets to carry the performance. Thus, before we even begin the
images – a total of 128 rather than closer to 64 packets quantitative analysis, we can draw important
for the other pages. conclusions for page design.

Errors When attempting to optimize performance of a page

with multiple elements, be sure to include enough
We also tracked errors over a continuous week of
elements so as to keep all browser threads
measurements; the results are shown in Figure 17.
occupied. Ideally, no single element should be so
We observe that content errors increase roughly
much larger than others that it forces its thread to
linearly with the number of images. These errors occur
take longer than other threads. However, when the
when the base page loads correctly, but one or more of
number of elements per server far exceeds the
the content elements fail. This could be due to TCP
number of concurrent threads supported by the
connection failure, for example.
browser, additional elements increase overhead,
As page complexity increases (for example with a lager and slow down the overall download time.
number of images), a connection time out error
becomes more likely. Although the association of time
outs with image count is not so clear here, in general,

12
# of Estimated Content Predicted Content Download Time (No Concurrent Connection)
Images Packets, based on size
Houston Qwest New York AT&T San Francisco Sprint
1 57 0.64 1.29 0.22
2 56 0.73 1.46 0.24
4 56 0.93 1.89 0.30
8 56 1.25 2.57 0.34
16 48 2.06 3.96 0.45
32 32 3.54 6.78 0.71
64 64 7.01 13.55 1.60
Figure 18: Predicted Content Download Time With No Concurrent Connection.

Concurrent Connection Efficiency might possibly be explained is that the connection

between the agent and the server might have been so
The content download time with no concurrent good that the server itself became a larger part of the
connection can be predicted by the formula: bottleneck. In this case, multithreading at the client
n· (TCP +FPD) + PTT (4) does not speed up the server’s delivery of packets as
where the PTT is predicted from the adjusted total much.
content download packets and n is the number of
embedded images. The predicted values for the three Models for Multithreaded Performance
agents are illustrated in Figure 18:
As more than one embedded image is added to the
Obviously, the predicted content download times with base page, the Internet download time becomes much
no concurrent connection are much longer than actual more complicated. This is illustrated in detail in Figure
measurements. But how much benefit do the pages 20, which is a graph of a download of test page B.32.
gain from client multithreading? The components of download time are coded using
We define the Concurrent Connection Efficiency as different shades of gray. For each element, most time
the ratio of Content Download Time with no Concurrent is spent either forming TCP connections or waiting for
Connections (predicted) over Content Download Time the first packet.
with Concurrent Connections (measured). The perfect In this case, the browser uses 4 parallel threads. New
efficiency would be lesser of n and the number of threads are spawned for individual images only after
concurrent connections (4 in these tests). the complete base page has been received and
The images in the pages with 16, 32 and 64 images parsed. In addition, individual connections are
are equal sized. According to the data in the figure 19, established for each object. Many of these behaviors
we could see that these three pages benefit more from resemble Netscape more closely than Microsoft’s
the concurrent connection than other pages. New York Internet Explorer. A subsequent paper will build similar
AT&T, with a longer Internet round trip time, achieves performance models more closely tailored to IE.
more Concurrent Connection Efficiency than the other
two. San Francisco Sprint, with smallest latency,
benefits least from the multithreading. One way this

# of Content Download Time (4 Thread Connection) Concurrent Connection Efficiency

Images Houston Qwest New York AT&T SF Sprint Houston Qwest New York AT&T SF Sprint
1 0.63 1.29 0.24 1.01 1.00 0.95
2 0.50 0.87 0.20 1.45 1.68 1.24
4 0.40 0.70 0.22 2.30 2.71 1.36
8 0.54 0.91 0.25 2.30 2.83 1.35
16 0.75 1.32 0.33 2.74 3.01 1.38
32 1.14 1.84 0.48 3.12 3.68 1.50
64 2.27 3.40 0.82 3.08 3.98 1.95
Figure 19: Content Download Time Observations and Concurrent Connection Efficiency

13
Figure 20: Detailed download performance for test page B.32 (i.e. with 32 images)

Notice in Figure 20 that despite the identical size and 4 images, the number of threads used equals the
packet counts of the images, groups of four images number of images.
processed simultaneously tend to drift apart. As a To build this model, we make two small
result, it is quite common for some threads to process simplifications. First, we treat the variable Images
more images than others. This is only one of the per Thread (abbreviated "O" for Objects) as if each
difficulties in modeling multithreaded content thread handled the same number of images. In
download. reality, a single slow image may delay its thread so
We may model the total content download time (CD) much that other threads may end up having to handle
in terms of a single thread with the maximal more than their "fair quota" of images. Second, we
workload, basing our model on the table in Figure 21. are modeling the maximum time of a known number
The abbreviated variable names P/O, P, and O of threads in a linear model, even though taking a
shown in this table will be used later in our derived maximum (even of several linear variables) is not a
model formulae. Note that when there are fewer than linear operation.

Maximum # of Packets Maximum # of Packets

# of # of Images per Thread
Page Index per Image per Thread # of Threads in Use
Images (O)
(P/O) (P)
B.1 1 58 58 1 1
B.2 2 30 30 1 2
B.4 4 17 17 1 4
B.8 8 9 18 2 4
B.16 16 4 16 4 4
B.32 32 2 16 8 4
B.64 64 2 32 16 4
Figure 21: Test page characteristics for Experiment B

14
In fact, the fitting of the model corrects for these Finally, we have added a separate O term in the way
simplifications implicitly. First, for example, for page that interaction terms in Model 2 also used single-
B.8, we might find that the average number of images variable terms.
handled by the slowest thread was 2.1, rather than the By looking at Figure 15, the Page B.4 with a relatively
2 in our model. If so, the ratio 2.1 / 2 would be small number of Packets per Thread (P) and the least
included in the fitted linear coefficient for Images per number of images per thread (O) needs shortest
Thread. Second, in this same example, the expected response time to download the content. This agrees
time for the slowest of the four threads for downloads with formula (6).
of B.8 would be strictly greater than the expected time
As in Model 2, we model the content download time by
of individual threads handling 2 images. Again, this
using LTS robust regression to fit the coefficients of
overhead would be included in coefficients fitted to the
terms in (6) based on the individual observations of the
model.
B.1-B.64 Web pages on one weekend measured by
So we emphasize that while the model uses predictor agents at the usual three locations. LTS results are
variables that are based on simplifying assumptions, shown in equation (7).
the fitting of the model uses real measurements, and
so counterbalances the assumptions. Thus, as we see Discussion
below, the predicted total download time behaves quite
accurately. On the other hand, it is not appropriate to R-Squared is close to 1 after 10% of the observations
interpret the fitted model coefficients as modeling the are thrown away. As in the earlier discussion, this
expected download time per image for an average means that the majority (~90%) of measurements
thread, for example. behave in an easily understandable way.
The intercept in the model could be the time the client
The Generalized Model parses the HTML after downloading the whole HTML
text file. Alternately, it could just fill in for some of the
Content Download Time (CD) can be modeled as:
random fluctuation that is not explained by other
CD ∝ TCP ⋅ O + FPD ⋅ O + f ( P / O) ⋅ O (5) coefficients.
This notation means that we will fit constants to The coefficient of TCP· P is pretty close to the one in
multiply by each of the three terms listed on the right Model 2. Theoretically, the true coefficients of TCP· O
side of the formula to model Content Download Time. and FPD· O should be 1, since these cover the same
The first two terms are related to the overhead of TCP tasks that need to be completed either when sending
connection and server processing time to download an image or the base page. However, the fitted
each image. As before, we do not obtain TCP and coefficients are slightly less than 1. This suggests
FPD times for each image separately, but rather use some obscuring of individual time contributions such as
those times measured for the base page of a by multithreading.
download. As is seen clearly in Figure 20 above, both In this linear model, all of the TCP slow start would be
of these components vary across elements, but this counted in term O, which certainly has a nonzero
way, we are able to build a simpler, though somewhat coefficient.
less precise, model. Thus, we find that all of the results seem have sensible
The last term is a content data transmission time, such interpretations. Note that other regressions were
as was the main result of Model 2. There we pointed attempted using additional variables (such as the
out the use for possible additional terms, such as number of threads in use), but that these models did
intercept and non-interaction terms. However as a not come out as significantly more accurate than the
start, we simplify f(P/O) to a linear function, allowing us results already discussed.
to rewrite the previous model as Later work will concentrate on more general page
CD ∝ TCP ⋅ O + FPD ⋅ O + TCP ⋅ P + O (6) designs. For real Web pages, a difficulty in applying
this model is that each embedded page element can
Here we have also replaced the term (P / O) ⋅ O by P, have a different size, so that it is not clear which subset
indicating the amount of packets traveling within a of elements will be handled by the slowest thread.
thread. Also, we have incorporated the multiplier TCP Nevertheless, we anticipate that substantial progress
into this term, to account for the Internet round-trips can be made even in the multithreaded cases.
needed to schedule additional packet flow.

CD = 0.1273 + 0.8684 ⋅ TCP ⋅ O + 0.9182 ⋅ FPD ⋅ O + 0.1939 ⋅ TCP ⋅ P + 0.0135 ⋅ O (7)

Scale estimate of residuals: 0.07151 Total number of observations: 6044
Robust Multiple R-Squared: 0.9782 Observations used in LTS regression: 5439.

15
Results of Non-Robust Regression Acknowledgement
The results discussed in the previous section were only I am indebted to Chris Overton for his expert guidance
the most clear of several attempts. Originally, a larger in the experiment design and in the research work of
combination of predictor terms was used, and less the field, and for his assistance with the creation of this
meaningful (and less significant) terms were omitted to paper. I would also like to thank Shawn White for his
produce the previous results. assistance with the measurement setup for this work,
A different approach attempted was to build a model and Dr. Jeff Buzen, Richard Gimarc, and Chris Loosley
using linear regression on all of the data points. As for helpful comments and suggestions on earlier drafts
bad as outliers are in linear statistics such as the of this paper.
mean, they can wreak much greater havoc when given
even greater weight, such as in least squares References
regression. While we will not even publish the
coefficients returned by regression (for fear of [Burn1992] Burns,P.J, A Genetic Algorithm for Robust
2
misleading the reader), suffice it to say that the R for Regression Estimation. (Statsci Technical Note)
the full fit was only about 0.66. [Heid1997] John Heidemann, Katia Obraczka, and Joe
So in summary, the roughly 10% of measurements Touch. Modeling the Performance of HTTP Over
omitted from the robust regression not only brought the Several Transport Protocols. ACM/IEEE Transactions
2
R down over ten times as far from perfect, but they on Networking, 5 5, 616-630, October, 1997
managed to distort wildly the regression model. It is
clear that any model that seeks to avoid trimming must [HTTP1.0] Hypertext Transfer Protocol HTTP/1.0
use a different technique than least squares regression Specifications 1998
of linear data. However, since the linear model fits [HTTP1.1] Hypertext Transfer Protocol HTTP/1.1
most of the data so well, and since it has such an Specifications
intuitively appealing interpretation, the most attractive
approach is to study the outliers as a sub-population, [Neil2000] Jakob Nielsen, Designing Web Usability:
and in fact as several sub-populations, if warranted. The Practice of Simplicity, New Riders Publishing 2000
[Sper1995] Simon E Spero Analysis of HTTP
Conclusion Performance problems
http://www.ibiblio.org/mdma-release/http-prob.html
This paper represents part of an ongoing effort to
understand better the billions of measurements taken [Stev1997] W. Richard Stevens; TCP/IP Illustrated,
annually by Keynote Systems. Despite the Volume 1: The Protocols (Addison-Wesley)
complexities of dealing with heavy -tailed Internet
[Touc1998] Joe Touch, John Heidemann, and Katia
statistics and such technologies as multithreading, we
Obraczka. Analysis of HTTP Performance. Research
feel that the results discussed here show significant
Report 98-463, USC/Information Sciences Institute,
promise toward building more granular quantitative
August, 1998
performance models for entire Web pages based
http://www.isi.edu/lsam/publications/http-perf/.
solely on page design and simple measures of Internet
latency as taken only for a base page. Subsequent [Wang1998] Zhe Wang and Pei Cao, “Persistent
work should generalize these results to more kinds of Connection Behavior of Popular Browsers”
pages as well as other browser technologies. http://www.cs.wisc.edu/~cao/papers/persistent-
Implications useful for site designers include the goal connection.html
of keeping all available threads “busy” for the same
amount of time. For example, this might be improved
by separating a page’s single largest image into
parallelizable pieces.
On the other hand, our performance optimum at four
images is dependent on equal image sizing as well as
on a lack of connection recycling. If a browser keeps
connections to a server open for possible additional
element downloads, this would tend to tilt upward the
number of images that optimize performance. In any
case, large numbers of images cause undesirable
overhead, and image sizes should be understood
primarily in terms of the number of packets they
require.