Improving Performance On The Internet
Improving Performance On The Internet
doi:10.1145/ 1461928.1461944
Improving
Performance
on the Internet
anything, networks want to minimize cut. According to TeleGeography, the mand. Broadband adoption continues
traffic coming into their networks cuts reduced bandwidth connectivity to rise, in terms of both penetration
that they don’t get paid for. As a re- between Europe and the Middle East and speed, as ISPs invest in last-mile
sult, peering points are often overbur- by 75%.8 infrastructure. AT&T just spent ap-
dened, causing packet loss and service Internet protocols such as BGP proximately $6.5 billion to roll out
degradation. (Border Gateway Protocol, the Inter- its U-verse service, while Verizon is
f e brua ry 2 0 0 9 | vo l. 52 | n o. 2 | c om m u n ic at i on s of t he acm 45
practice
spending $23 billion to wire 18 million broadband numbers represent a 19% dience of 50 million viewers, approxi-
homes with FiOS (Fiber-optic Service) increase in just three months. mately the audience size of a popular
by 2010.6,7 Comcast also recently an- TV show. The scenario produces ag-
nounced it plans to offer speeds of up A Question of Scale gregate bandwidth requirements of
to 100Mbps within a year.3 Along with the greater demand and 100Tbps. This is a reasonable vision
Demand drives this last-mile boom: availability of broadband comes a rise in for the near term—the next two to five
Pew Internet’s 2008 report shows that user expectations for faster sites, richer years—but it is orders of magnitude
one-third of U.S. broadband users have media, and highly interactive applica- larger than the biggest online events
chosen to pay more for faster connec- tions. The increased traffic loads and today, leading to skepticism about
tions.4 Akamai Technologies’ data, performance requirements in turn put the Internet’s ability to handle such
shown in Figure 1, reveals that 59% of greater pressure on the Internet’s inter- demand. Moreover, these numbers
its global users have broadband con- nal infrastructure—the middle mile. In are just for a single TV-quality show. If
nections (with speeds greater than 2 fact, the fast-rising popularity of video hundreds of millions of end users were
Mbps), and 19% of global users have has sparked debate about whether the to download Blu-ray-quality movies
“high broadband” connections great- Internet can scale to meet the demand. regularly over the Internet, the result-
er than 5Mbps—fast enough to sup- Consider, for example, delivering ing traffic load would go up by an addi-
port DVD-quality content.2 The high- a TV-quality stream (2Mbps) to an au- tional one or two orders of magnitude.
Another interesting side effect of
Figure 1: Broadband penetration by country. the growth in video and rich media
file sizes is that the distance between
server and end user becomes critical
Broadband to end-user performance. This is the
Ranking Country % > 2Mbps result of a somewhat counterintuitive
— Global 59% phenomenon that we call the Fat File
Paradox: given that data packets can
1 South Korea 90%
traverse networks at close to the speed
2 Belgium 90% of light, why does it takes so long for a
3 Japan 87% “fat file” to cross the country, even if
4 Hong Kong 87% the network is not congested?
5 Switzerland 85%
It turns out that because of the
way the underlying network proto-
6 Slovakia 83%
cols work, latency and throughput
7 Norway 82% are directly coupled. TCP, for exam-
8 Denmark 79% ple, allows only small amounts of
9 Netherlands 77% data to be sent at a time (that is, the
TCP window) before having to pause
10 Sweden 75%
and wait for acknowledgments from
… the receiving end. This means that
20. United States 71% throughput is effectively throttled by
network round-trip time (latency),
which can become the bottleneck for
Fast Broadband file download speeds and video view-
Ranking Country % > 5Mbps ing quality.
— Global 19%
Packet loss further complicates the
problem, since these protocols back
1 South Korea 64%
off and send even less data before
2 Japan 52% waiting for acknowledgment if packet
3 Hong Kong 37% loss is detected. Longer distances in-
4 Sweden 32% crease the chance of congestion and
packet loss to the further detriment of
5 Belgium 26%
throughput.
6 United States 26%
Figure 2 illustrates the effect of dis-
7 Romania 22% tance (between server and end user)
8 Netherlands 22% on throughput and download times.
9 Canada 18% Five or 10 years ago, dial-up modem
speeds would have been the bottle-
10 Denmark 18%
neck on these files, but as we look at
Source: Akamai’s State of the Internet Report, 02 2008 the Internet today and into the future,
middle-mile distance becomes the
bottleneck.
Four Approaches away from most users and still deliver side of the middle-mile bottlenecks,
to Content Delivery content from the wrong side of the eliminating peering, connectivity,
Given these bottlenecks and scalability middle-mile bottlenecks. routing, and distance problems, and
challenges, how does one achieve the It may seem counterintuitive that reducing the number of Internet com-
levels of performance and reliability re- having a presence in a couple dozen ma- ponents depended on for success.
quired for effective delivery of content jor backbones isn’t enough to achieve Moreover, this architecture scales. It
and applications over the Internet? commercial-grade performance. In can achieve a capacity of 100Tbps, for
There are four main approaches to fact, even the largest of those networks example, with deployments of 20 serv-
distributing content servers in a con- controls very little end-user access traf- ers, each capable of delivering 1Gbps
tent-delivery architecture: centralized fic. For example, the top 30 networks in 5,000 edge locations.
hosting, “big data center” CDNs (con- combined deliver only 50% of end-user On the other hand, deploying a high-
tent-delivery networks), highly distrib- traffic, and it drops off quickly from ly distributed CDN is costly and time
uted CDNs, and peer-to-peer networks. there, with a very long tail distribution consuming, and comes with its own
Centralized Hosting.Traditionally ar- over the Internet’s 13,000 networks. set of challenges. Fundamentally, the
chitected Web sites use one or a small Even with connectivity to all the biggest network must be designed to scale ef-
number of collocation sites to host backbones, data must travel through ficiently from a deployment and man-
content. Commercial-scale sites gen- the morass of the middle mile to reach agement perspective. This necessitates
erally have at least two geographically most of the Internet’s 1.4 billion users. development of a number of technolo-
dispersed mirror locations to provide A quick back-of-the-envelope calcu- gies, including:
additional performance (by being clos- lation shows that this type of architec- ˲˲ Sophisticated global-scheduling,
er to different groups of end users), reli- ture hits a wall in terms of scalability mapping, and load-balancing algo-
ability (by providing redundancy), and as we move toward a video world. Con- rithms
scalability (through greater capacity). sider a generous forward projection ˲˲ Distributed control protocols and
This approach is a good start, and on such an architecture—say, 50 high- reliable automated monitoring and
for small sites catering to a localized capacity data centers, each with 30 alerting systems
audience it may be enough. The per- outbound connections, 10Gbps each. ˲˲ Intelligent and automated failover
formance and reliability fall short of This gives an upper bound of 15Tbps and recovery methods
expectations for commercial-grade total capacity for this type of network, ˲˲ Colossal-scale data aggregation
sites and applications, however, as the far short of the 100Tbps needed to sup- and distribution technologies (de-
end-user experience is at the mercy of port video in the near term. signed to handle different trade-offs
the unreliable Internet and its middle- Highly Distributed CDNs. Another ap- between timeliness and accuracy or
mile bottlenecks. proach to content delivery is to leverage completeness)
There are other challenges as well: a very highly distributed network—one ˲˲ Robust global software-deploy-
site mirroring is complex and costly, with servers in thousands of networks, ment mechanisms
as is managing capacity. Traffic levels rather than dozens. On the surface, this ˲˲ Distributed content freshness, in-
fluctuate tremendously, so the need to architecture may appear quite similar tegrity, and management systems
provision for peak traffic levels means to the “big data center” CDN. In reality, ˲˲ Sophisticated cache-management
that expensive infrastructure will sit however, it is a fundamentally different protocols to ensure high cache-hit ratios
underutilized most of the time. In ad- approach to content-server placement, These are nontrivial challenges, and
dition, accurately predicting traffic with a difference of two orders of mag- we present some of our approaches
demand is extremely difficult, and a nitude in the degree of distribution. later on in this article.
centralized hosting model does not By putting servers within end-user Peer-to-Peer Networks. Because a
provide the flexibility to handle unex- ISPs, for example, a highly distributed highly distributed architecture is criti-
pected surges. CDN delivers content from the right cal to achieving scalability and perfor-
“Big Data Center” CDNs. Content-
delivery networks offer improved Figure 2: Effect of distance on throughput and download times.
f e brua ry 2 0 0 9 | vo l. 52 | n o. 2 | c om m u n ic at i on s of t he acm 47
practice
using the Dimes Project data sets that describes the structure of the Internet,
Chris harrison of Carnegie mellon university created this visualization illustrating how
cities across the globe are interconnected (by router configuration and not physical
backbone). In total, there are 89,344 connections.
mance in video distribution, it is nat- Web-quality streams. Similarly, P2P have focused our conversation on the
ural to consider a P2P (peer-to-peer) fails in “flash crowd” scenarios where same. As Web sites become increasing-
architecture. P2P can be thought of there is a sudden, sharp increase in de- ly dynamic, personalized, and applica-
as taking the distributed architecture mand, and the number of download- tion-driven, however, the ability to ac-
to its logical extreme, theoretically ers greatly outstrips the capacity of up- celerate uncacheable content becomes
providing nearly infinite scalability. loaders in the network. equally critical to delivering a strong
Moreover, P2P offers attractive eco- Somewhat better results can be end-user experience.
nomics under current network pricing achieved with a hybrid approach, lever- Ajax, Flash, and other RIA (rich In-
structures. aging P2P as an extension of a distrib- ternet application) technologies work
In reality, however, P2P faces some uted delivery network. In particular, to enhance Web application respon-
serious limitations, most notably be- P2P can help reduce overall distribu- siveness on the browser side, but ul-
cause the total download capacity of a tion costs in certain situations. Be- timately, these types of applications
P2P network is throttled by its total up- cause the capacity of the P2P network all still require significant numbers of
link capacity. Unfortunately, for con- is limited, however, the architecture round-trips back to the origin server.
sumer broadband connections, uplink of the non-P2P portion of the network This makes them highly susceptible
speeds tend to be much lower than still governs overall performance and to all the bottlenecks I’ve mentioned
downlink speeds: Comcast’s standard scalability. before: peering-point congestion, net-
high-speed Internet package, for ex- Each of these four network architec- work latency, poor routing, and Inter-
ample, offers 6Mbps for download tures has its trade-offs, but ultimately, net outages.
but only 384Kbps for upload (one-six- for delivering rich media to a global Speeding up these round-trips is a
teenth of download throughput). Web audience, a highly distributed ar- complex problem, but many optimi-
This means that in situations such chitecture provides the only robust so- zations are made possible by using a
as live streaming where the number of lution for delivering commercial-grade highly distributed infrastructure.
uploaders (peers sharing content) is performance, reliability, and scale. Optimization 1: Reduce transport-
limited by the number of download- layer overhead. Architected for reliabil-
ers (peers requesting content), average application acceleration ity over efficiency, protocols such as
download throughput is equivalent Historically, content-delivery solutions TCP have substantial overhead. They
to the average uplink throughput and have focused on the offloading and de- require multiple round-trips (between
thus cannot support even mediocre livery of static content, and thus far we the two communicating parties) to set
up connections, use a slow initial rate ability by finding alternate routes content prefetching does not expend
of data exchange, and recover slowly when the default routes break. extra bandwidth resources and does
from packet loss. In contrast, a net- Optimization 3: Prefetch embed- not request extraneous objects that
work that uses persistent connections ded content. You can do a number of may not be requested by the end user.
and optimizes parameters for efficien- additional things at the application With current trends toward highly
cy (given knowledge of current network layer to improve Web application re- personalized applications and user-
conditions) can significantly improve sponsiveness for end users. One is to generated content, there’s been growth
performance by reducing the number prefetch embedded content: while in either uncacheable or long-tail (that
of round-trips needed to deliver the an edge server is delivering an HTML is, not likely to be in cache) embedded
same set of data. page to an end user, it can also parse content. In these situations, prefetch-
Optimization 2: Find better routes. the HTML and retrieve all embedded ing makes a huge difference in the us-
In addition to reducing the number content before it is requested by the er-perceived responsiveness of a Web
of round-trips needed, we would also end user’s browser. application.
like to reduce the time needed for each The effectiveness of this optimiza- Optimization 4: Assemble pages at the
round-trip—each journey across the tion relies on having servers near end edge. The next three optimizations in-
Internet. At first blush, this does not users, so that users perceive a level of volve reducing the amount of content
seem possible. All Internet data must application responsiveness akin to that needs to travel over the middle
be routed by BGP and must travel over that of an application being delivered mile. One approach is to cache page
numerous autonomous networks. directly from a nearby server, even fragments at edge servers and dynami-
BGP is simple and scalable but not though, in fact, some of the embedded cally assemble them at the edge in re-
very efficient or robust. By leveraging a content is being fetched from the ori- sponse to end-user requests. Pages can
highly distributed network—one that gin server across the long-haul Inter- be personalized (at the edge) based on
offers potential intermediary servers net. Prefetching by forward caches, for characteristics including the end us-
on many different networks—you can example, does not provide this perfor- er’s location, connection speed, cook-
actually speed up uncacheable com- mance benefit because the prefetched ie values, and so forth. Assembling the
munications by 30% to 50% or more, by content must still travel over the mid- page at the edge not only offloads the
using routes that are faster and much dle mile before reaching the end user. origin server, but also results in much
less congested. You can also achieve Also, note that unlike link prefetching lower latency to the end user, as the
much greater communications reli- (which can also be done), embedded middle mile is avoided.
f e B R ua RY 2 0 0 9 | vo l. 52 | N o. 2 | C om m u n IC at Ion s of t he aCm 49
practice
Optimization 5: Use compression and control both endpoints). To maximize fails. To ensure robustness of all sys-
delta encoding. Compression of HTML the effect of this optimized connec- tems, however, you will likely need to
and other text-based components can tion, the endpoints should be as close work around the constraints of existing
reduce the amount of content travel- as possible to the origin server and the protocols and interactions with third-
ing over the middle mile to one-tenth end user. party software, as well as balancing
of the original size. The use of delta Note also that these optimizations trade-offs involving cost.
encoding, where a server sends only work in synergy. TCP overhead is in For example, the Akamai network
the difference between a cached HTML large part a result of a conservative ap- relies heavily on DNS (Domain Name
page and a dynamically generated ver- proach that guarantees reliability in System), which has some built-in con-
sion, can also greatly cut down on the the face of unknown network condi- straints that affect reliability. One ex-
amount of content that must travel tions. Because route optimization gives ample is DNS’s restriction on the size
over the long-haul Internet. us high-performance, congestion-free of responses, which limits the number
While these techniques are part of paths, it allows for a much more ag- of IP addresses that we can return to a
the HTTP/1.1 specification, browser gressive and efficient approach to relatively static set of 13. The Generic
support is unreliable. By using a highly transport-layer optimizations. Top Level Domain servers, which sup-
distributed network that controls both ply the critical answers to akamai.net
endpoints of the middle mile, com- Highly Distributed Network Design queries, required more reliability, so
pression and delta encoding can be It was briefly mentioned earlier that we took several steps, including the use
successfully employed regardless of building and managing a robust, highly of IP Anycast.
the browser. In this case, performance distributed network is not trivial. At Ak- We also designed our system to take
is improved because very little data amai, we sought to build a system with into account DNS’s use of TTLs (time
travels over the middle mile. The edge extremely high reliability—no down- to live) to fix resolutions for a period
server then decompresses the content time, ever—and yet scalable enough of time. Though the efficiency gained
or applies the delta encoding and deliv- to be managed by a relatively small through TTL use is important, we need
ers the complete, correct content to the operations staff, despite operating in to make sure users aren’t being sent
end user. a highly heterogeneous and unreliable to servers based on stale data. Our ap-
Optimization 6: Offload computations environment. Here are some insights proach is to use a two-tier DNS—em-
to the edge. The ability to distribute ap- into the design methodology. ploying longer TTLs at a global level and
plications to edge servers provides the The fundamental assumption be- shorter TTLs at a local level— allowing
ultimate in application performance hind Akamai’s design philosophy is less of a trade-off between DNS effi-
and scalability. Akamai’s network en- that a significant number of compo- ciency and responsiveness to changing
ables distribution of J2EE applications nent or other failures are occurring at conditions. In addition, we have built
to edge servers that create virtual appli- all times in the network. Internet sys- in appropriate failover mechanisms at
cation instances on demand, as need- tems present numerous failure modes, each level.
ed. As with edge page assembly, edge such as machine failure, data-center Principle 2: Use software logic to pro-
computation enables complete origin failure, connectivity failure, software vide message reliability. This design
server offloading, resulting in tremen- failure, and network failure—all oc- principle speaks directly to scalability.
dous scalability and extremely low ap- curring with greater frequency than Rather than building dedicated links
plication latency for the end user. one might think. As mentioned earlier, between data centers, we use the pub-
While not every type of application for example, there are many causes of lic Internet to distribute data—includ-
is an ideal candidate for edge compu- large-scale network outages—includ- ing control messages, configurations,
tation, large classes of popular applica- ing peering problems, transoceanic monitoring information, and custom-
tions—such as contests, product cata- cable cuts, and major virus attacks. er content—throughout our network.
logs, store locators, surveys, product Designing a scalable system that We improve on the performance of
configurators, games, and the like— works under these conditions means existing Internet protocols—for exam-
are well suited for edge computation. embracing the failures as natural and ple, by using multirouting and limited
expected events. The network should retransmissions with UDP (User Da-
Putting it All Together continue to work seamlessly despite tagram Protocol) to achieve reliability
Many of these techniques require a these occurrences. We have identified without sacrificing latency. We also use
highly distributed network. Route op- some practical design principles that software to route data through inter-
timization, as mentioned, depends on result from this philosophy, which we mediary servers to ensure communica-
the availability of a vast overlay net- share here.1 tions (as described in Optimization 2),
work that includes machines on many Principle 1: Ensure significant redun- even when major disruptions (such as
different networks. Other optimiza- dancy in all systems to facilitate failover. cable cuts) occur.
tions such as prefetching and page as- Although this may seem obvious and Principle 3: Use distributed control for
sembly are most effective if the deliver- simple in theory, it can be challenging coordination. Again, this principle is
ing server is near the end user. Finally, in practice. Having a highly distributed important both for fault tolerance and
many transport and application-layer network enables a great deal of redun- scalability. One practical example is the
optimizations require bi-nodal connec- dancy, with multiple backup possibili- use of leader election, where leadership
tions within the network (that is, you ties ready to take over if a component evaluation can depend on many factors
including machine status, connectivity els of demand for the content, while deploy in greater numbers of smaller
to other machines in the network, and keeping the network safe. regions—many of which host our serv-
monitoring capabilities. When connec- ers for free—rather than in fewer, larg-
tivity of a local lead server degrades, for Practical Results and Benefits er, more “reliable” data centers where
example, a new server is automatically Besides the inherent fault-tolerance congestion can be greatest.
elected to assume the role of leader. benefits, a system designed around
Principle 4: Fail cleanly and restart. these principles offers numerous other Conclusion
Based on the previous principles, the benefits. Even though we’ve seen dramatic ad-
network has already been architected Faster software rollouts. Because the vances in the ubiquity and usefulness
to handle server failures quickly and network absorbs machine and regional of the Internet over the past decade,
seamlessly, so we are able to take a failures without impact, Akamai is able the real growth in bandwidth-intensive
more aggressive approach to failing to safely but aggressively roll out new Web content, rich media, and Web-
problematic servers and restarting software using the phased rollout ap- and IP-based applications is just begin-
them from a last known good state. proach. As a benchmark, we have his- ning. The challenges presented by this
This sharply reduces the risk of operat- torically implemented approximately growth are many: as businesses move
ing in a potentially corrupted state. If 22 software releases and 1,000 custom- more of their critical functions on-
a given machine continues to require er configuration releases per month to line, and as consumer entertainment
restarting, we simply put it into a “long our worldwide network, without dis- (games, movies, sports) shifts to the
sleep” mode to minimize impact to the rupting our always-on services. Internet from other broadcast media,
overall network. Minimal operations overhead. A the stresses placed on the Internet’s
Principle 5: Phase software releases. large, highly distributed, Internet- middle mile will become increasingly
After passing the quality assurance (QA) based network can be very difficult to apparent and detrimental. As such, we
process, software is released to the live maintain, given its sheer size, number believe the issues raised in this article
network in phases. It is first deployed of network partners, heterogeneous and the benefits of a highly distributed
to a single machine. Then, after per- nature, and diversity of geographies, approach to content delivery will only
forming the appropriate checks, it is time zones, and languages. Because grow in importance as we collectively
deployed to a single region, then pos- the Akamai network design is based work to enable the Internet to scale to
sibly to additional subsets of the net- on the assumption that components the requirements of the next genera-
work, and finally to the entire network. will fail, however, our operations team tion of users.
The nature of the release dictates how does not need to be concerned about
many phases and how long each one most failures. In addition, the team References
lasts. The previous principles, particu- can aggressively suspend machines or 1. Afergan, M., Wein, J., LaMeyer, A. Experience with
some principles for building an Internet-scale reliable
larly use of redundancy, distributed data centers if it sees any slightly wor- system. In Proceedings of the 2nd Conference on Real,
control, and aggressive restarts, make risome behavior. There is no need to Large Distributed Systems 2. (These principles are laid
out in more detail in this 2005 research paper.)
it possible to deploy software releases rush to get components back online 2. Akamai Report: The State of the Internet, 2nd quarter,
2008; http://www.akamai.com/stateoftheinternet/.
frequently and safely using this phased right away, as the network absorbs the (These and other recent Internet reliability events are
approach. component failures without impact to discussed in Akamai’s quarterly report.)
3. Anderson, N. Comcast at CES: 100 Mbps connections
Principle 6: Notice and proactively overall service. coming this year. ars technica (Jan. 8, 2008); http://
quarantine faults. The ability to isolate This means that at any given time, it arstechnica.com/news.ars/post/20080108-comcast-
100mbps-connections-coming-this-year.html.
faults, particularly in a recovery-orient- takes only eight to 12 operations staff 4. Horrigan, J.B. Home Broadband Adoption 2008. Pew
ed computing system, is perhaps one members, on average, to manage our Internet and American Life Project; http://www.
pewinternet.org/pdfs/PIP_Broadband_2008.pdf.
of the most challenging problems and network of approximately 40,000 devic- 5. Internet World Statistics. Broadband Internet
an area of important ongoing research. es (consisting of more than 35,000 serv- Statistics: Top World Countries with Highest Internet
Broadband Subscribers in 2007; http://www.
Here is one example. Consider a hypo- ers plus switches and other networking internetworldstats.com/dsl.htm.
thetical situation where requests for hardware). Even at peak times, we suc- 6. Mehta, S. Verizon’s big bet on fiber optics. Fortune
(Feb. 22, 2007); http://money.cnn.com/magazines/
a certain piece of content with a rare cessfully manage this global, highly fortune/fortune_archive/2007/03/05/8401289/.
set of configuration parameters trig- distributed network with fewer than 20 7. Spangler T. AT&T: U-verse TV spending to increase.
Multichannel News (May 8, 2007); http://www.
ger a latent bug. Automatically failing staff members. multichannel.com/article/CA6440129.html.
the servers affected is not enough, as Lower costs, easier to scale. In addi- 8. TeleGeography. Cable cuts disrupt Internet in Middle
East and India. CommsUpdate (Jan. 31, 2008); http://
requests for this content will then be tion to the minimal operational staff www.telegeography.com/cu/article.php?article_
id=21528.
directed to other machines, spreading needed to manage such a large net-
the problem. To solve this problem, work, this design philosophy has had
Tom Leighton co-founded Akamai Technologies in August
our caching algorithms constrain each several implications that have led to 1998. Serving as chief scientist and as a director to the
set of content to certain servers so as reduced costs and improved scalabil- board, he is Akamai’s technology visionary, as well as
a key member of the executive committee setting the
to limit the spread of fatal requests. In ity. For example, we use commodity company’s direction. He is an authority on algorithms for
general, no single customer’s content hardware instead of more expensive, network applications. Leighton is a Fellow of the American
Academy of Arts and Sciences, the National Academy of
footprint should dominate any other more reliable servers. We deploy in Science, and the National Academy of Engineering.
customer’s footprint among available third-party data centers instead of hav- A previous version of this article appeared in the October
2008 issue of ACM Queue magazine.
servers. These constraints are dynami- ing our own. We use the public Internet
cally determined based on current lev- instead of having dedicated links. We © 2009 ACM 0001-0782/09/0200 $5.00
f e brua ry 2 0 0 9 | vo l. 52 | n o. 2 | c om m u n ic at i on s of t he acm 51