0% found this document useful (0 votes)
22 views

Understandingsoftwaredynamics Preview

This document provides an overview of measuring performance in complex software environments like datacenters. It discusses key concepts like transaction latency, tail latency of the slowest transactions, and how hardware is utilized. The goal is to understand why some transactions are unexpectedly slow by observing program dynamics and making informed estimates of how long each part of a program should take. Reducing these occasional slow transactions can improve hardware efficiency and response times for users.

Uploaded by

xekose3389
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Understandingsoftwaredynamics Preview

This document provides an overview of measuring performance in complex software environments like datacenters. It discusses key concepts like transaction latency, tail latency of the slowest transactions, and how hardware is utilized. The goal is to understand why some transactions are unexpectedly slow by observing program dynamics and making informed estimates of how long each part of a program should take. Reducing these occasional slow transactions can improve hardware efficiency and response times for users.

Uploaded by

xekose3389
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Part I

Measurement
Understanding variation is the key to success in quality and business.
—W. Edwards Deming

Measurement is the act of ascertaining the size, amount, or degree of something.


Careful measurements are the underpinning of understanding software performance.

This first part describes a complex hardware and software environment, the book’s
emphasis on transaction latency, the concept of latency distributions, and the conse-
quences of long 99th percentile latencies.

Our overall goal is to understand the root causes of variance in transaction latency—the
apparently random unexpectedly long response times in complex software.

The datacenter environment is a superset of the environment you might have set up
when exploring the performance of database transactions, desktop software delays,
dedicated controller delays, or game delays. This part also introduces the important
practice of estimating within a factor of 10 how long pieces of code should take. As an
underpinning for the rest of the book, it leads readers through detailed measurements
of CPU, memory, disk, and network latencies. These chapters use pre-supplied but
flawed programs that every reader can run and get some insight and then can modify
as directed to fix the flaws and gain substantially more insight. The resulting measure-
ments will start to show the sources of latency variation in simple programs.

The first part serves to bring readers with varying backgrounds to a common base of
knowledge about performance measurement, user- and kernel-mode software interac-
tions, cross-thread and cross-program software interference, and interactions between
complex software and computer hardware. At the end of this part, every reader will be
able to make informed estimates of how long a piece of code should take.

Humble Bundle Pearson Software Development Ð © Pearson. Do Not Distribute.


This page intentionally left blank

Humble Bundle Pearson Software Development Ð © Pearson. Do Not Distribute.


Chapter 1
My Program Is Too Slow

Someone walks into my office and says “My program is too slow.” After a pause, I ask “How slow
should it be?"

A good programmer has a ready answer to this question, as she describes the work to be done
and estimates of how long each portion should take. Perhaps she says “This database query
accesses 10,000 records of which about 1,000 turn out to be relevant; each access should take
about 10 milliseconds and they are spread across 20 disks, so 10,000 accesses should be about
5 seconds total. There is no network activity and the CPU processing and memory use are small
and simple—all much faster than the disk access time. The actual query is taking about 15 sec-
onds, which is too slow.”

A sloppier programmer might answer “I wrote 1,000 lines of code all night using lots of existing
libraries, and it all works but takes about 15 seconds per query, and I want it to take 1/10 of a
second. One of those libraries must be too slow; how can I find it?” When asked, he has no idea
whether 1/10 of a second is a reasonable expectation, no idea how long each library call should
take, no idea if he is using the libraries appropriately, and no designed-in way to observe the
dynamics of his code to determine where the time really goes. We will explore all these issues in
this book.

1.1 Datacenter Context


We introduce some terms and concepts from a complex software environment. Your environ-
ment may be much simpler, but the ideas carry over almost exactly. The terminology is from
datacenters, but the ideas also apply to database, desktop, vehicle, gaming, and other time-
constrained environments.

A transaction or query or request is an input message to a computer system that must be dealt with
as a single unit of work. Each computer processing transactions is termed a server. The latency
or response time of a transaction is the time elapsed between sending a message and receiving
its result. The offered load is the number of transactions sent per second; when this exceeds the
number of transactions processed per second, response time suffers, sometimes dramatically.
A service is a collection of programs that handle one particular kind of transaction. Large data-
centers process transactions for dozens of different services simultaneously, and each service has
a different offered load and a different latency goal.

Humble Bundle Pearson Software Development Ð © Pearson. Do Not Distribute.


4 Chapter 1}}My Program Is Too Slow

Transaction latency is not constant—it has a probability distribution taken over thousands of
transactions per second. Tail latency refers to the slowest transactions in this distribution. A
simple way to summarize the tail latency is to state the 99th percentile latency—the time that is
exceeded by the slowest 1% of all transactions, i.e., by 50 transactions every second if the offered
load is 5,000 transactions per second.

By the dynamics of a program or collection of programs we mean the activity over time—what
pieces of code run when, what they wait for, what memory space they take, and how different
programs affect each other. As programmers, we imagine in our heads simple dynamics for a
program, but in reality the program may (occasionally) behave much differently than that pic-
ture and perform much more slowly than expected. If we can observe the true dynamics, we can
adjust our mental picture and usually improve the code’s performance with simple changes.

We are interested in user-facing transactions in complex software—the datacenter half of cell


phones, for example. We are particularly interested in transactions that are usually fast but occa-
sionally take much longer—enough that the end user sees an annoying delay. In datacenters, the
hardware budget for each service is often determined by how many transactions per second each
server can “handle.” This target number is determined empirically by increasing the offered load
until some tail-latency time constraint is exceeded, and then the target load is backed off a little.

If we can understand and then reduce the number of too-long transactions, the same hardware
can handle larger loads within the tail-latency goal, at no additional cost. This is worth a lot of
money. A skilled and somewhat lucky performance engineer can occasionally make a simple
software change that saves enough money to pay for 10 years of salary. Companies and custom-
ers like such people.

Time-constrained transaction software is fundamentally different from batch or offline soft-


ware (or most benchmarks). The important metric for transaction software is response time,
while the important metric for batch software is usually efficient hardware utilization. For trans-
actions, it is not the average response time that matters, but the slowest times, the tail latency.

In a datacenter, a higher average latency but shorter tail latency is usually preferred
over a lower average latency and longer tail latency. Most commuters prefer the same
thing—a route that takes a few minutes longer but always takes about the same time
is better than a slightly faster route that occasionally has unpredictable hour-long
delays.

For batch software, having the CPUs 98% busy on average can be good; for transaction soft-
ware, 98% busy is a disaster, and even CPUs 50% busy on average might be too much, because it
produces long response times whenever the offered load spikes for a few seconds to 3x above the
average. When I first joined Google in 2004, the average datacenter CPU was 9% busy and 91%
idle. The 9% busy was too low. Increasing that to 18% without increasing tail latency doubled
the efficiency of all those datacenters. Doubling again to 36% busy would be good, but doubling
a third time to 72% busy would likely ruin too many transactions’ time constraints.

In looking at the performance of complex transaction-oriented software, we assume in this book


that the programs involved fundamentally work and that on average they work quickly enough.

Humble Bundle Pearson Software Development Ð © Pearson. Do Not Distribute.


1.2 Datacenter Hardware 5

We won’t discuss designing or debugging such software, nor understanding or improving its
average performance. We also assume that always-slow transactions have been identified and
fixed in offline test/debug environments that have no time constraints, leaving us just with
occasionally slow transactions. We focus on the mechanisms that make occasional transactions
slow, on how to observe these mechanisms, and on how to interpret the observations.

When you use a cell phone to send a text message, read a post, search the web, look at a map,
stream a video, use an app, or even dial a telephone number, there is a datacenter somewhere
that responds to your requests. If these responses are annoyingly slow and some competing
app or service is faster, you may well switch to that one, or at least use the slow one less often.
Everyone in a time-constrained ecosystem has an incentive, often financial, to reduce annoying
delays. Few people have the skills to do so.

It is the goal of this book to teach a few more people how.

1.2 Datacenter Hardware


Large datacenters have something like 10,000 servers in a building, with each server a PC about
the size of a desktop PC but without the case. Instead, about 50 server boards are mounted in
a rack, and there are 200 racks spread around a very large room. A typical server has 1–4 CPU
chip sockets with 4–50 CPU cores each, a boatload1 of RAM, a couple of disks or solid-state drives
(SSDs), and a network connection to a datacenter-wide switching fabric set up so that any server
can communicate with any other server, and at least some of the servers can also communicate
with the Internet and hence your phone. Outside the building, there are big generators that can
run the entire building, including air conditioning, for days or weeks when there is a power out-
age. Inside, there are batteries that can run the servers and network switches for tens of seconds
while the generators start up.

Each server runs multiple programs. It usually doesn’t make business sense to dedicate some
servers to just doing email, others to just map tiles, and others to just instant messages. Instead,
each server runs multiple programs, and each of those programs likely has multiple threads. For
example, an email server program might have 100 worker threads processing email requests
for several thousand users simultaneously, most of whom are typing or reading, with many of
the active threads waiting for disk accesses or for other software layers. The worker threads take
incoming requests, do whatever is asked, respond, and then go on to another pending request
from another user. During the busiest hour of the day almost all the worker threads are busy,
while during the slowest hour of the day at least half of them will be idle, waiting for work. There
is a constant boom-and-bust cycle of offered work at almost all time scales—microsecond, milli-
second, second, and minute. There is even a seven-day cycle with lower activity on Saturday and
Sunday (for Western work weeks).

To control response times, it is important to have spare hardware resources available for user-
facing transactions, since the user load tends to spike now and then based on physical-world
events. But it is also economical to have some non-user-facing batch programs to run when
there are otherwise idle processors. In addition to user-facing foreground programs and batch

1
A computer science technical term, 1012

Humble Bundle Pearson Software Development Ð © Pearson. Do Not Distribute.

You might also like