Scaling
Scaling
Which scalability advice is relevant to applications that can fit on a single server or a handful of servers?
Most people will never maintain systems at an extremely large scale, and the tactics used at very large
and popular companies shouldn’t always be emulated. We’ll try to cover a range of strategies in this
chapter. We’ve built or helped build many applications, ranging from those that use a single server or a
handful of servers to those that use thousands. Choosing the appropriate strategy for your application is
often the key to saving money and time that can be invested elsewhere.
MySQL has been criticized for being hard to scale, and sometimes that’s true, but usually you can make
MySQL scale well if you choose the right architecture and implement it well. Scalability is not always a
well-understood topic, however, so we’ll begin by clearing up the confusion.
What Is Scalability?
People often use terms such as “scalability,” “high availability,” and “performance” as synonyms in casual
conversation, but they’re completely different. As we explained in Chapter 3, we define performance as
response time. Scalability can be defined precisely too; we’ll explore that more fully in a moment, but in
a nutshell it’s the system’s ability to deliver equal bang for the buck as you add resources to perform
more work. Poorly scalable systems reach a point of diminishing returns and can’t grow further.
(Định nghĩa của chúng tôi là hiệu suất được đo bằng thời gian cần thiết để hoàn thành một nhiệm vụ.
Nói cách khác,hiệu suất là thời gian đáp ứng. Đây là một nguyên tắc rất quan trọng. Chúng tôi đo
hiệu suất theo nhiệm vụ và thời gian, không phải theo nguồn lực. Mục đích của máy chủ cơ sở dữ liệu là
thực thi các câu lệnh SQL, vì vậy các tác vụ chúng ta quan tâm là các truy vấn hoặc câu lệnh—69)
Capacity is a related concept. The system’s capacity is the amount of work it can perform in a given
amount of time.1 However, capacity must be qualified. The system’s maximum throughput is not the
same as its capacity. Most benchmarks measure a system’s maximum throughput, but you can’t push
real systems that hard. If you do, performance will degrade and response times will become
unacceptably large and variable. We define the system’s actual capacity as the throughput it can achieve
while still delivering acceptable performance. This is why benchmark results usually shouldn’t be
reduced to a single number.
Capacity and scalability are independent of performance. To make an analogy with cars on a highway:
• Scalability is the degree to which you can add more cars and more lanes without slowing traffic.
In this analogy, scalability depends on factors such as how well the interchanges are designed, how many
cars have accidents or break down, and whether the cars drive at different speeds or change lanes a lot
—but generally not on how powerful the cars’ engines are. This is not to say that performance doesn’t
matter, because it does. We’re just pointing out that systems can be scalable even if they aren’t high-
performance.
From the 50,000-foot view, scalability is the ability to add capacity by adding resources.
Even if your MySQL architecture is scalable, your application might not be. If it’s hard to increase
capacity for any reason, your application isn’t scalable overall. We defined capacity in terms of
throughput a moment ago, but it’s worth looking at capacity from the same 50,000-foot view. From this
vantage point, capacity simply means the ability to handle load, and it’s useful to think of load from
several different angles:
Quantity of data
The sheer volume of data your application can accumulate is one of the most common scaling
challenges. This is particularly an issue for many of today’s web applications, which never delete any
data. Social networking sites, for example, typically never delete old messages or comments.
Number of users
Even if each user has only a small amount of data, if you have a lot of users it adds up—and the data size
can grow disproportionately faster than the number of users. Many users generally means more
transactions too, and the number of transactions might not be proportional to the number of users.
Finally, many users (and more data) can mean increasingly complex queries, especially if queries depend
on the number of relationships among users. (The number of relationships is bounded by ( N * (N–1) ) /
2, where N is the number of users.)
User activity
Not all user activity is equal, and user activity is not constant. If your users suddenly become more
active, for example because of a new feature they like, your load can increase significantly. User activity
isn’t just a matter of the number of page views, either—the same number of page views can cause more
work if part of the site that requires a lot of work to generate becomes more popular. Some users are
much more active than others, too: they might have many more friends, messages, or photos than the
average user.
If there are relationships among users, the application might need to run queries and computations on
entire groups of related users. This is more complex than just working with individual users and their
data. Social networking sites often face challenges due to popular groups or users who have many
friends.
A Formal Definition
It’s worth exploring a mathematical definition of scalability, as it will enable you to think clearly about
the higher-level concepts. If you don’t have that grounding, you might not understand or be able to
communicate scalability precisely. Don’t worry, this won’t involve advanced mathematics—you’ll be able
to understand it intuitively(trực giác) even if you’re not a math whiz.
The key is the phrase we used earlier: “equal bang for the buck.” Another way to say this is that
scalability is the degree to which the system provides an equal return on investment (ROI) as you add
resources to handle the load and increase capacity. Let’s suppose that we have a system with one server,
and we can measure its maximum capacity. Figure 11-1 illustrates this scenario.
This is linear scalability. We doubled the number of servers, and as a result, we doubled the system’s
capacity. Most systems aren’t linearly scalable; they often scale a bit like Figure 11-3 instead.
Most systems provide slightly less than linear scalability at small scaling factors, and the deviation from
linearity becomes more obvious at higher scaling factors. In fact, most systems eventually reach a point
of maximum throughput, beyond which additional investment provides a negative return—add more
workload and you’ll actually reduce the system’s throughput!
How is this possible? Many models of scalability have been created over the years, with varying degrees
of success and realism. The scalability model that we refer to here is based on some of the underlying
mechanisms that influence systems as they scale. It is Dr. Neil J. Gunther’s Universal Scalability Law (USL).
Dr. Gunther has written about it at length in his books, including Guerrilla Capacity Planning (Springer).
We will not go deeply into the mathematics here, but if you are interested, his book and the training
courses offered by his company, Performance Dynamics, might be good resources for you.4
The short introduction to the USL is that the deviation from linear scalability can be modeled by two
factors: a portion of the work cannot be done in parallel, and a portion of the work requires crosstalk.
Modeling the first factor results in the well-known Amdahl’s Law, which causes throughput to level off.
When part of the task can’t be parallelized, no matter how much you divide and conquer, the task takes
at least as long as the serial portion.
It is possible to measure a system and use regression to determine the amount of seriality and crosstalk
it exhibits.
Another framework for understanding scalability problems is the theory of constraints, which explains
how to improve a system’s throughput and efficiency by reducing dependent events and statistical
variations.
This is all a lot of theory, but how well does it work in practice?
The USL breaks down when a workload’s interaction with the system on which it runs has subtleties.
There are other cases where the USL model doesn’t describe a system’s behavior very well.
Scaling MySQL
Nhận định chung
Placing all of your application’s data in a single MySQL instance simply will not scale well. Sooner or later
you’ll hit performance bottlenecks. The traditional solution in many types of applications is to buy more
powerful servers. This is what’s known as “scaling vertically” or “scaling up.” The opposite approach is to
divide your work across many computers, which is usually called “scaling horizontally” or “scaling out.”
We’ll discuss how to combine scale-out and scale-up solutions with consolidation, and how to scale with
clustering solutions. Finally, most applications also have some data that’s rarely or never needed and
that can be purged or archived. We call this approach “scaling back,” just to give it a name that matches
the other strategies.
Planning for Scalability
People usually start to think about scalability when the server has difficulty keeping up with increased
load.
If your application is highly scalable, you can simply plug in more servers to handle theload, and the
performance problems will disappear.
The hardest part of scalability planning is estimating how much load you’ll need to
handle.
You also need to estimate your schedule approximately right—that is, you need to know where the
“horizon” is. For some applications, a simple prototype could work fine for a few months, giving you a
chance to raise capital and build a more scalable architecture. For other applications, you might need
your current architecture to provide enough capacity for two years.
• How complete is your application’s functionality? A lot of the scaling solutions we suggest can make it
harder to implement certain features. If you haven’t yet implemented some of your application’s core
features, it might be hard to see how you can build them in a scaled application. Likewise, it could be
hard to decide on a scaling solution before you’ve seen how these features will really work.
What is your expected peak load? Your application should work even at this load. What would happen if
your site made the front page of Yahoo! News or Slashdot? Even if your application isn’t a popular
website, you can still have peak loads khối lượng cao đ. For example, if you’re an online retailer, the
holiday season—especially the infamous online shopping days in the few weeks before Christmas—is
often a time of peak load. In the US, Valentine’s Day and the weekend before Mother’s Day are also a
peak times for online florists.
If you rely on every part of your system to handle the load, what will happen if part of it fails? For
example, if you rely on replicas to distribute the read load, can you still keep up if one of them fails? Will
you need to disable some functionality to do so? You can build in some spare capacity to help alleviate
these concerns.
Optimize performance
You can often get significant performance improvements from relatively simple changes, such as
indexing tables correctly or switching from MyISAM to the InnoDB storage engine. If you’re facing
performance limitations now, one of the first things you should do is enable and analyze the slow query
log. See Chapter 3 for more on this topic.
There is a point of diminishing returns. After you’ve fixed most of the major problems, it gets harder and
harder to improve performance. Each new optimization makes less of a difference and requires more
effort, and they often make your application much more complicated.
Upgrading your servers, or adding more of them, can sometimes work well. Especially for an
application that’s early in its lifecycle, it’s often a good idea to buy a few more servers or get some more
memory. The alternative might be to try to keep the application running on a single server. It can be
more practical just to buy some more hardware than to change your application’s design, especially if
time is critical and developers are scarce.