Unit 5 Scaling
Unit 5 Scaling
UNIT 5 SCALING
Structure:-
5.1 Introduction
5.2 Objective
5.3 Scaling primitives
5.4 Scaling Strategies
5.4.1 Proactive Scaling
5.4.2 Reactive Scaling
5.4.3 Combinational Scaling
5.5 Auto Scaling in Cloud
5.6 Types of Scaling
5.6.1 Vertical Scaling or Scaling Up
5.6.2 Horizontal Scaling or Scaling Out
5.1 INTRODUCTION
In this unit we will focus on the various methods and algorithms used in the
process of scaling. We will discuss various types of scaling, their usage and a
few examples. We will also discuss the importance of various techniques in
saving cost and man efforts by using the concepts of cloud scaling in highly
dynamic situations. The suitability of scaling techniques in different scenarios
is also discussed in detail.
5.2 OBJECTIVES
1
SCALING
After going through this unit you should be able to:
➔ describe scaling and its advantage;
1. Minimum cost: The user has to pay a minimum cost for access usage of
hardware after upscaling. The hardware cost for the same scale can be
much greater than the cost paid by the user. Also, the maintenance and
other overheads are also not included here. Further, as and when the
resources are not required, they may be returned to the Service provider
resulting in the cost saving.
2. Ease of use: The cloud upscaling and downscaling can be done in just a
few minutes (sometime dynamically) by using service providers
application interface.
2
RESOURCE PROVISIONING,
LOAD BALANCING AND
SECURITY
In the case of the clouds, virtual environments are utilized for resource
allocation. These virtual machines enable clouds to be elastic in nature which
can be configured according to the workload of the applications in real time. In
costs
Workload
Checkpoint|
Time
costs
Workload
Checkpoint|
Time
On the other hand, scaling saves cost of hardware setup for some small time
peaks or dips in load. In general most cloud service providers provide scaling
as a process for free and charge for the additional resource used. Scaling is also
a common service provided by almost all cloud platforms. Also need to
mention that user saves when usage of the resources declines by using scale
down.?
Let us now see what are the strategies for scaling, how one can achieve scaling
in a cloud environment and what are its types. In general, scaling is categorized
based on the decision taken for achieving scaling. The three main strategies for
scaling are discussed below.
Time of Day
4
RESOURCE PROVISIONING,
LOAD BALANCING AND
SECURITY
5.4.2 Reactive Scaling
The reactive scaling often monitors and enables smooth workload changes to
work easily with minimum cost. It empowers users to easily scale up or down
computing resources rapidly. In simple words, when the hardwares like CPU
or RAM or any other resource touches highest utilization, more of the
resources are added to the environment by the service providers. The auto
scaling works on the policies defined by the users/ resource managers for
traffic and scaling. One major concern with reactive scaling is a quick change
in load, i.e. user experiences lags when infrastructure is being scaled.
F
i
g
u
F r
Load
i e
g
u 1
r .
e
M
1 a
. n
Time of Day
u
M a
5.4.3 Combinational Scaling
a l
n
Till now we have seen uneed based
s and forecast based scaling techniques for
scaling. However, for better
a performance
c and low cool down period we can
also combine both of the l reactive
a and proactive scaling strategies where we
have some prior knowledge lof traffic. This helps us in scheduling timely
s
scaling strategies for expected iload. On the other hand, we also have provision
c
of load based scaling apart fromn the predicted load on the application. This
a
way both the problems of sudden g and expected traffic surges are addressed.
l
i i
Given below is the comparison between proactive and reactive scaling
n n
strategies. g
t
Parameters i r
Proactive Scaling Reactive Scaling
n a
Suitability For applications
d increasing For applications increasing loads in
loads tin expected/
i known unexpected/ unknown manner
mannerr t
a
Working User sets thei threshold but a User defined threshold values
d o
i n 5
t a
i l
o
SCALING
downtime is required. optimize the resources
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
In a cloud, auto scaling can be achieved using user defined policies, various
machine health checks and schedules. Various parameters such as Request
counts, CPU usage and latency are the key parameters for decision making in
autoscaling. A policy here refers to the instruction sets for clouds in case of a
particular scenario (for scaling -up or scaling -down). The autoscaling in the
cloud is done on the basis of following parameters.
6
RESOURCE PROVISIONING,
LOAD BALANCING AND
SECURITY
The process of auto scaling also requires some cooldown period for resuming
the services after a scaling takes place. No two concurrent scaling are triggered
so as to maintain integrity. The cooldown period allows the process of
autoscaling to get reflected in the system in a specified time interval and saves
any integrity issues in cloud environment.
Costs
Workload
Time
Consider a more specific scenario, when the resource requirement is high for
some time duration e.g. in holidays, weekends etc., a Scheduled scaling can
also be performed. Here the time and scale/ magnitude/ threshold of scaling
can be defined earlier to meet the specific requirements based on the previous
knowledge of traffic. The threshold level is also an important parameter in auto
scaling as a low value of threshold results in under utilization of the cloud
resources and a high level of threshold results in higher latency in the cloud.
After adding additional nodes in scale-up, the incoming requests per second
drops below the threshold. This results in triggering the alternate scale-up-
down processes known as a ping-pong effect. To avoid both underscaling and
overscaling issues load testing is recommended to meet the service level
agreements (SLAs). In addition, the scale-up process is required to satisfy the
following properties. Need to brief on SLA also?
1. The number of incoming requests per second per node > threshold of
scale down, after scale-up.
2. The number of incoming requests per second per node < threshold of
scale up, after scale-down
Here, in both the scenarios one should reduce the chances of ping-pong effect.
7
SCALING
Now we know what scaling is and how it affects the applications hosted on the
cloud. Let us now discuss how auto scaling can be performed in fixed amounts
as well as in percentage of the current capacity.
--------------------------------------------------------------------------------------------
Algorithm : 1
--------------------------------------------------------------------------------------------
Input : SLA specific application
Parameters:
N_min minimum number of nodes
D - scale down value.
U scale up value.
T_U scale up threshold
T_D scale down threshold
Let T (SLA) return the maximum incoming request per second (RPS) per node
for the specific SLA.
Let N_c and RPS_n represent the current number of nodes and incoming
requests per second per node respectively.
Repeat:
N_(c_old) ←N_c
N_c ← max(N_min, N_c - D)
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n< T_D or N_c = N_min
8
RESOURCE PROVISIONING,
LOAD BALANCING AND
Now, let us discuss how this algorithm works in detail. Let the values of a few SECURITY
4 0 450 112.5 4
1800
2 6 300
2510
2 8 313.75
3300
2 10 330.00
4120
2 12 343.33
5000
2 14 357.14
Similarly, in case of scaling down, let initially RPS = 8000 and N_c = 19. Now
RPS is reduced to 6200 and following it RPS_n reaches T_D, here an
autoscaling request is initiated deleting D = 2 nodes. Table - 2 lists all the
parameters as per the scale -down requirements.
18 8000 421.05 19
6200
2 17 364.7
4850
2 15 323.33
3500
9
SCALING
2 13 269.23
2650
2 11 240.90
1900
2 8 211.11
The given table shows the stepwise increase/ decrease in the cloud capacity
with respect to the change in load on the application(request per node per
second).
Percentage Scaling:
The below given algorithm is used to determine the scale up and down
thresholds for respective autoscaling.
-----------------------------------------------------------------------------------------------
Algorithm : 2
-----------------------------------------------------------------------------------------------
Input : SLA specific application
Parameters:
N_min - minimum number of nodes
D - scale down value.
U - scale up value.
T_U - scale up threshold
T_D - scale down threshold
Let T (SLA) returns the maximum requests per second (RPS) per node for
specific SLA.
Let N_c and RPS_n represent the current number of nodes and incoming
requests per second per node respectively.
10
RESOURCE PROVISIONING,
LOAD BALANCING AND
N_c ←N_c + max(1, N_c x U/100) SECURITY
Repeat:
N_(c_old) ←N_c
N_c ← max(N_min, N_c - max(1, N_c x D/ 100))
RPS_n ←RPS_n x N_(c_old) / N_c
Until RPS_n< T_D or N_c = N_min
Similarly in case of scaling down, initial RPS = 5000 and N_c = 19, here RPS
reduces to 4140 and RPS_n reaches T_D requesting scale down and hence
deleting 1 i.e. max(1, 1.8 x 8/100). The detailed example is explained using
Table -3 giving details of upscaling with D = 8, U = 1, N_min = 1, T_D = 230
and T_U = 290 .
6 0 500 83.33 6
1695
1 7 242.14
2190
1 8 273.75
2600
1 9 288.88
3430
1 10 343.00
3940
1 11 358.18
4420
1 12 368.33
11
SCALING
4960
1 13 381.53
5500
1 14 392.85
5950
1 15 396.6
The scaling down with the same algorithm is detailed in the table below.
19 5000 263.15 19
3920
1 18 217.77
3510
1 17 206.47
3200
1 16 200
2850
1 15 190
2600
1 14 185.71
2360
1 13 181.53
2060
1 12 171.66
1810
1 11 164.5
1500
150
12
RESOURCE PROVISIONING,
LOAD BALANCING AND
Here if we compare both the algorithms 1 and 2, it is clear that the values of SECURITY
the threshold U and D are at the higher side in case of 2. In this scenario the
utilization of hardware is more and the cloud experiences low footprints.
2) In Algorithm 1 for fixed amount auto scaling, calculate the values in table
if U = 3.
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
Let us now discuss the types of scaling, how we see the cloud infrastructure for
capacity enhancing/ reducing. In general we scale the cloud in a vertical or
horizontal way by either provisioning more resources or by installing more
resources.
The vertical scaling in the cloud refers to either scaling up i.e. enhancing the
computing resources or scaling down i.e. reducing/ cutting down computing
resources for an application. In vertical scaling, the actual number of VMs are
constant but the quantity of the resource allocated to each of them is increased/
decreased. Here no infrastructure is added and application code is also not
changed. The vertical scaling is limited to the capacity of the physical machine
or server running in the cloud. If one has to upgrade the hardware requirements
of an existing cloud environment, this can be achieved by minimum changes.
13
SCALING
B 4 CPUs
vertical scaling
A 2 CPUs
An IT resource (a virtual server with two CPUs) is scaled up by replacing it with a more
powerful IT resource with increased capacity for data storage (a physical server with four CPUs).
14
RESOURCE PROVISIONING,
LOAD BALANCING AND
SECURITY
Pooled
physical
servers
A A B A B C
horizontal scaling
An IT resource (Virtual Server A) is scaled out by adding more of the same IT resources (Virtual Servers B and C).
SUMMARY
In the end, we are now aware of various types of scaling, scaling strategies and
their use in real situations. Various cloud service providers like Amazon AWS,
Microsoft Azure and IT giants like Google offer scaling services on their
application based on the application requirements. These services offer good
help to the entrepreneurs who run small to medium businesses and seek IT
infrastructure support. We have also discussed various advantages of
cloudscaling for business applications.
SOLUTION/ANSWERS
Answers to CYPs 1.
3) Write differences between proactive and reactive scaling: The reactive scaling
technique only works for the actual variation of load on the application however, the
combination works for both expected and real traffic. A good estimate of load
increases performance of the combinational scaling.
Answers to CYPs 2.
1) Explain the concept of fixed amount auto scaling: The fixed amount scaling is a
simplistic approach for scaling in cloud environment. Here the resources are scaled
up/ down by a user defined number of nodes. In fixed amount scaling resource
utilization is not optimized. It can also happen that only a small node can solve the
resource crunch problem but the used defined numbers are very high leading to
underutilized resources. Therefore a percentage amount of scaling is a better
technique for optimal resource usage.
2) In Algorithm 1 for fixed amount auto scaling, calculate the values in table if U = 3:
For the given U = 3, following calculation are made.
4 0 450 112.5 4
1800
3 7 257.14
2510
3 10 251
3300
3 13 253.84
4120
3 16 257.50
16
RESOURCE PROVISIONING,
LOAD BALANCING AND
SECURITY
5000
3 19 263.15
3) What is a cool down period: When auto scaling takes place in cloud, a small time
interval (pause) prevents the triggering next auto scale event. This helps in
maintaining the integrity in the cloud environment for applications. Once the cool
down period is over, next auto scaling event can be accepted.
17