Predicting The End-to-End Tail Latency of Containerized Microservices in The Cloud
Predicting The End-to-End Tail Latency of Containerized Microservices in The Cloud
Abstract—Large-scale web services are increasingly adopting flowing through the microservice architecture, which could
cloud-native principles of application design to better utilize result in poor user experiences and loss of revenue [32, 46].
the advantages of cloud computing. This involves building Containerized microservices deployed in a public cloud
an application using many loosely coupled service-specific
components (microservices) that communicate via lightweight are scaled automatically based on user-specified static
APIs, and utilizing containerization technologies to deploy, thresholds for per-microservice resource utilization [1, 2, 6].
update, and scale these microservices quickly and indepen- However, this places a significant burden on application
dently. However, managing the end-to-end tail latency of owners who are concerned about the end-to-end tail la-
requests flowing through the microservices is challenging in tency (e.g 95th percentile latency) [28]. Setting appropriate
the absence of accurate performance models that can capture
the complex interplay of microservice workflows with cloud- resource utilization thresholds on various microservices to
induced performance variability and inter-service performance meet the end-to-end tail latency in such complex distributed
dependencies. In this paper, we present performance char- system is difficult and error-prone in the absence of accurate
acterization and modeling of containerized microservices in performance models.
the cloud. Our modeling approach aims at enabling cloud There are many challenges in modeling the end-to-end
platforms to combine resource usage metrics collected from
multiple layers of the cloud environment, and apply machine tail latency of containerized microservices. First, a mi-
learning techniques to predict the end-to-end tail latency of croservice architecture is characterized by complex request
microservice workflows. We implemented and evaluated our execution paths spanning many microservices forming a
modeling approach on NSF Cloud’s Chameleon testbed using directed acyclic graph (DAG) with complex interactions
KVM for virtualization, Docker Engine for containerization across the service topology [28, 29, 39]. Second, the tail
and Kubernetes for container orchestration. Experimental re-
sults with an open-source microservices benchmark, Sock Shop, latency is highly sensitive to any variance in the system
show that our modeling approach achieves high prediction which could be related to application, OS or hardware [32].
accuracy even in the presence of multi-tenant performance Third, in a cloud environment where microservices run as
interference. containers hosted on a cluster of virtual machines (VMs),
Keywords-microservices; containers; cloud computing; per- application performance can degrade often in unpredictable
formance modeling; ways [18, 21, 24, 44].
Traditionally, analytical models based on queuing theory
I. I NTRODUCTION have been widely applied for performance prediction and
resource provisioning of monolithic (3-tier) applications [40,
Large-scale web services (e.g Netflix, Microsoft Bing, 41]. However, such techniques can become intractable when
Uber, Spotify etc.) are increasingly adopting cloud-native dealing with the scale and complexity of microservice ar-
principles and design patterns such as microservices and chitecture, and the presence of cloud-induced performance
containers to better utilize the advantages of the cloud variability. Furthermore, analytical modeling is a white-box
computing delivery model, which includes greater agility in approach that often requires intrusive instrumentation of ap-
software deployment, automated scalability, and portability plication code for workload profiling and expert knowledge
across cloud environments [24, 30]. In a micro-services about the application structure and data flow between various
architecture, an application is built using a combination of components [25]. Such approach can be impractical from
loosely coupled and service-specific software containers that a cloud provider’s perspective since customer applications
communicate using APIs, instead of using a single, tightly appear with limited visibility to the cloud providers.
coupled monolith of code. This development methodol- There are black-box modeling approaches that relate
ogy combined with recent advancements in containerization observable resource usage metrics [36, 42] or resource
technologies makes an application easier to enhance, main- allocation metrics [43] with the performance of monolithic
tain, and scale. However, it is challenging to manage the end- applications hosted in virtualized computing environments.
to-end tail latency (e.g 95th percentile latency) of requests More recent studies [19, 26] focused on runtime trace anal-
201
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
Urgaonkar et al. [41] designed a dynamic server provision-
ing technique on multi-tier server clusters. The technique
decomposes the per-tier average delay targets to be certain
percentages of the end-to-end delay constraint. Singh et
al. [38] applied k-means clustering algorithm and a G/G/1
queuing model to predict the server capacity for a given
workload mix. Although these approaches were effective
for multi-tier monolithic applications, they can become in-
tractable when dealing with complex microservice architec- Figure 2: Workflow DAGs.
ture in a cloud environment. The complexity introduced by
having many moving parts with complex interactions and the
presence of cloud-induced performance variability [21, 44] IV. P LATFORM
pose significant challenges in modeling the system behavior, A. Experimental Testbed
identifying critical resource bottlenecks and managing them
We set up a cloud prototype testbed, which closely resem-
effectively.
bles real-world cloud platforms such as Google Kubernetes
Blackbox modeling techniques have been widely adopted
Engine [6] and Amazon Elastic Container Services [2]. Our
in cluster resource allocation and management [31, 36,
testbed consists of a physical layer of bare metal servers, a
42, 43]. Nguyen et al.[36] applied online profiling and
VM layer built on top of the physical layer and a container
polynomial curve fitting to provide a black-box performance
layer built on top of VM layer.
model of the applications SLO violation rate for a given re-
Physical Servers. We used four bare metal servers leased
source pressure. Wajahat et al. [42] presented an application-
on NSF Chameleon Cloud[3] testbed. Each server was
agnostic, neural network based auto-scaler for minimizing
equipped with dual socket Intel Xeon E5-2670 v3 Haswell
SLA violations of diverse applications. Wang et al. [43]
processors (each with 12 cores @ 2.3GHz) and 128 GiB of
applied fuzzy model predictive control and Lama et al. [31]
RAM. Each server was connected to a Dell switch at 10Gbp,
proposed self-adaptive neural fuzzy control techniques for
with 40Gbps of bandwidth to the core network from each
dynamic resource management of monolithic cloud applica-
switch.
tions. However, these studies do not address the modeling
VMs. We setup 16 VMs on top of the bare metal servers
inaccuracies caused by the performance interference in the
by using KVM for server virtualization. Each VM was
cloud, and the complexity introduced by microservice archi-
configured with four vCPUs, 8GB Ram and 30GB disk
tecture.
space.
A few studies have focused on managing the end-to-end
Containers. We setup a 16 VM Kubernetes cluster for
performance objectives of large-scale web services and ana-
container orchestration and management. Docker (version
lyzing their complex performance behavior [27, 28, 39]. Guo
18.03.1-ce) was used as the container run time engine on
et al. [27] highlighted how the complex interactions between
each VM. Kubernetes pod networking was set up using
various components of large-scale web services not only
the Calico CNI (Container Network Interface) network plu-
lead to sharp degradation in performance, but also trigger
gin [11]. We use the term pod and container interchangeably
cascading behaviors that result in wide-spread application
in this paper, since we use a one-container-per-Pod model,
outages. Jalaparti et al. [28] presented Kwiken, a framework
which is the most common Kubernetes use case.
that decomposes the problem of minimizing latency over
a general processing DAG in a large web service into a B. Workloads
manageable optimization over individual stages. Suresh et
For performance characterization, we used Sock
al. [28] presented Wisp, a resource management frame-
Shop [14], an open-source microservices benchmark that
work that applies a combination of techniques, including
is particularly tailored for container platforms. Sock Shop
estimating local workload models based on measurements
emulates an e-commerce website as shown in Figure 1 with
of immediate neighborhoods, distributed rate control and
the specific aim of aiding the demonstration and testing
metadata propagation to achieve end-to-end throughput and
of existing microservice and cloud-native technologies.
latency objectives in Service-Oriented architectures. These
A recent study suggests that Sock shop closely reflects
approaches are complimentary to our work as they focus on
how typical microservices applications are currently being
solutions that need to be adopted at the application layer in
developed and delivered into production, as reported
the context of cloud computing stack, and requires expert
by practitioners and industry experts [17]. We used the
knowledge about the application. On the other hand, our
Locust tool [9] to generate user traffic for the Sock
performance modeling approach does not require intrusive
Shop benchmark. The workload traffic is composed of a
instrumentation of application code for profiling or expert
number of concurrent clients that generate HTTP-based
knowledge about the data flow between various components.
REST API calls to Sock Shop. To create a controlled
202
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
RUGHUVBZRUIORZ RUGHUVBZRUIORZ RUGHUVBZRUIORZ
WKSHUFHQWLOHODWHQF\ PV
WKSHUFHQWLOHODWHQF\ PV
WKSHUFHQWLOHODWHQF\ PV
FDUWBZRUIORZ FDUWBZRUIORZ FDUWBZRUIORZ
&38XWLOL]DWLRQ &38XWLOL]DWLRQ &38XWLOL]DWLRQ
(a) CPU utilization of orders microservice. (b) CPU utilization of cart microservice. (c) CPU utilization of frontend microservice.
&38XWLOL]DWLRQ
&38XWLOL]DWLRQ
@ @ @
FDUW IURQWHQG RUGHU XVHU FDUW IURQWHQG RUGHU XVHU FDUW IURQWHQG RUGHU XVHU
(a) without interference. (b) with interference on cart. (c) with interference on frontend.
Figure 4: Parallel coordinates plot showing the impact of performance interference on the multivariate relationship between
CPU utilization and end-to-end tail latency of orders workflow.
interference workload for our experiments, we used the APIs invoked at each encountered microservice, the supplied
STREAM Memory Bandwidth benchmark[33]. STREAM is arguments, the content of caches, as well as the use of
a synthetic benchmark program geared towards measuring load balancing along the service graph [39]. We used a
memory bandwidth (in MB/s) corresponding to computation visualization and monitoring tool, weavescope [16], to map
rate for simple vector kernels. We run the benchmark inside the DAG structure of orders and cart workflows as shown
a docker container and deploy it as a batch job in in Figure 2.
Kubernetes.
A. End-to-end Tail Latency
V. P ERFORMANCE C HARACTERIZATION
First, we analyze the impact of CPU utilization of in-
One of the challenges that complicate performance char- dividual microservices on the end-to-end tail latency of
acterization of a microservice architecture is that request two different workflows viz. orders and cart in the Sock
execution workflows can form directed acyclic graph (DAG) Shop benchmark. For this purpose, we run experiments
structures spanning across many microservices. As a re- with various workload intensities by varying the number
sult, the end-to-end latency of a workflow is impacted by of concurrent clients in the workload generator from 5 to
the performance behavior of multiple microservices in a 50, while setting the total number of generated requests to
complex way. We use the term workflow to represent an be 50000. We also vary the number of pods allocated to
application-specific group of requests that are associated cart, orders and frontend microservices to include various
with a particular API endpoint, which is usually in the combination of scaling configurations. The CPU utilization
form of an HTTP URI. For instance, in case of the Sock of a particular microservice is measured as the average CPU
Shop benchmark shown in Figure 1, the HTTP URIs for utilization of all the pods allocated to that microservice.
workflows involved with processing orders are [ base url: As shown in Figures 3 (a), (b) and (c) the end-to-end tail
/ GET / Orders] and [ base url: / POST / Orders]. The latency of various workflows have a non-linear relationship
exact structure of the DAG for request workflows is often with the CPU utilization of individual microservices. We
unknown, since it depends on multiple factors such as the observe that the 95th percentile latency of the two workflows
203
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
WKSHUFHQWLOHFDUWZRUNIORZODWHQF\ PV WKSHUFHQWLOHFDUWZRUNIORZODWHQF\ PV WKSHUFHQWLOHFDUWZRUNIORZODWHQF\ PV
@ @ @
@ @ @
&38XWLOL]DWLRQ
&38XWLOL]DWLRQ
&38XWLOL]DWLRQ
@ @ @
@
FDUW IURQWHQG RUGHU XVHU FDUW IURQWHQG RUGHU XVHU FDUW IURQWHQG RUGHU XVHU
(a) without interference. (b) with interference on cart. (c) with interference on frontend.
Figure 5: Parallel coordinates plot showing the impact of performance interference on multivariate relationship between CPU
utilization and end-to-end tail latency of cart workflow.
RUGHUVZRUNIORZ FDUWZRUNIORZ
tail latency of the orders workflow is influenced by the
CPU utilization of multiple microservices. However, their
multivariate relationship changes significantly depending on
the performance interference experienced by various mi-
croservices. For example, in the case of no interference,
the 95th percentile latency of orders workflow is greater
UW
UIH G
UW
G
H
HQ
QF
QF
FD
FD
LQ QWH
UH
UH
Q
Q
QW
IUR
R
IUR
UIH
WUI
WUI
Q
WH
Q
WH
LQ
LQ
R
R
LQ
WUI
R
R
LQ
LQ
Z
204
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
combines the resource usage metrics at the container/pod
level with VM level resource usage and hardware perfor-
0HDQ$EVROXWH3HUFHQWDJH(UURU
mance counter values to construct machine learning (ML) 3RGB&38
based performance models for individual workflows. Our 3RGB&3890B&3,
3RGB&3890B&38
modeling approach does not rely on any expert application
A. Data Collection
/5 695 '7 5) 11
0/PRGHOV
In this paper, we use CPU utilization as a resource metric
for the microservices since CPU is a major resource bottle- (a) Mean absolute percentage error.
neck in most web applications. We use docker stats [4] to
3RGB&38
measure pod level CPU utilization. To capture the impact of 3RGB&3890B&3,
performance interference due to the contention of processor 3RGB&3890B&38
56FRUH
resources, such as the last level cache (LLC) and memory
0HDQ$EVROXWH3HUFHQWDJH(UURU
the microservices. For each experiment, we measure the
end-to-end tail latency of various workflows as reported by 3RGB&38
3RGB&3890B&3,
the Locust [9] tool. The collected data is used to train our 3RGB&3890B&38
machine learning based performance models.
models are built and trained by using scikit-learn [12], a
machine learning library in Python.
Feature Selection. The input features of our ML mod-
205
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
3UHGLFWHGWDLOODWHQF\ PV
3UHGLFWHGWDLOODWHQF\ PV
3UHGLFWHGWDLOODWHQF\ PV
3UHGLFWHGWDLOODWHQF\ PV
0HDVXUHGWDLOODWHQF\ PV 0HDVXUHGWDLOODWHQF\ PV 0HDVXUHGWDLOODWHQF\ PV 0HDVXUHGWDLOODWHQF\ PV
(a) Linear regression with Pod CPU.(b) Linear regression with Pod CPU (c) Neural network with Pod CPU. (d) Neural network with Pod CPU
and VM CPI. and VM CPI.
Table I: Optimal number of neurons in the three hidden predictions approximate the real data points. An R2 of 1
layers of NN models for orders and cart workflow. indicates that the regression predictions perfectly fit the data.
Workflow Figures 7 (a) and (b) show that, compared to the
orders cart
Input Feature Pod CPU based modeling approach, Pod CPU+VM CPU
Pod CPU (6,3,5) (8,5,6)
Pod CPU+VM CPU (4,6,3) (3,6,8)
and Pod CPU+VM CPI approaches achieve significant im-
Pod CPU+VM CPI (9,6,4) (5,7,5) provement in the prediction accuracy of each ML model
for the orders workflow. This is because VM-level CPU
utilization can capture inter-pod CPU contention within a
the pod-level CPU utilization of the microservices including VM. Furthermore, VM-level CPI metric can capture the
front-end, orders, cart, cart-db, and the CPU utilization or contention of shared processor resources between multiple
CPI of the VMs that host these microservices. pods within a VM as well as across VMs. Such inter-
Hyper-parameters. The hyper-parameters of each model VM resource contention may arise when the concerned
is set to the default values provided by scikit-learn. We VMs are colocated in the same physical machine. The
observe that the prediction accuracy of the deep NN model improvement in the prediction accuracy in terms of MAPE
is highly sensitive to the number of hidden layers and the due to Pod CPU+VM CPU and Pod CPU+VM CPI ap-
size (number of neurons) in each hidden layer. Hence, we proaches are up to 36% and 38% respectively. The largest
tuned these parameters through an exhaustive search for improvement is observed in case of the NN model. We also
various combinations of input feature space and the targeted observe that the NN model outperforms all other models in
workflow for the prediction of end-to-end tail latency. The prediction accuracy since the Neural Network is a universal
optimal number of hidden layers for our NN model is three, function approximator. On the other hand, the LR model
and the optimal number of neurons in these three hidden shows the worst prediction accuracy. This is because a linear
layers is summarized in Table I. regression model can not capture the non-linearity of tail
latency. Overall, we observed similar results in the latency
C. Prediction Accuracy prediction of cart workflow as shown in Figure 8.
In this section, we evaluate the prediction accuracy of Figure 9 plots the cross-validated predictions vs. the
various ML models (LR, SVR, DT, RF, NN) and three measured values of end-to-end tail latency of the orders
modeling approaches. First, the Pod CPU approach includes workflow in order to graphically illustrate the different R2
pod-level CPU utilization metrics in the input feature space. values for the LR and NN models. Theoretically, if a model
Second, the Pod CPU+VM CPU approach includes both could explain 100% of the variance in the observed data, the
pod-level and VM-level CPU utilization metrics. Third, the predicted values would always equal the measured values
Pod CPU+VM CPI approach includes pod-level CPU uti- and, therefore, all the data points would fall on the fitted
lization and VM-level CPI metrics in the input feature space. regression line. The more variance that is accounted for
The models are evaluated with 10-fold cross validation on by the regression model the closer the data points will
the collected dataset. As a result, 90% of data is used for fall to the fitted regression line. The proportion of variance
training, 10% of data is used for testing in each of the accounted for by the LR model with Pod CPU , LR model
10 iterations of cross-validation. We utilize commonly used with Pod CPU+VM CPI, NN model with Pod CPU and
metrics such as the mean absolute percentage error (MAPE) NN model with Pod CPU+VM CPI approaches are 42%,
of determination, R2 . MAPE is calculated
n y−ŷ
and the coefficient 66%, 71% and 89% respectively.
1
as n i=1 y where y and ŷ are the measured and VII. O PTIMIZATION FOR R ESOURCE S CALING
predicted values of the end-to-end tail latency respectively. Although existing cloud platforms [1, 2, 5, 6] provide
R2 is a statistical measure of how well the regression mechanisms for auto-scaling microservices, they expect ap-
206
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
Table II: Notation used in Resource Scaling Optimization We formulate the optimization problem as follows:
Problem
max xi (2)
Symbol Description i∈Sj
Sj Set of microservices relevant to workflow j
SLOjtarget Tail latency target of workflow j s.t. rj (x) ≤ SLOjtarget (3)
xi Average pod-level CPU utilization in microservice i
x A vector of average pod-level CPU utilizations of various
x = (xi )i∈Sj (4)
microservices relevant to the target workflow
rj (x) Predicted tail latency of workflow j where, the symbol notations are described in Table II. The
objective function in Equation 2 aims to maximize the pod-
level resource usage i.e the sum of average CPU utilization
in the set of microservices that are relevant to the target
workflow. The relevance of a microservice to a workflow
plication owners to specify thresholds for various microser- can be determined either by analyzing the workflow DAG,
vice load metrics to enable auto-scaling features. For exam- or through machine learning based feature selection as
ple, the auto-scaling feature [7] in Kubernetes determines described in Section VI-B. Consider that rj (x) is the tail
the allocation of containers/pods to a microservice by using latency predicted by machine learning model for workflow
the formula:
j. The inequality constraint in Equation 3 ensures that
the SLO target of workflow j will not be violated. The
optimization problem is nonlinear since the workflow tail
currentM etricV alue latency rj (x) included in the constraint Equation 3 has a
desiredReplicas = currentReplicas∗
desiredM etricV alue nonlinear relationship with the average CPU utilization of
(1)
various microservices.
In the formulation of the optimization problem,
If the desiredMetricValue (threshold) is specified as an application-layer metrics (e.g number of concurrent clients),
average CPU utilization of 50% for a particular microser- VM-level CPU utilization and CPI metrics are not included
vice, and the current average CPU utilization is 100%, then as variables, although the tail latency prediction rj (x) de-
the number of pods allocated to that microservice will be pends on these metrics as well. Instead, the values of these
doubled. Furthermore, any scaling is performed only if the metrics are fixed according to their observed values at the
ratio of currentMetricValue and desiredMetricValue drops time of solving the optimization problem, and are treated
below 0.9 or increases above 1.1 (10% tolerance by default). as constants for that instance of optimization. As a result,
It is challenging and burdensome for application owners the solutions to the optimization problem will only include
to determine the resource utilization thresholds for various pod-level CPU utilization values, which can be directly used
microservices in order to meet the application’s end-to-end as thresholds for making resource scaling decisions. This
performance target. Setting inappropriate thresholds may allows the resource scaling mechanism to be practical and
lead to overprovisioning or underprovisioning of resources. simple to implement.
We propose that cloud platforms should automatically deter- B) Solution. We apply a non-linear optimization tech-
mine these thresholds based on user-provided performance nique, trust-region interior point method [13, 20], to solve
SLO targets. For this purpose, we study the feasibility of this problem. This optimization technique provides two main
utilizing the proposed performance models in making effi- benefits. First, it is efficient for large scale problems. Second,
cient resource scaling decisions by formulating a constrained the gradient of the constraint function which is required for
nonlinear optimization problem. optimization, can be approximated through finite difference
methods in this optimization technique [13]. This property
A) Problem Formulation. Consider that the performance
is desirable since the machine learning models for workflow
SLO target in terms of the end-to-end tail latency for a work-
tail latency are blackbox functions, whose gradient can not
flow is specified. For a given workload condition, we aim
be directly calculated.
to find the highest resource utilization values of the relevant
microservices, at which the given SLO targets will not be C) Feasibility Study. As a case study, we apply the
violated. These optimal utilization values can be calculated optimization technique to calculate the desired CPU utiliza-
periodically and set as the thresholds (desiredMetricValue) tion (thresholds) for various relevant microservices, when a
for making resource scaling decisions. These thresholds will workload of 30 concurrent clients is applied to the SockShop
help in determining which microservices should be scaled, benchmark, and a performance SLO target of 240 ms is spec-
and how many pods should be allocated to each microservice ified for the 95th percentile latency of orders workflow. For
based on Equation 1. This approach aims to avoid resource this optimization, we utilize our Neural Network model for
overprovisioning while providing performance guarantee to orders workflow with pod-level CPU utilization, VM-level
the given workflow. CPI metrics and the number of concurrent clients as the input
207
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
envision that our performance modeling and resource scaling
GHVLUHG
&38XWLOL]DWLRQ
PHDVXUHG
optimization approach can enable cloud platforms to au-
tomatically scale microservice-based applications based on
user-provided performance SLO targets. This will remove
the burden of determining resource utilization thresholds
for numerous microservices from the cloud users, which
W
UW
UV
VK G
QJ
HU
E
is prevalent in existing cloud platforms. In future, we will
HQ
HQ
BG
FD
GH
XV
SL
\P
QW
HU
LS
RU
IUR
XV
SD
PLFURVHUYLFH
extend our work to include diverse microservice-based ap-
plications with different resource bottlenecks. We will also
(a) Current vs desired average CPU utilization evaluate the effectiveness of the proposed resource scaling
of various microservices. Here, one pod is
allocated to each microservice. system in the face of dynamic workloads.
WKSHUFHQWLOHODWHQF\ PV
6/2WDUJHW
PHDVXUHGODWHQF\ ACKNOWLEDGMENT
Results presented in this paper were obtained using the
Chameleon testbed supported by the National Science Foun-
dation. The research is partially supported by NSF CREST
Grant HRD-1736209. We thank the anonymous reviewers
for their many suggestions for improving this paper. In
FRQILJXUDWLRQ FDUWRUGHUVIURQWHQG particular we thank our shepherd, Prof. Maarten van Steen.
(b) Tail latency of orders workflow for various
resource scaling configurations. The configu-
R EFERENCES
ration suggested by the optimization of CPU
utilization thresholds is (1,1,2) i.e one pod for
[1] Amazon elastic container service. https://aws.amazon.
cart, one pod for orders and two pods for fron- com/ecs/.
tend. All other microservices are provisioned [2] Amazon elastic container service for kubernetes. https:
with one pod.
//aws.amazon.com/eks/.
Figure 10: Optimization of CPU utilization thresholds for [3] Chameleon: A configurable experimental environ-
efficient resource scaling with a workload of 30 concurrent ment for large-scale cloud research. https://www.
clients, and SLO target 240 ms for 95th percentile latency chameleoncloud.org.
of orders workflow. [4] Docker stats. https://docs.docker.com/engine/reference/
commandline/stats/.
[5] Google app engine flexible environment. https://cloud.
features. Figure 10 (a) compares the current (measured) CPU google.com/appengine/docs/flexible/.
utilization of the microservices relevant to orders workflow [6] Google Kubernetes engine. https://cloud.google.com/
and their desired CPU utilization values, when only one pod kubernetes-engine/.
is allocated to each microservice. Based on Equation 1, the [7] Kubernetes horizontal autoscaling. https://kubernetes.
optimal resource scaling option is to allocate an additional io/docs/tasks/run-application/horizontal-pod-autoscale/
pod to the frontend microservice. As shown in Figure 10 (b), #algorithm-details.
we validate the optimality of this resource scaling option by [8] Kubernetes: Production-grade container orchestration.
comparing the tail latency of orders workflow for various https://kubernetes.io/.
possible resource scaling configurations. We observe that the [9] Locust: An open source load testing tool. https://locust.
resource scaling configuration suggested by our optimization io.
technique is able to meet the performance SLO target while [10] Microservices: an application revolution powered
allocating minimum number of pods in total. by the cloud. https://azure.microsoft.com/en-us/blog/
microservices-an-application-revolution-powered-by-the-cloud/.
VIII. C ONCLUSIONS A ND F UTURE W ORK [11] Project calico. https://www.projectcalico.org/.
We present the performance characterization and model- [12] Scikit-learn: Machine learning in python. http://
ing of containerized microservices in the cloud. Our mod- scikit-learn.org/stable/.
eling approach utilizes machine learning and multi-layer [13] Scipy optimization library. https://docs.scipy.org/doc/
data collected from the cloud environment to predict the scipy/reference/generated/scipy.optimize.minimize.
end-to-end tail latency of microservice workflows even in html.
the presence cloud induced performance interference. We [14] Sockshop microservice demo application. https://
also demonstrate the feasibility of utilizing the proposed microservices-demo.github.io.
models in making efficient resource scaling decisions. We [15] virt-top. https://linux.die.net/man/1/virt-top.
208
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
[16] Weave scope. https://www.weave.works/docs/scope/ resource provisioning for multi-service web applica-
latest/introducing/. tions. In Proceedings of the 19th ACM International
[17] C. M. Aderaldo, N. C. Mendona, C. Pahl, and Conference on World wide web (WWW), 2010.
P. Jamshidi. Benchmark requirements for microser- [30] G. Kakivaya, L. Xun, R. Hasha, S. B. Ahsan,
vices architecture research. In IEEE/ACM 1st Inter- T. Pfleiger, R. Sinha, A. Gupta, M. Tarta, M. Fussell,
national Workshop on Establishing the Community- V. Modi, M. Mohsin, R. Kong, A. Ahuja, O. Platon,
Wide Infrastructure for Architecture-Based Software A. Wun, M. Snider, C. Daniel, D. Mastrian, Y. Li,
Engineering (ECASE), 2017. A. Rao, V. Kidambi, R. Wang, A. Ram, S. Shiv-
[18] A. Balalaie, A. Heydarnoori, and P. Jamshidi. Mi- aprakash, R. Nair, A. Warwick, B. S. Narasimman,
croservices architecture enables devops: Migration to a M. Lin, J. Chen, A. B. Mhatre, P. Subbarayalu,
cloud-native architecture. IEEE Software, 33(3), 2016. M. Coskun, and I. Gupta. Service fabric: A distributed
[19] S. Barakat. Monitoring and analysis of microservices platform for building microservices in the cloud. In
performance. Journal of Computer Science and Control Proceedings of the Thirteenth EuroSys Conference,
Systems, 10:19–22, 05 2017. 2018.
[20] R. H. Byrd, M. E. Hribar, and J. Nocedal. An interior [31] P. Lama and X. Zhou. Autonomic provisioning with
point algorithm for large-scale nonlinear programming. self-adaptive neural fuzzy control for percentile-based
SIAM J. on Optimization, 9(4):877–900, Apr. 1999. delay guarantee. ACM Transactions on Autonomous
[21] X. Chen, L. Rupprecht, R. Osman, P. Pietzuch, F. Fran- and Adaptive Systems, 31 pages, under 2nd reviewing
ciosi, and W. Knottenbelt. Cloudscope: Diagnosing after revision, 2011.
and managing performance interference in multi-tenant [32] J. Li, N. K. Sharma, D. R. K. Ports, and S. D. Gribble.
clouds. In 2015 IEEE 23rd International Symposium Tales of the tail: Hardware, os, and application-level
on Modeling, Analysis, and Simulation of Computer sources of tail latency. In Proceedings of the ACM
and Telecommunication Systems (MASCOTS), 2015. Symposium on Cloud Computing (SoCC), 2014.
[22] N. Dragoni, S. Giallorenzo, A. L. Lafuente, M. Maz- [33] J. D. McCalpin. Memory bandwidth and machine
zara, F. Montesi, R. Mustafin, and L. Safina. Mi- balance in current high performance computers. IEEE
croservices: yesterday, today, and tomorrow. In Present computer society technical committee on computer
and Ulterior Software Engineering, pages 195–216. architecture (TCCA) newsletter, 2(19–25), 1995.
Springer, 2017. [34] N. Meinshausen and P. Bhlmann. Stability selection.
[23] S. Eranian. perfmon2: the hardware-based perfor- Journal of the Royal Statistical Society: Series B (Sta-
mance monitoring interface for linux. http://perfmon2. tistical Methodology), 72(4):417 – 473, 8 2010.
sourceforge.net/. [35] D. Merkel. Docker: lightweight linux containers for
[24] M. Fazio, A. Celesti, R. Ranjan, C. Liu, L. Chen, and consistent development and deployment. Linux Jour-
M. Villari. Open issues in scheduling microservices in nal, 2014(239):2, 2014.
the cloud. IEEE Cloud Computing, 3(5):81–88, 2016. [36] H. Nguyen, Z. Shen, X. Gu, S. Subbiah, and
[25] I. Giannakopoulos, D. Tsoumakos, and N. Koziris. J. Wilkes. AGILE: Elastic distributed resource scaling
Towards an adaptive, fully automated performance for infrastructure-as-a-service. In Proceedings of the
modeling methodology for cloud applications. In 10th International Conference on Autonomic Comput-
IEEE International Conference on Cloud Engineering ing (ICAC), 2013.
(IC2E), 2018. [37] J. Rao and C.-Z. Xu. Online capacity identification
[26] M. Gribaudo, M. Iacono, and D. Manini. Performance of multi-tier Websites using hardware performance
evaluation of massively distributed microservices based counters. IEEE Trans. on Parallel and Distributed
applications. In European Council for Modelling and Systems, 2009.
Simulation (ECMS), 2017. [38] R. Singh, U. Sharma, E. Cecchet, and P. Shenoy.
[27] Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Autonomic mix-aware provisioning for non-stationary
Y. Luo, T. Bergan, M. Musuvathi, Z. Zhang, and data center workloads. In Proc. IEEE Int’l Conf. on
L. Zhou. Failure recovery: When the cure is worse than Autonomic Computing (ICAC), pages 21–30, 2010.
the disease. In Presented as part of the 14th Workshop [39] L. Suresh, P. Bodik, I. Menache, M. Canini, and
on Hot Topics in Operating Systems, Santa Ana Pueblo, F. Ciucu. Distributed resource management across
NM, 2013. USENIX. process boundaries. In Proceedings of the 2017 Sym-
[28] V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Ry- posium on Cloud Computing-SoCC’17. ACM Press,
balkin, and C. Yan. Speeding up distributed request- 2017.
response workflows. In Proceedings of the ACM [40] B. Urgaonkar, G. Pacifici, P. Shenoy, M. Spreitzer,
SIGCOMM 2013 Conference on SIGCOMM, 2013. and A. Tantawi. An analytical model for multi-tier
[29] D. Jiang, G. Pierre, and C.-H. Chi. Autonomous internet services and its applications. In Proceedings
209
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
of the ACM SIGMETRICS International Conference
on Measurement and Modeling of Computer Systems,
2005.
[41] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, and
T. Wood. Agile dynamic provisioning of multi-tier
internet applications. ACM Trans. Auton. Adapt. Syst.,
3(1), Mar. 2008.
[42] M. Wajahat, A. Gandhi, A. Karve, and A. Kochut.
Using machine learning for black-box autoscaling.
In 2016 Seventh International Green and Sustainable
Computing Conference (IGSC), 2016.
[43] L. Wang, J. Xu, H. A. Duran-Limon, and M. Zhao.
Qos-driven cloud resource management through fuzzy
model predictive control. In IEEE International Con-
ference on Autonomic Computing (ICAC), 2015.
[44] Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail:
Avoiding long tails in the cloud. In Presented as part
of the 10th USENIX Symposium on Networked Systems
Design and Implementation (NSDI 13), 2013.
[45] Q. Zhang, L. Cherkasova, and E. Smirni. A regression-
based analytic model for dynamic resource provision-
ing of multi-tier Internet applications. In Proc. IEEE
Int’l Conference on Autonomic Computing (ICAC),
2007.
[46] Y. Zhang, D. Meisner, J. Mars, and L. Tang. Treadmill:
Attributing the source of tail latency through precise
load testing and statistical inference. In ACM/IEEE
43rd Annual International Symposium on Computer
Architecture (ISCA), 2016.
210
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.