0% found this document useful (0 votes)
58 views11 pages

Predicting The End-to-End Tail Latency of Containerized Microservices in The Cloud

This document summarizes a research paper that proposes a modeling approach to predict the end-to-end tail latency of requests flowing through microservices deployed in containers in the cloud. The modeling approach uses machine learning on multi-layer data including metrics from containers, VMs and hardware to account for cloud performance variability and interference. The approach is evaluated using an open-source e-commerce application on a cloud testbed and is shown to achieve high prediction accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views11 pages

Predicting The End-to-End Tail Latency of Containerized Microservices in The Cloud

This document summarizes a research paper that proposes a modeling approach to predict the end-to-end tail latency of requests flowing through microservices deployed in containers in the cloud. The modeling approach uses machine learning on multi-layer data including metrics from containers, VMs and hardware to account for cloud performance variability and interference. The approach is evaluated using an open-source e-commerce application on a cloud testbed and is shown to achieve high prediction accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2019 IEEE International Conference on Cloud Engineering (IC2E)

Predicting the End-to-End Tail Latency of Containerized Microservices in the Cloud

Joy Rahman Palden Lama


Dept. of Computer Science Dept. of Computer Science
University of Texas at San Antonio University of Texas at San Antonio
San Antonio, Texas-78249 San Antonio, Texas-78249
Email: [email protected] Email: [email protected]

Abstract—Large-scale web services are increasingly adopting flowing through the microservice architecture, which could
cloud-native principles of application design to better utilize result in poor user experiences and loss of revenue [32, 46].
the advantages of cloud computing. This involves building Containerized microservices deployed in a public cloud
an application using many loosely coupled service-specific
components (microservices) that communicate via lightweight are scaled automatically based on user-specified static
APIs, and utilizing containerization technologies to deploy, thresholds for per-microservice resource utilization [1, 2, 6].
update, and scale these microservices quickly and indepen- However, this places a significant burden on application
dently. However, managing the end-to-end tail latency of owners who are concerned about the end-to-end tail la-
requests flowing through the microservices is challenging in tency (e.g 95th percentile latency) [28]. Setting appropriate
the absence of accurate performance models that can capture
the complex interplay of microservice workflows with cloud- resource utilization thresholds on various microservices to
induced performance variability and inter-service performance meet the end-to-end tail latency in such complex distributed
dependencies. In this paper, we present performance char- system is difficult and error-prone in the absence of accurate
acterization and modeling of containerized microservices in performance models.
the cloud. Our modeling approach aims at enabling cloud There are many challenges in modeling the end-to-end
platforms to combine resource usage metrics collected from
multiple layers of the cloud environment, and apply machine tail latency of containerized microservices. First, a mi-
learning techniques to predict the end-to-end tail latency of croservice architecture is characterized by complex request
microservice workflows. We implemented and evaluated our execution paths spanning many microservices forming a
modeling approach on NSF Cloud’s Chameleon testbed using directed acyclic graph (DAG) with complex interactions
KVM for virtualization, Docker Engine for containerization across the service topology [28, 29, 39]. Second, the tail
and Kubernetes for container orchestration. Experimental re-
sults with an open-source microservices benchmark, Sock Shop, latency is highly sensitive to any variance in the system
show that our modeling approach achieves high prediction which could be related to application, OS or hardware [32].
accuracy even in the presence of multi-tenant performance Third, in a cloud environment where microservices run as
interference. containers hosted on a cluster of virtual machines (VMs),
Keywords-microservices; containers; cloud computing; per- application performance can degrade often in unpredictable
formance modeling; ways [18, 21, 24, 44].
Traditionally, analytical models based on queuing theory
I. I NTRODUCTION have been widely applied for performance prediction and
resource provisioning of monolithic (3-tier) applications [40,
Large-scale web services (e.g Netflix, Microsoft Bing, 41]. However, such techniques can become intractable when
Uber, Spotify etc.) are increasingly adopting cloud-native dealing with the scale and complexity of microservice ar-
principles and design patterns such as microservices and chitecture, and the presence of cloud-induced performance
containers to better utilize the advantages of the cloud variability. Furthermore, analytical modeling is a white-box
computing delivery model, which includes greater agility in approach that often requires intrusive instrumentation of ap-
software deployment, automated scalability, and portability plication code for workload profiling and expert knowledge
across cloud environments [24, 30]. In a micro-services about the application structure and data flow between various
architecture, an application is built using a combination of components [25]. Such approach can be impractical from
loosely coupled and service-specific software containers that a cloud provider’s perspective since customer applications
communicate using APIs, instead of using a single, tightly appear with limited visibility to the cloud providers.
coupled monolith of code. This development methodol- There are black-box modeling approaches that relate
ogy combined with recent advancements in containerization observable resource usage metrics [36, 42] or resource
technologies makes an application easier to enhance, main- allocation metrics [43] with the performance of monolithic
tain, and scale. However, it is challenging to manage the end- applications hosted in virtualized computing environments.
to-end tail latency (e.g 95th percentile latency) of requests More recent studies [19, 26] focused on runtime trace anal-

978-1-7281-0218-4/19/$31.00 ©2019 IEEE 200


DOI 10.1109/IC2E.2019.00034
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
ysis tools and simulation based approaches to analyze the
performance of microservice-based applications. However,
none of these works study the impact of cloud induced per-
formance interference on microservice-based applications,
and the resulting inaccuracies in performance modeling. In
this paper, we observe that the end-to-end tail latency of mi-
croservice workflows are highly sensitivity to performance
interference in the cloud. Furthermore, we show that the
tail latency of microservice workflows can be accurately
(a) Monolith. (b) Microservices.
predicted even in the presence of performance interference,
with the help of machine learning and multi-layer data Figure 1: Monolithic vs microservice architecture.
collected from the cloud environment.
In particular, we make the following contributions.
proach. Section VII discusses resource scaling optimization
1. We quantify the impact of resource utilization and per- based on the proposed models. Section VIII concludes the
formance interference experienced by various microser- paper.
vices on the end-to-end tail latency of various request
workflows in a web application. Since CPU is a major II. BACKGROUND ON M ICROSERVICE A RCHITECTURE
bottleneck for most web applications, we use CPU uti- Microservice architecture aims to overcome various lim-
lization as a resource metric in this paper, and focus on itations of traditional monolithic architecture for software
the performance interference caused by the contention in development [10, 22]. Figure 1 illustrates the difference
shared processor resources such as LLC (last level cache) between multi-tier monolithic architecture and microservice
and memory bandwidth. However, our approach can be architecture in the context of an e-commerce application
easily extended to include other resource metrics. that takes orders from customers, verifies product catalogue,
2. We propose a modeling approach that combines multi- processes payment and ships orders. In monolithic archi-
layer data including container-level, VM level and a tecture, the web application is divided into technology-
hardware performance counter based metric, CPI (clock specific tiers such as a frontend web tier for serving web
cycles per instruction), to accurately predict end-to-end contents, an application tier composed of numerous tightly
tail latency in the presence of performance interference coupled components for implementing the entire business
in the cloud. logic, and a shared database tier for data persistence. A
3. We apply several machine learning based modeling tech- monolithic application is often simple to design. However,
niques, and compare their accuracy in predicting the end- in order to update one component, the entire application
to-end performance for containerized microservices. has to be redeployed. Furthermore, each component within
4. We demonstrate the feasibility of utilizing the proposed a tier cannot be scaled independently based on its resource
performance models in making efficient resource scaling requirements. On the other hand, microservice architecture
decisions. For this purpose, we formulate resource scaling splits the application into many smaller self-contained com-
of microservices as a constrained nonlinear optimization ponents, called microservices, that serve specific business
problem, and solve it to calculate appropriate resource functions and communicate with each other via lightweight
utilization thresholds on various microservices, so that language-agnostic APIs. Each microservice has its own code
they can be scaled efficiently to meet a performance SLO and database without any shared component with other
(service level objective) target. services. This facilitates flexibility in application deployment
5. We implement and evaluate the proposed techniques and enhanced scalability since each component of an appli-
using a representative microservices benchmark, Sock cation can be updated and scaled independently. In essence,
Shop [14], using the NSF Chameleon cloud [3] microservice architecture is a variant of the Service-Oriented
testbed. The Sock Shop benchmark is containerized with Architecture (SOA) that emphasizes fine-grained services
Docker [35] and deployed in a cluster of VMs managed and lightweightness.
by Kubernetes [8] an open-source container orchestration
engine. III. R ELATED W ORK
The rest of this paper is organized as follows. Section II Performance modeling and dynamic resource provisioning
provides the background on microservice archiecture. Re- of Internet applications has been an important research topic
lated work are discussed in Section III. Section IV describes for many years [31, 36, 37, 40, 41, 43, 45]. There are
the testbed setup and benchmarks used. Section V presents traditional analytical modeling approaches based on queue-
the performance characterization of containerized microser- ing theory [40, 41], and hybrid approaches that combine
vices. Section VI provides the performance modeling ap- queueing theory with machine learning techniques [38, 45].

201

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
Urgaonkar et al. [41] designed a dynamic server provision-
ing technique on multi-tier server clusters. The technique
decomposes the per-tier average delay targets to be certain
percentages of the end-to-end delay constraint. Singh et
al. [38] applied k-means clustering algorithm and a G/G/1
queuing model to predict the server capacity for a given
workload mix. Although these approaches were effective
for multi-tier monolithic applications, they can become in-
tractable when dealing with complex microservice architec- Figure 2: Workflow DAGs.
ture in a cloud environment. The complexity introduced by
having many moving parts with complex interactions and the
presence of cloud-induced performance variability [21, 44] IV. P LATFORM
pose significant challenges in modeling the system behavior, A. Experimental Testbed
identifying critical resource bottlenecks and managing them
We set up a cloud prototype testbed, which closely resem-
effectively.
bles real-world cloud platforms such as Google Kubernetes
Blackbox modeling techniques have been widely adopted
Engine [6] and Amazon Elastic Container Services [2]. Our
in cluster resource allocation and management [31, 36,
testbed consists of a physical layer of bare metal servers, a
42, 43]. Nguyen et al.[36] applied online profiling and
VM layer built on top of the physical layer and a container
polynomial curve fitting to provide a black-box performance
layer built on top of VM layer.
model of the applications SLO violation rate for a given re-
Physical Servers. We used four bare metal servers leased
source pressure. Wajahat et al. [42] presented an application-
on NSF Chameleon Cloud[3] testbed. Each server was
agnostic, neural network based auto-scaler for minimizing
equipped with dual socket Intel Xeon E5-2670 v3 Haswell
SLA violations of diverse applications. Wang et al. [43]
processors (each with 12 cores @ 2.3GHz) and 128 GiB of
applied fuzzy model predictive control and Lama et al. [31]
RAM. Each server was connected to a Dell switch at 10Gbp,
proposed self-adaptive neural fuzzy control techniques for
with 40Gbps of bandwidth to the core network from each
dynamic resource management of monolithic cloud applica-
switch.
tions. However, these studies do not address the modeling
VMs. We setup 16 VMs on top of the bare metal servers
inaccuracies caused by the performance interference in the
by using KVM for server virtualization. Each VM was
cloud, and the complexity introduced by microservice archi-
configured with four vCPUs, 8GB Ram and 30GB disk
tecture.
space.
A few studies have focused on managing the end-to-end
Containers. We setup a 16 VM Kubernetes cluster for
performance objectives of large-scale web services and ana-
container orchestration and management. Docker (version
lyzing their complex performance behavior [27, 28, 39]. Guo
18.03.1-ce) was used as the container run time engine on
et al. [27] highlighted how the complex interactions between
each VM. Kubernetes pod networking was set up using
various components of large-scale web services not only
the Calico CNI (Container Network Interface) network plu-
lead to sharp degradation in performance, but also trigger
gin [11]. We use the term pod and container interchangeably
cascading behaviors that result in wide-spread application
in this paper, since we use a one-container-per-Pod model,
outages. Jalaparti et al. [28] presented Kwiken, a framework
which is the most common Kubernetes use case.
that decomposes the problem of minimizing latency over
a general processing DAG in a large web service into a B. Workloads
manageable optimization over individual stages. Suresh et
For performance characterization, we used Sock
al. [28] presented Wisp, a resource management frame-
Shop [14], an open-source microservices benchmark that
work that applies a combination of techniques, including
is particularly tailored for container platforms. Sock Shop
estimating local workload models based on measurements
emulates an e-commerce website as shown in Figure 1 with
of immediate neighborhoods, distributed rate control and
the specific aim of aiding the demonstration and testing
metadata propagation to achieve end-to-end throughput and
of existing microservice and cloud-native technologies.
latency objectives in Service-Oriented architectures. These
A recent study suggests that Sock shop closely reflects
approaches are complimentary to our work as they focus on
how typical microservices applications are currently being
solutions that need to be adopted at the application layer in
developed and delivered into production, as reported
the context of cloud computing stack, and requires expert
by practitioners and industry experts [17]. We used the
knowledge about the application. On the other hand, our
Locust tool [9] to generate user traffic for the Sock
performance modeling approach does not require intrusive
Shop benchmark. The workload traffic is composed of a
instrumentation of application code for profiling or expert
number of concurrent clients that generate HTTP-based
knowledge about the data flow between various components.
REST API calls to Sock Shop. To create a controlled

202

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
RUGHUVBZRUIORZ RUGHUVBZRUIORZ RUGHUVBZRUIORZ
WKSHUFHQWLOHODWHQF\ PV

WKSHUFHQWLOHODWHQF\ PV

WKSHUFHQWLOHODWHQF\ PV
 FDUWBZRUIORZ  FDUWBZRUIORZ  FDUWBZRUIORZ

  

  

  
              
&38XWLOL]DWLRQ  &38XWLOL]DWLRQ  &38XWLOL]DWLRQ 

(a) CPU utilization of orders microservice. (b) CPU utilization of cart microservice. (c) CPU utilization of frontend microservice.

Figure 3: Impact of CPU utilization on the tail latency of various workflows.

WKSHUFHQWLOHRUGHUVZRUNIORZODWHQF\ PV WKSHUFHQWLOHRUGHUVZRUNIORZODWHQF\ PV WKSHUFHQWLOHRUGHUVZRUNIORZODWHQF\ PV


  
@ @ @
 @  @  @
&38XWLOL]DWLRQ 

&38XWLOL]DWLRQ 

&38XWLOL]DWLRQ 
@ @ @
  

  

  

  

  
FDUW IURQWHQG RUGHU XVHU FDUW IURQWHQG RUGHU XVHU FDUW IURQWHQG RUGHU XVHU

(a) without interference. (b) with interference on cart. (c) with interference on frontend.

Figure 4: Parallel coordinates plot showing the impact of performance interference on the multivariate relationship between
CPU utilization and end-to-end tail latency of orders workflow.

interference workload for our experiments, we used the APIs invoked at each encountered microservice, the supplied
STREAM Memory Bandwidth benchmark[33]. STREAM is arguments, the content of caches, as well as the use of
a synthetic benchmark program geared towards measuring load balancing along the service graph [39]. We used a
memory bandwidth (in MB/s) corresponding to computation visualization and monitoring tool, weavescope [16], to map
rate for simple vector kernels. We run the benchmark inside the DAG structure of orders and cart workflows as shown
a docker container and deploy it as a batch job in in Figure 2.
Kubernetes.
A. End-to-end Tail Latency
V. P ERFORMANCE C HARACTERIZATION
First, we analyze the impact of CPU utilization of in-
One of the challenges that complicate performance char- dividual microservices on the end-to-end tail latency of
acterization of a microservice architecture is that request two different workflows viz. orders and cart in the Sock
execution workflows can form directed acyclic graph (DAG) Shop benchmark. For this purpose, we run experiments
structures spanning across many microservices. As a re- with various workload intensities by varying the number
sult, the end-to-end latency of a workflow is impacted by of concurrent clients in the workload generator from 5 to
the performance behavior of multiple microservices in a 50, while setting the total number of generated requests to
complex way. We use the term workflow to represent an be 50000. We also vary the number of pods allocated to
application-specific group of requests that are associated cart, orders and frontend microservices to include various
with a particular API endpoint, which is usually in the combination of scaling configurations. The CPU utilization
form of an HTTP URI. For instance, in case of the Sock of a particular microservice is measured as the average CPU
Shop benchmark shown in Figure 1, the HTTP URIs for utilization of all the pods allocated to that microservice.
workflows involved with processing orders are [ base url: As shown in Figures 3 (a), (b) and (c) the end-to-end tail
/ GET / Orders] and [ base url: / POST / Orders]. The latency of various workflows have a non-linear relationship
exact structure of the DAG for request workflows is often with the CPU utilization of individual microservices. We
unknown, since it depends on multiple factors such as the observe that the 95th percentile latency of the two workflows

203

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
WKSHUFHQWLOHFDUWZRUNIORZODWHQF\ PV WKSHUFHQWLOHFDUWZRUNIORZODWHQF\ PV WKSHUFHQWLOHFDUWZRUNIORZODWHQF\ PV
  
@ @ @
 @  @  @
&38XWLOL]DWLRQ 

&38XWLOL]DWLRQ 

&38XWLOL]DWLRQ 
@ @ @
  @ 

  

  

  

  
FDUW IURQWHQG RUGHU XVHU FDUW IURQWHQG RUGHU XVHU FDUW IURQWHQG RUGHU XVHU

(a) without interference. (b) with interference on cart. (c) with interference on frontend.

Figure 5: Parallel coordinates plot showing the impact of performance interference on multivariate relationship between CPU
utilization and end-to-end tail latency of cart workflow.

As shown in Figures 4 (a), (b) and (c), the end-to-end


WKSHUFHQWLOHODWHQF\ PV

RUGHUVZRUNIORZ FDUWZRUNIORZ

tail latency of the orders workflow is influenced by the
 CPU utilization of multiple microservices. However, their
 multivariate relationship changes significantly depending on
the performance interference experienced by various mi-

croservices. For example, in the case of no interference,
 the 95th percentile latency of orders workflow is greater
UW

UIH G

UW

G
H

than 300 ms when the CPU utilization measured at cart,


Q

HQ
QF

QF
FD

FD
LQ QWH

UH

UH
Q

Q

QW

frontend, orders and user microservices are 67%, 110%,


R

IUR

R

IUR

UIH
WUI

WUI
Q

WH

Q

WH
LQ

LQ
R

R

LQ

55% and 41% respectively. However, similar tail latency


WUI

WUI
R

R
LQ

LQ
Z

of orders workflow was observed at much lower CPU


Figure 6: Impact of performance interference on the end-to- utilization values when one of the microservices experienced
end tail latency of various workflows. performance interference. Similar results were obtained for
the cart workflow as shown in Figures 5 (a), (b) and (c). This
implies that the CPU utilization of microservices measured
increase significantly even at low CPU utilization values at the pod level are insufficient in accurately predicting the
of the orders and cart microservices. On the other hand, end-to-end tail latency of various workflows.
only high CPU utilization values (>70%) of the frontend Figure 6 shows the distribution of the 95th percentile
microservice has significant impact on the 95th percentile latency of various workflows under three different scenarios,
latency. For example, the tail latency of the orders workflow i.e with interference on cart, interference on frontend and
reaches 200 ms at 49%, 57% and 106% CPU utilizations of without interference. The variation in the latency observed
the orders, cart and frontend microservices respectively. within each case is mainly due to the varying workload
intensities in these experiments. On average the performance
B. Impact of Performance Interference degradation observed by orders and cart workflows due
Next, we analyze the impact of performance interference to interference on cart microservice are 22% and 79%
in a cloud environment on the multivariate relationship respectively. On the other hand, the average performance
between CPU utilization of various microservices and the degradation of the two workflows due to interference on
end-to-end tail latency of particular request workflows. For frontend microservice are 6% and 18% respectively. These
the sake of clarity, we present our analysis using top four mi- results demonstrate the complex interplay between perfor-
croservices from the Sock Shop benchmark ranked accord- mance interference, inter-service performance dependency
ing to their CPU utilization values. To induce performance and the end-to-end tail latency of various workflows.
interference, we colocate pods running the memory-intensive
VI. P ERFORMANCE M ODELING WITH M ACHINE
STREAM [33] benchmark on the VMs that host the pods
L EARNING
running cart and frontend microservices respectively. The
intensity of interference is fixed by running four pods for In this section, we present our approach to address the
each interfering workload. The workload intensities and challenges of predicting the end-to-end tail latency of com-
the scaling configurations for orders, cart and frontend plex workflows in a microservice architecture in the face
microservices are varied similar to the previous experiment. of diverse performance interference patterns. Our approach

204

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
combines the resource usage metrics at the container/pod
level with VM level resource usage and hardware perfor-

0HDQ$EVROXWH3HUFHQWDJH(UURU 

mance counter values to construct machine learning (ML) 3RGB&38
based performance models for individual workflows. Our  3RGB&3890B&3,
3RGB&3890B&38
modeling approach does not rely on any expert application 

knowledge. Hence, it can be easily extended to fit the need 


of diverse applications. 

A. Data Collection 
/5 695 '7 5) 11
0/PRGHOV
In this paper, we use CPU utilization as a resource metric
for the microservices since CPU is a major resource bottle- (a) Mean absolute percentage error.
neck in most web applications. We use docker stats [4] to 
3RGB&38
measure pod level CPU utilization. To capture the impact of 3RGB&3890B&3,

performance interference due to the contention of processor 3RGB&3890B&38

56FRUH
resources, such as the last level cache (LLC) and memory 

bandwidth, we utilize the CPU utilization and CPI metric 


associated with the VMs that host the various microservices 
as pods. We use the virt top [15] tool to measure VM level

CPU utilization. CPI is measured on a per cgroup basis by /5 695 '7 5) 11
0/PRGHOV
using the perf event [23] tool and each cgroup is mapped to a
VM. For data collection, we conduct extensive experiments (b) R2 Score.
on our cloud prototype testbed by varying the number of
Figure 7: Prediction accuracy of various ML models for
concurrent clients, and the performance interference levels
orders.
experienced by different microservices in the Sock Shop
benchmark. We also vary the number of pods allocated to

0HDQ$EVROXWH3HUFHQWDJH(UURU 
the microservices. For each experiment, we measure the
end-to-end tail latency of various workflows as reported by  3RGB&38
3RGB&3890B&3,
the Locust [9] tool. The collected data is used to train our 3RGB&3890B&38
machine learning based performance models. 

B. Machine Learning Models 

We build performance models for predicting the end-to- 


/5 695 '7 5) 11
end tail latency of each microservice workflow by applying 0/PRGHOV
various machine learning (ML) techniques including Linear
Regression (LR), Support Vector Regression (SVR), Deci- (a) Mean absolute percentage error.
sion Tree (DT), Random Forrest (RF) and a deep Neural 
3RGB&38
Network (NN) based regression (more specifically a multi-  3RGB&3890B&3,
3RGB&3890B&38
layer perceptron with multiple hidden layers). The ML
56FRUH


models are built and trained by using scikit-learn [12], a

machine learning library in Python.
Feature Selection. The input features of our ML mod- 

els include the number of concurrent clients, pod-level 


/5 695 '7 5) 11
resource metrics and VM-level resource metrics. The pod- 0/PRGHOV
level metrics include the average CPU utilization of load-
balanced pods for each microservice. The VM-level metrics (b) R2 Score.
include the CPU utilization or the CPI of VMs that host the Figure 8: Prediction accuracy of various ML models for cart.
pods. To reduce our feature space and avoid potential over-
fitting issues, we apply a popular feature selection technique selected across randomizations. The features selected for the
called stability selection [34]. In particular, we use scikit- orders workflow are the number of concurrent clients, pod-
learn [12] library’s randomized lasso technique, which works level CPU utilization of the microservices including front-
by subsampling the training data and computing a Lasso end, orders, users, shipping, payment, cart, users-db, orders-
estimate where the penalty of a random subset of coefficients db, cart-db, and the CPU utilization or CPI of the VMs
has been scaled. By performing this operation several times, that host these microservices. Similarly, the features selected
the method assigns high scores to features that are repeatedly for the cart workflow are the number of concurrent clients,

205

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
   

3UHGLFWHGWDLOODWHQF\ PV

3UHGLFWHGWDLOODWHQF\ PV

3UHGLFWHGWDLOODWHQF\ PV

3UHGLFWHGWDLOODWHQF\ PV
   

   

   

   

   
                       
0HDVXUHGWDLOODWHQF\ PV 0HDVXUHGWDLOODWHQF\ PV 0HDVXUHGWDLOODWHQF\ PV 0HDVXUHGWDLOODWHQF\ PV

(a) Linear regression with Pod CPU.(b) Linear regression with Pod CPU (c) Neural network with Pod CPU. (d) Neural network with Pod CPU
and VM CPI. and VM CPI.

Figure 9: Cross-validated predictions of tail latency in orders workflow.

Table I: Optimal number of neurons in the three hidden predictions approximate the real data points. An R2 of 1
layers of NN models for orders and cart workflow. indicates that the regression predictions perfectly fit the data.
Workflow Figures 7 (a) and (b) show that, compared to the
orders cart
Input Feature Pod CPU based modeling approach, Pod CPU+VM CPU
Pod CPU (6,3,5) (8,5,6)
Pod CPU+VM CPU (4,6,3) (3,6,8)
and Pod CPU+VM CPI approaches achieve significant im-
Pod CPU+VM CPI (9,6,4) (5,7,5) provement in the prediction accuracy of each ML model
for the orders workflow. This is because VM-level CPU
utilization can capture inter-pod CPU contention within a
the pod-level CPU utilization of the microservices including VM. Furthermore, VM-level CPI metric can capture the
front-end, orders, cart, cart-db, and the CPU utilization or contention of shared processor resources between multiple
CPI of the VMs that host these microservices. pods within a VM as well as across VMs. Such inter-
Hyper-parameters. The hyper-parameters of each model VM resource contention may arise when the concerned
is set to the default values provided by scikit-learn. We VMs are colocated in the same physical machine. The
observe that the prediction accuracy of the deep NN model improvement in the prediction accuracy in terms of MAPE
is highly sensitive to the number of hidden layers and the due to Pod CPU+VM CPU and Pod CPU+VM CPI ap-
size (number of neurons) in each hidden layer. Hence, we proaches are up to 36% and 38% respectively. The largest
tuned these parameters through an exhaustive search for improvement is observed in case of the NN model. We also
various combinations of input feature space and the targeted observe that the NN model outperforms all other models in
workflow for the prediction of end-to-end tail latency. The prediction accuracy since the Neural Network is a universal
optimal number of hidden layers for our NN model is three, function approximator. On the other hand, the LR model
and the optimal number of neurons in these three hidden shows the worst prediction accuracy. This is because a linear
layers is summarized in Table I. regression model can not capture the non-linearity of tail
latency. Overall, we observed similar results in the latency
C. Prediction Accuracy prediction of cart workflow as shown in Figure 8.
In this section, we evaluate the prediction accuracy of Figure 9 plots the cross-validated predictions vs. the
various ML models (LR, SVR, DT, RF, NN) and three measured values of end-to-end tail latency of the orders
modeling approaches. First, the Pod CPU approach includes workflow in order to graphically illustrate the different R2
pod-level CPU utilization metrics in the input feature space. values for the LR and NN models. Theoretically, if a model
Second, the Pod CPU+VM CPU approach includes both could explain 100% of the variance in the observed data, the
pod-level and VM-level CPU utilization metrics. Third, the predicted values would always equal the measured values
Pod CPU+VM CPI approach includes pod-level CPU uti- and, therefore, all the data points would fall on the fitted
lization and VM-level CPI metrics in the input feature space. regression line. The more variance that is accounted for
The models are evaluated with 10-fold cross validation on by the regression model the closer the data points will
the collected dataset. As a result, 90% of data is used for fall to the fitted regression line. The proportion of variance
training, 10% of data is used for testing in each of the accounted for by the LR model with Pod CPU , LR model
10 iterations of cross-validation. We utilize commonly used with Pod CPU+VM CPI, NN model with Pod CPU and
metrics such as the mean absolute percentage error (MAPE) NN model with Pod CPU+VM CPI approaches are 42%,
of determination, R2 . MAPE is calculated
n  y−ŷ 
and the coefficient 66%, 71% and 89% respectively.
1
as n i=1  y  where y and ŷ are the measured and VII. O PTIMIZATION FOR R ESOURCE S CALING
predicted values of the end-to-end tail latency respectively. Although existing cloud platforms [1, 2, 5, 6] provide
R2 is a statistical measure of how well the regression mechanisms for auto-scaling microservices, they expect ap-

206

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
Table II: Notation used in Resource Scaling Optimization We formulate the optimization problem as follows:
Problem 
max xi (2)
Symbol Description i∈Sj
Sj Set of microservices relevant to workflow j
SLOjtarget Tail latency target of workflow j s.t. rj (x) ≤ SLOjtarget (3)
xi Average pod-level CPU utilization in microservice i
x A vector of average pod-level CPU utilizations of various
x = (xi )i∈Sj (4)
microservices relevant to the target workflow
rj (x) Predicted tail latency of workflow j where, the symbol notations are described in Table II. The
objective function in Equation 2 aims to maximize the pod-
level resource usage i.e the sum of average CPU utilization
in the set of microservices that are relevant to the target
workflow. The relevance of a microservice to a workflow
plication owners to specify thresholds for various microser- can be determined either by analyzing the workflow DAG,
vice load metrics to enable auto-scaling features. For exam- or through machine learning based feature selection as
ple, the auto-scaling feature [7] in Kubernetes determines described in Section VI-B. Consider that rj (x) is the tail
the allocation of containers/pods to a microservice by using latency predicted by machine learning model for workflow
the formula:
j. The inequality constraint in Equation 3 ensures that
the SLO target of workflow j will not be violated. The
optimization problem is nonlinear since the workflow tail
 currentM etricV alue  latency rj (x) included in the constraint Equation 3 has a
desiredReplicas = currentReplicas∗
desiredM etricV alue nonlinear relationship with the average CPU utilization of
(1)
various microservices.
In the formulation of the optimization problem,
If the desiredMetricValue (threshold) is specified as an application-layer metrics (e.g number of concurrent clients),
average CPU utilization of 50% for a particular microser- VM-level CPU utilization and CPI metrics are not included
vice, and the current average CPU utilization is 100%, then as variables, although the tail latency prediction rj (x) de-
the number of pods allocated to that microservice will be pends on these metrics as well. Instead, the values of these
doubled. Furthermore, any scaling is performed only if the metrics are fixed according to their observed values at the
ratio of currentMetricValue and desiredMetricValue drops time of solving the optimization problem, and are treated
below 0.9 or increases above 1.1 (10% tolerance by default). as constants for that instance of optimization. As a result,
It is challenging and burdensome for application owners the solutions to the optimization problem will only include
to determine the resource utilization thresholds for various pod-level CPU utilization values, which can be directly used
microservices in order to meet the application’s end-to-end as thresholds for making resource scaling decisions. This
performance target. Setting inappropriate thresholds may allows the resource scaling mechanism to be practical and
lead to overprovisioning or underprovisioning of resources. simple to implement.
We propose that cloud platforms should automatically deter- B) Solution. We apply a non-linear optimization tech-
mine these thresholds based on user-provided performance nique, trust-region interior point method [13, 20], to solve
SLO targets. For this purpose, we study the feasibility of this problem. This optimization technique provides two main
utilizing the proposed performance models in making effi- benefits. First, it is efficient for large scale problems. Second,
cient resource scaling decisions by formulating a constrained the gradient of the constraint function which is required for
nonlinear optimization problem. optimization, can be approximated through finite difference
methods in this optimization technique [13]. This property
A) Problem Formulation. Consider that the performance
is desirable since the machine learning models for workflow
SLO target in terms of the end-to-end tail latency for a work-
tail latency are blackbox functions, whose gradient can not
flow is specified. For a given workload condition, we aim
be directly calculated.
to find the highest resource utilization values of the relevant
microservices, at which the given SLO targets will not be C) Feasibility Study. As a case study, we apply the
violated. These optimal utilization values can be calculated optimization technique to calculate the desired CPU utiliza-
periodically and set as the thresholds (desiredMetricValue) tion (thresholds) for various relevant microservices, when a
for making resource scaling decisions. These thresholds will workload of 30 concurrent clients is applied to the SockShop
help in determining which microservices should be scaled, benchmark, and a performance SLO target of 240 ms is spec-
and how many pods should be allocated to each microservice ified for the 95th percentile latency of orders workflow. For
based on Equation 1. This approach aims to avoid resource this optimization, we utilize our Neural Network model for
overprovisioning while providing performance guarantee to orders workflow with pod-level CPU utilization, VM-level
the given workflow. CPI metrics and the number of concurrent clients as the input

207

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
 envision that our performance modeling and resource scaling
GHVLUHG

&38XWLOL]DWLRQ 
PHDVXUHG
 optimization approach can enable cloud platforms to au-

tomatically scale microservice-based applications based on
user-provided performance SLO targets. This will remove

the burden of determining resource utilization thresholds
 for numerous microservices from the cloud users, which

W
UW

UV

VK G
QJ

HU

E
is prevalent in existing cloud platforms. In future, we will

HQ
HQ

BG
FD

GH

XV
SL

\P
QW

HU
LS
RU

IUR

XV
SD
PLFURVHUYLFH
extend our work to include diverse microservice-based ap-
plications with different resource bottlenecks. We will also
(a) Current vs desired average CPU utilization evaluate the effectiveness of the proposed resource scaling
of various microservices. Here, one pod is
allocated to each microservice. system in the face of dynamic workloads.

WKSHUFHQWLOHODWHQF\ PV

6/2WDUJHW
PHDVXUHGODWHQF\ ACKNOWLEDGMENT

Results presented in this paper were obtained using the
Chameleon testbed supported by the National Science Foun-

dation. The research is partially supported by NSF CREST

Grant HRD-1736209. We thank the anonymous reviewers
for their many suggestions for improving this paper. In


































FRQILJXUDWLRQ FDUWRUGHUVIURQWHQG particular we thank our shepherd, Prof. Maarten van Steen.
(b) Tail latency of orders workflow for various
resource scaling configurations. The configu-
R EFERENCES
ration suggested by the optimization of CPU
utilization thresholds is (1,1,2) i.e one pod for
[1] Amazon elastic container service. https://aws.amazon.
cart, one pod for orders and two pods for fron- com/ecs/.
tend. All other microservices are provisioned [2] Amazon elastic container service for kubernetes. https:
with one pod.
//aws.amazon.com/eks/.
Figure 10: Optimization of CPU utilization thresholds for [3] Chameleon: A configurable experimental environ-
efficient resource scaling with a workload of 30 concurrent ment for large-scale cloud research. https://www.
clients, and SLO target 240 ms for 95th percentile latency chameleoncloud.org.
of orders workflow. [4] Docker stats. https://docs.docker.com/engine/reference/
commandline/stats/.
[5] Google app engine flexible environment. https://cloud.
features. Figure 10 (a) compares the current (measured) CPU google.com/appengine/docs/flexible/.
utilization of the microservices relevant to orders workflow [6] Google Kubernetes engine. https://cloud.google.com/
and their desired CPU utilization values, when only one pod kubernetes-engine/.
is allocated to each microservice. Based on Equation 1, the [7] Kubernetes horizontal autoscaling. https://kubernetes.
optimal resource scaling option is to allocate an additional io/docs/tasks/run-application/horizontal-pod-autoscale/
pod to the frontend microservice. As shown in Figure 10 (b), #algorithm-details.
we validate the optimality of this resource scaling option by [8] Kubernetes: Production-grade container orchestration.
comparing the tail latency of orders workflow for various https://kubernetes.io/.
possible resource scaling configurations. We observe that the [9] Locust: An open source load testing tool. https://locust.
resource scaling configuration suggested by our optimization io.
technique is able to meet the performance SLO target while [10] Microservices: an application revolution powered
allocating minimum number of pods in total. by the cloud. https://azure.microsoft.com/en-us/blog/
microservices-an-application-revolution-powered-by-the-cloud/.
VIII. C ONCLUSIONS A ND F UTURE W ORK [11] Project calico. https://www.projectcalico.org/.
We present the performance characterization and model- [12] Scikit-learn: Machine learning in python. http://
ing of containerized microservices in the cloud. Our mod- scikit-learn.org/stable/.
eling approach utilizes machine learning and multi-layer [13] Scipy optimization library. https://docs.scipy.org/doc/
data collected from the cloud environment to predict the scipy/reference/generated/scipy.optimize.minimize.
end-to-end tail latency of microservice workflows even in html.
the presence cloud induced performance interference. We [14] Sockshop microservice demo application. https://
also demonstrate the feasibility of utilizing the proposed microservices-demo.github.io.
models in making efficient resource scaling decisions. We [15] virt-top. https://linux.die.net/man/1/virt-top.

208

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
[16] Weave scope. https://www.weave.works/docs/scope/ resource provisioning for multi-service web applica-
latest/introducing/. tions. In Proceedings of the 19th ACM International
[17] C. M. Aderaldo, N. C. Mendona, C. Pahl, and Conference on World wide web (WWW), 2010.
P. Jamshidi. Benchmark requirements for microser- [30] G. Kakivaya, L. Xun, R. Hasha, S. B. Ahsan,
vices architecture research. In IEEE/ACM 1st Inter- T. Pfleiger, R. Sinha, A. Gupta, M. Tarta, M. Fussell,
national Workshop on Establishing the Community- V. Modi, M. Mohsin, R. Kong, A. Ahuja, O. Platon,
Wide Infrastructure for Architecture-Based Software A. Wun, M. Snider, C. Daniel, D. Mastrian, Y. Li,
Engineering (ECASE), 2017. A. Rao, V. Kidambi, R. Wang, A. Ram, S. Shiv-
[18] A. Balalaie, A. Heydarnoori, and P. Jamshidi. Mi- aprakash, R. Nair, A. Warwick, B. S. Narasimman,
croservices architecture enables devops: Migration to a M. Lin, J. Chen, A. B. Mhatre, P. Subbarayalu,
cloud-native architecture. IEEE Software, 33(3), 2016. M. Coskun, and I. Gupta. Service fabric: A distributed
[19] S. Barakat. Monitoring and analysis of microservices platform for building microservices in the cloud. In
performance. Journal of Computer Science and Control Proceedings of the Thirteenth EuroSys Conference,
Systems, 10:19–22, 05 2017. 2018.
[20] R. H. Byrd, M. E. Hribar, and J. Nocedal. An interior [31] P. Lama and X. Zhou. Autonomic provisioning with
point algorithm for large-scale nonlinear programming. self-adaptive neural fuzzy control for percentile-based
SIAM J. on Optimization, 9(4):877–900, Apr. 1999. delay guarantee. ACM Transactions on Autonomous
[21] X. Chen, L. Rupprecht, R. Osman, P. Pietzuch, F. Fran- and Adaptive Systems, 31 pages, under 2nd reviewing
ciosi, and W. Knottenbelt. Cloudscope: Diagnosing after revision, 2011.
and managing performance interference in multi-tenant [32] J. Li, N. K. Sharma, D. R. K. Ports, and S. D. Gribble.
clouds. In 2015 IEEE 23rd International Symposium Tales of the tail: Hardware, os, and application-level
on Modeling, Analysis, and Simulation of Computer sources of tail latency. In Proceedings of the ACM
and Telecommunication Systems (MASCOTS), 2015. Symposium on Cloud Computing (SoCC), 2014.
[22] N. Dragoni, S. Giallorenzo, A. L. Lafuente, M. Maz- [33] J. D. McCalpin. Memory bandwidth and machine
zara, F. Montesi, R. Mustafin, and L. Safina. Mi- balance in current high performance computers. IEEE
croservices: yesterday, today, and tomorrow. In Present computer society technical committee on computer
and Ulterior Software Engineering, pages 195–216. architecture (TCCA) newsletter, 2(19–25), 1995.
Springer, 2017. [34] N. Meinshausen and P. Bhlmann. Stability selection.
[23] S. Eranian. perfmon2: the hardware-based perfor- Journal of the Royal Statistical Society: Series B (Sta-
mance monitoring interface for linux. http://perfmon2. tistical Methodology), 72(4):417 – 473, 8 2010.
sourceforge.net/. [35] D. Merkel. Docker: lightweight linux containers for
[24] M. Fazio, A. Celesti, R. Ranjan, C. Liu, L. Chen, and consistent development and deployment. Linux Jour-
M. Villari. Open issues in scheduling microservices in nal, 2014(239):2, 2014.
the cloud. IEEE Cloud Computing, 3(5):81–88, 2016. [36] H. Nguyen, Z. Shen, X. Gu, S. Subbiah, and
[25] I. Giannakopoulos, D. Tsoumakos, and N. Koziris. J. Wilkes. AGILE: Elastic distributed resource scaling
Towards an adaptive, fully automated performance for infrastructure-as-a-service. In Proceedings of the
modeling methodology for cloud applications. In 10th International Conference on Autonomic Comput-
IEEE International Conference on Cloud Engineering ing (ICAC), 2013.
(IC2E), 2018. [37] J. Rao and C.-Z. Xu. Online capacity identification
[26] M. Gribaudo, M. Iacono, and D. Manini. Performance of multi-tier Websites using hardware performance
evaluation of massively distributed microservices based counters. IEEE Trans. on Parallel and Distributed
applications. In European Council for Modelling and Systems, 2009.
Simulation (ECMS), 2017. [38] R. Singh, U. Sharma, E. Cecchet, and P. Shenoy.
[27] Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Autonomic mix-aware provisioning for non-stationary
Y. Luo, T. Bergan, M. Musuvathi, Z. Zhang, and data center workloads. In Proc. IEEE Int’l Conf. on
L. Zhou. Failure recovery: When the cure is worse than Autonomic Computing (ICAC), pages 21–30, 2010.
the disease. In Presented as part of the 14th Workshop [39] L. Suresh, P. Bodik, I. Menache, M. Canini, and
on Hot Topics in Operating Systems, Santa Ana Pueblo, F. Ciucu. Distributed resource management across
NM, 2013. USENIX. process boundaries. In Proceedings of the 2017 Sym-
[28] V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Ry- posium on Cloud Computing-SoCC’17. ACM Press,
balkin, and C. Yan. Speeding up distributed request- 2017.
response workflows. In Proceedings of the ACM [40] B. Urgaonkar, G. Pacifici, P. Shenoy, M. Spreitzer,
SIGCOMM 2013 Conference on SIGCOMM, 2013. and A. Tantawi. An analytical model for multi-tier
[29] D. Jiang, G. Pierre, and C.-H. Chi. Autonomous internet services and its applications. In Proceedings

209

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.
of the ACM SIGMETRICS International Conference
on Measurement and Modeling of Computer Systems,
2005.
[41] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, and
T. Wood. Agile dynamic provisioning of multi-tier
internet applications. ACM Trans. Auton. Adapt. Syst.,
3(1), Mar. 2008.
[42] M. Wajahat, A. Gandhi, A. Karve, and A. Kochut.
Using machine learning for black-box autoscaling.
In 2016 Seventh International Green and Sustainable
Computing Conference (IGSC), 2016.
[43] L. Wang, J. Xu, H. A. Duran-Limon, and M. Zhao.
Qos-driven cloud resource management through fuzzy
model predictive control. In IEEE International Con-
ference on Autonomic Computing (ICAC), 2015.
[44] Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail:
Avoiding long tails in the cloud. In Presented as part
of the 10th USENIX Symposium on Networked Systems
Design and Implementation (NSDI 13), 2013.
[45] Q. Zhang, L. Cherkasova, and E. Smirni. A regression-
based analytic model for dynamic resource provision-
ing of multi-tier Internet applications. In Proc. IEEE
Int’l Conference on Autonomic Computing (ICAC),
2007.
[46] Y. Zhang, D. Meisner, J. Mars, and L. Tang. Treadmill:
Attributing the source of tail latency through precise
load testing and statistical inference. In ACM/IEEE
43rd Annual International Symposium on Computer
Architecture (ISCA), 2016.

210

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:08:21 UTC from IEEE Xplore. Restrictions apply.

You might also like