0% found this document useful (0 votes)
50 views14 pages

Resource Allocation With Service Affinity in Large-Scale Cloud Environments

Uploaded by

firedelonix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views14 pages

Resource Allocation With Service Affinity in Large-Scale Cloud Environments

Uploaded by

firedelonix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

2024 IEEE 40th International Conference on Data Engineering (ICDE)

Resource Allocation with Service Affinity in


Large-Scale Cloud Environments
Zuzhi Chen∗ , Fuxin Jiang∗ , Binbin Chen∗ , Yu Li∗ , Yunkai Zhang† , Chao Huang∗ , Rui Yang∗ ,
Fan Jiang∗ , Jianjun Chen∗ , Wu Xiang∗ , Guozhu Cheng∗ , Rui Shi∗ , Ning Ma‡ , Wei Zhang§ , Tieying Zhang∗ 
∗ ByteDance
Inc. † University of California, Berkeley
‡ Xi’an
Jiaotong University § South China University of Technology
∗ {chenzuzhi, jiangfuxin, chenbinbin.1996, liyu.xjtu1998, huangchao.thss15, yangrui.emma, jiangfan.2017,
jianjun.chen, xiangwu, chengguozhu, shirui, tieying.zhang}@bytedance.com, † yunkai [email protected],
2024 IEEE 40th International Conference on Data Engineering (ICDE) | 979-8-3503-1715-2/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICDE60146.2024.00397

[email protected] , § [email protected]

Abstract—Containerization has garnered substantial favor benefits of containerization, one primary concern that troubles
among cloud service providers. Nevertheless, the notable network many companies in containerization is the substantial network
overhead incurred between containers has prompted concerns overhead incurred due to the frequent remote calls between
within the community. In cloud resource scheduling, collocating
service containers that frequently communicate to the same ma- services. Furthermore, a recent trend has witnessed a shift
chine – termed “service affinity” – is instrumental in enhancing from merely containerizing computing components to also
application performance. In response to this concern, we present migrating database and storage components into containers
a solution that harnesses service affinity and collocates containers [5], [6]. These include NoSQL databases such as caching
to enhance the overall system performance and stability. To components like Redis [7], [8] and message queues like Kafka
maximize the benefits of collocating containers, it is necessary
to calculate a new schedule that optimally and efficiently max- [9]. These data systems find extensive use in various data-
imizes service affinity, especially within the expansive domain intensive applications, where minimizing latency is paramount
of industry-scale cloud environments. In pursuit of this, we for enhancing user experiences. To mitigate this issue, one so-
leverage the skewness property of affinity and machine learning lution is to optimize the scheduling of containers to maximize
to fuse solver-based algorithms, thereby assuring both quality and the traffic that can communicate within the local machine,
efficiency for problems at scale. Our methodology encompasses
the partitioning of a given task into discrete subproblems, with which can greatly reduce network overhead and enhance
a keen focus on resolving the most critical ones. Via a graph service performance.
neural network classifier, we assign each subproblem to be solved The scheduling of containers1 to physical machines is a
independently using methods based on off-the-shelf solvers in fundamental resource allocation challenge in cloud computing
our algorithm pool – namely, MIP-based, or column generation. [10]–[12]. Consider two services in a microservice cluster,
This strategic approach enables the efficient computation of a
schedule for a cloud cluster that fully optimizes the overall when the collocation of their containers on the same machine
service affinity. We further propose a heuristic algorithm to can yield benefits, we refer to these services as having an
compute executable container migration plans for practical use, affinity relation. Collocating containers with an affinity relation
facilitating the transition to the new placement where service yields multiple advantages, including the potential for network
affinity is well optimized. Our solution has been deployed in our bandwidth savings [10], [13], enhanced service performance
large-scale production environment, covering over a million cores
within ByteDance. Through the successful real-world production [10], [14], and reduced resource consumption [15], [16].
deployment, our approach exhibits an average improvement in
end-to-end latency by 23.75% and a reduction in request error
rates by 24.09% compared to the original system.
Index Terms—resource allocation, service affinity, optimiza-
tion, solver

I. I NTRODUCTION
Containerization has been widely adopted in cloud comput- (a) End-to-end latency (b) Stability - error rate
ing due to its scalability, lightweight nature, fault-isolation, Fig. 1: Comparison of collocating and without collocating two
and various other advantages [1]–[4]. One of the most popular containers with a service affinity relation.
applications of containerization is the microservice archi- Fig. 1(a) and (b) demonstrate the benefits to service per-
tecture, widely embraced by major Internet companies. In formance and stability by collocating service containers with
microservice architecture, applications often consist of dozens an affinity relation on the same machine. By leveraging inter-
to hundreds of containerized services, each encompassing a process communication (IPC) between collocated containers
specific functionality. These services are highly interconnected
and engage in frequent data exchange. Despite the many 1 For convenience, in this paper, the term container is extended to refer to
any object that can be assigned to machines in the cloud, like “Container” in
 Corresponding Author Yarn and “Pod” in Kubernetes.

2375-026X/24/$31.00 ©2024 IEEE 5280


DOI 10.1109/ICDE60146.2024.00397
Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
instead of remote procedure calls (RPC) over the network, while sophisticated solver-based algorithms may take
we can significantly reduce network latency associated with days or even weeks to produce high-quality solutions.
network I/O, minimize data transfer overhead between dif- Consequently, ensuring both quality and efficiency in
ferent hosts, and lower request error rates related to network large-scale RASA problems poses a significant challenge.
congestion, packet loss, or connectivity issues. In this paper, we present a comprehensive solution for
Why Not Consolidation? We define consolidation as the optimizing the affinity of a large-scale containerized cluster.
process of merging several services into one, while we define Our solution revolves around consistently optimizing container
collocation as the deployment of containers from different ser- placement within the given cluster. Our contributions are
vices on the same machine. Network efficiency is undoubtedly summarized as follows:
higher in consolidation, as it entirely eliminates network costs. • Present the problem definition and mathematical
However, extensively adopting consolidation is often infeasible formulation of RASA. We formally define and math-
in practice for the following reasons: ematically formulate the problem of optimizing service
• Microservices with different programming languages and affinity (termed RASA). This includes defining service
compilation environments cannot be easily consolidated. affinity precisely, framing it as a scheduling problem with
To merge different microservices into a single entity re- typical constraints, and introducing the objective function.
quires substantial effort to deal with various compatibility, • Propose a novel algorithm to efficiently optimize the
environment requirements, and service level agreement schedule of industrial-scale clusters2 . Our algorithm is
(SLA) challenges, which is very time-consuming. a three-phase approach. In the first phase, we analyze
• Consolidation undermines the advantages of microservice service affinities as a graph, utilizing affinity skewness
architecture. It intensifies the coupling between services, and graph partitioning techniques to identify key subprob-
therefore, limiting the flexibility that containerization lems. This significantly reduces the scale of the problems
could offer in disaster recovery, independent scaling and at hand. In the second phase, we employ graph learning
upgrades, and other operational aspects. to select the appropriate solver-based approach to strike
On the other hand, collocation does not encounter such com- a balance between quality and efficiency, enabling us to
plexities. It retains the independence of microservices and only address industrial-scale clusters previously deemed in-
involves the infrastructure team to implement. tractable. In the final phase, we propose a migration path
In this paper, we propose the Resource Allocation with algorithm to calculate the orders of container deletions
Service Affinity (RASA), which is a constrained optimization and creations necessary for transitioning the container’s
problem that aims to find a container-to-machine mapping that placement to align with the new mapping.
maximizes an objective function accurately characterizing the • Show advantages of our solution via extensive evalu-

overall utility of collocating containers. Section II defines this ations in both experimental and production environ-
objective function as the total gained affinity. ments. In our experiments, our algorithm, on average,
Despite the aforementioned benefits, collocation comes with not only outperforms the state-of-the-art by 17.66% in
associated challenges that need to be addressed before imple- terms of the optimization objective function, known as
menting it in practice. There are two main challenges: total gained affinity, but also achieves this with much
• The first challenge lies in mathematically defining the less computation time. This showcases that the RASA
problem and accurately modeling the concept of affinity algorithm excels in both quality and efficiency. In our
to optimize it effectively. Prior studies [10], [11], [13] real-world evaluations, our solution is integrated with
simply treat affinity as a Boolean relation, overlooking its Kubernetes and deployed in production clusters with
full potential for optimization. In our work, we propose over one million cores, resulting in a 23.75% reduction
a model that approximates affinity based on traffic and in latency and a 24.09% decrease in error requests.
represents the affinity relation as a graph. By transforming This demonstrates that our solution effectively enhances
the optimization of affinity into a scheduling problem, we service performance and cluster stability.
can effectively enhance the network. II. P ROBLEM F ORMULATION
• The second challenge is how to efficiently solve the
A. The Basics
proposed scheduling problem on a large-scale cloud Given a cluster, assume there are N services and M
environment. In the context of an optimization algo- machines. Let S and M represent the sets of services and
rithm, “solution quality” pertains to the objective value machines, respectively. To meet the SLA (Service Level
of the solution, while “time efficiency” relates to the Agreement), each service s ∈ S needs to instantiate ds
algorithm’s running time. The optimization of a sched- homogeneous containers in this cluster. Fig. 2(a) illustrates the
ule that maximizes actual benefits necessitates that the fundamental concepts of services, containers, and machines. It
algorithm ensures both solution quality and time effi- is important to note that the concept of affinity discussed here
ciency. However, in practical scenarios, such as those pertains to the service-to-service level rather than the service-
encountered at ByteDance, where a single cluster can to-machine level. In concise terms, this concept is referred
comprise thousands of services and machines, efficient 2 The source code of our RASA algorithm and the datasets are available in
heuristics often yield solutions with low overall affinity, the GitHub repository [17].

5281

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.

to as ”service affinity.” This paper focuses on service affinity The gained affinity between service s and s is denoted as
resulting from frequent data communication between services. a
m∈N s,s ,m
 , which signifies the maximum ratio of traffic
Fig. 2(b) illustrates the affinity relations between services. between s and s that can be transferred within the same


machine. For instance, in Fig. 2(a), consider s as Service A and


       s as Service B. The gained affinity between Service A and B is
  50%, indicating that up to 50% of the traffic can be transferred
      within the same machine (marked with a red dashed line).
 

    
  We refer to this as localized traffic. The more traffic between
Service A and Service B that is localized, the more requests
   
can contribute to reducing latency and error rates between
    
(a) Container, service, these two services. To optimize service affinity, it is natural to
and machine (b) An affinity graph consider the overall gained affinity as the objective function for
Fig. 2: Illustration of key concepts. optimization since it quantifies the quantity of actual benefits.
B. Modelling the Affinity
In practice, there is flexibility to finetune affinity for better
To describe the complex affinity topology among services,
optimization. For instance, the cluster manager can set up
we define affinity graphs. To quantize the utility we obtain
multiple priority levels and ask each microservice developer to
from service affinity, we introduce the concept of total affinity
specify the priority of network performance for their services.
vs. gained affinity. Affinity is an abstract concept. In this
If the priority is high, then we can assign a higher weight to
paper, since affinity arises from frequent data communication
the traffic as the affinity of their services. Otherwise, we assign
between services, we can intuitively use the volume of traffic
a lower weight to the traffic as the affinity of their services
between two services to equate the affinity between them,
towards other services.
aiding reader comprehension.
An affinity graph is a weighted undirected graph G = C. Formulation of RASA
V, E, where each vertex of V represents a service, as RASA is an optimization problem involving the scheduling
illustrated in Fig. 2(b). If (u, v) ∈ E, then the services u of containers to machines in order to maximize a chosen utility
and v have an affinity relation. function while satisfying various constraints. Let x be a matrix
The weight of an edge between two services describes of size N × M , where xs,m denotes the number of service
the degree of their affinity. A higher weight between two s’s containers assigned to machine m. The RASA problem
services indicates that collocating their containers results in is to maximize the utility via optimizing x (Notations are
more benefits. The values of weights should be explicitly summarized in Tab. I):
designed for the corresponding scenarios. Once again, in this  
paper, the affinity we focus on arises from frequent data max. as,s ,g (2)
communication between services. Therefore, in our real-world (s,s )∈E g∈F

experiments, we have a metrics monitoring system to track s.t. xs,m = ds , ∀s ∈ S (3)
the volume of traffic between any two services within a given m∈M
cluster, and we utilize this volume of traffic as the weight of 
xs,m · Rr,s
S
≤ Rr,m
M
, ∀r ∈ R, m ∈ M (4)
the edge between the services.
s∈S
We call the total weight of an affinity graph G as the total 
affinity. For simplicity, we normalize the total affinity to 1.0. xs,m ≤ hk , ∀Ak ∈ A, m ∈ M (5)
As a comparison, we quantize the realized utility from a given s∈Ak

mapping of containers to machines as the gained affinity. As bs,m · ds ≥ xs,m , ∀s ∈ S, m ∈ M (6)


a concrete example, in this paper, since the affinity we consider xs,m  
ws,s · ≥ as,s ,g , ∀ s, s ∈ E, m ∈ M (7)
is defined to encourage frequent data communication among ds
xs ,m  
services, utility is the amount of traffic that can be localized ws,s · ≥ as,s ,g , ∀ s, s ∈ E, m ∈ M (8)
on each machine. In our case, we consider the gained affinity d s
as the maximum amount of traffic that can be shared within xs,m ∈ N, ∀s ∈ S, m ∈ M. (9)
the same machine under a traffic load balancing [18]. Here,
we present its formal definition.  Here, {a, x} are the decision variables, and
Definition 1 (The Gained Affinity): Consider a machine r, RS , RM , b, d, h, w are the given parameters. Equation
m ∈ M. The affinity between services s and s is ws,s and (9) restricts x to be non-negative integers.
the number of containers that the two services scheduled on Optimization Objective. To maximize the actual benefits
machine m is xs,m and xs ,m respectively, then the gained obtained from collocating containers, we will use the overall
affinity of services s and s on machine gained affinity, defined in Definition 1, as the optimization
 m is 
xs,m xs ,m objective function, as it quantifies the utility from collocation.
as,s ,m = ws,s · min , . (1)
ds d s   Constraints. A container can only be placed on a machine if it
The overall gained affinity is the sum of all as,s ,m , ∀ s, s ∈ satisfies various scheduling constraints. In practical scenarios,
E and m ∈ M, where M is the set of all machines. we consider the following constraints:

5282

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
          
 
      
    
                 
 
          
  

  
     

   
 
     


Fig. 3: Workflow of the entire system.


TABLE I: Summary of notations given cluster, we use a binary matrix bN ×M to represent
Notation Description the schedulable relations, i.e., the machine m can host the
S Set of all services, where N = |S|
M Set of all machines, where M = |M|
container of service s if and only if bs,m = 1. Schedulable
R Set of all resource types constraints are commonly related to compatibility issues
A Set of anti-affinity sets in practical applications. For example, if machine m
xs,m Number of containers service s places on machine m
ds Number of containers for service s
does not support the IPv4 network stack, but the service
S
Rr,s Requested type r resource of each container for service s s relies on the IPv4 protocol for communication, the
M
Rr,m Total type r resource of machine m deployment of the container for that service is not allowed
bs,m bs,m = 1 if machine m can host containers of service s; on that machine, and we will set bs,m = 0. We abstract
0 otherwise
hk The maximum number of containers of anti-affinity Ak ∈ all these compatibility requirements as schedulable con-
A that a single machine
  can host straints and formulate them as (6).
ws,s Weight of edge s, s in affinity graph III. S YSTEM OVERVIEW
as,s ,g Gained affinity of s and s on machine group g A. System Design
In this section, we provide an overview of our system, which
• SLA constraints concern the need to create a sufficient
comprises three primary components, as illustrated in Fig. 3:
number of service instances (i.e., ds ) for each service s to
• Data Collector. For each cluster, a data collection pro-
adhere to the service level agreements (SLAs). The SLA
gram gathers information at a given moment. This in-
constraint corresponds to (3) in the above formulation,
cludes the service list, machine list, current container
where ds is a constant predefined by users.
deployments, and traffic metrics. This data forms the
• Resource constraints require that if we want to place a
cluster state, serving as input for our RASA algorithm.
container on a machine, then the requested resources of
• Workflow Controlling Periodic Task (CronJob). The Cron-
the container must not exceed the machine’s available
Job is responsible for orchestrating the workflow of the
resources. In practical applications, multiple resource
entire system. It triggers data collection and the RASA
types need to be considered, including CPU, memory,
algorithm, and manages container reallocation operations.
network, and disk. We use R to represent the list of
• RASA algorithm. Our algorithm, referred to as the RASA
resource types. For r ∈ R, we use Rr,s S
to denote the
algorithm, plays a central role. It determines the new
requested r-th resource of the container for service s
M mapping of containers to machines to maximize service
and Rr,m to denote the total r-th resource capacity of
affinity and computes the necessary container migrations
machine m. The resource constraints correspond to (4)
to transition to the new cluster state.
in the formulation, which prevents the total requested The workflow for fully optimizing the cluster is as follows:
resources of all containers hosted on each machine from • Firstly, the CronJob triggers the data collection module of
exceeding the total available resources of that machine. the cluster, obtaining a cluster state that includes service
• Anti-affinity constraints state that given a set of services information, machine details, and traffic data.
Ak ∈ A, for any machine, the number of containers from • Second, the CronJob triggers the decision-making pro-
the service set Ak should not exceed a certain predefined gram and feeds the cluster state to the RASA algorithm.
threshold, denoted as hk . Anti-affinity constraints prevent • Third, the RASA algorithm calculates a new container-
too many containers with a certain feature from being to-machine mapping and migration plan, which includes
concentrated on a single machine. These constraints are instructions for deleting and creating containers to align
often designed for the purposes of disaster control, fault with the new mapping, and then returns them.
tolerance, isolation, and security. Note that if Ak consists • Lastly, the CronJob reallocates the containers according
of only one service s, the constraints prevent service s to the migration plan.
from placing too many containers on one machine. This Following the aforementioned process, a full optimization of
is often referred to as service-to-machine anti-affinity. We the cluster is completed. However, in practice, the cluster’s
use the set A to represent all anti-affinity sets, and the state may change for various reasons, such as application
anti-affinity constraints are expressed by (5). updates or user modifications. After these changes, the overall
• Schedulable constraints determine whether the containers gained affinity may no longer be satisfactory. To address
of a service s can be hosted by a machine m. For a this, we continuously optimize the cluster by configuring the

5283

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
CronJob to run every half an hour. This approach ensures that 
 

the overall gained affinity remains consistently high, allowing  
 

us to maximize the benefits of collocation.
  
B. Trade-Offs in Other Metrics   
    
Our approach may trade off other cluster features, like load   


balancing, as we reallocate containers to optimize service 


 
  
affinity. These trade-offs are inevitable in order to optimize     


network performance. However, in practice, we can effectively 


manage the compromises resulting from our approach.     
!
   
First, the extent of side effects our approach has on the 
 
overall cluster is not significant. As discussed in Section     
! ! 
IV-B2, a few services account for the majority of the cluster’s   

traffic, and we focus optimization and reallocation efforts Fig. 4: Workflow of the multi-stage service partitioning with
on containers associated with these services. In practice, we subproblem representations (each leaf node) and an example.
observe that in each execution, less than 5% of the total
containers are relocated. This minimal proportion of containers MIP-based algorithms. The first step is service partitioning,
has negligible impacts on the overall cluster. where a multi-stage partitioning algorithm splits the input data
Second, we have implemented extra mechanisms to prevent and produces several subproblems. To achieve better optimal-
extreme cases on other metrics: ity, the second step is algorithm selection, where a GCN-
• Resource utilization: The load balance is maintained by based classifier selects the most appropriate algorithm from
the default scheduler of the cluster, and mild imbalance the pool for each subproblem. From here, each subproblem
is acceptable. Even if our approach causes highly skewed is solved independently with its selected algorithm, and then,
loads on some machines, we have a rollback mechanism we combine the solution of each subproblem into an overall
that rolls back the reallocation and utilizes the default placement of containers. Finally, a heuristic algorithm is
scheduler to schedule these containers on skewed ma- employed to calculate a migration path comprising batches of
chines. Furthermore, to prevent these containers from delete and create commands. This path facilitates the transition
causing excessive imbalances again and churn, we will of the current mapping to the new mapping while ensuring
tag them as unschedulable for three days. compliance with SLA and resource requirements.
• Churn: Churn refers to the rate of container movements. B. Service Partitioning
The trade-off in churn is small for the following reasons: At ByteDance, each cluster comprises hundreds or even
i) The maximum number of moved containers is relatively thousands of services. Computing an optimal solution to opti-
small. ii) We focus on optimizing stateless services in mize the overall gained affinity of such large-scale clusters can
the cluster, which have a negligible moving cost. iii) In be time-consuming. Partitioning is a commonly used technique
practice, the half-hourly CronJob will only dry-run if the to deal with this. The most prevalent way is equal-partitioning,
gained affinity does not show a significant improvement which divides the problem into homogeneous subproblems
(i.e., an improvement of over 3%) in the new schedule. [23]–[26]. However, this method is not optimal for problems
Therefore, in practice, the real execution of the realloca- with skewed properties, where certain subproblems are more
tion happens only a few times a day. important and require greater attention, while others are less
IV. RASA A LGORITHM significant and can be ignored. To deal with this, we propose
The RASA algorithm is the core component of our entire a multi-stage service partitioning technique, where sets of
solution, as it determines the mapping of containers to ma- services are iteratively partitioned into more disjoint sets,
chines, thereby directly influencing the actual benefits we can each representing a subproblem. The procedure of multi-
obtain. In this section, we will provide an overview of our stage partitioning can be represented by a hierarchy tree, as
RASA algorithm and then dive into the finer details. illustrated in Fig. 4. Now, we dive into the features we consider
A. Algorithm Overview and the algorithm we use at each stage.
We devised the mathematical programming formulation for 1) Non-Affinity Partitioning: The first stage is partitioning
RASA in Section II-C. A common approach is to employ off- the original service set into two disjoint sets, the affinity set
the-shelf solvers to solve the formulation and obtain promising and the non-affinity set. Services with no affinity relations with
results; we refer to this as the solver-based approach. How- other services belong to the non-affinity set. The services in
ever, while the solver-based approach can achieve optimal the non-affinity set can never contribute to the gained affinity,
optimization quality, it often suffers from an exponential so collocating containers of these services is not necessary.
time complexity [19]–[22] that is unacceptable for large-scale 2) Master-Affinity Partitioning: The second stage is to set
problems. To address this, we propose the RASA algorithm. apart the non-master services from the affinity set. We define
The RASA algorithm in Fig. 3 shows an overview of our  total affinity of a service s as T (s), where T (s) =
the
algorithm. We maintain a scheduling algorithm pool consisting s ∈N (s) ws,s , where N (s) is the neighborhood of vertex
of two solver-based methods - namely column generation and s. Without loss of generality, we assume that the services

5284

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
are indexed in the order of decreasing total affinity. Given the total affinity (weight) between different subsets. We refer
α ∈ [0, 1], we call the top αN services with the largest to this as loss-minimization balanced partitioning. Here, “loss-
total affinity as the master services, and call its complement minimization” means that the total affinity between services
as the non-master services. The master services set and the from different subsets is minimized. “Balanced” means that the
non-master services set are disjoint. In Section V-B, we will number of services in different subsets is close. Specifically,
explain how we determine the value of αN for the master we consider a partition balanced if the number of services in
partitioning step in our real-world deployment. the largest subset does not exceed twice the number of services
Note that the master services are only a small subset of in the smallest subset after the partitioning.
all services while taking up a large portion of the total To achieve this, we propose a heuristic that partitions the
affinity in several practical cases. A power-law function often service set into balanced sets while minimizing the loss of
approximates this skewed distribution of affinity. affinity. Given a service set Sl and its affinity graph Gl , we
Assumption 4.1: The total affinity of the sth service satisfies follow the process below for |E| times (|E| is the edge number
T (s) ∝ s1β for some constant β > 1, for all s = 1, · · · , N . of the graph Gl ), generating a new partition each time:
Prior works provide both empirical [3], [27], [28] and i) Randomly sample h services from Sl .
theoretical [29]–[31] evidences that Assumption 4.1 holds in ii) For each of the h services in the affinity graph Gl , apply
network applications, and is further confirmed by our practical the breadth-first search algorithm.
cluster data as shown in Fig. 5. With this assumption, we prove iii) For each service s that is not among the h sampled
the following lemma 3 . services, if it is firstly visited by s from the h services,
Lemma 1: Under Assumption 4.1 with a power of β > 1, then in this partition, s and s will be placed in the same
for any  ∈ (0, 1], let γ = (β − 1)(1 − ). Then the total subset. This process results in h disjoint service subsets
affinity of the last N − O ln1− N services is bounded by from Sl , forming a partition of Sl .
  After the above process, we obtain |E| ways of partitioning
O lnγ1 N .
1− the service set Sl . We first filter out those not satisfying the
Given any  ∈ (0, 1]. If we let α = O( ln N N ), Lemma balanced condition among these partitions. Then, we select the
1 implies that scheduling only the top O(ln1− N ) services partition that minimizes the loss of affinity between different
leads to just a small loss in the objective, which is o(1). In subsets as the final partition for the service set Sl . The resulting
other words, the set of non-master services can only contribute partition demonstrates a clear balanced feature, while the
minimal affinity to the gained affinity. Thus we can ignore loss-minimization aspect stems from the intuition that each
these services to reduce the time complexity greatly. subset in a good partition contains a set of services within a
neighborhood in Gl . In large-scale industrial scenarios, this
Exponential(0.86)
0.4 Power-law(1.56) heuristic algorithm excels due to its simplicity and paralleliz-
Affinity

Observed able nature, enabling efficient performance without significant


0.2
0.0
loss of affinities after the partitioning process.
0 5 10 15 20 25 30 35 40 5) Summary of Service Partitioning: To summarize, re-
Service
Fig. 5: Fitting exponential and power law distributions to the ferring to Fig. 4, the services from the non-affinity set and
total affinity distribution of 40 services in a production cluster. non-master-affinity set only contain a minimal amount of
3) Compatibility Partitioning: The third stage is to isolate total affinity according to the analysis in IV-B1 and IV-B2,
services with different compatibility requirements, as defined and are therefore deemed as trivial services. Conversely, the
by the matrix b. A machine m is compatible with a service services from the descendant set of the master-affinity set are
s if and only if machine m can host the container of service considered as crucial services, as they encompass most of
s. Since services with no intersecting compatible machines the overall affinities. Note that under Assumption 4.1, crucial
can never be placed together, their containers can be sched- services are relatively small in scale.
uled separately with no loss in the objective. Compatibility To construct subproblems, we need first to ignore the trivial
partitioning is, in fact, the decomposition of the compatibility services. Our method is to construct a new machine set, i.e.,
  M
matrix b. An example is if b = A0 B0 , then compatibility par- for machine m with a total resource Rm , if a container of
titioning will divide the service into two disjoint service sets, trivial services s is initially hosted by machine m, then we
with A and B as their compatibility matrices, respectively. construct a new machine with a new total resource Rm M
− RsS .
This stage will partition the master services into several even After we construct the new machine set, for each type of
smaller disjoint service sets, without hurting optimality. machine specification, a specific number of machines with
4) Loss-Minimization Balanced Partitioning: We ended up that specification are assigned to each crucial service set,
with several service sets after the previous three stages. proportional to the ratio of requested resources by that service
However, the scale of each set could still be massive. Thus, set relative to the total requested resources by all crucial
for each large service set Sl at this stage, we further seek a service sets. Each crucial service set and its assigned machines
balanced disjoint partition of the service set while minimizing form a new subproblem. For trivial services, we do not need
to do any further operations. It is important to note that
3 The full proof can be found in [18]. our algorithm may not be able to successfully deploy all

5285

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
containers in each subproblem. However, a small number of stock formulation. The pattern generation process aims to
failed deployments is considered acceptable, as they will be improve currently obtained patterns and generate high-quality
managed by the default scheduler in the cluster. patterns with a high gained service affinity. The cutting stock
C. Scheduling Algorithm Pool formulation generates the final solution by utilizing a small set
After the service partitioning step, we end up with several of patterns. These techniques effectively reduce the problem
subproblems. Meanwhile, our scheduling algorithm pool pos- size compared to solving the original MIP formulation without
sesses two algorithms tailored for different types of problems, significantly compromising optimality.
which are MIP-based algorithm and column generation algo-
Algorithm 1: Column Generation for RASA
rithm. For each algorithm, we briefly describe how it works,
its characteristics, and the features of its target subproblems. Input : Parameters of Expressions (2) - (9):
{d, Rs , Rm , b, h, w}, tmax
1) MIP-Based Algorithm (MIP): RASA can naturally be Output: Scheduling decision x ∈ NN ×M
formulated as a mixed integer programming (MIP), as shown begin
in Expressions (2) - (9). The MIP-based algorithm for solving // Initialize patterns, let
RASA feeds its MIP formulation into an off-the-shelf mathe- P = {P1 , . . . , PM }
matical programming solver [32], [33] directly. Note that the 1 Pm ← diag(b1,m , . . . ,bN,m ), ∀m ∈ M
principles behind these solvers could be quite complicated and 2 while IsTerminate y, P  , tmax do
// Solve the cutting stock
are thus out of the scope of this paper4 . formulation
Characteristics: MIP-based algorithm guarantees an optimal 3 y ← SolveCuttingStock (P, d)
solution (within a tolerance) but has a runtime exponential // Generate new patterns
to the input size, rendering it only acceptable for small-scale 4 P  ← GenPattern (d, Rs , Rm , b, h, P)
problems but impractical for industry-scale applications. 5 P ← P ∪ P
Targets: If a subproblem is relatively small in scale yet has 6 x ← Round (y, d, Rs , Rm , b, h, w, P)
a significant total affinity, employing the MIP-based algorithm 7 return x
is a favorable option. Characteristics: The column generation algorithm can solve
2) Column Generation Algorithm (CG): To illustrate the large-scale MIPs and often performs efficiently in practice
principles of the column generation algorithm, we introduce [37], [39]–[41]. Thus, we consider column generation to
the concepts of patterns and the cutting stock formulation of have a sub-optimal optimization quality and an acceptable
RASA. A pattern p ∈ NN represents a feasible placement computation time.
of service containers on a machine, satisfying resource, anti-
Targets: For a subproblem of a medium scale with non-
affinity, and schedulable constraints. The cutting stock formu-
negligible total affinity, the column generation algorithm is
lation is an equivalent formulation of the MIP formulation
a promising option since it strikes a good balance between
(Expressions (2) - (9)) of RASA. In this formulation, decision
efficiency and optimality.
variables determine the pattern used by each machine. Given
3) Summary of Algorithm Pool: Considering the NP-hard
a pattern set Pm of machine m, which consists of feasible
(l) (l) (l) nature of the RASA problem [18], the worst-case time com-
patterns on machine m, let p(l) = p1 , p2 , . . . , pN ∈ Pm plexity for both algorithms in our pool is exponential to the
be the lth pattern of machine m, and ym,l be a binary decision input size [18], [21]. However, in practice, these algorithms
variable denoting whether the container placement on machine exhibit varying levels of efficiency and solution quality for
m follows the pattern p(l) or not. different subproblems, depending on factors such as problem
Algorithm 1 presents the framework of the column gen- scale and affinity structure. By considering the features of each
eration algorithm5 . In each iteration of the while loop, subproblem, assigning the most appropriate algorithm to it can
SolveCuttingStock solves the cutting stock formula- ensure that the algorithm achieves both efficiency and quality.
tion. Note that SolveCuttingStock relaxes the integer
constraints of decision variables and produces a fractional D. Algorithm Selection
solution y for time efficiency. Then, GenPattern solves the With a set of subproblems after partitioning and the two
formulation for generating feasible patterns and produces new scheduling algorithms, the next step is to select an appropriate
patterns P  for the next iteration. The algorithm repeats this algorithm for each subproblem. The algorithm selection takes
process until no more patterns with negative reduced cost are a subproblem as input and uses a graph learning model to
found6 , or the runtime exceeds the time-out parameter tmax select the more appropriate algorithm between CG and MIP.
(i.e., IsTerminate). The final step is to Round y to obtain 1) Graph Classifier: Empirically, selecting between CG
an integral solution x. and MIP should factor in the finer graphical structure of the
In summary, the column generation algorithm solves RASA subproblem. In this case, simple heuristics would fail since
by iteratively generating patterns and solving the cutting it is practically infeasible to design rules that can capture all
4 For
the structural information of the input. Motivated by a surge of
more details, we refer readers to [34], [35] and [36].
5 The interest in graph learning [42]–[45] in recent years, we propose
omitted formulations of cutting stock and pattern generation can be
found in our GitHub repository [17]. a classifier based on graph convolution network (GCN) to
6 For more details, please refer to [37] and [38]. select the appropriate algorithm.

5286

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
For the subproblem k with a service set Sk ⊆ S, let containers (lcreate ). These sets are devised iteratively until the
G [Sk ] = Sk , Ek  be the sub-graph induced by Sk in the containers match the new mapping. The choice of containers to
affinity graph G and Fk be a matrix with a size of N × 2. Let delete or create on each machine depends on the offline ratio
[rs , ds ] be the sth row of Fk , which represents the resource (offs ) of each service s ∈ S which refers to the proportion
demand and the containers number of service s. We define of containers from service s that have been deleted and not
Ĝk = Sk , Ek , Fk  as the feature graph of subproblem k, yet recreated, relative to the total number of containers for
which is used to select an algorithm for the subproblem. service s. More specifically, the SelectDelete function
Definition 2 (Graph Classification):  Given a set of fea- on machine m, will firstly filter out all containers on machine
ture graphs G = Ĝ1 , Ĝ2 , . . . , ĜK and their labels m that need to be migrated, then select the one with the
{1 , 2 , . . . , K }, where k ∈ L = {CG, MIP}, ∀k ∈ [K]. lowest offline ratio. Similarly, the SelectCreate function
on machine m, filters containers that meet the following
We need to learn a function f : G → L, so that f Ĝk
criteria: 1) they are scheduled to machine m in the new
approximates k , ∀k ∈ {1, 2, . . . , K}. mapping, 2) they have been deleted but not yet created, and 3)
We propose to parameterize f with the following GCN their requested resource does not exceed the available resource
model: given subproblem’s feature graph Ĝ as input, it is first on machine m. It then selects a container whose service has the
processed by a two-layer GCN with ReLU as the activation highest offline ratio. These offline-ratio-based heuristics ensure
function. Then, graph readout is applied to get a hidden vector. that SLA constraints are maintained during reallocation.
Lastly, a linear layer with the softmax function calculates the
probability of selecting each label based on the hidden vector. Algorithm 2: Compute a Migration Path
To learn f , we obtain our train set by randomly sampling Input : An original mapping xcurr ∈ NN ×M and a new
1000 subproblems and their corresponding feature graphs from mapping xnew ∈ NN ×M
Output: Migration paths p
four real clusters7 . To label a subproblem, we attempt each
begin
subproblem with the two candidate algorithms and choose the 1 xcurr ← xorig
one that returns better objective within one-minute time limit. 2 migration path ← [ ]
E. Migration Path 3 while xcurr = xnew do
After the algorithm selection step, we have solved all // Get a list of containers to
delete
subproblems and obtained a new mapping of containers to
4 ldelete ← [ ]
machines. However, transitioning to this new mapping neces- for each m ∈ M do
 ← ldelete +
sitates the reallocation of a portion of containers within the ldelete

5
cluster. Container reallocation involves two steps: deleting the delete, SelectDelete (m, xcurr , xnew ) , m
container from its original machine and then creating it on
the target machine. This reallocation process is subject to two 6 p ← p + [ldelete ]
// Get a list of containers to
specific requirements: create
• SLA constraints, which can be temporarily relaxed, allow 7 lcreate ← [ ]
each service s to maintain at least 75% of its containers for each m ∈ M do
alive during reallocation. 8
 ← lcreate +
lcreate

• Satisfying the resource constraints in Section II-C. create, SelectCreate (m, xcurr , xnew ) , m
The first requirement restricts us from deleting all containers
9 p ← p + [lcreate ]
at once and subsequently creating new ones. The second
requirement necessitates the deletion of some containers first 10 return p
to free up resources before new ones can be created. To effec- Each set of the final migration path list contains a series of
tively employ the RASA algorithm in practical scenarios, we delete or create commands, which can be executed in parallel
must determine the optimal sequence for deleting and creating on different machines. However, it is important to note that the
containers while ensuring compliance with the aforementioned commands in the i-th set can only be executed after completing
two requirements during the reallocation. This problem is all the commands in the i − 1-th set.
known as the migration path problem.
F. Running Example
Algorithm 2 details the process of computing a migration
Our approach begins with service partitioning to reduce the
path using the original and new mappings of containers to
number of services that need to be considered in optimization.
machines. The migration path consists of a list of command
Fig. 4 shows our four-partitioning process, where we illustrate
sets containing commands to delete or create containers on
the properties of affinity relations in a cluster and show how
specific machines. For instance, (delete, c1 , m2 ) refers to
we can leverage them to reduce unnecessary computations.
delete the container c1 on machine m2 .
Initially, we perform non-affinity partitioning to identify
In each iteration, the algorithm generates two command sets:
services lacking affinity with other services. These services do
one for deleting containers (ldelete ) and another for creating
not need to be reallocated for collocation. For the remaining
7 Denoted as T1 - T4, which are different from the testing datasets M1 - services, we observe a common property of affinity in real-
M4 in Section V. world scenarios, which is skewness. This means that a few

5287

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
services contribute significantly to the overall affinity, while TABLE II: Scales of Experimental Datasets
the remaining majority have minimal impact and can be Cluster Name #Service #Container #Machine
disregarded. Based on this observation, we further divide the Microservice Cluster 1 (M1) 5, 904 25, 640 977
Microservice Cluster 2 (M2) 10, 180 152, 833 5, 284
services into two subsets. Next, we perform compatibility Microservice Cluster 3 (M3) 547 3, 485 96
partitioning, exemplified by IPv4 and IPv6 support. If one ser- Microservice Cluster 4 (M4) 10, 682 113, 261 4, 365
vice requires IPv4-compatible machines and another requires
No-Partition Random-Partition KaHIP Multi-Stage-Partition
IPv6 support, they cannot be deployed on the same machine.

Total Gained Affinity (%)


80
Attempting to collocate them is unnecessary; thus, we separate
60
them into two subproblems. In some cases, even after the 40
previous steps, a subproblem still contains a large number of 20

OOT

OOT
OOT

OOT
services. To address this, we introduce a final partitioning step 0
M1 M2 M3 M4
that partitions a service set into multiple smaller services while Fig. 6: Comparison of the gained affinity of different parti-
minimizing the affinity between different service sets. tioning algorithms under a one-minute time-out.
After partitioning, we obtain multiple subproblems. We need problem involves services with interactions, it is not
to solve each subproblem independently. Two common algo- considered granular, and therefore, POP is not applicable.
rithms, MIP-based and column-generation, are used to solve Nevertheless, we include POP as a baseline method since
the formulations. These two algorithms each have their own it represents one of the state-of-the-art approaches for
advantages and are suitable for different problem structures, so solving large-scale optimization problems using solvers.
we train a classifier model to select the best algorithm for each • K8 S +: An online algorithm in [14] that simulates the

subproblem. Comparison experiments in Section V-C validate Kubernetes scheduling processing - filter and score. We
the effectiveness of this selection approach. The solutions of use a scoring function that considers service affinity.
all subproblems are combined to form the final solution, in • A PPL S CI 19: An extension of the offline heuristic algo-

which some containers are moved to other machines in the rithm in [46], which is based on the min-weight graph
cluster. Finally, we reallocated these containers accordingly. partitioning and heuristic packing techniques.
• RASA: The full approach we proposed in Section IV.
V. E VALUATIONS
• O RIGINAL : Original assignments from the model in
We begin by describing the experiment setup in Section
ByteDance production combine the idea of first-fit with
V-A. Then, we delve into examining the effectiveness of
the K8S’s filter and score process.
service partitioning and algorithm selection in Sections V-B
B. Comparison on Service Partitioning
and V-C, respectively. Next, we provide the results of gained
Different Partitioning Algorithms. To demonstrate the effec-
affinity and running times in Sections V-D and V-E. Finally, in
tiveness of the service partitioning step, we compare different
Section V-F, we present the benefits of the deployed solution
service partitioning algorithms:
in the production environment at ByteDance.
• N O -PARTITION : The approach considers the entire problem
Note that from Section V-B to V-E, we evaluate the al-
without partitioning the services.
gorithms designed for solving the optimization problem of
• R ANDOM -PARTITION : The approach that uniformly random
RASA, the primary criteria for comparison are the optimiza-
partitions the services set in the service partitioning step.
tion objective (total gained affinity) and algorithm running
• K A HIP: The approach that adopts KaHIP [47], [48], which
times. Therefore, we only ran different algorithms and ablation
is the state-of-the-art for min-weight balanced graph cut
experiments in simulation without executing any deployments.
problem, to partition the services.
On the other hand, in Section V-F, to assess the overall
• M ULTI -S TAGE -PARTITION : Our service partitioning algo-
effectiveness of our approach, we deployed our solution in a
rithm as described in Section IV-B, which adopts a multi-
production environment and presented results related to end-
stage service partitioning technique.
to-end latency and request error rate. Fig. 6 shows the gained affinities under different service
A. Experimental Setup partitioning methods. On average, our method outperforms
The experiment is to validate that the RASA algorithm R ANDOM -PARTITION and K A HIP by 52.25% and 12.69%.
can efficiently compute a map of containers to machines, For N O -PARTITION, the program succeeds only for one small-
improving the cluster’s overall gained affinity. We introduce scale cluster (M3). Our M ULTI -S TAGE -PARTITION outper-
the datasets and baselines first: forms all the other methods. Furthermore, we evaluated the
Datasets. We conduct experiments on four microservice clus- optimality loss and time overhead associated with our M ULTI -
ters containing services, machines, and traffic (or affinity) data. S TAGE -PARTITION method. The results [18] show that the loss
All these data are collected from real traces of microservice is generally below 12%, and the time overhead constitutes less
clusters at ByteDance. Tab. II summarizes the dataset. than 10% of the total execution time of the RASA algorithm.
Baselines. We compare the following algorithms8 : Different Master Ratio. Master affinity partitioning plays a
• POP: An algorithm in [23] to efficiently solve gran-
crucial role in reducing computation time, so it is worth further
ular resource allocation problems. However, since our analyzing its properties. Fig. 7 illustrate how the gained affinity
8 We used Gurobi 9.5 [32] as the solver for all the solver-relevant algorithms.
and the total affinity of master services vary with different
master ratios under the one-minute time-out constraint.

5288

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
Our analysis and Lemma 1 indicate that the master ratio α schedule on average. RASA also outperforms POP, K8 S +
should be set to O((ln1− N )/N ). In practice, we empirically and A PPL S CI 19 by 54.91%, 54.69% and 17.66% on average,
set the master ratio α = 45 · (ln0.66 N )/N . Fig. 7 also plots respectively. In summary, RASA achieves the best optimiza-
the master ratios we use in our algorithm. Our chosen master tion quality for all clusters under a one-minute time-out.
ratio is close to the optimal value for all clusters. POP fails due to its random partitioning, which causes sig-
In general, as the master ratio increases, the total affinity of nificant loss. O RIGINAL and K8 S + also perform suboptimally
master services quickly approaches 1.0, and the gained affinity since they are both online heuristic algorithms with limited
increases to a peak before either plateauing for small and ability to optimize schedules. As for A PPL S CI 19, though the
medium-scale clusters or decreasing for large-scale clusters. graph partitioning performs well, the heuristic packing after
The decrease in gained affinity is due to the limited time each partitioning frequently fails since the original algorithm
frame of 1 minute for solving large-scale clusters, which is can only deal with one machine size. The heuristic packing
insufficient for our algorithm to explore the search space and did not consider problems with multiple machine types.
obtain a good solution fully.
E. Efficiency Results
C. Comparison on Algorithm Selection
We now investigate the runtime of different algorithms
To demonstrate the effectiveness of the algorithm selection
from two aspects: (1) the minimum possible runtime, and
step, we compare different algorithm selection settings.
(2) the solution quality under the same time-out constraint.
• CG: Our approach except that the graph classifier is replaced
For reference, from our practical experience, an algorithm
by labeling every subproblem with CG.
that produces a schedule with a runtime (consider the 95th
• MIP: Our approach except that the graph classifier is
percentile or p95) under 60 seconds is practically valuable.
replaced by labeling every subproblem with MIP.
Those with a runtime under 5 minutes could cause minor
• H EURISTIC : An empirical heuristic that calculates the av-
errors when scheduling due to the nonnegligible changes in
erage container number of all services and the average
the cluster snapshot, while those with a runtime over 5 minutes
machine number of all machine types. If the former is
are considered impractical.
greater than the latter, then we select CG. Otherwise, we
For RASA and POP, the algorithm iteratively refines its
select MIP.
solution until it converges to optimality. If we halt the program
• MLP-BASED : The approach that takes the mean value of
in the middle of execution, the algorithm can still return the
each feature for all services and then processes it by a
current best result. Thus, by setting a time-out parameter,
multi-layer perception (MLP) [49]. This method completely
we can control the max runtime of RASA and POP. This
ignores the topology of the affinity graph.
allows us to plot how the optimization quality changes over
• GCN-BASED : The approach described in Section IV-D
runtime for RASA and POP. In contrast, manipulating the
which adopts the GCN-based algorithm selection.
Fig. 8 shows the gained affinities under different algorithm max runtime is rather difficult for A PPL S CI 19 and K8 S + since
selection methods. Except GCN-BASED, no method can these two algorithms do not produce any feasible solutions
achieve the best gained affinity across all clusters. We find unless the algorithm fully executes.
that exclusively selecting either CG or MIP does not yield the Fig. 10 reveals the relations between the optimization qual-
best optimization results, as CG outperforms MIP in certain ity and the runtime for RASA and POP. A point closer to
clusters and less effectively in others. H EURISTIC is derived the top-left is desired, meaning its quality is superior, and
from our practical experience, which works well for clusters runtime is shorter. As shown, RASA outperforms the other
except for M2 and M4. The reason is that M2 and M4 are algorithms in terms of both quality and efficiency. Note that the
large clusters with complex features in graph structures and improvement of solution over time is not significant for both
feature matrices, and the rule is not enough to capture all RASA and POP, but for completely opposite reasons. For
these features. Similarly, the MLP-BASED approach, which RASA, partitioning is able to separate out small subproblems
ignores the affinity topology compared to GCN-BASED and that have high affinity. Employing a minimum time limit can
fails to identify the best choice for clusters like M1 and M3. already achieve satisfactory results, and increasing the time
To summarize, GCN-BASED is a general algorithm selection limit further does not yield significant improvements. How-
approach that works well across different datasets. ever, for POP, the subproblems after partitioning remain large
in scale, resulting in inferior solutions compared to RASA.
D. Optimization Quality Results
To get a fair comparison of the optimization quality, we Increasing the time limit leads to only marginal improvements
compare POP, A PPL S CI 19, K8 S + and RASA under the exact since the search space is too large.
time-out requirements. Considering the service level objective F. Performance Improvements in Production
(SLO) in practical scenarios, we set the time-out to one minute. The solution described in Section III has been deployed in
If an algorithm cannot produce a result in one minute, we will production at ByteDance for several critical clusters, covering
mark it as OOT (Out of Time). resources totaling more than a million CPU cores. In this
Fig. 9 illustrates the gained affinity of different algorithms. section, we present the results observed from our deployed
For the Microservice dataset, RASA improves the gained solution in production environments. In the production setting,
affinity by more than 13.83× compared to the O RIGINAL we adopted an altered RPC framework that allows collocated

5289

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
Fig. 7: Under one-minute time-out constraint, the gained affinity and the total affinity of master services under different given
master ratios, and the value of our chosen master ratios.

and resulting in improved average latency and error rates.


Fig. 13 shows the improvements of all services that we
have currently optimized in a cluster. We utilize a weighted
metric encompassing all the services considered in our RASA
algorithm. The weight assigned to each service pair in the
Fig. 8: Comparison of the gained affinity of different algorithm metric is based on its queries per second (QPS) relative to the
selections under a one-minute time-out. total QPS of all services. The WITH RASA weighted latency
and error rate demonstrates a significant 23.75% improvement
and a 24.09% reduction compared to the WITHOUT RASA,
respectively. This further validates that with the optimization
of the RASA algorithm and more containers collocated, we
achieve greater improvements in overall latency and error rate.
Furthermore, for the four individual service pairs as well
Fig. 9: Gained affinity comparisons of different algorithms for as for the overall cluster, the average absolute gap of WITH
RASA under a one-minute time-out. RASA to O NLY C OLLOCATED is less than 10% for both
latency and error rate. These results show that the affinity has
containers to communicate through inter-process communica-
been sufficiently optimized.
tion (IPC) instead of using the network.
To validate the effectiveness of improving service perfor- VI. R ELATED W ORK
mance and stability through the deployment of RASA, we Resource allocation for clusters. Resource allocation for
present the results of end-to-end latency and request error rate clusters has been studied extensively in the past years. Most
(all metrics are normalized with a maximum value of 1.0) of the previous research focuses on heuristic algorithms [50]–
for both the WITH RASA and WITHOUT RASA cases. W ITH [59]. An example is Eigen [59], which proposes a hierarchical
RASA refers to containers whose placement is optimized with resource management system and three resource optimization
the RASA algorithm, where more of them are collocated. In algorithms based on heuristics to improve resource allocation
contrast, WITHOUT RASA refers to the containers without ratio without hurting resource availability.
optimizing service affinity, which is essentially the O RIGINAL More recently, adopting solvers in resource allocation
algorithm that we described in Section V-A. To show the gained popularity due to its capability of producing high-
optimization upper bound that could be achieved, we also quality solutions [10]–[12], [23], [60]–[63]. [10] uses MIP
present results of O NLY C OLLOCATED. For a service pair, to model the scheduling of containers for long-running ap-
certain requests between them are routed between collocated plications and adopt an ILP-based solver to solve the MIP.
containers on the same machine. O NLY C OLLOCATED solely [12] studies the stochastic bin packing problem derived from
considers the latency and error rate of these collocated con- container scheduling, where they reformulate the problem and
tainers, providing a close approximation to the scenario where employ a solver-based cutting stock approach. The most recent
all containers are collocated on a single machine. work [64] studies the tenant placement problem in a Database-
Fig. 11 and 12 demonstrate the improvements in end-to- as-a-Service cluster with a focus on minimizing the probability
end latency and request error rate achieved by the RASA of failovers. The authors proposed mathematical programming
algorithm for four critical business service pairs in production. models to address this problem and utilized solvers to solve it.
Each subplot represents the average metrics of all containers The results demonstrate significant advantages of this approach
of the service pair with the RASA algorithm optimizing (solid over previous state-of-the-arts. Due to the poor efficiency of
line) and without the RASA algorithm optimizing (blue dashed solvers, only a small portion of solver-based research focuses
line) its placement. The relative improvements in latency range on large-scale clusters, and they are often equipped with some
from 16.77% to 72.16%, and the relative improvements in acceleration techniques to meet the efficiency requirements.
error rate range from 13.27% to 64.42%. These improvements For instance, RAS [11] uses MIP to model the capacity reser-
are due to the RASA algorithm enabling more containers to be vation problem of large-scale clusters. Multi-phase solving and
collocated, allowing more requests to benefit from collocation, variable aggregation techniques are applied to meet the SLO of

5290

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
Fig. 10: The optimization quality (concerning total gained affinity) under different runtimes.

Fig. 11: Comparison of (normalized) end-to-end latency for four critical service pairs in production.

Fig. 12: Comparison of (normalized) request error rate for four critical service pairs in production.

Fig. 13: Comparison of weighted end-to-end latency and error rate for considered services in production.
solving within one hour. POP [23] proposes a general solution On top of it, we propose a novel approach that utilizes
for granular resource allocation problems that can produce a multi-stage partitioning technique to divide a given task
high-quality solutions efficiently. into several subproblems. With a GCN classifier, each sub-
Resource allocation with service affinity. Previous works problem, based on its scale and impact on the objective, is
on service affinity in the cloud often refer to collocating the assigned to be solved by an algorithm from the candidate
containers of services or virtual machines to the same machine pool. Notably, we show that exact algorithms only need to
or the same group of machines to minimize cross-group be applied to a small fraction of services with top affinities
network communication. For example, as [65] studies the to guarantee asymptotic optimality. We further propose a
virtual machine placement problem and [11] studies capacity heuristic algorithm to compute a migration path that can be
reservation problem in the cloud, they both take reducing directly executed, transitioning to the new placement where
the cross-datacenter traffic into account. Moreover, [10], [13], service affinity is well optimized. Experimental results show
[46], [66] study the container scheduling, where they instead that our algorithm achieves both quality and efficiency on real
consider minimizing the inter-machine traffic between con- traces, and the successful large-scale production deployment
tainers. Besides that, a popular container orchestration system, shows that our solution significantly improves stability and
Kubernetes, also provides an affinity feature [14], [67], which service performance. With the trend of moving data systems
allows the developer to customize their affinity requirements. to containers, our solution can improve the performances of
these data systems.
VII. C ONCLUSION
Service affinity can greatly improve stability and enhance In general, adopting solvers for large-scale problems is rare
system performance. However, such a topic with tremendous due to efficiency concerns, despite their enormous potential.
business and environmental impact remains largely understud- In future works, we aim to explore more high-quality-high-
ied. To fill this gap, we present a formulation of this problem efficiency solver-based algorithms in databases, cloud com-
as Resource Allocation with Service Affinity (RASA). puting, and distributed systems.

5291

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [24] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M.
Hellerstein, “Distributed graphlab: A framework for machine learning
[1] Y. Gan, Y. Zhang, D. Cheng, A. Shetty, P. Rathi, N. Katarki, A. Bruno, in the cloud,” Proc. VLDB Endow., vol. 5, no. 8, pp. 716–727, 2012.
J. Hu, B. Ritchken, B. Jackson, K. Hu, M. Pancholi, Y. He, B. Clancy, [25] F. Bourse, M. Lelarge, and M. Vojnovic, “Balanced graph edge parti-
C. Colen, F. Wen, C. Leung, S. Wang, L. Zaruvinsky, M. Espinosa, tion,” in KDD. ACM, 2014, pp. 1456–1465.
R. Lin, Z. Liu, J. Padilla, and C. Delimitrou, “An open-source benchmark [26] S. Papadimitriou, J. Sun, C. Faloutsos, and P. S. Yu, “Hierarchical,
suite for microservices and their hardware-software implications for parameter-free community discovery,” in ECML/PKDD (2), ser. Lecture
cloud & edge systems,” in ASPLOS. ACM, 2019, pp. 3–18. Notes in Computer Science, vol. 5212. Springer, 2008, pp. 170–187.
[2] W. Hasselbring and G. Steinacker, “Microservice architectures for scala- [27] R. Mayer and H. Jacobsen, “Hybrid edge partitioner: Partitioning large
bility, agility and reliability in e-commerce,” in ICSA Workshops. IEEE power-law graphs under memory constraints,” in SIGMOD Conference.
Computer Society, 2017, pp. 243–246. ACM, 2021, pp. 1289–1302.
[3] H. Aragon, S. Braganza, E. F. Boza, J. Parrales, and C. L. Abad, [28] Z. Wei, X. He, X. Xiao, S. Wang, Y. Liu, X. Du, and J. Wen, “Prsim:
“Workload characterization of a software-as-a-service web application Sublinear time simrank computation on large power-law graphs,” in
implemented with a microservices architecture,” in WWW (Companion SIGMOD Conference. ACM, 2019, pp. 1042–1059.
Volume). ACM, 2019, pp. 746–750. [29] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and
[4] L. Baresi and M. Garriga, “Microservices: The evolution and extinction E. Upfal, “Random graph models for the web graph,” in FOCS. IEEE
of web services?” in Microservices, Science and Engineering. Springer, Computer Society, 2000, pp. 57–65.
2020, pp. 3–28. [30] L. A.Adamic and B. A.Huberman, “The nature of markets in the world
[5] “D OCKER: Persist the db,” https://docs.docker.com/get-started/05\ wide web,” Quarterly Journal of Electronic Commerce, vol. 1, no. 1,
persisting\ data, 2024. pp. 5–12, 2000.
[6] “K UBERNETES: Persistent volumes,” https://kubernetes.io/docs/ [31] M. E. Newman, “Power laws, pareto distributions and zipf’s law,”
concepts/storage/persistent-volumes, 2024. Contemporary physics, vol. 46, no. 5, pp. 323–351, 2005.
[7] “AWS: Amazon memorydb for redis,” https://aws.amazon.com/cn/ [32] “G UROBI: Gurobi optimizer reference manual,” https://www.gurobi.
memorydb/, 2024. com, 2024.
[8] “R EDIS: Run redis stack on docker,” https://redis.io/docs/stack/ [33] “G OOGLE OR-T OOLS,” https://developers.google.com/optimization,
get-started/install/docker, 2024. 2024.
[9] “A PACHE K AFKA: Quick start - docker,” https://developer.confluent.io/ [34] P. Laborie, J. Rogerie, P. Shaw, and P. Vilı́m, “IBM ILOG CP optimizer
quickstart/kafka-docker, 2024. for scheduling - 20+ years of scheduling with constraints at IBM/ILOG,”
[10] P. Garefalakis, K. Karanasos, P. R. Pietzuch, A. Suresh, and S. Rao, Constraints An Int. J., vol. 23, no. 2, pp. 210–250, 2018.
“Medea: Scheduling of long running applications in shared production [35] W. E. Hart, J. Watson, and D. L. Woodruff, “Pyomo: Modeling and
clusters,” in EuroSys. ACM, 2018, pp. 4:1–4:13. solving mathematical programs in python,” Math. Program. Comput.,
[11] A. Newell, D. Skarlatos, J. Fan, P. Kumar, M. Khutornenko, M. Pundir, vol. 3, no. 3, pp. 219–260, 2011.
Y. Zhang, M. Zhang, Y. Liu, L. Le, B. Daugherty, A. Samudra, P. Baid, [36] G. L. Nemhauser, M. W. P. Savelsbergh, and G. Sigismondi, “Minto,
J. Kneeland, I. Kabiljo, D. Shchukin, A. Rodrigues, S. Michelson, a mixed integer optimizer,” Oper. Res. Lett., vol. 15, no. 1, pp. 47–58,
B. Christensen, K. Veeraraghavan, and C. Tang, “RAS: continuously 1994.
optimized region-wide datacenter resource allocation,” in SOSP. ACM, [37] M. E. Lübbecke and J. Desrosiers, “Selected topics in column genera-
2021, pp. 505–520. tion,” Oper. Res., vol. 53, no. 6, pp. 1007–1023, 2005.
[12] J. Yan, Y. Lu, L. Chen, S. Qin, Y. Fang, Q. Lin, T. Moscibroda, [38] K. Lin, M. Ehrgott, and A. Raith, “Integrating column generation in a
S. Rajmohan, and D. Zhang, “Solving the batch stochastic bin packing method to compute a discrete representation of the non-dominated set of
problem in cloud: A chance-constrained optimization approach,” in multi-objective linear programmes,” 4OR, vol. 15, no. 4, pp. 331–357,
KDD. ACM, 2022, pp. 2169–2179. 2017.
[13] C. Mommessin, R. Yang, N. V. Shakhlevich, X. Sun, S. Kumar, [39] H. Dyckhoff, “A new linear programming approach to the cutting stock
J. Xiao, and J. Xu, “Affinity-aware resource provisioning for long- problem,” Oper. Res., vol. 29, no. 6, pp. 1092–1104, 1981.
running applications in shared clusters,” CoRR, vol. abs/2208.12738, [40] J. Gondzio, P. González-Brevis, and P. A. Munari, “New developments
2022. in the primal-dual column generation technique,” Eur. J. Oper. Res., vol.
[14] “K UBERNETES: Assigning pods to nodes,” https://kubernetes.io/docs/ 224, no. 1, pp. 41–51, 2013.
concepts/scheduling-eviction/assign-pod-node, 2024. [41] C. Barnhart, E. L. Johnson, G. L. Nemhauser, M. W. P. Savelsbergh,
[15] S. Sudevalayam and P. Kulkarni, “Affinity-aware modeling of cpu and P. H. Vance, “Branch-and-price: Column generation for solving huge
usage for provisioning virtualized applications,” in 2011 IEEE 4th integer programs,” Oper. Res., vol. 46, no. 3, pp. 316–329, 1998.
International Conference on Cloud Computing, 2011, pp. 139–146. [42] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph
[16] J. Sonnek, J. Greensky, R. Reutiman, and A. Chandra, “Starling: neural networks?” in ICLR. OpenReview.net, 2019.
Minimizing communication overhead in virtualized computing platforms [43] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,”
using decentralized affinity-aware migration,” in 2010 39th International IEEE Trans. Knowl. Data Eng., vol. 34, no. 1, pp. 249–270, 2022.
Conference on Parallel Processing, 2010, pp. 228–237. [44] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
[17] “Service-affinity-scheduling,” https://github.com/bytedance/ convolutional networks,” in ICLR (Poster). OpenReview.net, 2017.
Service-Affinity-Scheduling, 2024. [45] J. B. Lee, R. A. Rossi, and X. Kong, “Graph classification using
[18] “Supplementary materials - resource allocation with service affinity structural attention,” in KDD. ACM, 2018, pp. 1666–1674.
in large-scale cloud envoronments,” https://github.com/bytedance/ [46] Y. Hu, C. de Laat, and Z. Zhao, “Optimizing service placement for
Service-Affinity-Scheduling/blob/main/supplementary-materials/ microservice architecture in clouds,” Applied Sciences, vol. 9, no. 21, p.
supplementary-materials.pdf, 2024. 4663, 2019.
[19] “SCIP: Solving constraint integer programs,” https://www.scipopt.org/, [47] P. Sanders and C. Schulz, “Think locally, Act globally: Highly balanced
2024. graph partitioning,” in Proceedings of the 12th International Symposium
[20] “Branch and cut in cplex,” https://www.ibm.com/docs/en/icos/12.10.0? on Experimental Algorithms (SEA’13), ser. LNCS, vol. 7933. Springer,
topic=concepts-branch-cut-in-cplex, 2024. 2013, pp. 164–175.
[21] A. Basu, M. Conforti, M. D. Summa, and H. Jiang, “Complexity of [48] S. Schlag, C. Schulz, D. Seemaier, and D. Strash, “Scalable edge parti-
branch-and-bound and cutting planes in mixed-integer optimization - tioning,” in Proceedings of the 21th Workshop on Algorithm Engineering
II,” Comb., vol. 42, no. 6, pp. 971–996, 2022. and Experimentation (ALENEX). SIAM, 2019, pp. 211–225.
[22] J. E. Mitchell, “Integer programming: Branch and cut algorithms,” in [49] M. Kubat, “Neural networks: A comprehensive foundation by simon
Encyclopedia of Optimization. Springer, 2009, pp. 1643–1650. haykin, macmillan, 1994, ISBN 0-02-352781-7,” Knowl. Eng. Rev.,
[23] D. Narayanan, F. Kazhamiaka, F. Abuzaid, P. Kraft, A. Agrawal, vol. 13, no. 4, pp. 409–412, 1999.
S. Kandula, S. P. Boyd, and M. Zaharia, “Solving large-scale granular [50] W. Khallouli and J. Huang, “Cluster resource scheduling in cloud
resource allocation problems efficiently with POP,” in SOSP. ACM, computing: literature review and research challenges,” J. Supercomput.,
2021, pp. 521–537. vol. 78, no. 5, pp. 6898–6943, 2022.

5292

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.
[51] C. Delimitrou and C. Kozyrakis, “Paragon: Qos-aware scheduling for
heterogeneous datacenters,” in ASPLOS. ACM, 2013, pp. 77–88.
[52] A. Rahimikhanghah, M. Tajkey, B. Rezazadeh, and A. M. Rahmani,
“Resource scheduling methods in cloud and fog computing environ-
ments: A systematic literature review,” Clust. Comput., vol. 25, no. 2,
pp. 911–945, 2022.
[53] H. M. Demoulin, J. Fried, I. Pedisich, M. Kogias, B. T. Loo, L. T. X.
Phan, and I. Zhang, “When idling is ideal: Optimizing tail-latency for
heavy-tailed datacenter workloads with perséphone,” in SOSP. ACM,
2021, pp. 621–637.
[54] K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, “Sparrow: dis-
tributed, low latency scheduling,” in SOSP. ACM, 2013, pp. 69–84.
[55] S. Venkataraman, A. Panda, G. Ananthanarayanan, M. J. Franklin, and
I. Stoica, “The power of choice in data-aware cluster scheduling,” in
OSDI. USENIX Association, 2014, pp. 301–316.
[56] O. Hadary, L. Marshall, I. Menache, A. Pan, E. E. Greeff, D. Dion,
S. Dorminey, S. Joshi, Y. Chen, M. Russinovich, and T. Moscibroda,
“Protean: VM allocation service at scale,” in OSDI. USENIX Associ-
ation, 2020, pp. 845–861.
[57] S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy, A. Tu-
manov, J. Yaniv, R. Mavlyutov, I. Goiri, S. Krishnan, J. Kulkarni, and
S. Rao, “Morpheus: Towards automated slos for enterprise clusters,” in
OSDI. USENIX Association, 2016, pp. 117–134.
[58] E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and
L. Zhou, “Apollo: Scalable and coordinated scheduling for cloud-scale
computing,” in OSDI. USENIX Association, 2014, pp. 285–300.
[59] J. Y. Li, J. Zhang, W. Zhou, Y. Liu, S. Zhang, Z. Xue,
D. Xu, H. Fan, F. Zhou, and F. Li, “Eigen: End-to-end resource
optimization for large-scale databases on the cloud,” Proc. VLDB
Endow., vol. 16, no. 12, p. 3795–3807, sep 2023. [Online]. Available:
https://doi.org/10.14778/3611540.3611565
[60] A. Tumanov, J. Cipar, G. R. Ganger, and M. A. Kozuch, “Alsched:
algebraic scheduling of mixed workloads in heterogeneous clouds,” in
SoCC. ACM, 2012, p. 25.
[61] L. Suresh, J. Loff, F. Kalim, S. A. Jyothi, N. Narodytska, L. Ryzhyk,
S. Gamage, B. Oki, P. Jain, and M. Gasch, “Building scalable and
flexible cluster managers using declarative programming,” in OSDI.
USENIX Association, 2020, pp. 827–844.
[62] C. Curino, D. E. Difallah, C. Douglas, S. Krishnan, R. Ramakrishnan,
and S. Rao, “Reservation-based scheduling: If you’re late don’t blame
us!” in SoCC. ACM, 2014, pp. 2:1–2:14.
[63] A. Tumanov, T. Zhu, J. W. Park, M. A. Kozuch, M. Harchol-Balter,
and G. R. Ganger, “Tetrisched: Global rescheduling with adaptive plan-
ahead in dynamic heterogeneous clusters,” in EuroSys. ACM, 2016,
pp. 35:1–35:16.
[64] A. C. König, Y. Shan, K. Newatia, L. Marshall, and V. Narasayya,
“Solver-in-the-loop cluster resource management for database-as-a-
service,” Proc. VLDB Endow., vol. 16, no. 13, pp. 4254–4267, 2023.
[65] X. Meng, V. Pappas, and L. Zhang, “Improving the scalability of
data center networks with traffic-aware virtual machine placement,” in
INFOCOM. IEEE, 2010, pp. 1154–1162.
[66] Z. Wu, Y. Deng, H. Feng, Y. Zhou, G. Min, and Z. Zhang, “Blender: A
container placement strategy by leveraging zipf-like distribution within
containerized data centers,” IEEE Transactions on Network and Service
Management, 2021.
[67] “K UBERNETES: Production-grade container orchestration,” https://
kubernetes.io, 2024.

5293

Authorized licensed use limited to: Beijing Information Science & Tech Univ. Downloaded on November 06,2024 at 11:11:34 UTC from IEEE Xplore. Restrictions apply.

You might also like