0% found this document useful (0 votes)
49 views17 pages

2019 Asplos Parties Slides

This document discusses a resource partitioning technique called PARTIES for colocating multiple latency-critical applications on the same server. It presents a characterization of interactive applications and their sensitivity to different resource interference. PARTIES dynamically partitions resources like CPU, memory and cache to provide quality of service guarantees for multiple latency-critical applications without requiring prior knowledge about the applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views17 pages

2019 Asplos Parties Slides

This document discusses a resource partitioning technique called PARTIES for colocating multiple latency-critical applications on the same server. It presents a characterization of interactive applications and their sensitivity to different resource interference. PARTIES dynamically partitions resources like CPU, memory and cache to provide quality of service guarantees for multiple latency-critical applications without requiring prior knowledge about the applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

PARTIES:

QOS-AWARE RESOURCE PARTITIONING


FOR MULTIPLE INTERACTIVE SERVICES

Shuang Chen, Christina Delimitrou, José F. Martínez

Cornell University
COLOCATION OF APPLICATIONS

Best- Latency
effort -critical
P P … P P
Private Private Private Private
caches caches caches caches

Last-level Cache

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 1 of 15
PRIOR WORK
§ Interference during colocation
§ Scheduling [Nathuji’10, Mars’13, Delimitrou’14]
• Avoid co-scheduling of apps that may interfere
- May require offline knowledge
- Limit colocation options
§ Resource partitioning [Sanchez’11, Lo’15]
• Partition shared resources
- At most 1 LC app + multiple best-effort jobs

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 2 of 15
TRENDS IN DATACENTERS
1 LC + many BE Monolith
Best-
effort

Latency
-critical

many LC + many BE
All have QoS targets More LC jobs
Microservices
Motivation• Characterization• PARTIES• Evaluation • Conclusions
Page 3 of 15
MAIN CONTRIBUTIONS
§ Workload characterization
• The impact of resource sharing
• The effectiveness of resource isolation
• Relationship between different resources

§ PARTIES: First QoS-aware resource manager for


colocation of many LC services
• Dynamic partitioning of 9 shared resources
• No a priori application knowledge
• 61% higher throughput under QoS constraints
• Adapts to varying load patterns

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 1 of 15
INTERACTIVE LC APPLICATIONS
Table 1: Latency-Critical Applications

Application Memcached Xapian NGINX Moses MongoDB Sphinx


Key-value Web Web Real-time Persistent Speech
Domain
store search server translation database recognition
Target QoS 600us 5ms 10ms 15ms 300ms 2.5s
Max Load 1,280,000 8,000 560,000 2,800 240 14
User / Sys /
13 / 78 / 0 42 / 23 / 0 20 / 50 / 0 50 / 14 / 0 0.3 / 0.2 / 57 85 / 0.6 / 0
IO CPU%
LLC MPKI 0.55 0.03 0.06 10.48 0.01 6.28
Memory
9.3 GB 0.02 GB 1.9 GB 2.5 GB 18 GB 1.4 GB
Capacity
Memory
0.6 GB/s 0.01 GB/s 0.6 GB/s 26 GB/s 0.03 GB/s 3.1 GB/s
Bandwidth
Disk
0 MB/s 0 MB/s 0 MB/s 0 MB/s 5 MB/s 0 MB/s
Bandwidth
Network
3.0 Gbps 0.07 Gbps 6.2 Gbps 0.001 Gbps 0.01 Gbps 0.001 Gbps
Bandwidth

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 5 of 15
INTERACTIVE LC APPLICATIONS
Table 1: Latency-Critical Applications

Application Memcached Xapian NGINX Moses MongoDB Sphinx


Key-value Web Web Real-time Persistent Speech
Domain
store search server translation database recognition
Target QoS 600us 5ms 10ms 15ms 300ms 2.5s
Max Load 1,280,000 8,000 560,000 2,800 240 14
User / Sys /
13 / 78 / 0 42 / 23 / 0 20 / 50 / 0 50 / 14 / 0 0.3 / 0.2 / 57 85 / 0.6 / 0
IO CPU%
LLC MPKI 0.55 0.03 0.06 10.48 0.01 6.28
Memory
9.3 GB 0.02 GB 1.9 GB 2.5 GB 18 GB 1.4 GB
Capacity
Memory
0.6 GB/s 0.01 GB/s 0.6 GB/s 26 GB/s 0.03 GB/s 3.1 GB/s
Bandwidth
Disk
0 MB/s 0 MB/s 0 MB/s 0 MB/s 5 MB/s 0 MB/s
Bandwidth
Network
3.0 Gbps 0.07 Gbps 6.2 Gbps 0.001 Gbps 0.01 Gbps 0.001 Gbps
Bandwidth

Max load: max RPS under QoS target when running alone

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 6 of 15
: Impact of resource interference. Each row corresponds to a type of resource. Values in the table are the maximum per
INTERFERENCE STUDY
load for which the server can satisfy QoS when the LC application is running under interference. Cells with smaller n
rker colors mean that applications are more sensitive to that type of interference.

Memcached Xapian NGINX Moses MongoDB Sphinx


Hyperthread
CPU
Power
LLC Capacity
LLC Bandwidth
Memory Bandwidth
Memory Capacity
Disk Bandwidth
Network Bandwidth

0% 100%
Extremely sensitive % of max load under QoS target Not sensitive at all

• Applications are sensitive to resources with high usage


• Applications with strict QoS
1 targets are more sensitive

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 7 of 15
ISOLATION MECHANISMS
• Core mapping
» Hyperthreads
» Core counts …
P P P P
• Memory capacity
Private Private Private Private
• Disk bandwidth caches caches caches caches
• Core frequency
» Power Last-level Cache
• LLC capacity
» Cache capacity
» Cache bandwidth
» Memory bandwidth cgroup ACPI frequency driver
• Network bandwidth qdisc
Intel CAT

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 8 of 15
RESOURCE FUNGIBILITY
Stand-alone With memory interference
Xapian Xapian
1 XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX 1 XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX
3 XX XX X X X X X X X X X X X X X X X X X X 3 XX XX XX XX XX XX XX XX XX XX XX XX X X X X X X X
5X 5 XX XX XX XX XX XX XX XX X
7 7 XX XX XX XX X X
9 9 XX X X
11
11 X
13
13
1 3 5 7 9 1113151719 1 3 5 7 9 1113151719
Cache ways Cache ways

§ Resources are fungible


• More flexibility in resource allocation
• Simplifies resource manager

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 9 of 15
PARTIES: DESIGN PRINCIPLES
§ PARTIES
• PARTitioning for multiple InteractivE Services

§ Design principles
• LC apps are equally important
• Allocation should be dynamic and fine-grained
• No a priori application knowledge or offline profiling
• Recover quickly from incorrect decisions
• Migration is used as a last resort

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 10 of 15
PARTIES
Client side
• 5 knobs organized into 2 wheels
Latency
• Start from a random resource Monitor
• Follow the wheels to visit all resources
Unallocated pool Poll latency
every 100ms
Main Function
QoS violations?
Upsize!
Slack

Excess resources?
App 1 C
20% Downsize!
C
C No Benefit M
$ 0 time C No Benefit M
App 2 C
$ Compute Storage
Storage
Upsizing AppApp
DownsizingC
1…2…
$ Compute F
No Benefit
Server
D side
$ F M
D
F No Benefit
F$ No Benefit
Motivation•Compute Storage
Characterization• PARTIES• Evaluation • Conclusions
$ F D Page 11 of 15
METHODOLOGY
§ Platform: Intel E5-2699 v4
• Single socket with 22 cores (8 IRQ cores)
§ Virtualization
• LXC 2.0.7
§ Load generators
• Open loop
• Request inter-arrival distribution: exponential
• Request popularity: Zipfian
§ Testing strategy
• Constant load: 30s warmup, 1m measurement (x5)
• Varying load simulates diurnal load patterns
Motivation• Characterization• PARTIES• Evaluation • Conclusions
Page 12 of 15
CONSTANT LOADS: MEMCACHED, XAPIAN & NGINX
Oracle Unmanaged Oracle

Max Load of Memcached(%)


100

Max Load of NGINX(%)


10 50 30 20 10 10 80 70 50 40 30 30 20 10

- Offline profiling 20 30 20 70 50 40 30 20 10
80

30 10 60 40 30 20 10 60
- Always finds the 40 10 40 30 20 40
global optimum 50 20 10
20
60 10
10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80
Max Load of Xapian(%) Max Load of Xapian(%)
Heracles
Heracles PARTIES
- No partitioning
Max Load of Memcached(%)

100

Max Load of NGINX(%)


10 60 50 40 30 10 70 60 50 40 30 20 20 10

between BE jobs 20 50 40 20 10 70 50 40 20 20 10
80

30 40 30 10 60 40 30 10 10 60
- Suspend BE upon 40 30 10 30 20 10 40
QoS violation 50 20 10
20
60 10
- No interaction 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80

between resources
Max Load of Xapian(%) Max Load of Xapian(%)

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 13 of 15
MORE EVALUATION
Constant loads
§ All 2- and 3-app mixes under PARTIES
§ Comparison with Heracles for 2- to 6-app mixes

Diurnal load pattern


§ Colocation of Memcached, Xapian and Moses

PARTIES overhead
§ Convergence time for 2- to 6-app mixes

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 14 of 15
CONCLUSIONS
§ Need to manage multiple LC apps
§ Insights
• Resource partitioning
• Resource fungibility
§ PARTIES
• Partition 9 shared resources
• No offline knowledge required
• 61% higher throughput under QoS targets
• Adapts to varying load patterns

Motivation• Characterization• PARTIES• Evaluation • Conclusions


Page 15 of 15
PARTIES:
QOS-AWARE RESOURCE PARTITIONING
FOR MULTIPLE INTERACTIVE SERVICES
http://tiny.cc/parties

Shuang Chen, Christina Delimitrou, José F. Martínez


Cornell University

You might also like