0% found this document useful (0 votes)

77 views

Cloud Scheduling

The document discusses three papers related to improving MapReduce performance: 1. The LATE scheduler improves MapReduce performance in heterogeneous environments by speculatively executing tasks estimated to take the longest to finish based on past progress. 2. Quincy scheduling provides fair sharing of computing clusters through a min-cost flow formulation that considers data locality and fairness constraints. 3. CA-NFS improves NFS performance under congestion by using a pricing mechanism to schedule client operations based on monitored server and client resource usage.

Uploaded by

Tooba Aamir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Cloud Scheduling

Uploaded by

Tooba Aamir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Cloud Scheduling

Improving MapReduce Performance in Heterogeneous Environments, M. Zaharia et al, OSDI 2008 Quincy: Fair Scheduling for Distributed Computing Clusters, M. Isard et al, SOSP 2009 CA-NFS: A Congestion-Aware Network File System, A. Batsakis et al, FAST 2009

Ghazale Hosseinabadi Wucherl Yoo

CS 525

Improving MapReduce Performance in Heterogeneous Environments

Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica OSDI 2008

CS 525

Motivation - MapReduce

CS 525 - Figures are borrowed from Google

Motivation Hadoop Scheduling

Straggler Task

Poorly perform due to faulty hardware or mis-configuration Minimize jobs response time Speculation copy (backup task) for straggler Homogeneous cluster nodes Constant progress rate of task Copy, sort, and reduce phases take the same amount (1/3) work of reduce task Tasks finish in waves - low progress score represents a straggler
CS 525 4

Speculative Execution

Assumptions

Problem - Broken Assumptions

Heterogeneous Cluster Nodes

Multiple VMs on a same physical host Multiple HW generations Copy phase of reduce task is slowest due to network communication Tasks from different generations run concurrently Too many speculative tasks can run (80% of reducers) Wrong (fast and new) task can be selected as straggler Speculative task can be assigned to slow node
CS 525 5

Non-Linear Progress of Tasks

Problems

LATE Scheduler

Longest Approximate Time to End

Speculate (backup) the task with the largest estimated time left
Progress Score Progress Rate = Execution Time 1 Progress Score Estimated Time Left = Progress Rate

CS 525 - Figures are borrowed from the slide of the authors

Progress Rate Example

Speculate a task far below the average progress rate A job with 3 tasks Node 1 Node 2 Node 3
1 min 2 min

1 task/min 3x slower 1.9x slower

Time (min)
CS 525 - Figures are borrowed from the slide of the authors

Progress Rate Example

A job with 5 tasks Node 1 Node 2 Node 3
Time left: 1 min, Progress Rate = 0.33

2 min

Time left: 1.8 min, Progress Rate = 0.53

Time (min)

Node 2 is slowest but picked It should have speculated Node 3s

CS 525 - Figures are borrowed from the slide of the authors

LATE Example
A job with 5 tasks Node 1 Node 2 Node 3
Progress = 66%

2 min
Estimated time left: (1-0.66) / (1/3) = 1 Estimated time left: (1-0.05) / (1/1.9) = 1.8

Progress = 5.3%

Time (min)

LATE picks Node 3

CS 525 - Figures are borrowed from the slide of the authors

Rationales in LATE Scheduler

The Past Progress of a Task Represents the Future Progress

Assuming constant progress rate may be incorrect Heterogeneity impacts appear in the past progress

Looking Forward

Try to Speculate Tasks that Improve Response Time the Most

CS 525

Details on LATE Scheduler

Prioritize Tasks to Speculate

Based on how much the tasks hurt the response time SlowTaskThreshold 25th percentile Based on total work performed SlowNodeThreshold 25th percentile SpeculativeCap - 20% Avoid unnecessary speculations Limit contention and hurting throughput

Select Fast Node to Run on

Cap to the Number of Speculative Tasks

Chosen Thresholds are Not Sensitive

CS 525 11

Experimental Environments

Environments

Amazon EC2 (200-250 nodes) 2 replicas of each chunk on Hadoop Distributed File System Up to 2 mappers and 2 reducers (Hadoop default)

Heterogeneity Setup

Assigning a varying number of VMs to each node

Contention on resources

Background job intentionally makes stragglers

CS 525

EC2 Sort with Stragglers

Normalized Response Time 2.5 2.0 1.5 1.0 0.5 0.0 Worst

No Backups Hadoop Native LATE Scheduler

Best

Average

Average 58% speedup over native, 220% over no backups

CS 525 Figures are borrowed from the slide of the authors 13

EC2 Sort without Stragglers

1.4 Normalized Response Time 1.2 1 0.8 0.6 0.4 0.2 0 Worst

No Backups Hadoop Native LATE Scheduler

Best

Average

Average 27% speedup over native, 31% over no backups

CS 525 Figures are borrowed from the slide of the authors 14

Remarks

Contributions

Analysis about heterogeneity that makes speculation of Hadoop worse than native setting LATE speculatively executes the tasks that hurts the response time the most on fast nodes

Considering heterogeneity

Limitations

Lack of considerations about data locality and fairness Tasks may require different amount of computation Speculation reduces the throughput of cloud

CS 525

Discussion Points

Workload is chosen in favor of LATE

LATE doesnt always improve the performance

Would the preemption of tasks help?

How many tasks are assigned to a node?

CS 525

Quincy: Fair Scheduling for D istributed Computing Clusters

Michael Isard Vijayan Prabhakaran Jon Currey Udi Wieder Kunal Talwar Andrew Goldberg Microsoft Research

Outline

Cluster architecture Queue-based scheduling Min-cost flow Flow-based scheduling

Fairness and locality constraints

Evaluation

Motivation

Scheduling concurrent jobs on clusters Sharing of the cluster among short jobs

Microsoft cluster 250 computers Academic cluster

Fair sharing Application data is stored on computers

High bandwidth between computers: expensive Computations are placed close to the input data

Fairness and locality conflict

Cluster Architecture

Queue-based scheduler

Min-cost flow

Flow network:

Directed graph ye : capacity of e pe : cost of e v : integer supply of v v v = 0.

Feasible flow: assigns a non-negative integer flow fe ye, such that for every node v,

A min-cost feasible flow is a feasible flow that minimizes e fe pe

Data Locality

Application data is stored on the computing nodes. Scheduling computations close to their data is crucial for performance. Hadoop:

Computer storing one of the replica On the same rack Random computer

Fairness in a Shared Cluster

N computers, J jobs Each job gets at least N/J computers Fine-grain sharing:

Multiplex all computers in cluster between jobs. When a task completes computer may be assigned to another job. Job uses N/J computers at a time but set in use varies over lifetime.

Why not static allocation?

Varying workload Not able to adjust to workload changes Low system throughput and resource utilization under nonuniform workloads

Encoding scheduling as a flow network

w1 w2 w3 w4 w5 r1 w6

w1 C1 R1 w6 C2 w3 C3 X S

C4 w2 R2 C5 R C6 w4

Fairness Policies

Q: Quincy, Unfair sharing without Preemption QF: Quincy with Fairness, without Preemption QP: Quincy with Preemption, Unfair sharing QFP: Quincy with Fairness and Preemption

Evaluation

Cluster of 240 computers Cluster runs the Dryad distributed execution engine Applications:

Sort DatabaseJoin PageRank WordCount Prime

Running Times

Makespan: total time taken by an experiment until the last job completes.

Data transfer

Discussion

Correlated constraints Multi-dimensional capacities:

CPU, disk IO, memory

Fair sharing of the network or other resources

CA-NFS: A Congestion-Aware Network File System

Alexandros Batsakis, NetApp and Johns Hopkins University; Randal Burns, Johns Hopkins University; Arkady Kanevsky, James Lentini, Thomas Talpey FAST 2009

CS 525

Problems in NFS

Congestion of Resources

Selfish clients want to maximize throughput Difficult to represent the congestion of multiple resources as a unified metric Requests of clients have same priority The benefit of clients increases by maximizing throughput even in congestion

False Assumptions

CS 525

CA-NFS Overview

Congestion-Aware NFS

Measure Congestion for Multiple Resources

Usages of resources are monitored as price Asynchronous operations can be deferred depending on server and client states (depending on the price) Try asynchronous operations not to interfere with on-demand synchronous operations

Schedule Client Operations to React Congestion

CS 525

Pricing Mechanism

Congestion Price

Pi price of resource i, Pmax - max price, represents bottleneck ui utilization of resource i (0< ui <1) ki performance degradation parameter due to congested resource

CS 525 - Figures are borrowed from the slide of the authors

Scheduling

Pricing

Price is increased or decreased corresponding to resource usages of client and server Increased price represents congestion of a resource

Client reacts to price

By comparing advertised server price with local price, client schedules asynchronous operations

Accelerate or defer asynchronous writes Issue read-ahead aggressively or prudently

CS 525

Pricing Example (1)

CS 525 - Figures are borrowed from the slide of the authors

Pricing Example (2)

CS 525 - Figures are borrowed from the slide of the authors

Scheduling Asynchronous Writes

Server Price

Based on server memory, disk, and network utilization Based on client memory Clients flushes writes immediately when server load is low Save client memory more cache hit Reduce write latency Clients keep writes in local memory when server load is high Save server memory and I/O (disk and memory) Increase write latency and client memory usage
CS 525 41

Client Price

Write Acceleration

Write Deferral

Examples of Resource Monitoring

CPU

Utilization at a given time Average bandwidth over hundreds milliseconds Sampling the length of devices dispatch queue at regular small time interval Calculate the projected cache hit rates using the distribution of read requests

Client and Server Network

Server Disk

Client and Server Memory

CS 525

Experimental Results File Server

CS 525 - Figures are borrowed from the slide of the authors

Discussion Points

Limitations

Not Scalable - works only for small number of clients Price can be fluctuated

Would the CA-NFS apply to utility computing on cloud?

Is pricing sufficient to unify heterogeneous resources?

Multiple clients on separate VMs can run on same node Multiple servers run on separate nodes with replicated data

How to extend single server topology for scalability?

How to provide fairness or prioritized services?

CS 525 44

Parallel Programming for Modern High Performance Computing Systems (Czarnul, Pawel)
No ratings yet
Parallel Programming for Modern High Performance Computing Systems (Czarnul, Pawel)
330 pages
AWS Certified Solutions Architect Study Guide: Associate SAA-C02 Exam
From Everand
AWS Certified Solutions Architect Study Guide: Associate SAA-C02 Exam
David Clinton
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Elvis Presley Annotated Bibliography
No ratings yet
Elvis Presley Annotated Bibliography
4 pages
Mapreduce Advanced
No ratings yet
Mapreduce Advanced
26 pages
Lecture 03
No ratings yet
Lecture 03
17 pages
Document
No ratings yet
Document
35 pages
2.1metapt 2021 Clouds New
No ratings yet
2.1metapt 2021 Clouds New
79 pages
DS Syllabus Introduction (Reference)
No ratings yet
DS Syllabus Introduction (Reference)
44 pages
03 Intro HadoopAndMapReduce BigData
No ratings yet
03 Intro HadoopAndMapReduce BigData
91 pages
01 en Principles of Distributed Systems
No ratings yet
01 en Principles of Distributed Systems
35 pages
DC - Co 1 All in 1 PDF
No ratings yet
DC - Co 1 All in 1 PDF
197 pages
HADOOP
No ratings yet
HADOOP
57 pages
Compiler-Guided Throughput Scheduling for Many-core Machines
No ratings yet
Compiler-Guided Throughput Scheduling for Many-core Machines
14 pages
Reliable Distributed Systems
No ratings yet
Reliable Distributed Systems
44 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
34 pages
Lectures - Week 1 2 Introduction To Distributed Computing
No ratings yet
Lectures - Week 1 2 Introduction To Distributed Computing
65 pages
Cloudscheduling Backfills
No ratings yet
Cloudscheduling Backfills
19 pages
CS4230 Parallel Programming: Mary Hall August 21, 2012
No ratings yet
CS4230 Parallel Programming: Mary Hall August 21, 2012
17 pages
CS 5412: Topics in Cloud Computing: Ken Birman Spring, 2018
No ratings yet
CS 5412: Topics in Cloud Computing: Ken Birman Spring, 2018
64 pages
Lec RT Sched Part1 v2 1x2
No ratings yet
Lec RT Sched Part1 v2 1x2
27 pages
Yarn Scheduler
No ratings yet
Yarn Scheduler
13 pages
CS542: Topics in Distributed Systems
No ratings yet
CS542: Topics in Distributed Systems
39 pages
Assignment1 OS (F22A-F22B)
No ratings yet
Assignment1 OS (F22A-F22B)
5 pages
Scheduling Periodic PDF
No ratings yet
Scheduling Periodic PDF
13 pages
Scheduling Periodic
No ratings yet
Scheduling Periodic
13 pages
CS4961 Parallel Programming: Course Details
No ratings yet
CS4961 Parallel Programming: Course Details
7 pages
Grid Computing
No ratings yet
Grid Computing
10 pages
Scheduling: Ren-Song Ko National Chung Cheng University
No ratings yet
Scheduling: Ren-Song Ko National Chung Cheng University
57 pages
Lec 17
No ratings yet
Lec 17
31 pages
Lecture 5-6-7 HARMONY
No ratings yet
Lecture 5-6-7 HARMONY
76 pages
Server Management and Administration Introduction
No ratings yet
Server Management and Administration Introduction
47 pages
Haoop Architecture
No ratings yet
Haoop Architecture
34 pages
DistributedComputing Rev2
No ratings yet
DistributedComputing Rev2
44 pages
04 - Computer Clusters
No ratings yet
04 - Computer Clusters
66 pages
CSCE455/855 Distributed Operating Systems: Dr. Ying Lu
No ratings yet
CSCE455/855 Distributed Operating Systems: Dr. Ying Lu
40 pages
Job Scheduling in High Perfomance Computing
No ratings yet
Job Scheduling in High Perfomance Computing
6 pages
Scheduling in CPU
No ratings yet
Scheduling in CPU
32 pages
Apache Hadoop Training
No ratings yet
Apache Hadoop Training
377 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
CC 1 Unit Notes
No ratings yet
CC 1 Unit Notes
8 pages
RMCS
No ratings yet
RMCS
127 pages
CC Unit I
No ratings yet
CC Unit I
72 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
High Performance Parallel I O 1st Edition I Foster - The latest ebook edition with all chapters is now available
100% (1)
High Performance Parallel I O 1st Edition I Foster - The latest ebook edition with all chapters is now available
80 pages
CS-3006_Parallel and Distributed Computing_(BS All Programs)_Spring-2023
No ratings yet
CS-3006_Parallel and Distributed Computing_(BS All Programs)_Spring-2023
6 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
Unit 1 - Computing Paradigms
No ratings yet
Unit 1 - Computing Paradigms
31 pages
CC ZG502 Ec-2r Second Sem 2023-2024
No ratings yet
CC ZG502 Ec-2r Second Sem 2023-2024
1 page
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
Scheduling in Distributed Systems
No ratings yet
Scheduling in Distributed Systems
9 pages
Unit 1 System Models and Issues - MP (1)
No ratings yet
Unit 1 System Models and Issues - MP (1)
71 pages
CC Unit-1
No ratings yet
CC Unit-1
17 pages
Spark Introduction
No ratings yet
Spark Introduction
90 pages
Apache Hadoop Developer Training
100% (1)
Apache Hadoop Developer Training
394 pages
Apache Hadoop Developer Training PDF
No ratings yet
Apache Hadoop Developer Training PDF
394 pages
Perfbook 1c E2 rc11
No ratings yet
Perfbook 1c E2 rc11
881 pages
Perfbook-Eb 2023 06 11a
No ratings yet
Perfbook-Eb 2023 06 11a
1,432 pages
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Trusted Cloud
No ratings yet
Trusted Cloud
5 pages
Application Form 2011 Season
No ratings yet
Application Form 2011 Season
2 pages
Course - Outline (CSC103 Introduction To Computer and Progamming)
No ratings yet
Course - Outline (CSC103 Introduction To Computer and Progamming)
3 pages
Wcnks :ect Lhint: &:3 Years Years
No ratings yet
Wcnks :ect Lhint: &:3 Years Years
1 page
6:15 Ffi Minutes Ffi: Worksheet
No ratings yet
6:15 Ffi Minutes Ffi: Worksheet
1 page
Worksheet: K& : Seconds
No ratings yet
Worksheet: K& : Seconds
1 page
Target Report
No ratings yet
Target Report
18 pages
Img 2 0005
No ratings yet
Img 2 0005
1 page
Worksheet: R Ffi
No ratings yet
Worksheet: R Ffi
1 page
Worksheet: Compae
No ratings yet
Worksheet: Compae
1 page
References - Mock Assignment
No ratings yet
References - Mock Assignment
1 page
Worksheat: Thousands of Australians Tricked Making' Unwise Afiempt To Quick Money. About
No ratings yet
Worksheat: Thousands of Australians Tricked Making' Unwise Afiempt To Quick Money. About
1 page
Img 2 0015
No ratings yet
Img 2 0015
1 page
Companng"ca Rs - : 6:8:: Ffi: I
No ratings yet
Companng"ca Rs - : 6:8:: Ffi: I
1 page
Advice To Client Activities (Answers)
No ratings yet
Advice To Client Activities (Answers)
12 pages
Advice To Client Activities (Answers-FULL)
No ratings yet
Advice To Client Activities (Answers-FULL)
12 pages
Outcome 1: VCE Physics SAC: Unit 3 - Outcome 1
No ratings yet
Outcome 1: VCE Physics SAC: Unit 3 - Outcome 1
9 pages
Safety Clearance Recommendations For Transformer
No ratings yet
Safety Clearance Recommendations For Transformer
6 pages
Activity MapAo Sept PDF
No ratings yet
Activity MapAo Sept PDF
1 page
EM Lab Manual-1 PDF
No ratings yet
EM Lab Manual-1 PDF
9 pages
Wms Questions
No ratings yet
Wms Questions
22 pages
Parametric Design Pattern Language and G
No ratings yet
Parametric Design Pattern Language and G
23 pages
抑郁症案例研究
100% (2)
抑郁症案例研究
10 pages
Windmill Spares List
No ratings yet
Windmill Spares List
39 pages
Harness The Power of The Subconscious Mind: The Basic Idea: Cybernetics
No ratings yet
Harness The Power of The Subconscious Mind: The Basic Idea: Cybernetics
6 pages
Topic 3: Frequency Selective Circuits
No ratings yet
Topic 3: Frequency Selective Circuits
109 pages
Terms and Conditions: Engagement
No ratings yet
Terms and Conditions: Engagement
11 pages
Puerto Serial
No ratings yet
Puerto Serial
6 pages
Dst-Edp-Soc-378 Syllabus
No ratings yet
Dst-Edp-Soc-378 Syllabus
8 pages
Univolt Is Formulated To Improve Performance and Extend Transformer Life
No ratings yet
Univolt Is Formulated To Improve Performance and Extend Transformer Life
2 pages
3pl Agreement
No ratings yet
3pl Agreement
3 pages
Software Engineer - Job Description
No ratings yet
Software Engineer - Job Description
2 pages
Physics 1 - Questions n Answers - Msomibora.com-3
No ratings yet
Physics 1 - Questions n Answers - Msomibora.com-3
17 pages
E-Glossary (Consumerism and Purchasing)
No ratings yet
E-Glossary (Consumerism and Purchasing)
5 pages
Plantito & Plantita Week 3 NSTP
No ratings yet
Plantito & Plantita Week 3 NSTP
4 pages
pcnse_3
No ratings yet
pcnse_3
11 pages
927 Rational TestSuite Interview Questions Answers Guide
No ratings yet
927 Rational TestSuite Interview Questions Answers Guide
7 pages
Table Dicky
No ratings yet
Table Dicky
7 pages
Difference Between Deciduous & Permanent Dentition
No ratings yet
Difference Between Deciduous & Permanent Dentition
17 pages
3. SECURE - Benchmarking Generative Large Language
No ratings yet
3. SECURE - Benchmarking Generative Large Language
14 pages
Mock Interview Assignment
No ratings yet
Mock Interview Assignment
2 pages
Eaton Airflex Clutches
100% (1)
Eaton Airflex Clutches
23 pages
Seismic Design of Concrete Bridges: Some Key Issues To Be Addressed During The Evolution of Eurocode 8 - Part 2
No ratings yet
Seismic Design of Concrete Bridges: Some Key Issues To Be Addressed During The Evolution of Eurocode 8 - Part 2
3 pages
Steeringchecklist
No ratings yet
Steeringchecklist
2 pages
26071-JO-HSE-0014 JSA - Green Field - Temporary Facility Instalation (New Form)
No ratings yet
26071-JO-HSE-0014 JSA - Green Field - Temporary Facility Instalation (New Form)
32 pages
(Ebook) Cognitive Therapy of Substance Abuse by Aaron T. Beck, Fred D. Wright, Cory F. Newman, Bruce S. Liese ISBN 9780898621150, 9781572306592, 9781593859060, 0898621151, 1572306599, 1593859066 - Download the ebook now for instant access to all chapters
100% (1)
(Ebook) Cognitive Therapy of Substance Abuse by Aaron T. Beck, Fred D. Wright, Cory F. Newman, Bruce S. Liese ISBN 9780898621150, 9781572306592, 9781593859060, 0898621151, 1572306599, 1593859066 - Download the ebook now for instant access to all chapters
61 pages

Cloud Scheduling

Uploaded by

Cloud Scheduling

Uploaded by

Cloud Scheduling

Ghazale Hosseinabadi Wucherl Yoo

Improving MapReduce Performance in Heterogeneous Environments

CS 525 - Figures are borrowed from Google

Motivation Hadoop Scheduling

Problem - Broken Assumptions

Heterogeneous Cluster Nodes

Non-Linear Progress of Tasks

Longest Approximate Time to End

CS 525 - Figures are borrowed from the slide of the authors

Progress Rate Example

1 task/min 3x slower 1.9x slower

Progress Rate Example

Time left: 1.8 min, Progress Rate = 0.53

Node 2 is slowest but picked It should have speculated Node 3s

LATE picks Node 3

Rationales in LATE Scheduler

The Past Progress of a Task Represents the Future Progress

Try to Speculate Tasks that Improve Response Time the Most

Details on LATE Scheduler

Prioritize Tasks to Speculate

Select Fast Node to Run on

Cap to the Number of Speculative Tasks

Chosen Thresholds are Not Sensitive

Assigning a varying number of VMs to each node

Background job intentionally makes stragglers

EC2 Sort with Stragglers

No Backups Hadoop Native LATE Scheduler

Average 58% speedup over native, 220% over no backups

EC2 Sort without Stragglers

No Backups Hadoop Native LATE Scheduler

Average 27% speedup over native, 31% over no backups

Workload is chosen in favor of LATE

LATE doesnt always improve the performance

Would the preemption of tasks help?

How many tasks are assigned to a node?

Quincy: Fair Scheduling for D istributed Computing Clusters

Cluster architecture Queue-based scheduling Min-cost flow Flow-based scheduling

Fairness and locality constraints

Microsoft cluster 250 computers Academic cluster

Fair sharing Application data is stored on computers

Fairness and locality conflict

Directed graph ye : capacity of e pe : cost of e v : integer supply of v v v = 0.

A min-cost feasible flow is a feasible flow that minimizes e fe pe

Fairness in a Shared Cluster

Why not static allocation?

Encoding scheduling as a flow network

Sort DatabaseJoin PageRank WordCount Prime

Correlated constraints Multi-dimensional capacities:

CPU, disk IO, memory

Fair sharing of the network or other resources

CA-NFS: A Congestion-Aware Network File System

Measure Congestion for Multiple Resources

Schedule Client Operations to React Congestion

CS 525 - Figures are borrowed from the slide of the authors

Client reacts to price

Accelerate or defer asynchronous writes Issue read-ahead aggressively or prudently

Pricing Example (1)

CS 525 - Figures are borrowed from the slide of the authors

Pricing Example (2)

CS 525 - Figures are borrowed from the slide of the authors

Scheduling Asynchronous Writes

Examples of Resource Monitoring

Client and Server Network

Client and Server Memory

Experimental Results File Server

CS 525 - Figures are borrowed from the slide of the authors

Would the CA-NFS apply to utility computing on cloud?

Is pricing sufficient to unify heterogeneous resources?

How to extend single server topology for scalability?

How to provide fairness or prioritized services?

You might also like