0% found this document useful (0 votes)
77 views

Cloud Scheduling

The document discusses three papers related to improving MapReduce performance: 1. The LATE scheduler improves MapReduce performance in heterogeneous environments by speculatively executing tasks estimated to take the longest to finish based on past progress. 2. Quincy scheduling provides fair sharing of computing clusters through a min-cost flow formulation that considers data locality and fairness constraints. 3. CA-NFS improves NFS performance under congestion by using a pricing mechanism to schedule client operations based on monitored server and client resource usage.

Uploaded by

Tooba Aamir
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Cloud Scheduling

The document discusses three papers related to improving MapReduce performance: 1. The LATE scheduler improves MapReduce performance in heterogeneous environments by speculatively executing tasks estimated to take the longest to finish based on past progress. 2. Quincy scheduling provides fair sharing of computing clusters through a min-cost flow formulation that considers data locality and fairness constraints. 3. CA-NFS improves NFS performance under congestion by using a pricing mechanism to schedule client operations based on monitored server and client resource usage.

Uploaded by

Tooba Aamir
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Cloud Scheduling

Improving MapReduce Performance in Heterogeneous Environments, M. Zaharia et al, OSDI 2008 Quincy: Fair Scheduling for Distributed Computing Clusters, M. Isard et al, SOSP 2009 CA-NFS: A Congestion-Aware Network File System, A. Batsakis et al, FAST 2009

Ghazale Hosseinabadi Wucherl Yoo


CS 525

Improving MapReduce Performance in Heterogeneous Environments


Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, Ion Stoica OSDI 2008

CS 525

Motivation - MapReduce

CS 525 - Figures are borrowed from Google

Motivation Hadoop Scheduling

Straggler Task

Poorly perform due to faulty hardware or mis-configuration Minimize jobs response time Speculation copy (backup task) for straggler Homogeneous cluster nodes Constant progress rate of task Copy, sort, and reduce phases take the same amount (1/3) work of reduce task Tasks finish in waves - low progress score represents a straggler
CS 525 4

Speculative Execution

Assumptions

Problem - Broken Assumptions

Heterogeneous Cluster Nodes


Multiple VMs on a same physical host Multiple HW generations Copy phase of reduce task is slowest due to network communication Tasks from different generations run concurrently Too many speculative tasks can run (80% of reducers) Wrong (fast and new) task can be selected as straggler Speculative task can be assigned to slow node
CS 525 5

Non-Linear Progress of Tasks

Problems

LATE Scheduler

Longest Approximate Time to End

Speculate (backup) the task with the largest estimated time left
Progress Score Progress Rate = Execution Time 1 Progress Score Estimated Time Left = Progress Rate

CS 525 - Figures are borrowed from the slide of the authors

Progress Rate Example


Speculate a task far below the average progress rate A job with 3 tasks Node 1 Node 2 Node 3
1 min 2 min

1 task/min 3x slower 1.9x slower

Time (min)
CS 525 - Figures are borrowed from the slide of the authors

Progress Rate Example


A job with 5 tasks Node 1 Node 2 Node 3
Time left: 1 min, Progress Rate = 0.33

2 min

Time left: 1.8 min, Progress Rate = 0.53

Time (min)

Node 2 is slowest but picked It should have speculated Node 3s


CS 525 - Figures are borrowed from the slide of the authors

LATE Example
A job with 5 tasks Node 1 Node 2 Node 3
Progress = 66%

2 min
Estimated time left: (1-0.66) / (1/3) = 1 Estimated time left: (1-0.05) / (1/1.9) = 1.8

Progress = 5.3%

Time (min)

LATE picks Node 3


CS 525 - Figures are borrowed from the slide of the authors

Rationales in LATE Scheduler

The Past Progress of a Task Represents the Future Progress


Assuming constant progress rate may be incorrect Heterogeneity impacts appear in the past progress

Looking Forward

Try to Speculate Tasks that Improve Response Time the Most

CS 525

10

Details on LATE Scheduler

Prioritize Tasks to Speculate


Based on how much the tasks hurt the response time SlowTaskThreshold 25th percentile Based on total work performed SlowNodeThreshold 25th percentile SpeculativeCap - 20% Avoid unnecessary speculations Limit contention and hurting throughput

Select Fast Node to Run on


Cap to the Number of Speculative Tasks


Chosen Thresholds are Not Sensitive


CS 525 11

Experimental Environments

Environments

Amazon EC2 (200-250 nodes) 2 replicas of each chunk on Hadoop Distributed File System Up to 2 mappers and 2 reducers (Hadoop default)

Heterogeneity Setup

Assigning a varying number of VMs to each node

Contention on resources

Background job intentionally makes stragglers

CS 525

12

EC2 Sort with Stragglers


Normalized Response Time 2.5 2.0 1.5 1.0 0.5 0.0 Worst

No Backups Hadoop Native LATE Scheduler

Best

Average

Average 58% speedup over native, 220% over no backups


CS 525 Figures are borrowed from the slide of the authors 13

EC2 Sort without Stragglers


1.4 Normalized Response Time 1.2 1 0.8 0.6 0.4 0.2 0 Worst

No Backups Hadoop Native LATE Scheduler

Best

Average

Average 27% speedup over native, 31% over no backups


CS 525 Figures are borrowed from the slide of the authors 14

Remarks

Contributions

Analysis about heterogeneity that makes speculation of Hadoop worse than native setting LATE speculatively executes the tasks that hurts the response time the most on fast nodes

Considering heterogeneity

Limitations

Lack of considerations about data locality and fairness Tasks may require different amount of computation Speculation reduces the throughput of cloud

CS 525

15

Discussion Points

Workload is chosen in favor of LATE

LATE doesnt always improve the performance

Would the preemption of tasks help?

How many tasks are assigned to a node?

CS 525

16

Quincy: Fair Scheduling for D istributed Computing Clusters

Michael Isard Vijayan Prabhakaran Jon Currey Udi Wieder Kunal Talwar Andrew Goldberg Microsoft Research

Outline

Cluster architecture Queue-based scheduling Min-cost flow Flow-based scheduling

Fairness and locality constraints

Evaluation

Motivation

Scheduling concurrent jobs on clusters Sharing of the cluster among short jobs

Microsoft cluster 250 computers Academic cluster

Fair sharing Application data is stored on computers


High bandwidth between computers: expensive Computations are placed close to the input data

Fairness and locality conflict

Cluster Architecture

Queue-based scheduler

Min-cost flow

Flow network:

Directed graph ye : capacity of e pe : cost of e v : integer supply of v v v = 0.

Feasible flow: assigns a non-negative integer flow fe ye, such that for every node v,

A min-cost feasible flow is a feasible flow that minimizes e fe pe

Data Locality

Application data is stored on the computing nodes. Scheduling computations close to their data is crucial for performance. Hadoop:

Computer storing one of the replica On the same rack Random computer

Fairness in a Shared Cluster


N computers, J jobs Each job gets at least N/J computers Fine-grain sharing:

Multiplex all computers in cluster between jobs. When a task completes computer may be assigned to another job. Job uses N/J computers at a time but set in use varies over lifetime.

Why not static allocation?


Varying workload Not able to adjust to workload changes Low system throughput and resource utilization under nonuniform workloads

Encoding scheduling as a flow network

w1 w2 w3 w4 w5 r1 w6

U1

w1 C1 R1 w6 C2 w3 C3 X S

C4 w2 R2 C5 R C6 w4

Fairness Policies

Q: Quincy, Unfair sharing without Preemption QF: Quincy with Fairness, without Preemption QP: Quincy with Preemption, Unfair sharing QFP: Quincy with Fairness and Preemption

Evaluation

Cluster of 240 computers Cluster runs the Dryad distributed execution engine Applications:

Sort DatabaseJoin PageRank WordCount Prime

Running Times

Running Times

Makespan: total time taken by an experiment until the last job completes.

Data transfer

Discussion

Correlated constraints Multi-dimensional capacities:

CPU, disk IO, memory

Fair sharing of the network or other resources

CA-NFS: A Congestion-Aware Network File System


Alexandros Batsakis, NetApp and Johns Hopkins University; Randal Burns, Johns Hopkins University; Arkady Kanevsky, James Lentini, Thomas Talpey FAST 2009

CS 525

Problems in NFS

Congestion of Resources

Selfish clients want to maximize throughput Difficult to represent the congestion of multiple resources as a unified metric Requests of clients have same priority The benefit of clients increases by maximizing throughput even in congestion

False Assumptions

CS 525

35

CA-NFS Overview

Congestion-Aware NFS

Measure Congestion for Multiple Resources

Usages of resources are monitored as price Asynchronous operations can be deferred depending on server and client states (depending on the price) Try asynchronous operations not to interfere with on-demand synchronous operations

Schedule Client Operations to React Congestion

CS 525

36

Pricing Mechanism

Congestion Price

Pi price of resource i, Pmax - max price, represents bottleneck ui utilization of resource i (0< ui <1) ki performance degradation parameter due to congested resource

CS 525 - Figures are borrowed from the slide of the authors

37

Scheduling

Pricing

Price is increased or decreased corresponding to resource usages of client and server Increased price represents congestion of a resource

Client reacts to price

By comparing advertised server price with local price, client schedules asynchronous operations

Accelerate or defer asynchronous writes Issue read-ahead aggressively or prudently

CS 525

38

Pricing Example (1)

CS 525 - Figures are borrowed from the slide of the authors

39

Pricing Example (2)

CS 525 - Figures are borrowed from the slide of the authors

40

Scheduling Asynchronous Writes

Server Price

Based on server memory, disk, and network utilization Based on client memory Clients flushes writes immediately when server load is low Save client memory more cache hit Reduce write latency Clients keep writes in local memory when server load is high Save server memory and I/O (disk and memory) Increase write latency and client memory usage
CS 525 41

Client Price

Write Acceleration

Write Deferral

Examples of Resource Monitoring

CPU

Utilization at a given time Average bandwidth over hundreds milliseconds Sampling the length of devices dispatch queue at regular small time interval Calculate the projected cache hit rates using the distribution of read requests

Client and Server Network

Server Disk

Client and Server Memory

CS 525

42

Experimental Results File Server

CS 525 - Figures are borrowed from the slide of the authors

43

Discussion Points

Limitations

Not Scalable - works only for small number of clients Price can be fluctuated

Would the CA-NFS apply to utility computing on cloud?

Is pricing sufficient to unify heterogeneous resources?

Multiple clients on separate VMs can run on same node Multiple servers run on separate nodes with replicated data

How to extend single server topology for scalability?

How to provide fairness or prioritized services?


CS 525 44

You might also like