Job Aware Scheduling Algorithm For MapReduce Framework
Job Aware Scheduling Algorithm For MapReduce Framework
Abstract—MapReduce framework has received a wide ac- a tremendous growth in recent years especially for text
claim over the past few years for large scale computing. It has indexing, log processing, web crawling, data mining, ma-
become a standard paradigm for batch oriented workloads. As chine learning etc [4]. MapReduce is best suited for batch-
the adoption of this paradigm has increased rapidly, scheduling
of these MapReduce jobs has become a problem of great oriented jobs which tend to run for hours to days over a
interest in research community. We propose an approach which large dataset on a limited resources of the cluster. Hence,
tries to maintain harmony among the jobs running on the effective scheduling mechanism is vital to make sure that
cluster, and in turn decrease their runtime. In our model, the the resources are cogently used.
scheduler is made aware of different types of jobs running on In this paper, we try to present an approach that takes into
the cluster. The scheduler tries to allocate a task on a node
if the incoming task does not affect the tasks already running account the interoperability of MapReduce tasks running on
on that node. From the list of available pending tasks, our a node of the cluster. Our algorithms try to ensure that a
algorithm selects the one that is most compatible with the task running on a node would not affect the performance
tasks already running on that node. We bring up heuristic and of other tasks. This requires the scheduler to be aware of
machine learning based solutions to our approach and try to the resource usage information of each task running on the
maintain a resource balance on the cluster by not overloading
any of the nodes, thereby reducing the overall runtime of the cluster. We try to present a heuristic approach as well as a
jobs. The results show a saving of runtime of around 21% in machine learning based approach for task scheduling. Our
the case of heuristic based approach and around 27% in the algorithm tries to select a task from the list of pending tasks
case of machine learning based approach when compared to that is most compatible with the tasks already running on
Yahoo’s Capacity scheduler. the node.
Keywords-Cloud Computing; Job Scheduling; Machine
Learning; MapReduce;
II. P ROPOSED A LGORITHMS
In our approach, we try to monitor the resource usage up
I. I NTRODUCTION to the level of each task and each node in the cluster as the
Cloud computing has emerged as the advanced form of performance of tasks and nodes is vital in any distributed
distributed computing, parallel processing and grid com- environment. Our algorithm tries to maintain stability at
puting. It is a new and promising paradigm delivering IT node and cluster level through intelligent scheduling of the
services as computing utilities. As Clouds are designed to tasks. The uniqueness of this scheduler lies in its ability to
provide services to external users, providers need to be take into account the resource usage pattern of the job before
compensated for sharing their resources and capabilities its tasks are scheduled on the cluster.
[1]. Since these computing resources are finite, there is A. Task Characteristics
a need for efficient resource allocation algorithms for the Based on the resources a task uses, it can be broadly clas-
cloud platforms. Efficient resource/data allocation would sified into cpu-intensive, memory-intensive, disk-intensive,
help reduce the number of virtual machines used and in network-intensive. In a practical scenario, it might not be
turn reduce the carbon footprint leading to a lot of energy possible to categorize a task as belonging to one of the
saving [2]. Scheduling in MapReduce can be seen analogous above categories. A task will have attributes of more than
to this problem. If the scheduling algorithms are designed one categories mentioned above and to perfectly describe the
in a more intelligent way to avoid overloading any node and true nature of the task it should be characterized as being
utilize most of the resources on a particular node, the runtime a weighted-linear combination of parameters from each of
of the jobs could be lowered to a greater extent leading to these categories. We represent true and complete nature of
a lot of energy saving. This paper deals with scheduling of −
→
a task through its TaskVector Tk as defined below.
jobs on MapReduce cluster without degrading their runtime
while still maintaining the cost savings that the providers →
−
T k = Ecpu e1 + Emem e2 + Edisk e3 + Enw e4 (1)
expect.
Recently, MapReduce [3] has become a standard pro- where Ex (x is cpu, mem, disk, nw) is a resource usage
gramming model for large scale data analysis. It has seen pattern for cpu, memory, disk and network of a particular
725
1: TaskAssignment(task, TT)
2: Φ = getHardwareSpecifications(TT)
3: Σ = getNetworkDistance(taskk , TT)
→
−
4: T = getTaskVector(taskk )
→
−k
5: T compound = getVectorsOfTasksOnTaskTracker(TT)
6: compatibility = classifier(taskk ,
→
− → −
{Φ, Σ, T k , T compound })
7: if compatibility ≥ Cml then
8: return TRUE
9: else
10: return FALSE
11: end if
Figure 1. Task Assignment algorithm following the machine learning
approach
Figure 2. Task Selection algorithm: The received task is tested for
P (taskk = compatible | F ) = compatibility on a Incremental Naive-Bayes classifier and then its accepted
if the posterior probability is greater than equal to Cml .
P (F |taskk = compatible) × P (taskk = compatible)
P (F )
(3) 2) Heuristic-based Algorithm: In this section we try to
The quantity P (F |taskk = compatible) thus becomes: present a heuristic-based algorithm to decide whether a
P (F |taskk = compatible) task is compatible on a particular TaskTracker. Whenever
the Task Selection algorithm queries for the compatibility
= P (f1 , f2 , ..f4 |taskk = compatible) (4) of a task on a TaskTracker, this algorithm runs the Task
Compatibility Test.
where f1 , f2 , ..f4 are the features of the classifier Task Compatibility Test: The compound TaskVector of
− →
→ −
({Φ, Σ, T k , T compound (i)}). We assume that all the fea- the tasks currently running on the TaskTracker T Ti , is given
tures are independent of each other (Naive-Bayes assump- by equation 2.
tion). Thus, →
− →
− →
− →
−
T compound (i) = T 1 + T 2 + .. + T n
P (F |taskk = compatible)
−
→
4
Each of the TaskVector Tk in Task Queue, for all k ∈
= P (fj |taskk = compatible) (5) [1, n], is represented as follows,
j=1 →
−
T k = Ecpu e1 + Emem e2 + Edisk e3 + Enw e4 (6)
The above equation forms the foundation of learning in →
−
our classifier. The classifier uses results of the decisions We obtain T availability by calculating the difference of
−→
made in the past to make the current decision. This is the total resources (denoted by TR which is vector with
achieved by keeping track of past decisions and their out- maximum magnitude i.e. 100 for each component) and the
→
−
comes in the form of posterior probabilities. T compound (i), which is given by,
If this posterior probability is greater than or equal to the →
− −
→ → −
administrator configured Minimum Acceptance Probability T availability = TR − T compound (7)
Cml , then the taskk is considered for scheduling on T Ti −
→
Assuming that Tk is scheduled on the TaskTracker, the
(algorithm on Figure 1).
amount of unused resources on it is calculated through the
Our algorithm tries to train the classifier incrementally following equation
after every task is completed. Every TaskTracker monitors
itself for the race condition on its resources by checking →
− →
− →
−
T unused = T availability − T k (8)
if the particular TaskTracker is overloaded and sends the
feedback corresponding to the previous decision made on it Though there are many similarity/distance calculation
after the completion of every task. If a negative feedback is functions, we have chosen cosine similarity as its easier
received from a TaskTracker pertinent to previous task allo- to calculate the measure of the similarities/dissimilarities in
cation, the classifier is re-trained at the JobTracker (Figure resource usage patterns of the tasks through the knowledge
2), to avoid such mistakes in future. This will help in keeping of angle between the vectors. Since, the resource usage
−
→ →
−
the classifier model updated with the current scenario of the patterns between Tk and T compound (i) should be dissimilar
cluster. (since tasks should not conflict with each other), the pattern
726
1: TaskAssignment(task, TT) Table I
→
− H ADOOP AND A LGORITHM PARAMETERS
2: T = getTaskVector(taskk )
→
−k
3: T compound = getVectorsOfTasksOnTaskTracker(TT)
→
− −
→ → − Parameter Description
4: T availability = TR - T compound
→
− →
− →
− HDFS block size 64 MB
5: T unused = T availability
→ − Tk Speculative execution enabled
− → −
T k · T availability (i)
6: angle = arccos → − →
− Heartbeat interval 4 seconds
| T k || T availability (i)|
→
− Number of map task slots per node 4
7: valuecombination = α × angle + (1 − α) × | T unused | Number of reduce task slots per node 2
8: if valuecombination ≤ Cheu then Replication factor 3
9: return TRUE Number of queues in Capacity sched- 3
10: else uler
11: return FALSE Cluster resources allocated for each 33.3%
12: end if queue in Capacity scheduler
alpha, α 0.5
Figure 3. Task Assignment algorithm following the heuristic-based Maximum Value for Acceptance, Cheu 60
approach Minimum Acceptance Probability Cml 0.45
−
→
−
→ →
− out the Tk . The TaskVector is captured by monitoring the
should be similar in case of Tk and T availability (i). The resource usage related to cpu, memory, disk and network
angle (in degrees) between the vectors is given by, through ‘atop’ [8] utility.
→ − → −
T k · T availability (i) A. Testing Environment
angle = arccos → − → − (9)
| T k || T availability (i)| Our testing environment consisted of 12 nodes (one
master and 11 slaves). The master node was designated to
Lesser the angle value, lesser the dissimilarity between
−
→ →
− run JobTracker and NameNode [9], while the slaves ran
Tk and T availability (i) (greater the dissimilarity between
→ →
− − TaskTrackers and DataNodes (slave daemon of HDFS). The
Tk , T compound (i)) and hence better the compatibility. At the nodes were heterogeneous with Intel core 2 Duo, 2.4 GHz
→
−
same time, lesser the | T unused | more the resource utilization processors, with a single hard disk capacity ranging from 80
→
−
on a particular TaskTracker. The negative T unused (when GB to 2 TB, and 2 or 4 GB RAM. The nodes were inter-
−
→
the resource requirement pattern of Tk is greater than the connected with a gigabit ethernet switch. All the nodes were
→
−
available resources T availability (i)) indicates that the task installed with CentOS release 5.5 (Final) operating system
→
−
will overload the node, hence lower the | T unused | value, with Java 1.6.0 26.
the better.
B. Experiments Description
Bringing them into a single equation, we get the follow-
ing. There are very few schedulers which are minimally re-
source aware of which Fair [10] and Capacity schedulers
→
− [11] are widely acclaimed. Capacity scheduler, developed
valuecombination = α × angle + (1 − α) × | T unused | (10)
by Yahoo is one of the most successful schedulers currently
The values α and Cheu are administrator configured used in the industry for real-world applications with resource
values which can be set based on the requirements of a division at cluster level. Hence, we have chosen it as the
particular cluster. If the valuecombination is less than or baseline for testing our algorithms. Table I provides the
−
→
equal to Maximum Value for Acceptance Cheu , then the Tk Hadoop and algorithm parameters used for testing. Exper-
is considered for scheduling on T Ti . (Figure 3). iments were conducted on jobs like terasort, grep, web
crawling, wordcount, video file format conversion which are
III. E VALUATION AND R ESULTS
close to real-world applications.
We have employed our algorithms into scheduler plu-
gins for MapReduce framework. We customized the C. Comparison on runtime of the jobs
method, ‘List<Task> assignTasks(TaskTrackerStatus)’ of We compare the overall runtime of the jobs by varying
org.apache.hadoop.mapred.TaskScheduler class. Upon the the number of input jobs given to the scheduler. From our
job submission our algorithm calculates the TaskVector of experiments, we conclude that the overall runtime saved
the job by running a sample of map/reduce tasks. The could go up to 21% in heuristic based approach and 27%
TaskVector is constructed separately for ‘map’ and ‘reduce’ in machine learning based approach when compared to
phases and ‘sort’ and ‘shuffle’ phases are merged with the Capacity scheduler. The amount of savings also varies with
reduce phase. In our experiments, five map/reduce tasks the number of jobs given to the cluster. As the number of
are run and the average amongst them is calculated to find jobs increase, the scheduler finds it easier to find diverse jobs
727
to be scheduled together so that they would not conflict.
As the number of jobs increase the savings in the overall
runtime also increase, as can be seen from Figures 4 and 5.
728
learning based algorithm is proposed which incorporates a [2] Nitesh Maheshwari, Radheshyam Nanduri, and Vasudeva
Naive-Bayes classifier with MapReduce related features like Varma. Dynamic energy efficient data placement and cluster
‘used map and reduce slots’, ‘map/reduce time average’ etc. reconfiguration algorithm for mapreduce framework. Future
Generation Computer Systems, 28(1):119 – 127, 2012.
In [14], the authors present a design of an agile data
center with integrated server and storage virtualization along [3] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified
with the implementation of an end-to-end management layer. data processing on large clusters. Commun. ACM, 51(1):107–
They also propose a novel VectorDot scheme (similar to our 113, 2008.
vector model) to address the complexity introduced by the
[4] Jaideep Dhok, Nitesh Maheshwari, and Vasudeva Varma.
data center topology and the multidimensional nature of the Learning based opportunistic admission control algorithm for
loads on resources. mapreduce as a service. In ISEC ’10: Proceedings of the
In [15], the authors present an approach known as ‘RS 3rd India software engineering conference, pages 153–160.
Maximizer’ to maximize the utilization of each resource set ACM, 2010.
which can optimize the performance of the Hadoop job at
[5] JobTracker Architecture. http://hadoop.apache.org/common/
hand, and also obtain potential energy savings by identifying docs/current/mapred tutorial.html.
the unused resources.
[6] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern
V. C ONCLUSION AND F UTURE W ORK
Classification (2nd Edition). Wiley-Interscience, 2 edition,
In this paper, we presented scheduling algorithms that November 2000.
try to avoid race condition for resources on a node of the
cluster. We have discussed the importance of having the [7] Geoffrey Holmes Bernhard Pfahringer Peter Reutemann Ian
H. Witten Mark Hall, Eibe Frank. The weka data mining
information regarding every task and node of the cluster. software. SIGKDD Explorations, 11(1), 2009.
In this context, we presented Task Selection algorithm and
Task Assignment algorithm which select the task that is best [8] ’Atop’ utility. http://www.atoptool.nl/.
suitable on a particular node. We have successfully achieved
our aim to avoid overloading any node on the cluster, utilize [9] Hadoop Distributed File System. http://hadoop.apache.org/
common/docs/current/hdfs design.html.
maximum resources on a particular node thereby decrease
race condition for resources and overall runtime of the jobs. [10] Fair Scheduler. http://hadoop.apache.org/common/docs/r0.20.
Since our algorithm mainly focuses on Task Assignment, 2/fair scheduler.html.
it can also be plugged-in in conjunction to a Fair or Capacity
[11] Capacity Scheduler. http://hadoop.apache.org/common/docs/
scheduler. This would add Fair and Capacity schedulers’ job
r0.20.2/capacity scheduler.html.
selection features into our algorithm. Though the proposed
algorithms are designed specifically for MapReduce Frame- [12] Quan Chen, Daqiang Zhang, Minyi Guo, Qianni Deng, and
work, they can very well be implemented in any distributed Song Guo. Samr: A self-adaptive mapreduce scheduling
environment. For example, we can incorporate our resource- algorithm in heterogeneous environment. In Computer and
Information Technology (CIT), 2010 IEEE 10th International
aware algorithms to effectively schedule virtual machines on
Conference on, 29 2010.
a given number of physical machines.
Future work of our research include tuning of MapRe- [13] J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres, and
duce framework with different configuration parameters for E. Ayguade and. Performance management of accelerated
finding best runtime of the jobs through machine learning mapreduce workloads in heterogeneous clusters. In Parallel
Processing (ICPP), 2010 39th International Conference on,
techniques. We plan to propose scale up/down algorithms
pages 653 –662, 2010.
with our resource-aware scheduling to switch on/off the
virtual machines/nodes based on the resource usage of the [14] Aameek Singh, Madhukar Korupolu, and Dushmanta Mo-
cluster to save energy. hapatra. Server-storage virtualization: integration and load
balancing in data centers. In Proceedings of the 2008
ACKNOWLEDGMENT ACM/IEEE conference on Supercomputing, SC ’08, pages
53:1–53:12, Piscataway, NJ, USA, 2008. IEEE Press.
The authors would like to thank students and faculty
of Search and Information Extraction Lab (SIEL) for their [15] Karthik Kambatla, Abhinav Pathak, and Himabindu Pucha.
support and constant feedback throughout our work. Towards optimizing hadoop provisioning in the cloud. In
Proceedings of the 2009 conference on Hot topics in cloud
R EFERENCES computing, HotCloud’09, pages 22–22, Berkeley, CA, USA,
2009. USENIX Association.
[1] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James
Broberg, and Ivona Brandic. Cloud computing and emerging
it platforms: Vision, hype, and reality for delivering comput-
ing as the 5th utility. Future Generation Computer Systems,
25(6):599 – 616, 2009.
729