Job Aware Scheduling Algorithm For MapReduce Framework

The document discusses a job aware scheduling algorithm for the MapReduce framework. It proposes an approach that takes into account the compatibility and resource usage of tasks running on nodes to improve scheduling. The paper describes characterizing tasks using vectors, a task selection algorithm, and a machine learning and heuristic based task assignment algorithm.

Uploaded by

saahithyaalagarsamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Job Aware Scheduling Algorithm For MapReduce Framework

Uploaded by

saahithyaalagarsamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2011 Third IEEE International Conference on Coud Computing Technology and Science

Job Aware Scheduling Algorithm for MapReduce Framework

Radheshyam Nanduri, Nitesh Maheshwari, Reddyraja. A and Vasudeva Varma

Search and Information Extraction Lab (SIEL)
IIIT Hyderabad, India
Email: {radheshyam.nanduri, nitesh.maheshwari, reddy.raja}@research.iiit.ac.in, [email protected]

Abstract—MapReduce framework has received a wide ac- a tremendous growth in recent years especially for text
claim over the past few years for large scale computing. It has indexing, log processing, web crawling, data mining, ma-
become a standard paradigm for batch oriented workloads. As chine learning etc [4]. MapReduce is best suited for batch-
the adoption of this paradigm has increased rapidly, scheduling
of these MapReduce jobs has become a problem of great oriented jobs which tend to run for hours to days over a
interest in research community. We propose an approach which large dataset on a limited resources of the cluster. Hence,
tries to maintain harmony among the jobs running on the effective scheduling mechanism is vital to make sure that
cluster, and in turn decrease their runtime. In our model, the the resources are cogently used.
scheduler is made aware of different types of jobs running on In this paper, we try to present an approach that takes into
the cluster. The scheduler tries to allocate a task on a node
if the incoming task does not affect the tasks already running account the interoperability of MapReduce tasks running on
on that node. From the list of available pending tasks, our a node of the cluster. Our algorithms try to ensure that a
algorithm selects the one that is most compatible with the task running on a node would not affect the performance
tasks already running on that node. We bring up heuristic and of other tasks. This requires the scheduler to be aware of
machine learning based solutions to our approach and try to the resource usage information of each task running on the
maintain a resource balance on the cluster by not overloading
any of the nodes, thereby reducing the overall runtime of the cluster. We try to present a heuristic approach as well as a
jobs. The results show a saving of runtime of around 21% in machine learning based approach for task scheduling. Our
the case of heuristic based approach and around 27% in the algorithm tries to select a task from the list of pending tasks
case of machine learning based approach when compared to that is most compatible with the tasks already running on
Yahoo’s Capacity scheduler. the node.
Keywords-Cloud Computing; Job Scheduling; Machine
Learning; MapReduce;
II. P ROPOSED A LGORITHMS
In our approach, we try to monitor the resource usage up
I. I NTRODUCTION to the level of each task and each node in the cluster as the
Cloud computing has emerged as the advanced form of performance of tasks and nodes is vital in any distributed
distributed computing, parallel processing and grid com- environment. Our algorithm tries to maintain stability at
puting. It is a new and promising paradigm delivering IT node and cluster level through intelligent scheduling of the
services as computing utilities. As Clouds are designed to tasks. The uniqueness of this scheduler lies in its ability to
provide services to external users, providers need to be take into account the resource usage pattern of the job before
compensated for sharing their resources and capabilities its tasks are scheduled on the cluster.
[1]. Since these computing resources are finite, there is A. Task Characteristics
a need for efficient resource allocation algorithms for the Based on the resources a task uses, it can be broadly clas-
cloud platforms. Efficient resource/data allocation would sified into cpu-intensive, memory-intensive, disk-intensive,
help reduce the number of virtual machines used and in network-intensive. In a practical scenario, it might not be
turn reduce the carbon footprint leading to a lot of energy possible to categorize a task as belonging to one of the
saving [2]. Scheduling in MapReduce can be seen analogous above categories. A task will have attributes of more than
to this problem. If the scheduling algorithms are designed one categories mentioned above and to perfectly describe the
in a more intelligent way to avoid overloading any node and true nature of the task it should be characterized as being
utilize most of the resources on a particular node, the runtime a weighted-linear combination of parameters from each of
of the jobs could be lowered to a greater extent leading to these categories. We represent true and complete nature of
a lot of energy saving. This paper deals with scheduling of −
→
a task through its TaskVector Tk as defined below.
jobs on MapReduce cluster without degrading their runtime
while still maintaining the cost savings that the providers →
−
T k = Ecpu e1 + Emem e2 + Edisk e3 + Enw e4 (1)
expect.
Recently, MapReduce [3] has become a standard pro- where Ex (x is cpu, mem, disk, nw) is a resource usage
gramming model for large scale data analysis. It has seen pattern for cpu, memory, disk and network of a particular

978-0-7695-4622-3/11 $26.00 © 2011 IEEE 724

DOI 10.1109/CloudCom.2011.112
job respectively, with Ex taking the values 0 ≤ Ex ≤ 100 C. Task Assignment Algorithm
and e1 , e2 , e3 , e4 are basis vectors.
Task Assignment algorithm is the second part of our
B. Task Selection Algorithm scheduling algorithm. This algorithm is primarily responsi-
The JobTracker (master node) [5] receives an incoming ble to take the decision whether a task can be scheduled
job through the JobClient (client). The received job is queued on a particular TaskTracker. The main objective of this
up into Pending Job List (J). Our Task Selection Algorithm algorithm is to avoid overloading any of the TaskTrackers
takes the pending jobs from the J and tries to split it into by meticulous scheduling of only compatible tasks on a
its sub units (map and reduce tasks). particular node. By compatible task, we mean task that does
−→ not affect already running tasks on that node. We present
Task Vector (Tk ) Calculation: Before scheduling any task
of a particular job, the algorithm calculates the TaskVector two approaches to our algorithm: machine learning based
for the task. As task can be either a map or a reduce, each approach and a heuristic-based approach.
→
− 1) Machine Learning Approach: In this section, we
task has its own corresponding Map-TaskVector ( T kmap )
→
− present a machine learning based approach of our algo-
and Reduce-TaskVector ( T kreduce ). We assume the Map-
TaskVector of a task of a particular job to be logically rithm. We employ an automatically supervised Incremental
equivalent to Map-TaskVector of the whole job since same Naive-Bayes classifier [6], [7] to decide whether the task
code runs for each map task of the job. And same is the is compatible on a particular TaskTracker. We have used
case with reduce tasks. Ideally, MapReduce jobs run over Incremental Naive Bayes classifier since, the features used
a large dataset with thousands of map tasks and hence are in our algorithm are independent to each other. Moreover
−
→ this classifier is fast, consumes low memory and cpu, which
usually very long running. To calculate Tk , an initial sample
of map/reduce tasks are executed. The algorithm employs an avoids any overhead on the scheduler. Whenever the Task
event capturing mechanism on the TaskTrackers [5] which Selection algorithm queries for the compatibility of a task
listens to events related to cpu, memory, disk and network on a TaskTracker, our algorithm computes the compatibility
to monitor resource usage characteristics of that particular through the outcome of the classifier. We consider the
task and creates a TaskVector. The TaskVectors calculated following features to train the classifier:
through these few initial map/reduce tasks are averaged to • Hardware Specifications of TaskTracker (Φ)
−
→ −
→
get Tk . After Tk is calculated, the remaining tasks of that • Network distance between the nodes of task execution
particular job are said to be ready for scheduling. The initial and the corresponding datasplit of the task (Σ)
−
→
sample of tasks which are run to calculate the TaskVector • TaskVector of Incoming Task (Tk )
process different data splits and may also be scheduled • TaskVectors of Tasks Running on TaskTracker
→
−
on different nodes. To overcome the minor differences in ( T compound (i))
the TaskVectors generated due to the heterogeneity in the →−
cluster and the data splits processed by the tasks, we use an T compound (i) is the vector addition of the vectors of all
−
→ the tasks currently running on the TaskTracker T Ti , given
average TaskVector Tk , which could work as an approximate
by,
TaskVector for the rest of the tasks in that particular job.
Whenever a TaskTracker T T (slave node) has an empty
slot for a task, our Task Selection algorithm checks if the →
− →
− →
− →
−
→
− →
− T compound (i) = T 1 + T 2 + .. + T n (2)
TaskVector List T contains T kmap for job Jk from J. If
→
− →
−
T does not contain T kmap , it schedules the TaskVector where n is the number of tasks currently running on that
→
−
Calculation Algorithm on T T to calculate T kmap . Other- TaskTracker.
→
− →
−
wise, if T contains T kmap , then the algorithm schedules The above discussed features form the feature set F =
→
− → −
the TaskVector calculation algorithm on T T to calculate {Φ, Σ, T k , T compound (i)}. Whenever Task Selection algo-
→
−
T kreduce in a similar fashion, duly taking into consideration rithm queries the Task Assignment algorithm for the compat-
that all the map tasks have finished for Jk . But, if the ibility of a task taskk on a TaskTracker T T , the algorithm
TaskVectors for a job’s map and reduce components have tries to test the compatibility on a Incremental Naive-Bayes
already been calculated, the algorithm tries to queue up the classifier model. taskk = compatible denotes the event
remaining map/reduce tasks in to the Task Queue. This queue that the taskk would be compatible with the other tasks
is filled up with the tasks asynchronously as the new jobs running on T T . The probability P (taskk = compatible|F )
arrive at the JobTracker. Then, our algorithm takes the Task is conditional on the feature set F . The classifier uses
Queue as the input and picks up the task taskk that has the prior knowledge accumulated to make decisions for
arrived first. This taskk is submitted to Task Assignment the compatibility of a task. To achieve this, we compute
Algorithm along with details of T T to get the approval for the posterior probability P (taskk = compatible|F ) using
its scheduling. Bayes theorem:

725
1: TaskAssignment(task, TT)
2: Φ = getHardwareSpecifications(TT)
3: Σ = getNetworkDistance(taskk , TT)
→
−
4: T = getTaskVector(taskk )
→
−k
5: T compound = getVectorsOfTasksOnTaskTracker(TT)
6: compatibility = classifier(taskk ,
→
− → −
{Φ, Σ, T k , T compound })
7: if compatibility ≥ Cml then
8: return TRUE
9: else
10: return FALSE
11: end if
Figure 1. Task Assignment algorithm following the machine learning
approach
Figure 2. Task Selection algorithm: The received task is tested for
P (taskk = compatible | F ) = compatibility on a Incremental Naive-Bayes classifier and then its accepted
if the posterior probability is greater than equal to Cml .
P (F |taskk = compatible) × P (taskk = compatible)
P (F )
(3) 2) Heuristic-based Algorithm: In this section we try to
The quantity P (F |taskk = compatible) thus becomes: present a heuristic-based algorithm to decide whether a
P (F |taskk = compatible) task is compatible on a particular TaskTracker. Whenever
the Task Selection algorithm queries for the compatibility
= P (f1 , f2 , ..f4 |taskk = compatible) (4) of a task on a TaskTracker, this algorithm runs the Task
Compatibility Test.
where f1 , f2 , ..f4 are the features of the classifier Task Compatibility Test: The compound TaskVector of
− →
→ −
({Φ, Σ, T k , T compound (i)}). We assume that all the fea- the tasks currently running on the TaskTracker T Ti , is given
tures are independent of each other (Naive-Bayes assump- by equation 2.
tion). Thus, →
− →
− →
− →
−
T compound (i) = T 1 + T 2 + .. + T n
P (F |taskk = compatible)
−
→
4
Each of the TaskVector Tk in Task Queue, for all k ∈
= P (fj |taskk = compatible) (5) [1, n], is represented as follows,
j=1 →
−
T k = Ecpu e1 + Emem e2 + Edisk e3 + Enw e4 (6)
The above equation forms the foundation of learning in →
−
our classifier. The classifier uses results of the decisions We obtain T availability by calculating the difference of
−→
made in the past to make the current decision. This is the total resources (denoted by TR which is vector with
achieved by keeping track of past decisions and their out- maximum magnitude i.e. 100 for each component) and the
→
−
comes in the form of posterior probabilities. T compound (i), which is given by,
If this posterior probability is greater than or equal to the →
− −
→ → −
administrator configured Minimum Acceptance Probability T availability = TR − T compound (7)
Cml , then the taskk is considered for scheduling on T Ti −
→
Assuming that Tk is scheduled on the TaskTracker, the
(algorithm on Figure 1).
amount of unused resources on it is calculated through the
Our algorithm tries to train the classifier incrementally following equation
after every task is completed. Every TaskTracker monitors
itself for the race condition on its resources by checking →
− →
− →
−
T unused = T availability − T k (8)
if the particular TaskTracker is overloaded and sends the
feedback corresponding to the previous decision made on it Though there are many similarity/distance calculation
after the completion of every task. If a negative feedback is functions, we have chosen cosine similarity as its easier
received from a TaskTracker pertinent to previous task allo- to calculate the measure of the similarities/dissimilarities in
cation, the classifier is re-trained at the JobTracker (Figure resource usage patterns of the tasks through the knowledge
2), to avoid such mistakes in future. This will help in keeping of angle between the vectors. Since, the resource usage
−
→ →
−
the classifier model updated with the current scenario of the patterns between Tk and T compound (i) should be dissimilar
cluster. (since tasks should not conflict with each other), the pattern

726
1: TaskAssignment(task, TT) Table I
→
− H ADOOP AND A LGORITHM PARAMETERS
2: T = getTaskVector(taskk )
→
−k
3: T compound = getVectorsOfTasksOnTaskTracker(TT)
→
− −
→ → − Parameter Description
4: T availability = TR - T compound
→
− →
− →
− HDFS block size 64 MB
5: T unused = T availability
→ − Tk Speculative execution enabled
− → −
T k · T availability (i)
6: angle = arccos → − →
− Heartbeat interval 4 seconds
| T k || T availability (i)|
→
− Number of map task slots per node 4
7: valuecombination = α × angle + (1 − α) × | T unused | Number of reduce task slots per node 2
8: if valuecombination ≤ Cheu then Replication factor 3
9: return TRUE Number of queues in Capacity sched- 3
10: else uler
11: return FALSE Cluster resources allocated for each 33.3%
12: end if queue in Capacity scheduler
alpha, α 0.5
Figure 3. Task Assignment algorithm following the heuristic-based Maximum Value for Acceptance, Cheu 60
approach Minimum Acceptance Probability Cml 0.45
−
→
−
→ →
− out the Tk . The TaskVector is captured by monitoring the
should be similar in case of Tk and T availability (i). The resource usage related to cpu, memory, disk and network
angle (in degrees) between the vectors is given by, through ‘atop’ [8] utility.
→ − → −
T k · T availability (i) A. Testing Environment
angle = arccos → − → − (9)
| T k || T availability (i)| Our testing environment consisted of 12 nodes (one
master and 11 slaves). The master node was designated to
Lesser the angle value, lesser the dissimilarity between
−
→ →
− run JobTracker and NameNode [9], while the slaves ran
Tk and T availability (i) (greater the dissimilarity between
→ →
− − TaskTrackers and DataNodes (slave daemon of HDFS). The
Tk , T compound (i)) and hence better the compatibility. At the nodes were heterogeneous with Intel core 2 Duo, 2.4 GHz
→
−
same time, lesser the | T unused | more the resource utilization processors, with a single hard disk capacity ranging from 80
→
−
on a particular TaskTracker. The negative T unused (when GB to 2 TB, and 2 or 4 GB RAM. The nodes were inter-
−
→
the resource requirement pattern of Tk is greater than the connected with a gigabit ethernet switch. All the nodes were
→
−
available resources T availability (i)) indicates that the task installed with CentOS release 5.5 (Final) operating system
→
−
will overload the node, hence lower the | T unused | value, with Java 1.6.0 26.
the better.
B. Experiments Description
Bringing them into a single equation, we get the follow-
ing. There are very few schedulers which are minimally re-
source aware of which Fair [10] and Capacity schedulers
→
− [11] are widely acclaimed. Capacity scheduler, developed
valuecombination = α × angle + (1 − α) × | T unused | (10)
by Yahoo is one of the most successful schedulers currently
The values α and Cheu are administrator configured used in the industry for real-world applications with resource
values which can be set based on the requirements of a division at cluster level. Hence, we have chosen it as the
particular cluster. If the valuecombination is less than or baseline for testing our algorithms. Table I provides the
−
→
equal to Maximum Value for Acceptance Cheu , then the Tk Hadoop and algorithm parameters used for testing. Exper-
is considered for scheduling on T Ti . (Figure 3). iments were conducted on jobs like terasort, grep, web
crawling, wordcount, video file format conversion which are
III. E VALUATION AND R ESULTS
close to real-world applications.
We have employed our algorithms into scheduler plu-
gins for MapReduce framework. We customized the C. Comparison on runtime of the jobs
method, ‘List<Task> assignTasks(TaskTrackerStatus)’ of We compare the overall runtime of the jobs by varying
org.apache.hadoop.mapred.TaskScheduler class. Upon the the number of input jobs given to the scheduler. From our
job submission our algorithm calculates the TaskVector of experiments, we conclude that the overall runtime saved
the job by running a sample of map/reduce tasks. The could go up to 21% in heuristic based approach and 27%
TaskVector is constructed separately for ‘map’ and ‘reduce’ in machine learning based approach when compared to
phases and ‘sort’ and ‘shuffle’ phases are merged with the Capacity scheduler. The amount of savings also varies with
reduce phase. In our experiments, five map/reduce tasks the number of jobs given to the cluster. As the number of
are run and the average amongst them is calculated to find jobs increase, the scheduler finds it easier to find diverse jobs

727
to be scheduled together so that they would not conﬂict.
As the number of jobs increase the savings in the overall
runtime also increase, as can be seen from Figures 4 and 5.

Figure 6. Comparison of cpu requirement on a TaskTracker between

Capacity and heuristic based algorithms: The cpu requirement mostly stays
below 100% in case of heuristic based algorithm except for few surges.
The time stamp is shown in minutes.

Figure 4. Comparison of runtime (in hours) of the jobs between Capacity

and heuristic based algorithms: The amount of the saving in the runtime
of the jobs increases as the number of jobs increase.

Figure 7. Comparison of cpu requirement on a TaskTracker between

Capacity and machine learning based algorithms: The cpu requirement
mostly stays below 100% in case of machine learning based algorithm
except for few surges. The time stamp is shown in minutes.

map-slow and reduce-slow nodes by using historical infor-

mation. Whenever a task is found to be running slowly,
Figure 5. Comparison of runtime (in hours) of the jobs between Capacity
and machine learning based algorithms: The amount of the saving in the the algorithm selects slow tasks and launches backup tasks
runtime of the jobs increases as the number of jobs increase. accordingly on a faster node. In our approach, we try to
D. Comparison on resource usage understand the task compatibility on a node before a task is
scheduled to avoid ‘slow task’ condition for that particular
We present the effect of our scheduling algorithms on cpu
task on a node. And after a task is scheduled on a particular
usage of the TaskTracker. We monitored a random Task-
node, we try to monitor that node for overload condition
Tracker for its resource requirement. Figures 6 and 7 show
and re-train the classiﬁer accordingly to enhance decision
a snapshot of cpu requirement on a particular TaskTracker
making in future.
for a certain random period of time. From Figures 6 and 7,
we can see that the cpu requirement for the tasks running In [13], the authors have shown how MapReduce frame-
on the TaskTracker with Capacity scheduler reaches up to work can be leveraged to run heterogeneous sets of work-
250% (resource requirement but not the resource usage). loads, including accelerated and non-accelerated applica-
This situation has occurred due to un-awareness of the jobs tions, on top of heterogeneous clusters, composed of regular
in Capacity scheduler. On the contrary our algorithms try nodes and accelerator-enabled systems. The authors propose
not to schedule similar tasks on a TaskTracker and hence ‘adaptive scheduler’ which provides dynamic resource al-
not overload the TaskTracker, except for minor infrequent location across jobs, hardware afﬁnity when possible, and
surges. We also observe that the overall utilization of the would even be able to spread jobs’ tasks across accelerated
node is increased. Similar effect of our algorithms on other and non-accelerated nodes in order to meet performance
resources such as memory, disk and network have been goals in extreme conditions.
observed. In [4], the authors presented a utility based Job Admis-
sion algorithm for a MapReduce framework exposed as a
IV. R ELATED W ORK Software as a Service. The proposed Admission Control
In [12], the authors discuss a self-adaptive MapReduce algorithm accepts the jobs, based on the amount of utility
scheduling algorithm which tries to classify the nodes to they generate to the MapReduce service provider. A machine

728
learning based algorithm is proposed which incorporates a [2] Nitesh Maheshwari, Radheshyam Nanduri, and Vasudeva
Naive-Bayes classifier with MapReduce related features like Varma. Dynamic energy efficient data placement and cluster
‘used map and reduce slots’, ‘map/reduce time average’ etc. reconfiguration algorithm for mapreduce framework. Future
Generation Computer Systems, 28(1):119 – 127, 2012.
In [14], the authors present a design of an agile data
center with integrated server and storage virtualization along [3] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified
with the implementation of an end-to-end management layer. data processing on large clusters. Commun. ACM, 51(1):107–
They also propose a novel VectorDot scheme (similar to our 113, 2008.
vector model) to address the complexity introduced by the
[4] Jaideep Dhok, Nitesh Maheshwari, and Vasudeva Varma.
data center topology and the multidimensional nature of the Learning based opportunistic admission control algorithm for
loads on resources. mapreduce as a service. In ISEC ’10: Proceedings of the
In [15], the authors present an approach known as ‘RS 3rd India software engineering conference, pages 153–160.
Maximizer’ to maximize the utilization of each resource set ACM, 2010.
which can optimize the performance of the Hadoop job at
[5] JobTracker Architecture. http://hadoop.apache.org/common/
hand, and also obtain potential energy savings by identifying docs/current/mapred tutorial.html.
the unused resources.
[6] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern
V. C ONCLUSION AND F UTURE W ORK
Classification (2nd Edition). Wiley-Interscience, 2 edition,
In this paper, we presented scheduling algorithms that November 2000.
try to avoid race condition for resources on a node of the
cluster. We have discussed the importance of having the [7] Geoffrey Holmes Bernhard Pfahringer Peter Reutemann Ian
H. Witten Mark Hall, Eibe Frank. The weka data mining
information regarding every task and node of the cluster. software. SIGKDD Explorations, 11(1), 2009.
In this context, we presented Task Selection algorithm and
Task Assignment algorithm which select the task that is best [8] ’Atop’ utility. http://www.atoptool.nl/.
suitable on a particular node. We have successfully achieved
our aim to avoid overloading any node on the cluster, utilize [9] Hadoop Distributed File System. http://hadoop.apache.org/
common/docs/current/hdfs design.html.
maximum resources on a particular node thereby decrease
race condition for resources and overall runtime of the jobs. [10] Fair Scheduler. http://hadoop.apache.org/common/docs/r0.20.
Since our algorithm mainly focuses on Task Assignment, 2/fair scheduler.html.
it can also be plugged-in in conjunction to a Fair or Capacity
[11] Capacity Scheduler. http://hadoop.apache.org/common/docs/
scheduler. This would add Fair and Capacity schedulers’ job
r0.20.2/capacity scheduler.html.
selection features into our algorithm. Though the proposed
algorithms are designed specifically for MapReduce Frame- [12] Quan Chen, Daqiang Zhang, Minyi Guo, Qianni Deng, and
work, they can very well be implemented in any distributed Song Guo. Samr: A self-adaptive mapreduce scheduling
environment. For example, we can incorporate our resource- algorithm in heterogeneous environment. In Computer and
Information Technology (CIT), 2010 IEEE 10th International
aware algorithms to effectively schedule virtual machines on
Conference on, 29 2010.
a given number of physical machines.
Future work of our research include tuning of MapRe- [13] J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres, and
duce framework with different configuration parameters for E. Ayguade and. Performance management of accelerated
finding best runtime of the jobs through machine learning mapreduce workloads in heterogeneous clusters. In Parallel
Processing (ICPP), 2010 39th International Conference on,
techniques. We plan to propose scale up/down algorithms
pages 653 –662, 2010.
with our resource-aware scheduling to switch on/off the
virtual machines/nodes based on the resource usage of the [14] Aameek Singh, Madhukar Korupolu, and Dushmanta Mo-
cluster to save energy. hapatra. Server-storage virtualization: integration and load
balancing in data centers. In Proceedings of the 2008
ACKNOWLEDGMENT ACM/IEEE conference on Supercomputing, SC ’08, pages
53:1–53:12, Piscataway, NJ, USA, 2008. IEEE Press.
The authors would like to thank students and faculty
of Search and Information Extraction Lab (SIEL) for their [15] Karthik Kambatla, Abhinav Pathak, and Himabindu Pucha.
support and constant feedback throughout our work. Towards optimizing hadoop provisioning in the cloud. In
Proceedings of the 2009 conference on Hot topics in cloud
R EFERENCES computing, HotCloud’09, pages 22–22, Berkeley, CA, USA,
2009. USENIX Association.
[1] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James
Broberg, and Ivona Brandic. Cloud computing and emerging
it platforms: Vision, hype, and reality for delivering comput-
ing as the 5th utility. Future Generation Computer Systems,
25(6):599 – 616, 2009.

729

R
No ratings yet
R
4 pages
A Novel Deep Reinforcement Learning Scheme For Task Scheduling in Cloud Computing
No ratings yet
A Novel Deep Reinforcement Learning Scheme For Task Scheduling in Cloud Computing
18 pages
Kanniga IJASCA SplIssue ID 062 FinalVersion
No ratings yet
Kanniga IJASCA SplIssue ID 062 FinalVersion
13 pages
Machine Learning Approach To Select Optimal Task Scheduling Algorithm in Cloud
No ratings yet
Machine Learning Approach To Select Optimal Task Scheduling Algorithm in Cloud
16 pages
(IJCST-V12I4P1) :sunkari Mahesh, DR K. Ram Mohan Rao
No ratings yet
(IJCST-V12I4P1) :sunkari Mahesh, DR K. Ram Mohan Rao
7 pages
CCrrseach
No ratings yet
CCrrseach
5 pages
ProjPaper[1]_Edited-1
No ratings yet
ProjPaper[1]_Edited-1
6 pages
Task Scdeuling Algorithm
No ratings yet
Task Scdeuling Algorithm
8 pages
A Stochastic Approximation Approach For Foresighted Task Scheduling in Cloud Computing
No ratings yet
A Stochastic Approximation Approach For Foresighted Task Scheduling in Cloud Computing
27 pages
A Task Scheduling Algorithm Based On Load Balancing in Cloud Computing
No ratings yet
A Task Scheduling Algorithm Based On Load Balancing in Cloud Computing
2 pages
Mathematics 11 02126 v2
No ratings yet
Mathematics 11 02126 v2
18 pages
Final Viva
No ratings yet
Final Viva
36 pages
Improved Task Scheduling Model With Task Grouping For Cost and Time Optimization
No ratings yet
Improved Task Scheduling Model With Task Grouping For Cost and Time Optimization
5 pages
Cost and Performance Aware Scheduling Technique For Cloud Computing Environment
No ratings yet
Cost and Performance Aware Scheduling Technique For Cloud Computing Environment
11 pages
Guided by Done By: Investigating The Schedulability of Periodic Real-Time Tasks in Virtualized Cloud Environment
No ratings yet
Guided by Done By: Investigating The Schedulability of Periodic Real-Time Tasks in Virtualized Cloud Environment
31 pages
Adaptive Scheduling On Power-Aware Managed Data-Centers Using Machine Learning
No ratings yet
Adaptive Scheduling On Power-Aware Managed Data-Centers Using Machine Learning
8 pages
Paper 1
No ratings yet
Paper 1
9 pages
Scheduling Data Intensive Workloads Through Virtualization On MapReduce Based Clouds
No ratings yet
Scheduling Data Intensive Workloads Through Virtualization On MapReduce Based Clouds
12 pages
Task Scheduling Mechanisms
No ratings yet
Task Scheduling Mechanisms
17 pages
Job Scheduling in High Perfomance Computing
No ratings yet
Job Scheduling in High Perfomance Computing
6 pages
Aravind PDF
No ratings yet
Aravind PDF
29 pages
A Novel Self-Adaptive Multiclass Priority Algorith
No ratings yet
A Novel Self-Adaptive Multiclass Priority Algorith
25 pages
Cluster Resource Scheduling in Cloud Computing: Literature Review and Research Challenges
No ratings yet
Cluster Resource Scheduling in Cloud Computing: Literature Review and Research Challenges
46 pages
AI-driven Prediction Based Energy-Aware Fault-Tolerant Scheduling Scheme (PEFS) For Cloud Data Center Abstract
No ratings yet
AI-driven Prediction Based Energy-Aware Fault-Tolerant Scheduling Scheme (PEFS) For Cloud Data Center Abstract
16 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Task Scheduling Model One
No ratings yet
Task Scheduling Model One
4 pages
DRLBTSA: Deep Reinforcement Learning Based Task Scheduling Algorithm in Cloud Computing
No ratings yet
DRLBTSA: Deep Reinforcement Learning Based Task Scheduling Algorithm in Cloud Computing
29 pages
s10462-024-10756-9
No ratings yet
s10462-024-10756-9
42 pages
Kiran MB-2
No ratings yet
Kiran MB-2
17 pages
TMP 3135
No ratings yet
TMP 3135
13 pages
Multi-Objective Workflow Scheduling in Cloud System Based On Cooperative Multi-Swarm Optimization Algorithm
No ratings yet
Multi-Objective Workflow Scheduling in Cloud System Based On Cooperative Multi-Swarm Optimization Algorithm
13 pages
A Task Scheduling Algorithm With Improved Makespan Based On Prediction of Tasks Computation Time Algorithm For Cloud Computing
No ratings yet
A Task Scheduling Algorithm With Improved Makespan Based On Prediction of Tasks Computation Time Algorithm For Cloud Computing
11 pages
Survey On Machine Learning Based Scheduling in Cloud Computing
No ratings yet
Survey On Machine Learning Based Scheduling in Cloud Computing
5 pages
Analysis of Dynamic Workflow Scheduling Algorithm For Big Data Application
No ratings yet
Analysis of Dynamic Workflow Scheduling Algorithm For Big Data Application
5 pages
Tang2016 Article AnEnergy-EfficientTaskScheduli
No ratings yet
Tang2016 Article AnEnergy-EfficientTaskScheduli
20 pages
13.A New Task Scheduling Algorithm Based On Value and Time For Cloud Platform PDF
No ratings yet
13.A New Task Scheduling Algorithm Based On Value and Time For Cloud Platform PDF
8 pages
How Workflow Engines Should Talk to Resource Managers
No ratings yet
How Workflow Engines Should Talk to Resource Managers
14 pages
Sister 12juni 1271
No ratings yet
Sister 12juni 1271
14 pages
Schwarzkopf - Omega Algorithm
No ratings yet
Schwarzkopf - Omega Algorithm
14 pages
Hybrid Fault Tolerant Cost Aware Mechanism For Scientific Workflow in Cloud Computing
No ratings yet
Hybrid Fault Tolerant Cost Aware Mechanism For Scientific Workflow in Cloud Computing
11 pages
Chen 2021
No ratings yet
Chen 2021
12 pages
Flexible_Job-Shop_Scheduling_via_Graph_Neural_Network_and_Deep_Reinforcement_Learning
No ratings yet
Flexible_Job-Shop_Scheduling_via_Graph_Neural_Network_and_Deep_Reinforcement_Learning
11 pages
31
No ratings yet
31
7 pages
1-s2.0-S0950705123003131-main
No ratings yet
1-s2.0-S0950705123003131-main
14 pages
Published Patent Sample
No ratings yet
Published Patent Sample
15 pages
2021 Aaai TVW RL
No ratings yet
2021 Aaai TVW RL
9 pages
Optimizing Distributed Data Processing in Cloud Environments: Algorithms and Architectures for Cost Savings
No ratings yet
Optimizing Distributed Data Processing in Cloud Environments: Algorithms and Architectures for Cost Savings
24 pages
Memory Aware Optimized Hadoop MapReduce Model in Cloud Computing Environment
No ratings yet
Memory Aware Optimized Hadoop MapReduce Model in Cloud Computing Environment
11 pages
Performance and Cost-Efficient Spark Job Scheduling Based On Deep Reinforcement Learning in Cloud
No ratings yet
Performance and Cost-Efficient Spark Job Scheduling Based On Deep Reinforcement Learning in Cloud
16 pages
IJISAE 9 Yamuna+Pachipala 12 2631
No ratings yet
IJISAE 9 Yamuna+Pachipala 12 2631
11 pages
Task Scheduling in Cloud Computing
No ratings yet
Task Scheduling in Cloud Computing
13 pages
Energy-Aware Optimal Task Assignment For Mobile Heterogeneous Embedded Systems in Cloud Computing
No ratings yet
Energy-Aware Optimal Task Assignment For Mobile Heterogeneous Embedded Systems in Cloud Computing
6 pages
s10586-023-04024-8
No ratings yet
s10586-023-04024-8
27 pages
research 4
No ratings yet
research 4
12 pages
Cloud Computing
No ratings yet
Cloud Computing
10 pages
1063-Article Text-1696-1-10-20191020
No ratings yet
1063-Article Text-1696-1-10-20191020
7 pages
A Novel Resource Aware Scheduling With Multi-Criteria For Heterogeneous
No ratings yet
A Novel Resource Aware Scheduling With Multi-Criteria For Heterogeneous
10 pages
Adaptive Pricing and Online Scheduling for Distributed Machine Learning Jobs
No ratings yet
Adaptive Pricing and Online Scheduling for Distributed Machine Learning Jobs
18 pages
Dynamic Scheduling
No ratings yet
Dynamic Scheduling
6 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
STACK PROGRAMS
No ratings yet
STACK PROGRAMS
3 pages
CSS Grid Layout Guide _ CSS-Tricks
No ratings yet
CSS Grid Layout Guide _ CSS-Tricks
1 page
Robotic Lab 01
No ratings yet
Robotic Lab 01
3 pages
C# - OOPS Inheritance - Bestdotnettraining
No ratings yet
C# - OOPS Inheritance - Bestdotnettraining
11 pages
C Programming: Mark Allen Weiss
No ratings yet
C Programming: Mark Allen Weiss
16 pages
Ec302T Microprocessor and Microcontroller
No ratings yet
Ec302T Microprocessor and Microcontroller
6 pages
Garcia CVMarch 2015
No ratings yet
Garcia CVMarch 2015
25 pages
Unit-I: 1. Class
No ratings yet
Unit-I: 1. Class
53 pages
SQL Quick Guide PDF
No ratings yet
SQL Quick Guide PDF
7 pages
Std12 Computer Paper Set Upto July 2024
No ratings yet
Std12 Computer Paper Set Upto July 2024
170 pages
C Programming Notes
No ratings yet
C Programming Notes
3 pages
AN2770
No ratings yet
AN2770
8 pages
SSC-II Computer Science (2nd Set)
No ratings yet
SSC-II Computer Science (2nd Set)
8 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
SQL Test
No ratings yet
SQL Test
6 pages
hw2 Datapreproc
No ratings yet
hw2 Datapreproc
15 pages
JK Sioklahnd24t
No ratings yet
JK Sioklahnd24t
45 pages
Sample - IND ASSINGMENT-2-9 (ISP611)
No ratings yet
Sample - IND ASSINGMENT-2-9 (ISP611)
8 pages
Ioi Syllabus 2025
No ratings yet
Ioi Syllabus 2025
23 pages
Introduction To MATLAB
No ratings yet
Introduction To MATLAB
4 pages
B. Prashanth - CN Lab Manual (R22).Docx (2) (1)
No ratings yet
B. Prashanth - CN Lab Manual (R22).Docx (2) (1)
43 pages
Higher Computing National Qualifications: (C017/SQP009)
No ratings yet
Higher Computing National Qualifications: (C017/SQP009)
32 pages
CSC 217 Week 4
No ratings yet
CSC 217 Week 4
6 pages
Section 12 Lesson 1
No ratings yet
Section 12 Lesson 1
11 pages
Informed Search
No ratings yet
Informed Search
65 pages
Chapter 2 Searching and Sorting
No ratings yet
Chapter 2 Searching and Sorting
19 pages
Ug984 Vivado Microblaze Ref
No ratings yet
Ug984 Vivado Microblaze Ref
395 pages
Software Testing Unit 2 Personal Notes
No ratings yet
Software Testing Unit 2 Personal Notes
6 pages
GPU Computing 2
No ratings yet
GPU Computing 2
28 pages

Job Aware Scheduling Algorithm For MapReduce Framework

Uploaded by

Job Aware Scheduling Algorithm For MapReduce Framework

Uploaded by

2011 Third IEEE International Conference on Coud Computing Technology and Science

Job Aware Scheduling Algorithm for MapReduce Framework

Radheshyam Nanduri, Nitesh Maheshwari, Reddyraja. A and Vasudeva Varma

978-0-7695-4622-3/11 $26.00 © 2011 IEEE 724

Figure 6. Comparison of cpu requirement on a TaskTracker between

Figure 4. Comparison of runtime (in hours) of the jobs between Capacity

Figure 7. Comparison of cpu requirement on a TaskTracker between

map-slow and reduce-slow nodes by using historical infor-

You might also like