0% found this document useful (0 votes)
26 views

Job Aware Scheduling Algorithm For MapReduce Framework

The document discusses a job aware scheduling algorithm for the MapReduce framework. It proposes an approach that takes into account the compatibility and resource usage of tasks running on nodes to improve scheduling. The paper describes characterizing tasks using vectors, a task selection algorithm, and a machine learning and heuristic based task assignment algorithm.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Job Aware Scheduling Algorithm For MapReduce Framework

The document discusses a job aware scheduling algorithm for the MapReduce framework. It proposes an approach that takes into account the compatibility and resource usage of tasks running on nodes to improve scheduling. The paper describes characterizing tasks using vectors, a task selection algorithm, and a machine learning and heuristic based task assignment algorithm.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2011 Third IEEE International Conference on Coud Computing Technology and Science

Job Aware Scheduling Algorithm for MapReduce Framework

Radheshyam Nanduri, Nitesh Maheshwari, Reddyraja. A and Vasudeva Varma


Search and Information Extraction Lab (SIEL)
IIIT Hyderabad, India
Email: {radheshyam.nanduri, nitesh.maheshwari, reddy.raja}@research.iiit.ac.in, [email protected]

Abstract—MapReduce framework has received a wide ac- a tremendous growth in recent years especially for text
claim over the past few years for large scale computing. It has indexing, log processing, web crawling, data mining, ma-
become a standard paradigm for batch oriented workloads. As chine learning etc [4]. MapReduce is best suited for batch-
the adoption of this paradigm has increased rapidly, scheduling
of these MapReduce jobs has become a problem of great oriented jobs which tend to run for hours to days over a
interest in research community. We propose an approach which large dataset on a limited resources of the cluster. Hence,
tries to maintain harmony among the jobs running on the effective scheduling mechanism is vital to make sure that
cluster, and in turn decrease their runtime. In our model, the the resources are cogently used.
scheduler is made aware of different types of jobs running on In this paper, we try to present an approach that takes into
the cluster. The scheduler tries to allocate a task on a node
if the incoming task does not affect the tasks already running account the interoperability of MapReduce tasks running on
on that node. From the list of available pending tasks, our a node of the cluster. Our algorithms try to ensure that a
algorithm selects the one that is most compatible with the task running on a node would not affect the performance
tasks already running on that node. We bring up heuristic and of other tasks. This requires the scheduler to be aware of
machine learning based solutions to our approach and try to the resource usage information of each task running on the
maintain a resource balance on the cluster by not overloading
any of the nodes, thereby reducing the overall runtime of the cluster. We try to present a heuristic approach as well as a
jobs. The results show a saving of runtime of around 21% in machine learning based approach for task scheduling. Our
the case of heuristic based approach and around 27% in the algorithm tries to select a task from the list of pending tasks
case of machine learning based approach when compared to that is most compatible with the tasks already running on
Yahoo’s Capacity scheduler. the node.
Keywords-Cloud Computing; Job Scheduling; Machine
Learning; MapReduce;
II. P ROPOSED A LGORITHMS
In our approach, we try to monitor the resource usage up
I. I NTRODUCTION to the level of each task and each node in the cluster as the
Cloud computing has emerged as the advanced form of performance of tasks and nodes is vital in any distributed
distributed computing, parallel processing and grid com- environment. Our algorithm tries to maintain stability at
puting. It is a new and promising paradigm delivering IT node and cluster level through intelligent scheduling of the
services as computing utilities. As Clouds are designed to tasks. The uniqueness of this scheduler lies in its ability to
provide services to external users, providers need to be take into account the resource usage pattern of the job before
compensated for sharing their resources and capabilities its tasks are scheduled on the cluster.
[1]. Since these computing resources are finite, there is A. Task Characteristics
a need for efficient resource allocation algorithms for the Based on the resources a task uses, it can be broadly clas-
cloud platforms. Efficient resource/data allocation would sified into cpu-intensive, memory-intensive, disk-intensive,
help reduce the number of virtual machines used and in network-intensive. In a practical scenario, it might not be
turn reduce the carbon footprint leading to a lot of energy possible to categorize a task as belonging to one of the
saving [2]. Scheduling in MapReduce can be seen analogous above categories. A task will have attributes of more than
to this problem. If the scheduling algorithms are designed one categories mentioned above and to perfectly describe the
in a more intelligent way to avoid overloading any node and true nature of the task it should be characterized as being
utilize most of the resources on a particular node, the runtime a weighted-linear combination of parameters from each of
of the jobs could be lowered to a greater extent leading to these categories. We represent true and complete nature of
a lot of energy saving. This paper deals with scheduling of −

a task through its TaskVector Tk as defined below.
jobs on MapReduce cluster without degrading their runtime
while still maintaining the cost savings that the providers →

T k = Ecpu e1 + Emem e2 + Edisk e3 + Enw e4 (1)
expect.
Recently, MapReduce [3] has become a standard pro- where Ex (x is cpu, mem, disk, nw) is a resource usage
gramming model for large scale data analysis. It has seen pattern for cpu, memory, disk and network of a particular

978-0-7695-4622-3/11 $26.00 © 2011 IEEE 724


DOI 10.1109/CloudCom.2011.112
job respectively, with Ex taking the values 0 ≤ Ex ≤ 100 C. Task Assignment Algorithm
and e1 , e2 , e3 , e4 are basis vectors.
Task Assignment algorithm is the second part of our
B. Task Selection Algorithm scheduling algorithm. This algorithm is primarily responsi-
The JobTracker (master node) [5] receives an incoming ble to take the decision whether a task can be scheduled
job through the JobClient (client). The received job is queued on a particular TaskTracker. The main objective of this
up into Pending Job List (J). Our Task Selection Algorithm algorithm is to avoid overloading any of the TaskTrackers
takes the pending jobs from the J and tries to split it into by meticulous scheduling of only compatible tasks on a
its sub units (map and reduce tasks). particular node. By compatible task, we mean task that does
−→ not affect already running tasks on that node. We present
Task Vector (Tk ) Calculation: Before scheduling any task
of a particular job, the algorithm calculates the TaskVector two approaches to our algorithm: machine learning based
for the task. As task can be either a map or a reduce, each approach and a heuristic-based approach.

− 1) Machine Learning Approach: In this section, we
task has its own corresponding Map-TaskVector ( T kmap )

− present a machine learning based approach of our algo-
and Reduce-TaskVector ( T kreduce ). We assume the Map-
TaskVector of a task of a particular job to be logically rithm. We employ an automatically supervised Incremental
equivalent to Map-TaskVector of the whole job since same Naive-Bayes classifier [6], [7] to decide whether the task
code runs for each map task of the job. And same is the is compatible on a particular TaskTracker. We have used
case with reduce tasks. Ideally, MapReduce jobs run over Incremental Naive Bayes classifier since, the features used
a large dataset with thousands of map tasks and hence are in our algorithm are independent to each other. Moreover

→ this classifier is fast, consumes low memory and cpu, which
usually very long running. To calculate Tk , an initial sample
of map/reduce tasks are executed. The algorithm employs an avoids any overhead on the scheduler. Whenever the Task
event capturing mechanism on the TaskTrackers [5] which Selection algorithm queries for the compatibility of a task
listens to events related to cpu, memory, disk and network on a TaskTracker, our algorithm computes the compatibility
to monitor resource usage characteristics of that particular through the outcome of the classifier. We consider the
task and creates a TaskVector. The TaskVectors calculated following features to train the classifier:
through these few initial map/reduce tasks are averaged to • Hardware Specifications of TaskTracker (Φ)

→ −

get Tk . After Tk is calculated, the remaining tasks of that • Network distance between the nodes of task execution
particular job are said to be ready for scheduling. The initial and the corresponding datasplit of the task (Σ)


sample of tasks which are run to calculate the TaskVector • TaskVector of Incoming Task (Tk )
process different data splits and may also be scheduled • TaskVectors of Tasks Running on TaskTracker


on different nodes. To overcome the minor differences in ( T compound (i))
the TaskVectors generated due to the heterogeneity in the →−
cluster and the data splits processed by the tasks, we use an T compound (i) is the vector addition of the vectors of all

→ the tasks currently running on the TaskTracker T Ti , given
average TaskVector Tk , which could work as an approximate
by,
TaskVector for the rest of the tasks in that particular job.
Whenever a TaskTracker T T (slave node) has an empty
slot for a task, our Task Selection algorithm checks if the →
− →
− →
− →


− →
− T compound (i) = T 1 + T 2 + .. + T n (2)
TaskVector List T contains T kmap for job Jk from J. If

− →

T does not contain T kmap , it schedules the TaskVector where n is the number of tasks currently running on that


Calculation Algorithm on T T to calculate T kmap . Other- TaskTracker.

− →

wise, if T contains T kmap , then the algorithm schedules The above discussed features form the feature set F =

− → −
the TaskVector calculation algorithm on T T to calculate {Φ, Σ, T k , T compound (i)}. Whenever Task Selection algo-


T kreduce in a similar fashion, duly taking into consideration rithm queries the Task Assignment algorithm for the compat-
that all the map tasks have finished for Jk . But, if the ibility of a task taskk on a TaskTracker T T , the algorithm
TaskVectors for a job’s map and reduce components have tries to test the compatibility on a Incremental Naive-Bayes
already been calculated, the algorithm tries to queue up the classifier model. taskk = compatible denotes the event
remaining map/reduce tasks in to the Task Queue. This queue that the taskk would be compatible with the other tasks
is filled up with the tasks asynchronously as the new jobs running on T T . The probability P (taskk = compatible|F )
arrive at the JobTracker. Then, our algorithm takes the Task is conditional on the feature set F . The classifier uses
Queue as the input and picks up the task taskk that has the prior knowledge accumulated to make decisions for
arrived first. This taskk is submitted to Task Assignment the compatibility of a task. To achieve this, we compute
Algorithm along with details of T T to get the approval for the posterior probability P (taskk = compatible|F ) using
its scheduling. Bayes theorem:

725
1: TaskAssignment(task, TT)
2: Φ = getHardwareSpecifications(TT)
3: Σ = getNetworkDistance(taskk , TT)


4: T = getTaskVector(taskk )

−k
5: T compound = getVectorsOfTasksOnTaskTracker(TT)
6: compatibility = classifier(taskk ,

− → −
{Φ, Σ, T k , T compound })
7: if compatibility ≥ Cml then
8: return TRUE
9: else
10: return FALSE
11: end if
Figure 1. Task Assignment algorithm following the machine learning
approach
Figure 2. Task Selection algorithm: The received task is tested for
P (taskk = compatible | F ) = compatibility on a Incremental Naive-Bayes classifier and then its accepted
if the posterior probability is greater than equal to Cml .
P (F |taskk = compatible) × P (taskk = compatible)
P (F )
(3) 2) Heuristic-based Algorithm: In this section we try to
The quantity P (F |taskk = compatible) thus becomes: present a heuristic-based algorithm to decide whether a
P (F |taskk = compatible) task is compatible on a particular TaskTracker. Whenever
the Task Selection algorithm queries for the compatibility
= P (f1 , f2 , ..f4 |taskk = compatible) (4) of a task on a TaskTracker, this algorithm runs the Task
Compatibility Test.
where f1 , f2 , ..f4 are the features of the classifier Task Compatibility Test: The compound TaskVector of
− →
→ −
({Φ, Σ, T k , T compound (i)}). We assume that all the fea- the tasks currently running on the TaskTracker T Ti , is given
tures are independent of each other (Naive-Bayes assump- by equation 2.
tion). Thus, →
− →
− →
− →

T compound (i) = T 1 + T 2 + .. + T n
P (F |taskk = compatible)


4
 Each of the TaskVector Tk in Task Queue, for all k ∈
= P (fj |taskk = compatible) (5) [1, n], is represented as follows,
j=1 →

T k = Ecpu e1 + Emem e2 + Edisk e3 + Enw e4 (6)
The above equation forms the foundation of learning in →

our classifier. The classifier uses results of the decisions We obtain T availability by calculating the difference of
−→
made in the past to make the current decision. This is the total resources (denoted by TR which is vector with
achieved by keeping track of past decisions and their out- maximum magnitude i.e. 100 for each component) and the


comes in the form of posterior probabilities. T compound (i), which is given by,
If this posterior probability is greater than or equal to the →
− −
→ → −
administrator configured Minimum Acceptance Probability T availability = TR − T compound (7)
Cml , then the taskk is considered for scheduling on T Ti −

Assuming that Tk is scheduled on the TaskTracker, the
(algorithm on Figure 1).
amount of unused resources on it is calculated through the
Our algorithm tries to train the classifier incrementally following equation
after every task is completed. Every TaskTracker monitors
itself for the race condition on its resources by checking →
− →
− →

T unused = T availability − T k (8)
if the particular TaskTracker is overloaded and sends the
feedback corresponding to the previous decision made on it Though there are many similarity/distance calculation
after the completion of every task. If a negative feedback is functions, we have chosen cosine similarity as its easier
received from a TaskTracker pertinent to previous task allo- to calculate the measure of the similarities/dissimilarities in
cation, the classifier is re-trained at the JobTracker (Figure resource usage patterns of the tasks through the knowledge
2), to avoid such mistakes in future. This will help in keeping of angle between the vectors. Since, the resource usage

→ →

the classifier model updated with the current scenario of the patterns between Tk and T compound (i) should be dissimilar
cluster. (since tasks should not conflict with each other), the pattern

726
1: TaskAssignment(task, TT) Table I

− H ADOOP AND A LGORITHM PARAMETERS
2: T = getTaskVector(taskk )

−k
3: T compound = getVectorsOfTasksOnTaskTracker(TT)

− −
→ → − Parameter Description
4: T availability = TR - T compound

− →
− →
− HDFS block size 64 MB
5: T unused = T availability
 → − Tk  Speculative execution enabled
− → −
T k · T availability (i)
6: angle = arccos → − →
− Heartbeat interval 4 seconds
| T k || T availability (i)|

− Number of map task slots per node 4
7: valuecombination = α × angle + (1 − α) × | T unused | Number of reduce task slots per node 2
8: if valuecombination ≤ Cheu then Replication factor 3
9: return TRUE Number of queues in Capacity sched- 3
10: else uler
11: return FALSE Cluster resources allocated for each 33.3%
12: end if queue in Capacity scheduler
alpha, α 0.5
Figure 3. Task Assignment algorithm following the heuristic-based Maximum Value for Acceptance, Cheu 60
approach Minimum Acceptance Probability Cml 0.45



→ →
− out the Tk . The TaskVector is captured by monitoring the
should be similar in case of Tk and T availability (i). The resource usage related to cpu, memory, disk and network
angle (in degrees) between the vectors is given by, through ‘atop’ [8] utility.
→ − → − 
T k · T availability (i) A. Testing Environment
angle = arccos → − → − (9)
| T k || T availability (i)| Our testing environment consisted of 12 nodes (one
master and 11 slaves). The master node was designated to
Lesser the angle value, lesser the dissimilarity between

→ →
− run JobTracker and NameNode [9], while the slaves ran
Tk and T availability (i) (greater the dissimilarity between
→ →
− − TaskTrackers and DataNodes (slave daemon of HDFS). The
Tk , T compound (i)) and hence better the compatibility. At the nodes were heterogeneous with Intel core 2 Duo, 2.4 GHz


same time, lesser the | T unused | more the resource utilization processors, with a single hard disk capacity ranging from 80


on a particular TaskTracker. The negative T unused (when GB to 2 TB, and 2 or 4 GB RAM. The nodes were inter-


the resource requirement pattern of Tk is greater than the connected with a gigabit ethernet switch. All the nodes were


available resources T availability (i)) indicates that the task installed with CentOS release 5.5 (Final) operating system


will overload the node, hence lower the | T unused | value, with Java 1.6.0 26.
the better.
B. Experiments Description
Bringing them into a single equation, we get the follow-
ing. There are very few schedulers which are minimally re-
source aware of which Fair [10] and Capacity schedulers

− [11] are widely acclaimed. Capacity scheduler, developed
valuecombination = α × angle + (1 − α) × | T unused | (10)
by Yahoo is one of the most successful schedulers currently
The values α and Cheu are administrator configured used in the industry for real-world applications with resource
values which can be set based on the requirements of a division at cluster level. Hence, we have chosen it as the
particular cluster. If the valuecombination is less than or baseline for testing our algorithms. Table I provides the


equal to Maximum Value for Acceptance Cheu , then the Tk Hadoop and algorithm parameters used for testing. Exper-
is considered for scheduling on T Ti . (Figure 3). iments were conducted on jobs like terasort, grep, web
crawling, wordcount, video file format conversion which are
III. E VALUATION AND R ESULTS
close to real-world applications.
We have employed our algorithms into scheduler plu-
gins for MapReduce framework. We customized the C. Comparison on runtime of the jobs
method, ‘List<Task> assignTasks(TaskTrackerStatus)’ of We compare the overall runtime of the jobs by varying
org.apache.hadoop.mapred.TaskScheduler class. Upon the the number of input jobs given to the scheduler. From our
job submission our algorithm calculates the TaskVector of experiments, we conclude that the overall runtime saved
the job by running a sample of map/reduce tasks. The could go up to 21% in heuristic based approach and 27%
TaskVector is constructed separately for ‘map’ and ‘reduce’ in machine learning based approach when compared to
phases and ‘sort’ and ‘shuffle’ phases are merged with the Capacity scheduler. The amount of savings also varies with
reduce phase. In our experiments, five map/reduce tasks the number of jobs given to the cluster. As the number of
are run and the average amongst them is calculated to find jobs increase, the scheduler finds it easier to find diverse jobs

727
to be scheduled together so that they would not conflict.
As the number of jobs increase the savings in the overall
runtime also increase, as can be seen from Figures 4 and 5.

Figure 6. Comparison of cpu requirement on a TaskTracker between


Capacity and heuristic based algorithms: The cpu requirement mostly stays
below 100% in case of heuristic based algorithm except for few surges.
The time stamp is shown in minutes.

Figure 4. Comparison of runtime (in hours) of the jobs between Capacity


and heuristic based algorithms: The amount of the saving in the runtime
of the jobs increases as the number of jobs increase.

Figure 7. Comparison of cpu requirement on a TaskTracker between


Capacity and machine learning based algorithms: The cpu requirement
mostly stays below 100% in case of machine learning based algorithm
except for few surges. The time stamp is shown in minutes.

map-slow and reduce-slow nodes by using historical infor-


mation. Whenever a task is found to be running slowly,
Figure 5. Comparison of runtime (in hours) of the jobs between Capacity
and machine learning based algorithms: The amount of the saving in the the algorithm selects slow tasks and launches backup tasks
runtime of the jobs increases as the number of jobs increase. accordingly on a faster node. In our approach, we try to
D. Comparison on resource usage understand the task compatibility on a node before a task is
scheduled to avoid ‘slow task’ condition for that particular
We present the effect of our scheduling algorithms on cpu
task on a node. And after a task is scheduled on a particular
usage of the TaskTracker. We monitored a random Task-
node, we try to monitor that node for overload condition
Tracker for its resource requirement. Figures 6 and 7 show
and re-train the classifier accordingly to enhance decision
a snapshot of cpu requirement on a particular TaskTracker
making in future.
for a certain random period of time. From Figures 6 and 7,
we can see that the cpu requirement for the tasks running In [13], the authors have shown how MapReduce frame-
on the TaskTracker with Capacity scheduler reaches up to work can be leveraged to run heterogeneous sets of work-
250% (resource requirement but not the resource usage). loads, including accelerated and non-accelerated applica-
This situation has occurred due to un-awareness of the jobs tions, on top of heterogeneous clusters, composed of regular
in Capacity scheduler. On the contrary our algorithms try nodes and accelerator-enabled systems. The authors propose
not to schedule similar tasks on a TaskTracker and hence ‘adaptive scheduler’ which provides dynamic resource al-
not overload the TaskTracker, except for minor infrequent location across jobs, hardware affinity when possible, and
surges. We also observe that the overall utilization of the would even be able to spread jobs’ tasks across accelerated
node is increased. Similar effect of our algorithms on other and non-accelerated nodes in order to meet performance
resources such as memory, disk and network have been goals in extreme conditions.
observed. In [4], the authors presented a utility based Job Admis-
sion algorithm for a MapReduce framework exposed as a
IV. R ELATED W ORK Software as a Service. The proposed Admission Control
In [12], the authors discuss a self-adaptive MapReduce algorithm accepts the jobs, based on the amount of utility
scheduling algorithm which tries to classify the nodes to they generate to the MapReduce service provider. A machine

728
learning based algorithm is proposed which incorporates a [2] Nitesh Maheshwari, Radheshyam Nanduri, and Vasudeva
Naive-Bayes classifier with MapReduce related features like Varma. Dynamic energy efficient data placement and cluster
‘used map and reduce slots’, ‘map/reduce time average’ etc. reconfiguration algorithm for mapreduce framework. Future
Generation Computer Systems, 28(1):119 – 127, 2012.
In [14], the authors present a design of an agile data
center with integrated server and storage virtualization along [3] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified
with the implementation of an end-to-end management layer. data processing on large clusters. Commun. ACM, 51(1):107–
They also propose a novel VectorDot scheme (similar to our 113, 2008.
vector model) to address the complexity introduced by the
[4] Jaideep Dhok, Nitesh Maheshwari, and Vasudeva Varma.
data center topology and the multidimensional nature of the Learning based opportunistic admission control algorithm for
loads on resources. mapreduce as a service. In ISEC ’10: Proceedings of the
In [15], the authors present an approach known as ‘RS 3rd India software engineering conference, pages 153–160.
Maximizer’ to maximize the utilization of each resource set ACM, 2010.
which can optimize the performance of the Hadoop job at
[5] JobTracker Architecture. http://hadoop.apache.org/common/
hand, and also obtain potential energy savings by identifying docs/current/mapred tutorial.html.
the unused resources.
[6] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern
V. C ONCLUSION AND F UTURE W ORK
Classification (2nd Edition). Wiley-Interscience, 2 edition,
In this paper, we presented scheduling algorithms that November 2000.
try to avoid race condition for resources on a node of the
cluster. We have discussed the importance of having the [7] Geoffrey Holmes Bernhard Pfahringer Peter Reutemann Ian
H. Witten Mark Hall, Eibe Frank. The weka data mining
information regarding every task and node of the cluster. software. SIGKDD Explorations, 11(1), 2009.
In this context, we presented Task Selection algorithm and
Task Assignment algorithm which select the task that is best [8] ’Atop’ utility. http://www.atoptool.nl/.
suitable on a particular node. We have successfully achieved
our aim to avoid overloading any node on the cluster, utilize [9] Hadoop Distributed File System. http://hadoop.apache.org/
common/docs/current/hdfs design.html.
maximum resources on a particular node thereby decrease
race condition for resources and overall runtime of the jobs. [10] Fair Scheduler. http://hadoop.apache.org/common/docs/r0.20.
Since our algorithm mainly focuses on Task Assignment, 2/fair scheduler.html.
it can also be plugged-in in conjunction to a Fair or Capacity
[11] Capacity Scheduler. http://hadoop.apache.org/common/docs/
scheduler. This would add Fair and Capacity schedulers’ job
r0.20.2/capacity scheduler.html.
selection features into our algorithm. Though the proposed
algorithms are designed specifically for MapReduce Frame- [12] Quan Chen, Daqiang Zhang, Minyi Guo, Qianni Deng, and
work, they can very well be implemented in any distributed Song Guo. Samr: A self-adaptive mapreduce scheduling
environment. For example, we can incorporate our resource- algorithm in heterogeneous environment. In Computer and
Information Technology (CIT), 2010 IEEE 10th International
aware algorithms to effectively schedule virtual machines on
Conference on, 29 2010.
a given number of physical machines.
Future work of our research include tuning of MapRe- [13] J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres, and
duce framework with different configuration parameters for E. Ayguade and. Performance management of accelerated
finding best runtime of the jobs through machine learning mapreduce workloads in heterogeneous clusters. In Parallel
Processing (ICPP), 2010 39th International Conference on,
techniques. We plan to propose scale up/down algorithms
pages 653 –662, 2010.
with our resource-aware scheduling to switch on/off the
virtual machines/nodes based on the resource usage of the [14] Aameek Singh, Madhukar Korupolu, and Dushmanta Mo-
cluster to save energy. hapatra. Server-storage virtualization: integration and load
balancing in data centers. In Proceedings of the 2008
ACKNOWLEDGMENT ACM/IEEE conference on Supercomputing, SC ’08, pages
53:1–53:12, Piscataway, NJ, USA, 2008. IEEE Press.
The authors would like to thank students and faculty
of Search and Information Extraction Lab (SIEL) for their [15] Karthik Kambatla, Abhinav Pathak, and Himabindu Pucha.
support and constant feedback throughout our work. Towards optimizing hadoop provisioning in the cloud. In
Proceedings of the 2009 conference on Hot topics in cloud
R EFERENCES computing, HotCloud’09, pages 22–22, Berkeley, CA, USA,
2009. USENIX Association.
[1] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James
Broberg, and Ivona Brandic. Cloud computing and emerging
it platforms: Vision, hype, and reality for delivering comput-
ing as the 5th utility. Future Generation Computer Systems,
25(6):599 – 616, 2009.

729

You might also like