04 MapRed 6 JobExecutionOnYarn
04 MapRed 6 JobExecutionOnYarn
MapReduce on YARN
Job Execution
Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/
Also see the customized Hadoop training courses (onsite or at public venues) – http://courses.coreservlets.com/hadoop-training.html
YARN
• Yet Another Resource Negotiator (YARN)
• Responsible for
– Cluster Resource Management
– Scheduling
• Various applications can run on YARN
– MapReduce is just one choice
– http://wiki.apache.org/hadoop/PoweredByYarn
• Also referred to as MapReduce2.0, NextGen
MapReduce
– Some of these names are deceiving as YARN doesn’t
have to be tied to MapReduce
5
YARN vs. Old MapReduce
• Prior to YARN Hadoop had JobTracker and
TaskTracker daemons
– JobTracker is responsible for handling resources and
tasks’ progress monitoring/management
• Dealing with failed tasks
• Task Bookkeeping
• JobTracker based approach had drawbacks
– Scalability Bottleneck – 4,000+ nodes
– Cluster Resource sharing and allocation flexibility
• Slot based approach (ex. 10 slots per machine no matter
how small or big those tasks are)
• In 2010 Yahoo! started designing next
generation MapReduce => YARN
6
12 execute
11
MapReduce Job Submission
• Use org.apache.hadoop.mapreduce.Job
class to configure the job
• Submit the job to the cluster and wait for it
to finish.
– job.waitForCompletion(true)
• The YARN protocol is activated when
mapreduce.framework.name property in
mapred-site.xml is set to yarn
• Client code in client JVM
12
Task Assignment
• Only applicable to non-Uber Jobs
• MRAppMaster negotiates container for map and
reduce tasks from Resource Manager, request
carry:
– data locality information (hosts & racks) which was computed by
InputFormat and stored inside InputSplits
– Memory requirements for a task
• Scheduler on Resource Manager utilizes
provided information to make a decision on
where to place these tasks
– Attempts locality: Ideally placing tasks on the same nodes where
the data to process resides. Plan B is to place within the same
rack
19
Fine-Grained Memory Model for
Tasks
• In YARN, administrators and developers
have a lot of control over memory
management
– NodeManager
• Typically there is one NodeManager per machine
– Task Containers that run Map and Reduce tasks
• Multiple tasks can execute on a single NodeManager
– Scheduler
• Minimum and maximum allocations controls
– JVM Heap
• Allocate memory for your code
– Virtual Memory
• Prevent tasks from monopolizing machines
20
22
23
Memory Model: JVM Heap
• Recall that mapreduce.map.memory.mb and
mapreduce.reduce.memory.mb properties
set the limit for map and reduce containers
Container’s Memory Usage = JVM Heap Size + JVM Perm Gen +
Native Libraries + Memory used by spawned processes
Example: mapreduce.map.java.opts=-Xmx2G
24
Task Execution
• MRAppMaster requests Node Manager to
start container(s)
– Containers and Node Manager(s) have already been
chosen in the previous step
• For each task Node Manager(s) start
container – a java process with YarnChild as
the main class
• YarnChild is executed in the dedicated JVM
– Separate user code from long running Hadoop Daemons
• YarnChild copies required resource locally
– Configuration, jars, etc..
• YarnChild executes map or reduce tasks
27
MapReduce Task Execution
Components
2 Get new application
your 1 Run Job Job Resource
code object Submit application 4
Manager
Client JVM Management Node
Client Node Start MRAppMaster
container 5
Status Updates
• Tasks report status to MRAppMaster
– Maintain umbilical interface
– Poll every 3 seconds
• MRAppMaster accumulates and aggregates the
information to assemble current status of the job
– Determines if the job is done
• Client (Job object) polls MRAppMaster for status
updates
– Every second by default, Configure via property
mapreduce.client.progressmonitor.pollinterval
• Resource Manager Web UI displays all the
running YARN applications where each one is a
link to Web UI of Application Master
29
– In this case MRAppMaster Web UI
MapReduce Status Updates
Node Node
Poll Manager Manager
for status
MR
AppMaster YarnChild
Task JVM
HDFS MapTask or
Update status ReduceTask
Node X
30 Source: Tom White. Hadoop: The Definitive Guide. O'Reilly Media. 2012
Failures
• Failures can occur in
– Tasks
– Application Master – MRAppMaster
– Node Manager
– Resource Manager
31
Task Failures
• Most likely offender and easiest to handle
• Task’s exceptions and JVM crashes are
propagated to MRAppMaster
– Attempt (not a task) is marked as ‘failed’
• Hanging tasks will be noticed, killed
– Attempt is marked as failed
– Control via mapreduce.task.timeout property
• Task is considered to be failed after 4
attempts
– Set for map tasks via mapreduce.map.maxattempts
– Set for reduce tasks via mapreduce.reduce.maxattempts
32
33
Node Manager Failure
• Failed Node Manager will not send heartbeat
messages to Resource Manager
• Resource Manager will black list a Node
Manager that hasn’t reported within 10
minutes
– Configure via property:
• yarn.resourcemanager.nm.liveness-monitor.expiry-
interval-ms
– Usually there is no need to change this setting
• Tasks on a failed Node Manager are
recovered and placed on healthy Node
Managers
34
35
Resource Manager Failures
• The most serious failure = downtime
– Jobs or Tasks can not be launched
• Resource Manager was designed to
automatically recover
– Incomplete implementation at this point
– Saves state into persistent store by configuring
yarn.resourcemanager.store.class property
– The only stable option for now is in-memory store
• org.apache.hadoop.yarn.server.resourcemanager.recover
y.MemStore
– Zookeeper based implementation is coming
• You can track progress via
https://issues.apache.org/jira/browse/MAPREDUCE-4345
36
Job Scheduling
• By default FIFO scheduler is used
– First In First Out
– Supports basic priority model: VERY_LOW, LOW,
NORMAL, HIGH, and VERY_HIGH
– Two ways to specify priority
• mapreduce.job.priority property
• job.setPriority(JobPriority.HIGH)
• Specify scheduler via
yarn.resourcemanager.scheduler.class
property
– CapacityScheduler
– FairScheduler
37
Job Completion
• After MapReduce application completes (or fails)
– MRAppMaster and YARN Child JVMs are shut down
– Management and metrics information is sent from
MRAppMaster to the MapReduce History Server
• History server has a similar Web UI to YARN
Resource Manager and MRAppMaster
– By default runs on port 19888
• http://localhost:19888/jobhistory
– Resource Manager UI auto-magically proxies to the proper
location, MRAppMaster if an application is running and History
Server after application’s completion
• May get odd behavior (blank pages) if you access an application as
it’s moving its management from MRAppMaster to History Server
38
Wrap-Up
40
Questions?
More info:
http://www.coreservlets.com/hadoop-tutorial/ – Hadoop programming tutorial
http://courses.coreservlets.com/hadoop-training.html – Customized Hadoop training courses, at public venues or onsite at your organization
http://courses.coreservlets.com/Course-Materials/java.html – General Java programming tutorial
http://www.coreservlets.com/java-8-tutorial/ – Java 8 tutorial
http://www.coreservlets.com/JSF-Tutorial/jsf2/ – JSF 2.2 tutorial
http://www.coreservlets.com/JSF-Tutorial/primefaces/ – PrimeFaces tutorial
http://coreservlets.com/ – JSF 2, PrimeFaces, Java 7 or 8, Ajax, jQuery, Hadoop, RESTful Web Services, Android, HTML5, Spring, Hibernate, Servlets, JSP, GWT, and other Java EE training