Mapreduce
Mapreduce
You can run a MapReduce job with a single line of code: JobClient.runJob(conf). It’s very short,
but it conceals a great deal of processing behind the scenes. This section uncovers the steps Hadoop
takes to run a job.
Client node has the JobClient running. JobClient is the part of Hadoop setup which is responsible
for the interaction with Hadoop. It is important that JobClient runs on the machine that access or
interact with Hadoop. JobClient is a java program which carries out the whole process of
interaction with hadoop. It interacts with JobTracker which is again a Java Program names
JobTracker. JobTracker internally communicate with multiple task trackers which is again a java
program named TaskTracker. JobTracker runs on different node and TaskTracker runs on many
nodes parallelly.
• The client, which submits the MapReduce job to the JobTracker and places its job in the
JobTracker queue. There are many checks are done here like if the output is already
present or not, if input file is exist or not. After these verifications JobTracker picks next
job from its queue and assign it to the TaskTracker.
• The jobtracker, which coordinates the job run. The jobtracker is a Java application whose
main class is JobTracker.
• The tasktrackers, which run the tasks that the job has been split into. Tasktrackers are
Java applications whose main class is TaskTracker. Single TaskTracker node has
multiple slots for running map tasks and reduce tasks. It consistently interact with
JobTracker about the free slots and according to which JobTracker queues the task to
the TaskTracker. All the TaskTrackers sends the report regularly to the JobTracker
which combine the all the reports. And JobTracker sends the completion report to the
Client.
• The distributed filesystem , which is used for sharing job files between the other entities.
Job Submission
• Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker) (step 2).
• Checks the output specification of the job. For example, if the output directory has not
been specified or it already exists, the job is not submitted and an error is thrown to the
MapReduce program.
• Computes the input splits for the job. If the splits cannot be computed, because the input
paths don’t exist, for example, then the job is not submitted and an error is thrown to
the MapReduce program.
• Copies the resources needed to run the job, including the job JAR file, the configuration
file, and the computed input splits, to the jobtracker’s filesystem in a directory named
after the job ID. The job JAR is copied with a high replication factor (controlled by the
mapred.submit.replication property, which defaults to 10) so that there are lots of copies
across the cluster for the tasktrackers to access when they run tasks for the job(step3)
• Tells the jobtracker that the job is ready for execution (by calling submitJob() on
JobTracker) (step 4).
Job Initialization
1. When the JobTracker receives a call to its submitJob() method, it puts it into an internal
queue from where the job scheduler will pick it up and initialize it.
2. Initialization involves creating an object to represent the job being run, which encapsulates
its tasks, and bookkeeping information to keep track of the tasks’ status and progress (step
5).
3. To create the list of tasks to run, the job scheduler first retrieves the input splits computed
by the JobClient from the shared filesystem (step 6). It then creates one map task for
each split.
4. The number of reduce tasks to create is determined by the mapred.reduce.tasks property in
the JobConf, which is set by the setNumReduce Tasks() method (by default it is 1 but it
can be higher value depending upon the size of cluster to get the advantage of true
parallelism in reduce phase as well), and the scheduler simply creates this number of reduce
tasks to be run. Tasks are given IDs at this point.
5. JobTrackers creates setup and cleanup jobs on the TaskTrackers that needs to be run before
and after map reduce tasks run on the TaskTracker node.
Task Assignment
1. At this step JobTracker should know which TaskTracker is busy and which one has free
slots.
2. Tasktrackers run a simple loop that periodically sends heartbeat method calls to the
jobtracker. Heartbeats tell the jobtracker that a tasktracker is alive, but they also double as
a channel for messages. As a part of the heartbeat, a tasktracker will indicate whether it is
ready to run a new task, and if it is, the jobtracker will allocate it a task, which it
communicates to the tasktracker using the heartbeat return value (step 7).
3. Before it can choose a task for the tasktracker, the jobtracker must choose a job to select
the task from. There are various scheduling algorithms, but the default one simply
maintains a priority list of jobs. Having chosen a job, the jobtracker now chooses a task for
the job.
4. Tasktrackers have a fixed number of slots for map tasks and for reduce tasks: for example,
a tasktracker may be able to run two map tasks and two reduce tasks simultaneously. (The
precise number depends on the number of cores and the amount of memory on the
tasktracker; see “Memory” ) The default scheduler fills empty map task slots before reduce
task slots, so if the tasktracker has at least one empty map task slot, the jobtracker will
select a map task; otherwise, it will select a reduce task.
5. To choose a reduce task, the jobtracker simply takes the next in its list of yet-to-be-run
reduce tasks, since there are no data locality considerations. For a map task, however, it
takes account of the tasktracker’s network location and picks a task whose input split is as
close as possible to the tasktracker. In the optimal case, the task is data-local, that is,
running on the same node that the split resides on. Alternatively, the task may be rack-
local: on the same rack, but not the same node, as the split. Some tasks are neither data-
local nor rack-local and retrieve their data from a different rack from the one theyare
running on. You can tell the proportion of each type of task by looking at a job’s counters.
Task Execution
1. Now that the tasktracker has been assigned a task, the next step is for it to run the task.
First, it localizes the job JAR by copying it from the shared filesystem (HDFS) to the
tasktracker’s filesystem. It also copies any files needed from the distributed cache by the
application to the local disk; see “Distributed Cache” (step 8). Second, it creates a local
working directory for the task, and un-jars the contents of the JAR into this directory. Third,
it creates an instance of TaskRunner to run the task.
2. TaskRunner launches a new Java Virtual Machine (step 9) to run each task in (step 10), so
that any bugs in the user-defined map and reduce functions don’t affect the tasktracker (by
causing it to crash or hang, for example). It is, however, possible to reuse the JVM between
tasks; see “Task JVM Reuse”
3. The child process communicates with its parent through the umbilical interface. This way
it informs the parent of the task’s progress every few seconds until the task is complete.
4. TaskTrackers sends regular updates back to the JobTracker about the statuses.
5. After the last reduce job has finished TaskTracker cleanup the intermediate data that was
created while running the tasks.
MapReduce jobs are long-running batch jobs, taking anything from minutes to hours to run.
Because this is a significant length of time, it’s important for the user to get feedback on how the
job is progressing. A job and each of its tasks have a status, which includes such things as the state
of the job or task (e.g., running, successfully completed, failed), the progress of maps and reduces,
the values of the job’s counters, and a status message or description (which may be set by user
code). These statuses change over the course of the job, so how do they get communicated back to
the client? When a task is running, it keeps track of its progress, that is, the proportion of the task
completed. For map tasks, this is the proportion of the input that has been processed.
For reduce tasks, it’s a little more complex, but the system can still estimate the proportion of the
reduce input processed. It does this by dividing the total progress into three parts, corresponding
to the three phases of the shuffle (see “Sort, shuffle and reduce” ). For example, if the task has run
the reducer on half its input, then the task’s progress is ⅚, since it has completed the copy and sort
phases (⅓ each) and is halfway through the reduce phase (⅙).