MapReduce and Yarn
MapReduce and Yarn
Weather sensors collect data every hour at many locations across the globe and gather a large
volume of log data, which is a good candidate for analysis with MapReduce because it
involves processing all the data, and the data is semi-structured and record-oriented.
Data Format:
The data used in the example is from the National Climatic Data Center, or NCDC.
There is a directory for each year from 1901 to 2001, each containing a gzipped file for each
weather station with its readings for that year.
There are tens of thousands of weather stations, so the whole dataset is made up of a large
number of relatively small files.
Format of a National Climatic Data Center record
0057
332130 # USAF weather station identifier
99999 # WBAN weather station identifier
19500101 # observation date
0300 # observation time
4
+51317 # latitude (degrees x 1000)
+028783 # longitude (degrees x 1000)
FM-12
+0171 # elevation (meters)
99999
V020
320 # wind direction (degrees)
1 # quality code
N
0072
1
00450 # sky ceiling height (meters)
1 # quality code
C
N
010000 # visibility distance (meters)
1 # quality code
N
9
-0128 # air temperature (degrees Celsius x 10)
1 # quality code
-0139 # dew point temperature (degrees Celsius x 10)
1 # quality code
10268 # atmospheric pressure (hectopascals x 10)
1 # quality code
It’s generally easier and more efficient to process a smaller number of relatively large files, so
the data is preprocessed so that each year’s readings are concatenated into a single file.
What’s the highest recorded global temperature for each year in the dataset?
Each phase has key-value pairs as input and output, the types of which may be chosen by the
programmer.
Pull out the year and the air temperature, because these are the only fields useful for the query.
In this case, the map function is just a data preparation phase, setting up the data in such a
way that the reduce function can do its work on it: finding the maximum temperature for each
year.
The map function is also a good place to drop bad records: filter out temperatures that are
missing, suspect, or erroneous.
To visualize the way the map works, consider the following sample lines of input data
0067011990999991950051507004...9999999N9+00001+99999999999...
0043011990999991950051512004...9999999N9+00221+99999999999...
0043011990999991950051518004...9999999N9-00111+99999999999...
0043012650999991949032412004...0500001N9+01111+99999999999...
0043012650999991949032418004...0500001N9+00781+99999999999...
These lines are presented to the map function as the key-value pairs:
(0, 0067011990999991950051507004...9999999N9+00001+99999999999...)
(106, 0043011990999991950051512004...9999999N9+00221+99999999999...)
(212, 0043011990999991950051518004...9999999N9-00111+99999999999...)
(318, 0043012650999991949032412004...0500001N9+01111+99999999999...)
(424, 0043012650999991949032418004...0500001N9+00781+99999999999...)
The keys are the line offsets within the file, which is ignored in map function.
The map function merely extracts the year and the air temperature and emits them as its output
(1950, 0)
(1950, 22)
(1950, −11)
(1949, 111)
(1949, 78)
• The output from the map function is processed by the MapReduce framework before being sent to
the reduce function.
• The reduce function iterates through the list and pick up the maximum reading: ( The final Output)
(1949, 111)
(1950, 22)
}
public static void main(String[] args) throws Exception
{
/*[*/@SuppressWarnings("deprecation")
Job job = new Job();
job.setJarByClass(NewMaxTemperature.class);/*]*/
job.setMapperClass(NewMaxTemperatureMapper.class);
job.setReducerClass(NewMaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
WORD COUNT
public class WordCountMR
{
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException
{
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens())
{
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>
{
private IntWritable result = new IntWritable();
// BasicConfigurator.configure();
YARN provides APIs for requesting and working with cluster resources, but these APIs are not typically
used directly by user code.
Users write to higher-level APIs provided by distributed computing frameworks (MapReduce, Spark, and so
on), which themselves are built on YARN and hide the resource management details from the user.
Pig, Hive, and Crunch are all examples of processing frameworks that run on MapReduce, Spark, or Tez (or
on all three), and don’t interact with YARN directly.
YARN Applications
Anatomy of a YARN Application Run
YARN provides its core services via two types of long-running daemon:
A resource manager (one per cluster) to manage the use of resources across the cluster, and
Node managers running on all the nodes in the cluster to launch and monitor containers.
A container executes an application-specific process with a constrained set of resources (memory, CPU, and
so on).
Resource Requests
An application can make all of its requests up front, or it can take a more dynamic approach
whereby it requests more resources dynamically to meet the changing needs of the application.
Spark takes the first approach, starting a fixed number of executors on the cluster.
MapReduce has two phases: the map task containers are requested up front, but the reduce task
containers are not started until later.
Also, if any tasks fail, additional containers will be requested so the failed tasks can be rerun.
Application Lifespan
Categorizing applications in terms of how they map to the jobs that users run., we have –
The simplest case as one application per user job, which is the approach that MapReduce
takes.
The second model is to run one application per workflow or user session of (possibly
unrelated) jobs.
MapReduce 1 YARN
Slot Container
MapReduce 1 VS (YARN)
There are two types of daemon that control the job execution process:
A jobtracker
Coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers.
( Done by Resource manager in YARN)
Scalability
YARN can run on larger clusters than MapReduce 1.
MapReduce 1 hits scalability bottlenecks in the region of 4,000 nodes and 40,000 tasks.
YARN is designed to scale up to 10,000 nodes and 100,000 tasks.
Availability
With the jobtracker’s responsibilities split between the resource manager and application master in
YARN, making the service highly available is much simpler.
Provide High Availabilty for the resource manager, then for YARN applications (on a per-application
basis).
Utilization
In YARN, a node manager manages a pool of resources, rather than a fixed number of designated slots.
Resources in YARN are fine grained, so an application can make a request for what it needs, rather than
for an indivisible slot, which may be too big (which is wasteful of resources) or too small (which may
cause a failure) for the particular task.
Multitenancy
In some ways, the biggest benefit of YARN is that it opens up Hadoop to other types of distributed
application beyond MapReduce. MapReduce is just one YARN application among many.
MAP REDUCE
Anatomy of a MapReduce Job Run
Run a MapReduce job with a single method call: submit() on a Job object.
Give a call to waitForCompletion(), which submits the job if it hasn’t been submitted already, then waits
for it to finish.
There are five independent entities:
• The YARN resource manager, which coordinates the allocation of compute resources on the
cluster.
• The YARN node managers, which launch and monitor the compute containers on machines in
the cluster.
• The MapReduce application master, which coordinates the tasks running the Map‐Reduce job.
The application master and the MapReduce tasks run in containers that are scheduled
by the resource manager and managed by the node managers.
• The distributed filesystem , HDFS, which is used for sharing job files between the other entities.
Job Submission
The submit() method on Job creates an internal JobSubmitter instance and calls
submitJobInternal() on it.
Having submitted the job, waitForCompletion() polls the job’s progress once per second
When the job completes successfully, the job counters are displayed.
Otherwise, the error that caused the job to fail is logged to the console.
• Copies the resources needed to run the job, including the job JAR file, the configuration
file, and the computed input splits, to the shared filesystem in a directory named after the job
ID.
When the resource manager receives a call to its submitApplication() method, it hands off
the request to the YARN scheduler.
The scheduler allocates a container, and the resource manager then launches the application
master’s process there, under the node manager’s management.
The application master for MapReduce jobs is a Java application whose main class is
MRAppMaster.
It initializes the job by creating a number of bookkeeping objects to keep track of the job’s
progress.
Next, it retrieves the input splits computed in the client from the shared filesystem.
It then creates a map task object for each split, as well as a number of reduce task objects
determined by the mapreduce.job.reduces property.
Task Assignment
The application master requests containers for all the map and reduce tasks in the job from the
resource manager.
Requests for map tasks are made first and with a higher priority than those for reduce tasks.
Requests for reduce tasks are not made until 5% of map tasks have completed.
Task Execution
The application master starts the container by contacting the node manager.
Before it can run the task, it localizes the resources that the task needs, including the job
configuration and JAR file, and any files from the distributed cache.
The YarnChild runs in a dedicated JVM, so that any bugs in the user-defined map and reduce
functions (or even in YarnChild) don’t affect the node manager.
Progress and Status Updates
A job and each of its tasks have a status, which includes state of the job or task (e.g., running,
successfully completed, failed), the progress of maps and reduces, the values of the job’s
counters, and a status message or description.
Job Completion
When the application master receives a notification that the last task for a job is complete it
changes the status for the job to “successful.”
When the Job polls for status, it prints a message about job completion and returns from the
waitForCompletion() method.
Job statistics and counters are printed to the console at this point.
On job completion, the application master and the task containers clean up their working state
(so intermediate output is deleted).
One of the major benefits of using Hadoop is its ability to handle such failures and allow your
job to complete successfully.
Sudden exit of the task JVM—perhaps there is a JVM bug that causes the JVM to exit.
In this case, the node manager notices that the process has exited and informs the
application master so it can mark the attempt as failed.
Hanging tasks - The application master notices that it hasn’t received a progress update for a
while and proceeds to mark the task as failed.
The task JVM process will be killed automatically after this period
Application Master Failure
YARN imposes a limit for the maximum number of attempts for any YARN application master
running on the cluster
default value is 2
The limit is set by yarn.resourcemanager.am.max-attempts
Will stop sending heartbeats to the resource manager (or send them very infrequently).
The resource manager will notice a failed node manager if it has stopped sending heartbeats
for 10 minutes.
The failed node is then removed from its pool of nodes to schedule containers on.
Any task or application master running on the failed node manager will be recovered.
Node managers may be blacklisted if the number of failures for the application is high, even if
the node manager itself has not failed.
The user may set the threshold with the mapreduce.job.maxtaskfailures.per.tracker job
property.
Resource Manager Failure
Failure of the resource manager is serious, because without it, neither jobs nor task containers
can be launched.
Information about all the running applications is stored in a highly available state store (backed
by ZooKeeper or HDFS).
When the new resource manager starts, it reads the application information from the state
store, then restarts the application masters for all the applications.
The transition of a resource manager from standby to active is handled by a failover controller(
which by default uses Zookeeper leader election).
Shuffle and Sort
MapReduce makes the guarantee that the input to every reducer is sorted by key.
The process by which the system performs the sort—and transfers the map outputs to the
reducers as inputs—is known as the shuffle.
Mapper for Matrix A (k, v)=((i, k), (A, j, Aij)) for all k
Mapper for Matrix B (k, v)=((i, k), (B, j, Bjk)) for all i
Computing the mapper for Matrix A:
We can observe from Mapper computation that 4 pairs are common (1, 1), (1, 2), (2, 1) and (2, 2)
Make a list separate for Matrix A & B with adjoining values taken from Mapper step above: