Apache Hadoop YARN
Apache Hadoop YARN
Apache Hadoop YARN
to Apache YARN
One of the keys to this is the lack of data motion i.e. move compute to data and do not
move data to the compute node via the network. Specifically, the MapReduce tasks can
be scheduled on the same physical nodes on which data is resident in HDFS, which
exposes the underlying storage layout across the cluster. This significantly reduces the
network I/O patterns and keeps most of the I/O on the local disk or within the same rack
– a core advantage.
This separation of concerns has significant benefits, particularly for the end-users – they
can completely focus on the application via the API and allow the combination of the
MapReduce Framework and the MapReduce System to deal with the ugly details such
as resource management, fault-tolerance, scheduling etc.
The current Apache Hadoop MapReduce System is composed of the JobTracker, which
is the master, and the per-node slaves called TaskTrackers.
The JobTracker is responsible for resource management (managing the worker nodes
i.e. TaskTrackers),tracking resource consumption/availability and also job life-cycle
management (scheduling individual tasks of the job, tracking progress, providing fault-
tolerance for tasks etc).
For a while, we have understood that the Apache Hadoop MapReduce framework
needed an overhaul. In particular, with regards to the JobTracker, we needed to
address several aspects regarding scalability, cluster utilization, ability for customers to
control upgrades to the stack i.e. customer agility and equally importantly, supporting
workloads other than MapReduce itself.
We’ve done running repairs over time, including recent support for JobTracker
availability and resiliency to HDFS issues (both of which are available in Hortonworks
Data Platform v1 i.e. HDP1) but lately they’ve come at an ever-increasing maintenance
cost and yet, did not address core issues such as support for non-MapReduce and
customer agility.
MapReduce is great for many applications, but not everything; other programming
models better serve requirements such as graph processing (Google Pregel / Apache
Giraph) and iterative modeling (MPI). When all the data in the enterprise is already
available in Hadoop HDFS having multiple paths for processing is critical.
Providing these within Hadoop enables organizations to see an increased return on the
Hadoop investments by lowering operational costs for administrators, reducing the need
to move data between Hadoop HDFS and other storage systems etc.
In the current system, JobTracker views the cluster as composed of nodes (managed
by individual TaskTrackers) with distinct map slots and reduce slots, which are
not fungible. Utilization issues occur because maps slots might be ‘full’ while reduce
slots are empty (and vice-versa). Fixing this was necessary to ensure the entire system
could be used to its maximum capacity for high utilization.
Reference
http://hortonworks.com/blog/introducing-apache-hadoop-yarn/
YARN is part of the next generation Hadoop cluster compute environment. It creates a
generic and flexible resource management framework to administer the compute
resources in a Hadoop cluster. The YARN application framework allows multiple
applications to negotiate resources for themselves and perform their application specific
computations on a shared cluster. Thus, resource allocation lies at the heart of YARN.
YARN ultimately opens up Hadoop to additional compute frameworks, like Tez, so that
an application can optimize compute for their specific requirements.
The YARN Resource Manager service is the central controlling authority for resource
management and makes allocation decisions. It exposes a Scheduler API that is
specifically designed to negotiate resources and not schedule tasks. Applications can
request resources at different layers of the cluster topology such as nodes, racks etc.
The scheduler determines how much and where to allocate based on resource
availability and the configured sharing policy.
Currently, there are two sharing policies – fair scheduling and capacity scheduling.
Thus, the API reflects the Resource Manager’s role as the resource allocator. This API
design is also crucial for Resource Manager scalability because it limits the complexity
of the operations to the size of the cluster and not the size of the tasks running on the
cluster.The actual task scheduling decisions are delegated to the application manager
that runs the application logic. It decides when, where and how many tasks to run within
We envision YARN to be the cluster operating system. It may be the case that this 2-
step approach is slower than a custom scheduling logic but we believe that such
problems can be alleviated by careful design and engineering. Having the custom
scheduling logic reside inside the application allows the application to be run on any
YARN cluster. This is important for creating a vibrant YARN application ecosystem (tez
is a good example of this) that can be easily deployed on any YARN cluster. Developing
YARN scheduling libraries will alleviate the developer effort needed to create application
specific schedulers and YARN-103 is a step in that direction.
The ResourceManager and per-node slave, the NodeManager (NM), form the new, and
generic, system for managing applications in a distributed manner.
The ResourceManager is the ultimate authority that arbitrates resources among all the
applications in the system. The per-application ApplicationMaster is, in effect,
a framework specific entity and is tasked with negotiating resources from the
ResourceManager and working with the NodeManager(s) to execute and monitor the
component tasks.
The NodeManager is the per-machine slave, which is responsible for launching the
applications’ containers, monitoring their resource usage (cpu, memory, disk, network)
and reporting the same to the ResourceManager.
One of the crucial implementation details for MapReduce within the new
YARN system that I’d like to point out is that we have reused the existing
MapReduce framework without any major surgery. This was very important to
ensure compatibility for existing MapReduce applications and users. More on this
later.
Reference
http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/
In YARN, the ResourceManager is, primarily, a pure scheduler. In essence, it’s strictly
limited to arbitrating available resources in the system among the competing
applications – a market maker if you will. It optimizes for cluster utilization (keep all
resources in use all the time) against various constraints such as capacity guarantees,
fairness, and SLAs. To allow for different policy constraints the ResourceManager has
a pluggable scheduler that allows for different algorithms such as capacity and fair
scheduling to be used as necessary.
ApplicationMaster
Many will draw parallels between YARN and the existing Hadoop MapReduce system
(MR1 in Apache Hadoop 1.x). However, the key difference is the new concept of
an ApplicationMaster.
• Scale: The Application Master provides much of the functionality of the traditional
ResourceManager so that the entire system can scale more dramatically. In
tests, we’ve already successfully simulated 10,000 node clusters composed of
modern hardware without significant issue. This is one of the key reasons that we
have chosen to design the ResourceManager as a pure scheduler i.e. it doesn’t
attempt to provide fault-tolerance for resources. We shifted that to become a
primary responsibility of the ApplicationMaster instance. Furthermore, since there
is an instance of an ApplicationMaster per application, the ApplicationMaster
itself isn’t a common bottleneck in the cluster.
• Open: Moving all application framework specific code into the ApplicationMaster
generalizes the system so that we can now support multiple frameworks such as
MapReduce, MPI and Graph Processing.
• Move all complexity (to the extent possible) to the ApplicationMaster while
providing sufficient functionality to allow application-framework authors sufficient
flexibility and power.
It’s useful to remember that, in reality, every application has its own instance of an
ApplicationMaster. However, it’s completely feasible to implement an ApplicationMaster
to manage a set of applications (e.g. ApplicationMaster for Pig or Hive to manage a set
of MapReduce jobs). Furthermore, this concept has been stretched to manage long-
running services which manage their own applications (e.g. launch HBase in YARN via
an hypothetical HBaseAppMaster).
Resource Model
YARN supports a very general resource model for applications. An application (via the
ApplicationMaster) can request resources with highly specific requirements such as:
In order to meet those goals, the central Scheduler (in the ResourceManager) has
extensive information about an application’s resource needs, which allows it to make
better scheduling decisions across all applications in the cluster. This leads us to
the ResourceRequest and the resulting Container.
Essentially an application can ask for specific resource requests via the
ApplicationMaster to satisfy its resource needs. The Scheduler responds to a resource
request by granting a container, which satisfies the requirements laid out by the
ApplicationMaster in the initial ResourceRequest.
Let’s walk through each component of the ResourceRequest to understand this better.
• priority is intra-application priority for this request (to stress, this isn’t across
multiple applications).
Essentially, the Container is the resource allocation, which is the successful result of
the ResourceManager granting a specific ResourceRequest. A Container
The ApplicationMaster has to take the Container and present it to the NodeManager
managing the host, on which the container was allocated, to use the resources for
launching its tasks. Of course, the Container allocation is verified, in the secure mode,
to ensure that ApplicationMaster(s) cannot fake allocations in the cluster.
YARN allows applications to launch any process and, unlike existing Hadoop
MapReduce in hadoop-1.x (aka MR1), it isn’t limited to Java applications alone.
The YARN Container launch specification API is platform agnostic and contains:
• Environment variables.
• Local resources necessary on the machine prior to launch, such as jars, shared-
objects, auxiliary data files etc.
• Security-related tokens.
This allows the ApplicationMaster to work with the NodeManager to launch containers
ranging from simple shell scripts to C/Java/Python processes on Unix/Windows to full-
fledged virtual machines (e.g. KVMs).
• Application submission.
Let’s walk through an application execution sequence (steps are illustrated in the
diagram):
6. The application code executing within the container then provides necessary
information (progress, status etc.) to its ApplicationMaster via an application-
specific protocol.
8. Once the application is complete, and all necessary work has been finished, the
ApplicationMaster deregisters with the ResourceManager and shuts down,
allowing its own container to be repurposed.
In our next post in this series we dive more into guts of the YARN system, particularly the
ResourceManager – stay tuned!
Reference
http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/
Conclusion
In YARN, the ResourceManager is primarily limited to scheduling i.e. only arbitrating
available resources in the system among the competing applications and not
concerning itself with per-application state management. Because of this clear
separation of responsibilities coupled with the modularity described above, and with the
powerful scheduler API discussed in the previous post, RM is able to address the most
important design requirements – scalability, support for alternate programming
paradigms.
Reference
http://hortonworks.com/blog/apache-hadoop-yarn-resourcemanager/
The NodeManager (NM) is YARN’s per-node agent, and takes care of the individual
compute nodes in a Hadoop cluster. This includes keeping up-to date with the
ResourceManager (RM), overseeing containers’ life-cycle management; monitoring
resource usage (memory, CPU) of individual containers, tracking node-health, log’s
management and auxiliary services which may be exploited by different YARN
applications.
1. NodeStatusUpdater
On startup, this component registers with the RM and sends information about the
resources available on the nodes. Subsequent NM-RM communication is to provide
updates on container statuses – new containers running on the node, completed
containers, etc.
In addition the RM may signal the NodeStatusUpdater to potentially kill already running
containers.
1. ContainerManager
Interacts with the underlying operating system to securely place files and directories
needed by containers and subsequently to launch and clean up processes
corresponding to containers in a secure manner.
1. NodeHealthCheckerService
Provides functionality of checking the health of the node by running a configured script
frequently. It also monitors the health of the disks specifically by creating temporary files
1. Security
A. ApplicationACLsManagerNM needs to gate the user facing APIs like
container-logs’ display on the web-UI to be accessible only to authorized
users. This component maintains the ACLs lists per application and
enforces them whenever such a request is received.
B. ContainerTokenSecretManager: verifies various incoming
requests to ensure that all the incoming operations are indeed
properly authorized by the RM.
2. WebServer
Exposes the list of applications, containers running on the node at a given point of time,
node-health related information and the logs produced by the containers.
1. Container Launch
Logs for all the containers belonging to a single Application and that ran on this NM are
aggregated and written out to a single (possibly compressed) log file at a configured
location in the FS. Users have access to these logs via YARN command line tools, the
web-UI or directly from the FS.
Conclusion
In YARN, the NodeManager is primarily limited to managing abstract containers i.e. only
processes corresponding to a container and not concerning itself with per-application
state management like MapReduce tasks. It also does away with the notion of named
slots like map and reduce slots. Because of this clear separation of responsibilities
coupled with the modular architecture described above, NM can scale much more easily
and its code is much more maintainable.
Reference
http://hortonworks.com/blog/apache-hadoop-yarn-nodemanager/