0% found this document useful (0 votes)
19 views10 pages

Hadoop

YARN is a resource management layer for Hadoop that addresses scaling challenges and supports multi-tenant environments by efficiently managing resources for diverse applications. It consists of key components including the Resource Manager, Node Manager, and Application Master, which work together to allocate and monitor resources through containers. YARN enhances Hadoop's capabilities by integrating with tools like Hive, Pig, and Spark, while also providing high availability and fault tolerance.

Uploaded by

Jay Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views10 pages

Hadoop

YARN is a resource management layer for Hadoop that addresses scaling challenges and supports multi-tenant environments by efficiently managing resources for diverse applications. It consists of key components including the Resource Manager, Node Manager, and Application Master, which work together to allocate and monitor resources through containers. YARN enhances Hadoop's capabilities by integrating with tools like Hive, Pig, and Spark, while also providing high availability and fault tolerance.

Uploaded by

Jay Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

YARN:

The Resource
Management
Layer of Hadoop
Understanding the Need for YARN
Scaling Challenges Multi-Tenant Environments

Traditional Hadoop 1.0 was designed for batch As organizations adopted Hadoop for diverse
processing, lacking flexibility to handle real-time and applications, the need for a resource management
interactive workloads. Resource utilization was system capable of supporting multiple users and
inefficient, leading to limitations in scalability. frameworks became crucial.
Key Components of YARN Architecture
1 Resource Manager 2 Node Manager 3 Application Master
Acts as the central Runs on each data node, Represents an application,
coordinator, responsible for responsible for managing responsible for negotiating
managing cluster resources, resources on the node, resources from the Resource
accepting applications, starting and stopping Manager, managing the
negotiating resource containers, and reporting execution of application tasks,
allocation, and monitoring status to the Resource and monitoring their progress.
application progress. Manager.
YARN Resource Model and Scheduling

1 Resource Allocation

2 Resource Negotiation

3 Resource Scheduling

4 Resource Monitoring

YARN defines a resource model based on containers, which are isolated environments for application tasks. The
Resource Manager uses a scheduling policy to allocate resources to different applications based on their needs and
priorities. The scheduling policy can be customized for various workloads.
The Application Master and Container
Concept
Application Master Container

Each application has a dedicated Application Master Containers provide a lightweight and isolated
that acts as a central point of control. It negotiates environment for application tasks. They encapsulate
resources from the Resource Manager, monitors the resources like CPU, memory, and disk space, ensuring
execution of tasks, and handles application-specific that tasks are executed independently without
logic. interfering with other applications.
YARN Container Scheduling
and Allocation

Queue Management Capacity Scheduling


YARN organizes applications into The Capacity Scheduler allows
queues, enabling prioritization and fair administrators to define capacities for
resource allocation based on user or different queues, ensuring that each
application needs. queue receives a certain percentage of
cluster resources.

Fair Scheduler
The Fair Scheduler aims to provide fair
resource allocation among applications,
ensuring that no application is starved
for resources and that everyone gets a
chance to run.
YARN High Availability and Fault Tolerance

1 High Availability

2 Resource Manager Redundancy

3 Node Manager Failover

4 Application Recovery

YARN incorporates mechanisms for high availability and fault tolerance to ensure continuous operation even in the
event of failures. Multiple Resource Managers are deployed, with a standby Resource Manager taking over if the
primary fails. Node Managers also provide redundancy and failover capabilities, allowing the cluster to handle node
failures seamlessly.
Integrating YARN with
Other Hadoop Ecosystem
Tools
Hive Pig
YARN enables Hive to run its Pig leverages YARN for efficient
queries in a distributed fashion, resource allocation and task
leveraging the power of the execution, allowing users to
Hadoop cluster and the resource process large datasets using Pig
management capabilities of Latin scripting.
YARN.

Spark
Spark, known for its fast in-memory processing, benefits from YARN's
resource management, allowing it to efficiently allocate and manage
resources for its distributed operations.
Optimizing YARN
Performance and Best
Practices
1 2
Resource Allocation Scheduling Policies
Fine-tune container sizes and resource Experiment with different schedulers
quotas for different applications to and their settings to find the best
optimize performance based on balance between fairness, efficiency,
workload requirements. and priority.

3
Monitoring and Logging
Implement comprehensive monitoring
and logging to identify potential
bottlenecks, resource contention, and
application performance issues.
Conclusion
YARN has transformed Hadoop by providing a flexible and efficient
resource management layer. It enables a diverse set of
applications to run seamlessly on the Hadoop cluster, facilitating
data processing, analytics, and machine learning tasks with
improved performance and scalability.

You might also like