0% found this document useful (0 votes)
21 views12 pages

Big Data Notes

The document is an assignment on Big Data Analytics focusing on Hadoop and YARN, detailing their components, functionalities, and benefits. It explains the Hadoop ecosystem, including HDFS, MapReduce, and Hive, highlighting Hive's features and limitations. The assignment aims to familiarize students with resource management in a Hadoop cluster using YARN.

Uploaded by

Pradeep Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views12 pages

Big Data Notes

The document is an assignment on Big Data Analytics focusing on Hadoop and YARN, detailing their components, functionalities, and benefits. It explains the Hadoop ecosystem, including HDFS, MapReduce, and Hive, highlighting Hive's features and limitations. The assignment aims to familiarize students with resource management in a Hadoop cluster using YARN.

Uploaded by

Pradeep Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

BIG DATA ANALYTICS

Assignment - 01
● Access a Hadoop cluster with YARN installed, get familiar
with Hadoop ecosystem components, and basic knowledge of
cluster resource management.

NAME:- PRADEEP KUMAR


NAME-SAMI AKHTAR
NAME-TANVI
NAME-MANVIR SINGH
NAME-SUDHA KUMARI
BRANCH:- B.TECH CSE(DS)
Page:-2

SUB POINTS
• Introduction.
• What is Hadoop?.
• Hadoop Ecosystem Components.
• What is YARN?.
• Components of YARN.
• How YARN Works.
• Benefits of YARN.
• Introduction.

• Introduction to Hadoop Cluster and Resource Management with


YARN presentation.
• we'll explore the fundamentals of Hadoop ecosystem components
and cluster resource management using YARN.
• What is Hadoop?

• Hadoop is an open-source framework for distributed storage


and processing of large datasets.
• It's designed to handle big data tasks across a cluster of
commodity hardware.
• It is also known as a backbone of BDA system.
• Hadoop used HDFS.
• Hadoop is written in Java and is not OLAP (online analytical
processing).
• Hadoop Ecosystem Components.

• Hadoop consists of various components that work together to enable


distributed computing:
1. Hadoop Distributed File System (HDFS): Stores data across multiple
machines in a Hadoop cluster.
2. Yet Another Resource Negotiator (YARN): Manages resources and
schedules tasks across the cluster.
3. MapReduce: A programming model for processing and generating large
datasets in parallel.
4. Hadoop Common: Provides libraries and utilities needed by other
Hadoop modules.
5. Hive, Pig, Spark, etc.: Higher-level data processing tools built on top of
Hadoop.
• What is YARN?.

• YARN (Yet Another Resource Negotiator) is a resource


management layer in Hadoop.
• It separates the resource management and job scheduling
functions from the MapReduce processing engine.
• YARN allows multiple data processing engines like
MapReduce, Spark, and others to run simultaneously on
the same cluster.
• Components of YARN.

• YARN has three main components:


1. ResourceManager (RM): Manages and allocates cluster resources.
2. NodeManager (NM): Manages resources on individual cluster
nodes.
3. ApplicationMaster (AM): Controls and manages the execution of a
specific application's tasks.
• How YARN Works.

• When a job is submitted to the cluster, the ResourceManager


allocates resources for running the job.
• NodeManagers on individual nodes monitor resource usage and
report back to the ResourceManager.
• ApplicationMasters negotiate resources with the
ResourceManager and coordinate task execution on specific
nodes.
• Benefits of YARN.

YARN provides several benefits:


• Efficient resource utilization: Allows multiple frameworks to share
cluster resources.
• Scalability: Can handle large clusters with thousands of nodes.
• Flexibility: Supports various data processing frameworks.
• Cost Effective: Hadoop is open source and uses commodity hardware
to store data so it really cost effective as compared to traditional
relational database management system.
What is HIVE
• Hive is a data warehouse system which is used to analyze structured data. It is built
on the top of Hadoop. It was developed by Facebook.
• Hive provides the functionality of reading, writing, and managing large datasets
residing in distributed storage. It runs SQL like queries called HQL (Hive query
language) which gets internally converted to MapReduce jobs.

Features of Hive
• Hive is fast and scalable.
• It provides SQL-like queries (i.e., HQL) that are implicitly transformed to MapReduce
or Spark jobs.
• It is capable of analyzing large datasets stored in HDFS.
• It allows different storage types such as plain text, RCFile, and HBase. It uses
indexing to accelerate queries.
• It can operate on compressed data stored in the Hadoop ecosystem.
• It supports user-defined functions (UDFs) where user can provide its functionality.
Limitations of Hive
• Hive is not capable of handling real-time data.
• It is not designed for online transaction processing.
• Hive queries contain high latency.
THANK YOU

You might also like