Hadoop YARN

The document provides an overview of Big Data technologies, focusing on MapReduce, YARN, and NoSQL databases. It explains the architecture and differences between Hadoop 1 and Hadoop 2, as well as the features and benefits of Apache Spark. Additionally, it discusses various types of NoSQL databases, their advantages and disadvantages, and applications in real-world scenarios.

Uploaded by

Arut Jothi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views20 pages

Hadoop YARN

Uploaded by

Arut Jothi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

UNIT-III

Dr.G.Arutjothi
Assistant Professor
MapReduce, YARN & NoSQL

Big Data Technologies and Databases

Introduction to MapReduce
• - A programming model for processing large
datasets
• - Uses parallel computing across clusters
• - Consists of Map and Reduce functions
Processing Data with Hadoop using
MapReduce
• - Data is split into chunks and processed in
parallel
• - Mappers perform initial processing
• - Reducers aggregate and output results
Introduction to YARN & Architecture
• - YARN: Yet Another Resource Negotiator
• - Separates resource management and job
scheduling
• - Components: Resource Manager, Node
Manager, Application Master
Managing Resources & Applications with
Hadoop YARN
• - YARN dynamically allocates resources to applications
• - Ensures efficient resource utilization
• - Supports multiple processing frameworks
Difference between Hadoop 1 and Hadoop 2
Sr. No. Key Hadoop 1 Hadoop 2
It has more components and
New As Hadoop 1 less components APIs as compare to Hadoop 1
1 Components and and APIs as compare to that of such as YARN API,YARN
API Hadoop 2. FRAMEWORK, and enhanced
Resource Manager.
It allows to work in
Hadoop 1 only supports MapReducer model as well as
2 Support
MapReduce processing model other distributed computing
models

cluster resource management

Map reducer in Hadoop 1 is
Resource YARN is used while processing
3 responsible for processing and
Management management is done using
cluster-resource management.
different processing models.

less scalable, nodes it is limited scalable up to 10000 nodes

4 Scalability
to 4000 nodes per cluster per cluster.
It can be used to run generic
5 Implementation Map task or a Reduce task only.
tasks.

Windows no support for Microsoft support for Microsoft

6
Support Windows provided by Apache. windows in Hadoop 2.
Big Data Technologies
• Big data technologies like Hadoop, Spark, and NoSQL are essential
for managing vast amounts of data generated by various sources.
• Hadoop is an open-source framework that allows for distributed
storage and processing of large data sets across clusters of
computers.
• Its capabilities include HDFS for storage, MapReduce for processing,
and Hive for data warehousing.
• Spark, on the other hand, offers an advanced analytics engine that
offers fast, in-memory processing, supporting various workloads.
• NoSQL databases, like MongoDB, Cassandra, and Couchbase, offer
flexibility and scalability for diverse data types.
• Understanding these technologies is crucial for making informed
decisions about data architecture and strategy, enhancing business
intelligence, operational efficiency, and innovation.
Exploring Apache Spark: Key Features and
Performance Benefits
 Apache Spark is a powerful big data technology with in-memory
computing, allowing it to process large datasets faster than
traditional frameworks like Hadoop MapReduce.
 It offers a unified framework that can handle various workloads,
simplifying architecture and reducing overhead.
 Spark's programming language support allows developers to write
applications in Java, Scala, Python, or R, making it accessible to a
wider audience.
 Its rich ecosystem includes integrations with Hadoop, Apache Hive,
and NoSQL databases, as well as libraries for machine learning,
graph processing, and SQL-based queries.
 Spark's horizontal scalability and fault tolerance make it an
attractive choice for mission-critical applications.
Big Data Technologies & NoSQL Databases

• - NoSQL: Non-relational databases designed

for Big Data
-NoSQL databases are designed to handle various data
models, offering flexibility and performance that traditional
SQL databases may struggle with.
• - Scalable and flexible data models
• - Used in distributed systems
Understanding NoSQL Databases: Types
and Use Cases
There are three types of NoSQL databases:
1. Document Stores,
2. Key-value Stores,
3. Column-family Stores, and
4. Graph Databases.

* Document Stores store data in documents, ideal for complex data

structures and varied formats.
* Key-value Stores store data as a collection of key-value pairs, ideal for
caching, session management, and rapid lookups.
* Column-family Stores store data in column-oriented ways, efficient for
analytical queries and large-scale data warehousing.
* Graph Databases represent and analyze relationships between
interconnected data points, excelling in social networks, fraud detection,
and recommendation engines.
Features & Types of NoSQL Databases
• - Types: Document-based, Key-Value, Column-
family, Graph databases
• - Schema-less structure
• - High availability and partition tolerance
Advantages & Disadvantages of NoSQL
• - Advantages: Scalable, flexible schema, fast
performance
• - Disadvantages: Eventual consistency, complex
querying, limited standardization
Applications of NoSQL Databases
• - Social media (Facebook, Twitter)
• - E-commerce (Amazon, eBay)
• - Real-time analytics
• - Internet of Things (IoT) applications

Saddle Finisher V2 SM
No ratings yet
Saddle Finisher V2 SM
146 pages
Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
WITSML API Documentación
No ratings yet
WITSML API Documentación
147 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
BigData Session1
No ratings yet
BigData Session1
14 pages
BIGDATA4
No ratings yet
BIGDATA4
28 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
The Big Data Technology Landscape
No ratings yet
The Big Data Technology Landscape
36 pages
Lec 6 - Big Data Storage Technologies II - NoSQL
No ratings yet
Lec 6 - Big Data Storage Technologies II - NoSQL
20 pages
Big Data Deals With Large Data Sets
No ratings yet
Big Data Deals With Large Data Sets
4 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
43 pages
4.big Data Platforms
No ratings yet
4.big Data Platforms
49 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Data Science
No ratings yet
Data Science
87 pages
Bigdata Intro
No ratings yet
Bigdata Intro
76 pages
MapR OptimizeEnterpriseArchit Hadoop and NoSQL
No ratings yet
MapR OptimizeEnterpriseArchit Hadoop and NoSQL
7 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Bba13 Notes BDF Unit 1
No ratings yet
Bba13 Notes BDF Unit 1
3 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
Big Data Technologi
No ratings yet
Big Data Technologi
36 pages
Big Data Technology
No ratings yet
Big Data Technology
9 pages
Biggdata
No ratings yet
Biggdata
24 pages
Big Data Analytics Presentation
No ratings yet
Big Data Analytics Presentation
30 pages
Ch6 Architectural Design v1
No ratings yet
Ch6 Architectural Design v1
26 pages
Hadoop
No ratings yet
Hadoop
61 pages
SPARK
No ratings yet
SPARK
47 pages
Big Data
No ratings yet
Big Data
27 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Tools in Data Analytics
No ratings yet
Tools in Data Analytics
17 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
Big Data
No ratings yet
Big Data
79 pages
BD Imp Ques 1
No ratings yet
BD Imp Ques 1
22 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Chap3 OverviewOfBigDataEcosystem
No ratings yet
Chap3 OverviewOfBigDataEcosystem
91 pages
Hadoop Big Data Unit 2
No ratings yet
Hadoop Big Data Unit 2
23 pages
Big Data Technologies
No ratings yet
Big Data Technologies
31 pages
Dhan Singh Big Data File - 4
No ratings yet
Dhan Singh Big Data File - 4
1 page
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
Unit 2 - Intro To Hadoop
No ratings yet
Unit 2 - Intro To Hadoop
51 pages
NoSQL DBs
No ratings yet
NoSQL DBs
46 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
48 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
9 pages
BIG DATA AND ANALYTICS Presentation
No ratings yet
BIG DATA AND ANALYTICS Presentation
31 pages
Cloud Comp Techno
No ratings yet
Cloud Comp Techno
5 pages
Big Data Infrastructure
No ratings yet
Big Data Infrastructure
12 pages
BDA Unit2 Notes
No ratings yet
BDA Unit2 Notes
23 pages
Apache Spark Engine
100% (1)
Apache Spark Engine
82 pages
Big Data Analysis
No ratings yet
Big Data Analysis
8 pages
Big Data
No ratings yet
Big Data
12 pages
Bda Super Imp
No ratings yet
Bda Super Imp
35 pages
Intr Oduction of Big Data
No ratings yet
Intr Oduction of Big Data
12 pages
HCI-Design Rules
No ratings yet
HCI-Design Rules
10 pages
HCI-Design Process
No ratings yet
HCI-Design Process
17 pages
HCI-Interaction, Ergonomics
No ratings yet
HCI-Interaction, Ergonomics
15 pages
HCI-Input and Output Devices
No ratings yet
HCI-Input and Output Devices
8 pages
MongoDB and Cassandra
No ratings yet
MongoDB and Cassandra
19 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
15 pages
Form 430 ECS Familiarisation Checklist
No ratings yet
Form 430 ECS Familiarisation Checklist
7 pages
Class - VII
No ratings yet
Class - VII
3 pages
Parker Hyd Motor
No ratings yet
Parker Hyd Motor
44 pages
PDF Ibm Spss by Example A Practical Guide To Statistical Data Analysis Second Edition Service Des Societes Secretes Ebook Full Chapter
100% (7)
PDF Ibm Spss by Example A Practical Guide To Statistical Data Analysis Second Edition Service Des Societes Secretes Ebook Full Chapter
53 pages
A First Latin Course PDF
100% (5)
A First Latin Course PDF
176 pages
Beyonce - If I Were A Boy (Conditionals)
100% (1)
Beyonce - If I Were A Boy (Conditionals)
2 pages
Ankitseth SAP Basis
No ratings yet
Ankitseth SAP Basis
2 pages
Jiva Profounded in Visistadvitha
No ratings yet
Jiva Profounded in Visistadvitha
185 pages
First Conditional Activity
No ratings yet
First Conditional Activity
7 pages
Lesson Planning 2024.pdf-1
No ratings yet
Lesson Planning 2024.pdf-1
16 pages
Islamic Studies Edexcel 1
No ratings yet
Islamic Studies Edexcel 1
14 pages
Actual Reading 14
No ratings yet
Actual Reading 14
103 pages
NS LogMessages
No ratings yet
NS LogMessages
54 pages
ControlLogix Controller Portfolio Customer Presentation
No ratings yet
ControlLogix Controller Portfolio Customer Presentation
22 pages
Lesson Plan: Class: Duration: Subject: Lesson No.: Lesson Title
No ratings yet
Lesson Plan: Class: Duration: Subject: Lesson No.: Lesson Title
26 pages
Todays FPSC Computer Operator Paper (29-09-2020)
No ratings yet
Todays FPSC Computer Operator Paper (29-09-2020)
4 pages
SAS Enterprise Miner Tutorial
No ratings yet
SAS Enterprise Miner Tutorial
2 pages
DB2 Database Backup and Restore Steps
No ratings yet
DB2 Database Backup and Restore Steps
3 pages
OT24 Jericho Usa
No ratings yet
OT24 Jericho Usa
15 pages
Wake Model
No ratings yet
Wake Model
48 pages
CSC213 Object Oriented Programming-Lab Manual-Sol
No ratings yet
CSC213 Object Oriented Programming-Lab Manual-Sol
83 pages
MTH 252 Section 4.5 Exercise 61: Justin Drawbert June 30, 2010
No ratings yet
MTH 252 Section 4.5 Exercise 61: Justin Drawbert June 30, 2010
2 pages
04 0862 02 MS 4RP AFP tcm143-736388
No ratings yet
04 0862 02 MS 4RP AFP tcm143-736388
10 pages
Types of Sorting Algorithms
No ratings yet
Types of Sorting Algorithms
4 pages
Active Integration Compatibility Matrix v6.7 2020-04-11 tcm54-76356
No ratings yet
Active Integration Compatibility Matrix v6.7 2020-04-11 tcm54-76356
8 pages
Subject: English Level: Grade 8 Class Size: 40 Students Duration: 1 Hour Lesson: Nouns Learning Competencies
No ratings yet
Subject: English Level: Grade 8 Class Size: 40 Students Duration: 1 Hour Lesson: Nouns Learning Competencies
4 pages
Manual Registrador de Datos Cr3000
No ratings yet
Manual Registrador de Datos Cr3000
546 pages
Contoh Format Skrip Role Play (F2F)
No ratings yet
Contoh Format Skrip Role Play (F2F)
7 pages

Hadoop YARN

Uploaded by

Hadoop YARN

Uploaded by

UNIT-III

Big Data Technologies and Databases

cluster resource management

less scalable, nodes it is limited scalable up to 10000 nodes

Windows no support for Microsoft support for Microsoft

• - NoSQL: Non-relational databases designed

* Document Stores store data in documents, ideal for complex data

You might also like