0% found this document useful (0 votes)

10 views22 pages

BDMA Part 3

Uploaded by

432Kriti Rani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views22 pages

BDMA Part 3

Uploaded by

432Kriti Rani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

1

Big Data Management & Analytics

PGDM Trimester III

Lecture by
Dr. Ruchi Garg
BIMTECH
Greater Noida
Layout
2/20

 Big Data analytics

 World of HADOOP
Layout
3/20
Big data Analytics
4/20

 Descriptive: What happened? History. Footfall in a

mall. Hindsight
 Diagnostic: Why did it happen? Identify the drivers
of change. Why less footfall in mall. Insight.
 Predictive: What might happen? Using AI tools.
Sales decrease by how much in mall. Foresight.
 Prescriptive: What need to be done? Offers.
 Cognitive: AI and analytical tools. Solution from
tools. Most critical. Example????
World of HADOOP
5/20

 Hadoop is an opensource software platform for

distributed storage and distributed processing of
very large data sets on computer clusters.
HADOOP
6/20

 Hadoop HDFS to store data across slave

machines

 Hadoop YARN for resource management in the

Hadoop cluster

 Hadoop MapReduce to process data in a

distributed fashion
HADOOP
7/20

 The Hadoop Distributed File System (HDFS) is

Hadoop’s storage layer. Housed on multiple
servers, data is divided into blocks based on file
size. These blocks are then randomly distributed
and stored across slave machines.

 HDFS in Hadoop Architecture divides large data

into different blocks. Replicated three times by
default, each block contains 128 MB of data.
HDFS
8/20
HDFS
9/20
HDFS
10/
20

 In this example, blocks A, B, C, and D are

replicated three times and placed on different
racks. If DataNode 7 crashes, we still have two
copies of block C data on DataNode 4 of Rack 1
and DataNode 9 of Rack 3.
YARN
11/
20

 YARN (Yet Another Resource Negotiator): YARN is

the resource manager which arbitrates all
available cluster resources. It also follows the
master/slave approach. YARN has a resource
manager (master) per cluster and then a node
manager (slaves) per node.
YARN
12/
20

 It keeps the meta data about which jobs are

running on which node and manages how much
memory and CPU is consumed and hence has a
holistic view of total CPU and RAM consumption of
the whole cluster.

 The Node Manager is the per-machine agent who

is responsible for monitoring their resource usage
(cpu, memory, disk, network) and reporting the
HDFS
13/
20
YARN
14/
20

 The elements of YARN include:

 ResourceManager (one per cluster)

 ApplicationMaster (one per application)
 NodeManagers (one per node)
YARN
15/
20
 ResourceManager: Resource Manager manages the resource
allocation in the cluster and is responsible for tracking how many
resources are available in the cluster and each node manager’s
contribution.
 ApplicationMaster: Application Master manages the resource
needs of individual applications. It connects with the node
manager to execute and monitor tasks.
 NodeManagers: tracks running jobs and sends signals (or
heartbeats) to the resource manager to relay the status of a node.
It also monitors each container’s resource utilization.
 Container: Container houses a collection of resources like RAM,
CPU, and network bandwidth.
YARN
16/
20
Steps to Running an application in YARN
17/
20

 Client submits an application to the

ResourceManager
 ResourceManager allocates a container
 ApplicationMaster contacts the related
NodeManager because it needs to use the
containers
 NodeManager launches the container
 Container executes the ApplicationMaster
Map Phase
18/
20

 Map Phase stores data in the form of blocks. Data

is read, processed and given a key-value pair in
this phase. It is responsible for running a
particular task on one or multiple splits or inputs.
Reduce Phase
19/
20

 The reduce Phase receives the key-value pair from the map
phase. The key-value pair is then aggregated into smaller
sets and an output is produced. Processes such as shuffling
and sorting occur in the reduce phase.

 The mapper function handles the input data and runs a

function on every input split (known as map tasks). There
can be one or multiple map tasks based on the size of the
file and the configuration setup. Data is then sorted,
shuffled, and moved to the reduce phase, where a reduce
function aggregates the data and provides the output.
Map Reduce
20/
20
References
21/
20

 https://
www.simplilearn.com/tutorials/hadoop-tutorial/hadoop-archit
ecture
 https://
towardsdatascience.com/the-world-of-hadoop-d1e
5f5eb98d
Thank
You

Restful Api
No ratings yet
Restful Api
69 pages
DSA in JAVA Syllabus
No ratings yet
DSA in JAVA Syllabus
15 pages
IT Skill-2
100% (1)
IT Skill-2
58 pages
4 PPT On YARN MapReduce 31 10 20
No ratings yet
4 PPT On YARN MapReduce 31 10 20
17 pages
Yarn and Its Failures
No ratings yet
Yarn and Its Failures
22 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
30 pages
Updated Unit-IV Reference PPT 08-02-2022
No ratings yet
Updated Unit-IV Reference PPT 08-02-2022
103 pages
Chapter - 6 - Hadoop
No ratings yet
Chapter - 6 - Hadoop
51 pages
UNIT-4 BIG DATA (NoSql)
No ratings yet
UNIT-4 BIG DATA (NoSql)
38 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Bda Unit 3
No ratings yet
Bda Unit 3
50 pages
Data Architecture Basics: An Illustrated Guide For Non-Technical Readers
100% (6)
Data Architecture Basics: An Illustrated Guide For Non-Technical Readers
31 pages
Module 1 - Introduction To Big Data
100% (1)
Module 1 - Introduction To Big Data
40 pages
Mod 5
No ratings yet
Mod 5
46 pages
Unit-2 1
No ratings yet
Unit-2 1
93 pages
ECS765P - W3 - Hadoop Principles and Components
No ratings yet
ECS765P - W3 - Hadoop Principles and Components
47 pages
Bda Unit 3 - Mam
No ratings yet
Bda Unit 3 - Mam
89 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
10 - Big Data Architecture and Tools
No ratings yet
10 - Big Data Architecture and Tools
31 pages
WMN Chapter 1
No ratings yet
WMN Chapter 1
23 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
Part2 HDFS
No ratings yet
Part2 HDFS
33 pages
Sericulture Ece
No ratings yet
Sericulture Ece
27 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
56 pages
Big Data and BDA
No ratings yet
Big Data and BDA
44 pages
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
No ratings yet
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
87 pages
CA2 CC, IOT-C, DWDM Anskey
No ratings yet
CA2 CC, IOT-C, DWDM Anskey
36 pages
Big Data QB
No ratings yet
Big Data QB
37 pages
SIWES Report (Corrected)
100% (1)
SIWES Report (Corrected)
23 pages
BMS 201 Introduction PP
No ratings yet
BMS 201 Introduction PP
37 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Control Systems Chapter General Catalogue 2023 - ECPEN23-500
No ratings yet
Control Systems Chapter General Catalogue 2023 - ECPEN23-500
60 pages
Manual Theta76 en
No ratings yet
Manual Theta76 en
154 pages
Big Data QB
No ratings yet
Big Data QB
24 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
Unit2 Bda
No ratings yet
Unit2 Bda
12 pages
Chapter3 HDFS MapReduce YARN
No ratings yet
Chapter3 HDFS MapReduce YARN
35 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
Yarn Tutorial
No ratings yet
Yarn Tutorial
14 pages
Big Data Notes
No ratings yet
Big Data Notes
12 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Stacks Notes
No ratings yet
Stacks Notes
21 pages
Mini Project 2 Semster
No ratings yet
Mini Project 2 Semster
35 pages
Unit 3
No ratings yet
Unit 3
18 pages
Unit - 4 Yarn
No ratings yet
Unit - 4 Yarn
20 pages
Bigdata and Hadoop - Unit III
No ratings yet
Bigdata and Hadoop - Unit III
24 pages
Industrial Training Report: E-Learning
No ratings yet
Industrial Training Report: E-Learning
53 pages
Hadoop Class 2 PDF
No ratings yet
Hadoop Class 2 PDF
18 pages
Manual - Cable Reels 1400 Series
No ratings yet
Manual - Cable Reels 1400 Series
26 pages
Bigdata Lecture 4
No ratings yet
Bigdata Lecture 4
23 pages
Cosc 111 (Unit 1)
No ratings yet
Cosc 111 (Unit 1)
62 pages
Afnan PPT (1) (Read-Only)
No ratings yet
Afnan PPT (1) (Read-Only)
13 pages
122 Penn ST LRev 613
No ratings yet
122 Penn ST LRev 613
33 pages
M2 Bigdata&Hadoop
No ratings yet
M2 Bigdata&Hadoop
27 pages
Custom Notes
No ratings yet
Custom Notes
10 pages
Big Data - Tomas Iglesias IV
No ratings yet
Big Data - Tomas Iglesias IV
37 pages
Sdcbdasparkweek1 1
No ratings yet
Sdcbdasparkweek1 1
9 pages
Download
No ratings yet
Download
7 pages
Data W - Bigdata8
No ratings yet
Data W - Bigdata8
105 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
Big Data Notes Unit-3
No ratings yet
Big Data Notes Unit-3
7 pages
Simple Presentation On Artificial Intelligence
No ratings yet
Simple Presentation On Artificial Intelligence
7 pages
Easa Erules XML Export Schema Description 1.0.0
No ratings yet
Easa Erules XML Export Schema Description 1.0.0
15 pages
Tutorial On Kalman Filter
No ratings yet
Tutorial On Kalman Filter
47 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
11 pages
CH 4 BDA
No ratings yet
CH 4 BDA
7 pages
Aoc 2217v
No ratings yet
Aoc 2217v
51 pages
Ade Assignment
No ratings yet
Ade Assignment
18 pages
Divvy Exercise R Script
No ratings yet
Divvy Exercise R Script
5 pages
Lec. 1 Computer Organization and Architecture (CPE343)
No ratings yet
Lec. 1 Computer Organization and Architecture (CPE343)
18 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
Framework For Processing Data in Hadoop - : Yarn and Mapreduce
No ratings yet
Framework For Processing Data in Hadoop - : Yarn and Mapreduce
31 pages
Adoop Cosystem: S W S A, T L at 68
No ratings yet
Adoop Cosystem: S W S A, T L at 68
22 pages
YARN
No ratings yet
YARN
5 pages
Unit 2 B)
No ratings yet
Unit 2 B)
16 pages
2024 04 22 10.35.37
No ratings yet
2024 04 22 10.35.37
2 pages
Apache Hadoop YARN: Unit 3 Chapter 2
No ratings yet
Apache Hadoop YARN: Unit 3 Chapter 2
9 pages
Apache Hadoop YARN
No ratings yet
Apache Hadoop YARN
24 pages
Web Design With HTML, CSS, Javascript and Jquery Set: Description
No ratings yet
Web Design With HTML, CSS, Javascript and Jquery Set: Description
2 pages
Big Data and Hadoop - Suzanne
No ratings yet
Big Data and Hadoop - Suzanne
5 pages
Big Data
No ratings yet
Big Data
16 pages
Year 3 Reasoning Test Set 2 Paper A
No ratings yet
Year 3 Reasoning Test Set 2 Paper A
8 pages
Apsmo: Olympiad
No ratings yet
Apsmo: Olympiad
4 pages
IFHO Optimization: Radio Network Optimization - Cairo Team
No ratings yet
IFHO Optimization: Radio Network Optimization - Cairo Team
14 pages
Reading: Form B - Extra Reading
No ratings yet
Reading: Form B - Extra Reading
3 pages
Hadoop
No ratings yet
Hadoop
7 pages
YARN Essentials - Sample Chapter
No ratings yet
YARN Essentials - Sample Chapter
12 pages
YARN (Yet Another Resource Negotiator) : Apache Hadoop in A Nutshell
No ratings yet
YARN (Yet Another Resource Negotiator) : Apache Hadoop in A Nutshell
2 pages
Apache Hadoop Yarn Architecture PDF
No ratings yet
Apache Hadoop Yarn Architecture PDF
3 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)

BDMA Part 3

Uploaded by

BDMA Part 3

Uploaded by

1

Big Data Management & Analytics

 Big Data analytics

 Descriptive: What happened? History. Footfall in a

 Hadoop is an opensource software platform for

 Hadoop HDFS to store data across slave

 Hadoop YARN for resource management in the

 Hadoop MapReduce to process data in a

 The Hadoop Distributed File System (HDFS) is

 HDFS in Hadoop Architecture divides large data

 In this example, blocks A, B, C, and D are

 YARN (Yet Another Resource Negotiator): YARN is

 It keeps the meta data about which jobs are

 The Node Manager is the per-machine agent who

 The elements of YARN include:

 ResourceManager (one per cluster)

 Client submits an application to the

 Map Phase stores data in the form of blocks. Data

 The mapper function handles the input data and runs a

You might also like