0% found this document useful (0 votes)

32 views

Module 2 Hadoop

1. The document discusses the architecture of Hadoop clusters including core components like HDFS, YARN, and MapReduce. 2. It describes the roles of master nodes like NameNode and JobTracker that coordinate slave nodes which store data and process tasks. 3. The workflow of how a file is stored in Hadoop involves the client writing blocks to data nodes, which are then replicated across the cluster according to the replication factor.

Uploaded by

additiladdha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Module 2 Hadoop

Uploaded by

additiladdha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Introduction

To Hadoop
B Y:

D R . R A S H M I L M A LG H A N
Common Types of Architecture-
Multiprocessor
1) Shared Memory (SM) : Common Central Memory – Shared by multiple processors

2) Shared Disk (SD) : Multiple Processors - Common Collection of Disks – Own Private
Memory

3) Shared Nothing (SN) : Neither memory nor Disk – Shared among multiple processors.
Parallel Computing vs Distributed Computing
Introduction:
Introduction:
•HDFS (Hadoop Distributed File System): This is a distributed file system that stores data in blocks
across the slave nodes. The master node runs a service called NameNode, which manages the file
system namespace, the metadata of the files and directories, and the mapping of blocks to slave nodes.
The slave nodes run a service called DataNode, which stores the actual data blocks and serves read
and write requests from the clients.
•YARN (Yet Another Resource Negotiator): This is a framework for resource management and job
scheduling in Hadoop. The master node runs a service called ResourceManager, which allocates
resources to different applications and monitors their progress. The slave nodes run a service
called NodeManager, which launches and monitors the tasks assigned by the ResourceManager.
•MapReduce: This is a programming model for parallel processing of large data sets. The master node
runs a service called JobTracker, which splits the input data into smaller chunks and assigns them to
the slave nodes. The slave nodes run a service called TaskTracker, which executes the map and
reduce tasks on the data chunks and reports the results back to the JobTracker.
Hadoop Architecture:

• Install hadoop: Default: 2

components will be
installed (HDFS &
YARN)
• Additionl programs will
be installe din hadoop
env,like MapReduce,
• Master Nodes: These are main servers . Spark,Hive etc.
• 3 – 6 maximum created in hadoop cluster. • Tools help to facilitates :
• Coordinate the work of slave nodes. data management,
• Services: Name node service on master node, DataNode: streaming etc
Running service on slave node
• Datanode: Service of HDFS
• NodeManager: Service of YARN
MasterNode (NameNode):
Ex: File name, format, directory etc
DataNode: Salve Node
Hadoop Cluster
1. The architecture of Hadoop Cluster

2. Core Components of Hadoop Cluster

3. Work-flow of How File is Stored in Hadoop

1. Hadoop Cluster • These clusters run on low cost commodity computers.
• Hadoop clusters are often referred to as "shared nothing" systems
because the only thing that is shared between nodes is the network
that connects them.
• Large Hadoop Clusters are arranged in several racks.
• Network traffic between different nodes in the same rack is much
more desirable than network traffic across the racks.
• Example: Yahoo's Hadoop cluster. They have more than 10,000
machines running Hadoop and nearly 1 petabyte of user data.
• A small Hadoop cluster includes a single master node and multiple
worker or slave node.
• As discussed earlier, the entire cluster contains two layers.
• One of the layer of MapReduce Layer and another is of HDFS
Layer.
• The master node consists of a JobTracker, TaskTracker,
NameNode and DataNode.
• A slave or worker node consists of a DataNode and TaskTracker.
• It is also possible that slave node or worker node is only data or
compute node.
Hadoop Cluster – 3 Components: Client, Master & Slave
Hadoop Cluster would consists of

110 – Maximum Racks

Rack - 40 slave machine

At the top of each rack there is a rack switch

Each slave machine(rack server in a rack) has cables coming out it from both
the ends
Cables are connected to rack switch at the top which means that top rack switch
will have around 80 ports
Global = 8 core switches

The rack switch has uplinks connected to core switches and hence connecting
all other racks with uniform bandwidth, forming the Cluster

In the cluster, you have few machines to act as Name node and as JobTracker.
They are referred as Masters.
Cluster : Core Components
1. Client
2. Masters: Name Node, Secondary Node & Job Tracker
2.1: Name Node:

• NameNode oversees the health of DataNode and

coordinates access to the data stored in DataNode.

• Name node keeps track of all the file system related

information such as:
• Which section of file is saved in which part of the
cluster
• Last access time for the files
• User permissions like which user have access to
the file
2.2 JobTracker:

JobTracker : Coordinates the parallel processing of data using MapReduce.

2.3 Secondary Node: ▪ The job of Secondary Node is to contact NameNode in a periodic
manner after certain time interval (by default 1 hour).

▪ NameNode which keeps all filesystem metadata in RAM has no

capability to process that metadata on to disk.

▪ If NameNode crashes, you lose everything in RAM itself and you

don't have any backup of filesystem.

▪ What secondary node does is it contacts NameNode in an hour

and pulls copy of metadata information out of NameNode.

▪ It shuffle and merge this information into clean file folder and
sent to back again to NameNode, while keeping a copy for itself.

▪ Hence Secondary Node is not the backup rather it does job of

housekeeping.

▪ In case of NameNode failure, saved metadata can rebuild it easily.

3: Slave
➢ Slave nodes are the majority of machines in Hadoop
Cluster and are responsible to :
➢ Store the data
➢ Process the computation

➢ Each slave runs both a DataNode and Task Tracker

daemon which communicates to their masters.

➢ The Task Tracker daemon is a slave to the Job Tracker

➢ DataNode daemon a slave to the NameNode

Loading File In Hadoop Cluster
Client knows that to which data nodes load the blocks?
1) Now NameNode comes into picture.
2) The NameNode used its Rack Awareness intelligence
to decide on which DataNode to provide.
3) For each of the data block (in this case Block-A,
Block-B and Block-C), Client contacts NameNode
and in response NameNode sends an ordered list of 3
DataNodes.
Block replication 1. Client write the data block directly to one DataNode.
2. DataNodes then replicate the block to other Data
nodes.
3. When 1 block gets written in all 3 DataNode then only
cycle repeats for next block.
4. In Hadoop Gen 1 there is only one NameNode.
5. In Hadoop Gen2 there is active passive model in
NameNode where one more node "Passive Node"
comes in picture.
6. The default setting for Hadoop is to have 3 copies of
each block in the cluster.
7. This setting can be configured with "dfs.replication"
parameter of hdfs-site.xml file.
8. Keep note that Client directly writes the block to the
DataNode without any intervention of NameNode in
this process.

KORVAL Controller Manual
100% (1)
KORVAL Controller Manual
3 pages
Yarn Ha Federation
No ratings yet
Yarn Ha Federation
64 pages
Module 2
No ratings yet
Module 2
17 pages
Hadoop
No ratings yet
Hadoop
31 pages
Hadoop Building Blocks
No ratings yet
Hadoop Building Blocks
30 pages
DOC-20250429-WA0002. (1)
No ratings yet
DOC-20250429-WA0002. (1)
66 pages
Hadoop Cluster - Architecture, Core Components
100% (1)
Hadoop Cluster - Architecture, Core Components
9 pages
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
No ratings yet
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
62 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
Introduction to Hadoop
No ratings yet
Introduction to Hadoop
56 pages
Unit III
No ratings yet
Unit III
86 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Unit-2_ch_1_updated
No ratings yet
Unit-2_ch_1_updated
22 pages
Unit 3
No ratings yet
Unit 3
25 pages
Unit-2
No ratings yet
Unit-2
18 pages
Unit 3 Bba
No ratings yet
Unit 3 Bba
11 pages
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
No ratings yet
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
19 pages
Hadoop 1
No ratings yet
Hadoop 1
75 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
5 pages
Hadoop Physical Organization
No ratings yet
Hadoop Physical Organization
7 pages
Hadoop Provides 2 Services
No ratings yet
Hadoop Provides 2 Services
3 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
Presentation: Hadoop Technology
No ratings yet
Presentation: Hadoop Technology
15 pages
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
No ratings yet
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
47 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
26 pages
Chapter_6_Case_Study_Hadoop
No ratings yet
Chapter_6_Case_Study_Hadoop
39 pages
CH 2
No ratings yet
CH 2
6 pages
Hadoop Presentaton
No ratings yet
Hadoop Presentaton
47 pages
CC Unit 5 Notes
No ratings yet
CC Unit 5 Notes
30 pages
Lecture Notes Hadoop
100% (1)
Lecture Notes Hadoop
11 pages
Big Data Unit-2 PPT part1
No ratings yet
Big Data Unit-2 PPT part1
76 pages
Bda A2
No ratings yet
Bda A2
17 pages
Hadoop, Hdfs, Yarn
No ratings yet
Hadoop, Hdfs, Yarn
8 pages
NYOUG Hadoop Presentaton
No ratings yet
NYOUG Hadoop Presentaton
47 pages
ECS765P_W3_Hadoop principles and components
No ratings yet
ECS765P_W3_Hadoop principles and components
47 pages
BDA_UNIT-IV
No ratings yet
BDA_UNIT-IV
37 pages
24 Interview Questions
No ratings yet
24 Interview Questions
7 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
8 pages
Unit 2
No ratings yet
Unit 2
56 pages
Jenny Blog
No ratings yet
Jenny Blog
12 pages
Unit 5 Print
No ratings yet
Unit 5 Print
32 pages
Unit III
No ratings yet
Unit III
9 pages
Hadoop 1 Converted
No ratings yet
Hadoop 1 Converted
26 pages
Wa0002.
No ratings yet
Wa0002.
32 pages
HADOOP FRAME WORK
No ratings yet
HADOOP FRAME WORK
38 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
Adobe Scan 05-Nov-2023
No ratings yet
Adobe Scan 05-Nov-2023
9 pages
Introduction to Hadoop- chapter-2
No ratings yet
Introduction to Hadoop- chapter-2
59 pages
bd sec b
No ratings yet
bd sec b
19 pages
Hadoop Interview Questions - Part 1
No ratings yet
Hadoop Interview Questions - Part 1
8 pages
BD Module 1 Final
No ratings yet
BD Module 1 Final
17 pages
CC Unit 5
No ratings yet
CC Unit 5
43 pages
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
No ratings yet
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
25 pages
Hadoop Platform & Services
No ratings yet
Hadoop Platform & Services
41 pages
UNIT 5-PLH
No ratings yet
UNIT 5-PLH
34 pages
Hadoop Week 2
No ratings yet
Hadoop Week 2
40 pages
Hdfs Architecture
No ratings yet
Hdfs Architecture
16 pages
5.apache Hadoop
No ratings yet
5.apache Hadoop
33 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Unit-3 Notes SE Part-II
No ratings yet
Unit-3 Notes SE Part-II
4 pages
Traffic Light Simulator Using 8086 Assembly Language
No ratings yet
Traffic Light Simulator Using 8086 Assembly Language
14 pages
IEC 60870-5-104 and IEC 61850 Protocol Analysis With Wireshark
No ratings yet
IEC 60870-5-104 and IEC 61850 Protocol Analysis With Wireshark
3 pages
By: Nitesh Srivastava 28445
No ratings yet
By: Nitesh Srivastava 28445
17 pages
PLC Scada Training Report
No ratings yet
PLC Scada Training Report
31 pages
22en33t Owners Manual
No ratings yet
22en33t Owners Manual
30 pages
Summary-Control Panel Applet
No ratings yet
Summary-Control Panel Applet
5 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
AS 88013 PLC OM 600L93 GB WW 1027-1 Keyence
No ratings yet
AS 88013 PLC OM 600L93 GB WW 1027-1 Keyence
6 pages
17me62 - Cim - I Ia - QP
No ratings yet
17me62 - Cim - I Ia - QP
1 page
Garbage Collector Java
No ratings yet
Garbage Collector Java
6 pages
Dictionary Notes
No ratings yet
Dictionary Notes
4 pages
Student Dictionary - Basics and Word
No ratings yet
Student Dictionary - Basics and Word
7 pages
Module 11 Fetch Decode Execute Cycle V1
No ratings yet
Module 11 Fetch Decode Execute Cycle V1
16 pages
Sap Ase Hadr Users Guide en
100% (1)
Sap Ase Hadr Users Guide en
530 pages
(Ebook) Building Enterprise Systems with ODP: An Introduction to Open Distributed Processing by Peter F. Linington, Zoran Milosevic, Akira Tanaka, Antonio Vallecillo ISBN 9781439866252, 1439866252 2024 scribd download
100% (2)
(Ebook) Building Enterprise Systems with ODP: An Introduction to Open Distributed Processing by Peter F. Linington, Zoran Milosevic, Akira Tanaka, Antonio Vallecillo ISBN 9781439866252, 1439866252 2024 scribd download
71 pages
Tms Unit 4
No ratings yet
Tms Unit 4
23 pages
Linux Basics
No ratings yet
Linux Basics
180 pages
SQL Installation Guide Pacis 4.4
100% (1)
SQL Installation Guide Pacis 4.4
173 pages
Ecflowcourse2018 PDF
No ratings yet
Ecflowcourse2018 PDF
89 pages
1023 - Saturn Manual v1.4
No ratings yet
1023 - Saturn Manual v1.4
43 pages
Chapter 11 Datapath Subsystems: Enhanced
No ratings yet
Chapter 11 Datapath Subsystems: Enhanced
1 page
Zimbra 9 Datasheet
No ratings yet
Zimbra 9 Datasheet
2 pages
Synology VM Specifications
No ratings yet
Synology VM Specifications
9 pages
FactoryTalk Linx Data Bridge
No ratings yet
FactoryTalk Linx Data Bridge
59 pages
SPC574Kx: 32-Bit Power Architecture Based MCU For Automotive Applications
No ratings yet
SPC574Kx: 32-Bit Power Architecture Based MCU For Automotive Applications
160 pages
OS Alarm Clock Report
No ratings yet
OS Alarm Clock Report
16 pages
Chiranjeevi Meesa - CRM Technical
No ratings yet
Chiranjeevi Meesa - CRM Technical
6 pages
Practical Electronics for Inventors, Fourth Edition Paul Scherz - Get instant access to the full ebook with detailed content
100% (1)
Practical Electronics for Inventors, Fourth Edition Paul Scherz - Get instant access to the full ebook with detailed content
50 pages

Module 2 Hadoop

Uploaded by

Module 2 Hadoop

Uploaded by

Introduction

• Install hadoop: Default: 2

2. Core Components of Hadoop Cluster

3. Work-flow of How File is Stored in Hadoop

110 – Maximum Racks

At the top of each rack there is a rack switch

• NameNode oversees the health of DataNode and

• Name node keeps track of all the file system related

JobTracker : Coordinates the parallel processing of data using MapReduce.

▪ NameNode which keeps all filesystem metadata in RAM has no

▪ If NameNode crashes, you lose everything in RAM itself and you

▪ What secondary node does is it contacts NameNode in an hour

▪ Hence Secondary Node is not the backup rather it does job of

▪ In case of NameNode failure, saved metadata can rebuild it easily.

➢ Each slave runs both a DataNode and Task Tracker

➢ The Task Tracker daemon is a slave to the Job Tracker

➢ DataNode daemon a slave to the NameNode

You might also like