0% found this document useful (0 votes)

5 views17 pages

Module 2

Uploaded by

Aditya Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views17 pages

Module 2

Uploaded by

Aditya Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Introduction

To Hadoop
B Y:

S H AVA N T R E V VA S B I L A K E R I
Common Types of Architecture-
Multiprocessor
1) Shared Memory (SM): Common Central Memory – Shared by multiple processors

2) Shared Disk (SD) : Multiple Processors - Common Collection of Disks – Own

Private Memory

3) Shared Nothing (SN) : Neither memory nor Disk – Shared among multiple
processors.
Hadoop Cluster
1. The architecture of Hadoop Cluster
2. Core Components of Hadoop Cluster
3. Work-flow of How File is Stored in Hadoop
1. Hadoop Cluster • These clusters run on low cost commodity computers.
• Hadoop clusters are often referred to as "shared nothing" systems
because the only thing that is shared between nodes is the network
that connects them.
• Large Hadoop Clusters are arranged in several racks.
• Network traffic between different nodes in the same rack is much
more desirable than network traffic across the racks.
• Example: Yahoo's Hadoop cluster. They have more than 10,000
machines running Hadoop and nearly 1 petabyte of user data.
• A small Hadoop cluster includes a single master node and
multiple worker or slave node.
• As discussed earlier, the entire cluster contains two layers.
• One of the layer of MapReduce Layer and another is of HDFS
Layer.
• The master node consists of a JobTracker, NameNode.
• A slave or worker node consists of a DataNode and TaskTracker.
• It is also possible that slave node or worker node is only data or
compute node.
Hadoop Cluster – 3 Components: Client, Master & Slave
Hadoop Cluster would consists of

 110 – Maximum Racks

 Rack - 40 slave machine

 At the top of each rack there is a rack switch

 Each slave machine(rack server in a rack) has cables coming out it from both
the ends

 Cables are connected to rack switch at the top which means that top rack
switch will have around 80 ports

 Global = 8 core switches

 The rack switch has uplinks connected to core switches and hence connecting

all other racks with uniform bandwidth, forming the Cluster

 In the cluster, you have few machines to act as Name node and as JobTracker.

They are referred as Masters.

Cluster : Core Components
1. Client
2. Masters: Name Node, Secondary Node & Job Tracker
2.1: Name Node:

• NameNode oversees the health of

DataNode and coordinates access to the
data stored in DataNode.

• Name node keeps track of all the file

system related information such as:
• Which section of file is saved in
which part of the cluster
• Last access time for the files
• User permissions like which user
have access to the file
2.2 JobTracker:

JobTracker : Coordinates the parallel processing of data using MapReduce.

2.3 Secondary Node:  The job of Secondary Node is to contact NameNode in a periodic
manner after certain time interval (by default 1 hour).

 NameNode which keeps all filesystem metadata in RAM has no

capability to process that metadata on to disk.

 If NameNode crashes, you lose everything in RAM itself and you

don't have any backup of filesystem.

 What secondary node does is it contacts NameNode in an hour

and pulls copy of metadata information out of NameNode.

 It shuffle and merge this information into clean file folder and
sent to back again to NameNode, while keeping a copy for itself.

 Hence Secondary Node is not the backup rather it does job of

housekeeping.

 In case of NameNode failure, saved metadata can rebuild it easily.

3: Slave
 Slave nodes are the majority of
machines in Hadoop Cluster and are
responsible to :
 Store the data
 Process the computation

 Each slave runs both a DataNode and

Task Tracker daemon which
communicates to their masters.

 The Task Tracker daemon is a slave to

the Job Tracker

 DataNode daemon a slave to the

NameNode
Loading File In Hadoop Cluster
Client knows that to which data nodes load the blocks?
1) Now NameNode comes into picture.
2) The NameNode used its Rack Awareness intelligence
to decide on which DataNode to provide.
3) For each of the data block (in this case Block-A,
Block-B and Block-C), Client contacts NameNode
and in response NameNode sends an ordered list of 3
DataNodes.
Block replication 1. Client write the data block directly to one DataNode.
2. DataNodes then replicate the block to other Data
nodes.
3. When 1 block gets written in all 3 DataNode then only
cycle repeats for next block.
4. In Hadoop Gen 1 there is only one NameNode.
5. In Hadoop Gen2 there is active passive model in
NameNode where one more node "Passive Node"
comes in picture.
6. The default setting for Hadoop is to have 3 copies of
each block in the cluster.
7. This setting can be configured with "dfs.replication"
parameter of hdfs-site.xml file.
8. Keep note that Client directly writes the block to the
DataNode without any intervention of NameNode in
this process.
Parallel Computing vs Distributed Computing

Untitled
No ratings yet
Untitled
1,954 pages
21BCE9726 - DSA Lab - FINAL REPORT
No ratings yet
21BCE9726 - DSA Lab - FINAL REPORT
50 pages
Wa0002.
No ratings yet
Wa0002.
66 pages
Big Data Unit-2 PPT Part1
No ratings yet
Big Data Unit-2 PPT Part1
76 pages
Prelim Lab Quiz 1
No ratings yet
Prelim Lab Quiz 1
7 pages
Hadoop Building Blocks
No ratings yet
Hadoop Building Blocks
30 pages
Chapter 6 Case Study Hadoop
No ratings yet
Chapter 6 Case Study Hadoop
39 pages
Hadoop Interview Qs
No ratings yet
Hadoop Interview Qs
99 pages
Hadoop Physical Organization
No ratings yet
Hadoop Physical Organization
7 pages
STM32 Cube F1 Getting Started
No ratings yet
STM32 Cube F1 Getting Started
26 pages
Bda Unit-Iv
No ratings yet
Bda Unit-Iv
37 pages
Hadoop 1
No ratings yet
Hadoop 1
75 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
56 pages
Full Stack Web Development With JavaScript (MERN)
No ratings yet
Full Stack Web Development With JavaScript (MERN)
6 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
Intermediate Code Generation and Code Optimization
No ratings yet
Intermediate Code Generation and Code Optimization
40 pages
ECS765P - W3 - Hadoop Principles and Components
No ratings yet
ECS765P - W3 - Hadoop Principles and Components
47 pages
COMPUTER SOFTWARE and OPERATING SYSTEMS
No ratings yet
COMPUTER SOFTWARE and OPERATING SYSTEMS
6 pages
Chapter 6 1712934164767
No ratings yet
Chapter 6 1712934164767
19 pages
CC Unit 5
No ratings yet
CC Unit 5
43 pages
Hadoop Classroom Notes
100% (2)
Hadoop Classroom Notes
76 pages
BigData Fundamental and Hadoop Interview Questions
No ratings yet
BigData Fundamental and Hadoop Interview Questions
33 pages
Hadoop
No ratings yet
Hadoop
31 pages
Hadoop Week 2
No ratings yet
Hadoop Week 2
40 pages
Hadoop 1
No ratings yet
Hadoop 1
26 pages
Unit 3
No ratings yet
Unit 3
25 pages
Module 2 Hadoop
No ratings yet
Module 2 Hadoop
23 pages
Unit-2 CH 1 Updated
No ratings yet
Unit-2 CH 1 Updated
22 pages
Wa0002.
No ratings yet
Wa0002.
32 pages
Unit - 2
No ratings yet
Unit - 2
27 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
SAP NOTE 2872932 - E - 20250704 2872932 - ITS Applications Timeout Unexpectedly
No ratings yet
SAP NOTE 2872932 - E - 20250704 2872932 - ITS Applications Timeout Unexpectedly
3 pages
BD Sec B
No ratings yet
BD Sec B
19 pages
Project Peport Pranali
No ratings yet
Project Peport Pranali
88 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Hadoop Presentaton
No ratings yet
Hadoop Presentaton
47 pages
Python Pandas1
No ratings yet
Python Pandas1
39 pages
Bda A2
No ratings yet
Bda A2
17 pages
Unit 2
No ratings yet
Unit 2
18 pages
Hadoop, HDFS, Yarn
No ratings yet
Hadoop, HDFS, Yarn
8 pages
BD Module 1 Final
No ratings yet
BD Module 1 Final
17 pages
Unit III
No ratings yet
Unit III
9 pages
Unit 2
No ratings yet
Unit 2
56 pages
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
No ratings yet
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
19 pages
Cyberbullying IPR
No ratings yet
Cyberbullying IPR
25 pages
Unit 5 Print
No ratings yet
Unit 5 Print
32 pages
Course Structure of MCA Management 2024
No ratings yet
Course Structure of MCA Management 2024
2 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
Unit 3 Bba
No ratings yet
Unit 3 Bba
11 pages
Annexure - I - Syllabus PG-DBDA Aug 16
No ratings yet
Annexure - I - Syllabus PG-DBDA Aug 16
4 pages
Hadoop Cluster - Architecture, Core Components
100% (1)
Hadoop Cluster - Architecture, Core Components
9 pages
Adobe Scan 05-Nov-2023
No ratings yet
Adobe Scan 05-Nov-2023
9 pages
CC Unit 5 Notes
No ratings yet
CC Unit 5 Notes
30 pages
NYOUG Hadoop Presentaton
No ratings yet
NYOUG Hadoop Presentaton
47 pages
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
No ratings yet
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
62 pages
My Courses: Home BAED-PROG3115-2016S Quarterly Exam First Quarter Exam
No ratings yet
My Courses: Home BAED-PROG3115-2016S Quarterly Exam First Quarter Exam
11 pages
Lecture Notes Hadoop
100% (1)
Lecture Notes Hadoop
11 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
5 pages
Unit Test 2 Answer Key CD
No ratings yet
Unit Test 2 Answer Key CD
3 pages
Niversiti Utra Alaysia: LAB 4 (WEEKS 5 & 6) - Individual
No ratings yet
Niversiti Utra Alaysia: LAB 4 (WEEKS 5 & 6) - Individual
5 pages
Portopoli Ferliks
No ratings yet
Portopoli Ferliks
2 pages
Cca 410
No ratings yet
Cca 410
7 pages
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
No ratings yet
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
47 pages
Hadoop Interview Quations: HDFS (Hadoop Distributed File System)
No ratings yet
Hadoop Interview Quations: HDFS (Hadoop Distributed File System)
23 pages
Presentation: Hadoop Technology
No ratings yet
Presentation: Hadoop Technology
15 pages
Arun Resume
No ratings yet
Arun Resume
5 pages
Aditya Teltia
No ratings yet
Aditya Teltia
2 pages
Navisworks Keyboard Shortcuts - Symetri - Ie
No ratings yet
Navisworks Keyboard Shortcuts - Symetri - Ie
5 pages
24 Interview Questions
No ratings yet
24 Interview Questions
7 pages
S20 CS-103 CS-106 Object Oriented Programming - Midterm
No ratings yet
S20 CS-103 CS-106 Object Oriented Programming - Midterm
3 pages
Yarn Ha Federation
No ratings yet
Yarn Ha Federation
64 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
26 pages
CV - Joseph Franck Bekono Onambele
No ratings yet
CV - Joseph Franck Bekono Onambele
2 pages
New Documentation On e Banking
No ratings yet
New Documentation On e Banking
28 pages
Kaenus College, Databse Level III: Choose
No ratings yet
Kaenus College, Databse Level III: Choose
7 pages
Hadoop Provides 2 Services
No ratings yet
Hadoop Provides 2 Services
3 pages
004 - Hadoop Daemons (HDFS Only)
No ratings yet
004 - Hadoop Daemons (HDFS Only)
3 pages
Coding Standards & Best Practices: Balaji Uppala
No ratings yet
Coding Standards & Best Practices: Balaji Uppala
12 pages
Jenny Blog
No ratings yet
Jenny Blog
12 pages
CE219.01 Practical List
No ratings yet
CE219.01 Practical List
5 pages
The 8 Useful Java Testing Tools
No ratings yet
The 8 Useful Java Testing Tools
4 pages
Oracle Forms Developer 9i Tutorial
No ratings yet
Oracle Forms Developer 9i Tutorial
37 pages
Cloud Computing Integration Introduction
No ratings yet
Cloud Computing Integration Introduction
21 pages
Hadoop Interview Questions New
No ratings yet
Hadoop Interview Questions New
9 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
7 pages
Hadoop Interview Questions - Part 1
No ratings yet
Hadoop Interview Questions - Part 1
8 pages

Module 2

Uploaded by

Module 2

Uploaded by

Introduction

2) Shared Disk (SD) : Multiple Processors - Common Collection of Disks – Own

 110 – Maximum Racks

 Rack - 40 slave machine

 At the top of each rack there is a rack switch

 Global = 8 core switches

all other racks with uniform bandwidth, forming the Cluster

They are referred as Masters.

• NameNode oversees the health of

• Name node keeps track of all the file

JobTracker : Coordinates the parallel processing of data using MapReduce.

 NameNode which keeps all filesystem metadata in RAM has no

 If NameNode crashes, you lose everything in RAM itself and you

 What secondary node does is it contacts NameNode in an hour

 Hence Secondary Node is not the backup rather it does job of

 In case of NameNode failure, saved metadata can rebuild it easily.

 Each slave runs both a DataNode and

 The Task Tracker daemon is a slave to

 DataNode daemon a slave to the

You might also like