0% found this document useful (0 votes)

135 views21 pages

HDFS - Rackawareness

Rack awareness in Hadoop places replicas of data blocks across different racks to improve data reliability, availability, and performance. The NameNode maintains the rack IDs of each data node to choose nearby nodes on the same or different racks for read/write requests. The replica placement policy aims to store no more than one replica on a node and no more than two replicas on the same rack to reduce network traffic while ensuring fault tolerance if an entire rack fails.

Uploaded by

sowjanya kandukuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views21 pages

HDFS - Rackawareness

Uploaded by

sowjanya kandukuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

HDFS –

Rackawareness
Rackwareness

Rack Awaren ess in Hadoop is the concept that chooses close r

Datanodes b ased on the rack informa tion. By de fault, Hadoop
installation assumes that all the nodes belong to the same rack.

To improve network traffic while reading/writing HDFS file s in large

clusters of Hadoop .

NameNode chooses data nodes, which are on the same rack or a ne arby
rack to read/ write re que sts (client node ). HDFS Namenode achieves
this rack information by maintaining rack ids of each data node .
Why Rack Awareness?

The main pu rpose of Rack awaren ess is t o:

•Improve data reliability and data availability.

•Better cluster performance.

•Prevents data loss if the entire rack fails.

•To improve network bandwidth.

•Keep the bulk flow in -rack when possible.

Why Rack Awareness?

Hadoop keeps m ul ti pl e copi es for al l data that i s present i n

HD F S. I f Hadoop i s aware of the rack topol og y, each copy of
data can be kept i n a di fferent rack. By doi ng thi s, i n case an
enti re rack suffers a fail ure for som e reason, the data can be
retri eved from a di fferent rack.

Repl i cati on of data bl ocks i n m ul ti pl e racks i n HD FS vi a rack

awareness i s done usi ng a pol i cy call ed Repl i ca Repl acem ent
Pol i cy.

The pol i cy states that “N o m ore than one repli ca i s placed on

one node. And no m ore than 2 repl i cas are pl aced on the sam e
rack.”
Replica placement via Rack Awareness in
Hadoop

The m ai n purpose of repli ca placem ent vi a Rack awareness, the pol i cy i s to im prove
data rel i abi l i ty etc.

A si m pl e pol i cy i s to pl ace repl i cas on the rack to prevent l osi ng of data when an enti re
rack fai l s. And al l ow the use of bandwi dth from m ul ti pl e racks when readi ng a fi l e.

On m ul ti pl e rack cl usters, bl ock repl i cati on fol l ows the bel ow pol i cy:

Yo u sho uld no t pla ce mo re t ha n o ne re plica o n o ne no d e. Yo u sho uld a lso no t

pla ce mo re t ha n t wo replica s o n t he sa me ra ck. T his ha s a bo t t leneck t ha t number
o f ra cks used fo r blo ck repli ca t io n sho uld be a lwa ys less t ha n t he t o t a l numbe r o f
blo ck replica s.
For example;

When a Hadoop fram ework creates new bl ock, i t pl aces fi rst repl i ca on the
l ocal node. And pl ace a second one i n a di fferent rack, and the thi rd one i s on
di fferent node on the l ocal node.

When re- repl i cati ng a bl ock, i f the num ber of exi sti ng repl i cas i s one, pl ace the
second on a di fferent rack.

When num ber of exi sti ng repli cas are two, i f the two repl i cas are i n the sam e
rack, pl ace the thi rd one on a di fferent rack.
How does Hadoop decide
where to store the
replica of blocks created?
What is a rack?

A rack is n oth in g b u t a col l ection of 30 -40 DataNod es or mach in es in a

Had oop cl u ster l ocated in a sin g l e d ata cen ter or l ocation . Th ese
DataNod es in a rack are con n ected to th e NameNod e th rou g h
trad ition al n etwork d esig n via a n etwork switch . A l arg e Had oop cl u ster
wil l h ave mu l tip l e racks.
What is rack awareness in Hadoop HDFS?

The proc es s of making H adoop awar e of what mac hine is part of whic h
rac k and how thes e r ac ks ar e c onnec ted to eac h other within the H adoop
c lus ter is what def ines r ac k awar enes s . In a H adoop c lus ter, N ameN ode
keeps the r ac k ids of all the D ataN odes . N amenode c hoos es the c los es t
D ataN ode while s tor ing the data bloc ks us ing the r ac k inf or mation. In
s imple terms , having the knowledge of how dif f erent data nodes are
dis tributed ac r os s the r ac ks or knowing the c lus ter topology in the H adoop
c lus ter is c alled r ac k awar enes s in H adoop. R ac k awar enes s is impor tant
as it ens ures data r eliability and helps to rec over data in c as e of a rac k
f ailure.
Rack Awareness Example
The def ault r eplicat ion f act or is 3 or it can also be conf igur ed .

At t he t ime of t he cr eat ion of a new block: The f ir s t r eplic a is st or ed on t he c los es t

local nod e. The seco nd is st or ed on alt oget her a dif f er ent r ack . The t hir d r eplica is
st or ed on t he same r ack but a dif f er ent node.

At t he t ime of r e- r eplicat i ng a block : I f t he numb er of t he exist i ng r eplic as is one,

t he seco nd r eplica is st or ed on a dif f er ent r ack . I f t he number of t he ex is t i ng
r eplicas is t w o and bot h ar e on t he same r ack, t he t hir d r eplica is st or ed on a
dif f er ent r ack .

A simp le w ay of st or ing dat a block r epl icas is pl aci ng eac h o ne o n a separ at e r ack
how ever, t his could incr ease t he lat ency of Read/ W r it e oper at ions .

So Replic at io n policy is desig ned in s uc h a w ay t o r educe t he net w or k bandw idt h

us ed w he n r eadi ng t he dat a as t he r epl ic as ar e place d o n o nly 2 uni q ue r acks, at
t he same t ime ensur ing t he f ault t oler ance .
Advantages of implementing Rack Awareness in
Hadoop
•Rack awareness in Hadoop helps optimize replica placement thus ensuring high
reliability and fault tolerance.
•Rack awareness ensures that the Read/Write requests to replicas are placed to
the closest rack or the same rack. This maximizes the reading speed and
minimizes the writing cost.
•Rack Awareness maximizes the network bandwidth by block transfers within the
rack. Data access needs are catered to keeping in mind minimum network travel
so as to reduce the network overheads.
•Rack Awareness helps the NameNode to assign the task to the nodes closer to
data in the network topology.
•The M apReduce j obs can also benefit from rack awareness. B y knowing where
the data required by the map is located, it can run the map task on that
particular machine itself, thereby saving a lot of bandwidth and time.
Hadoop Arch – Rack Awareness
Algorithm
Hadoop Arch – Rack Awareness
Algorithm
Hadoop Arch – Rack Awareness
Algorithm
Hadoop Arch – Rack Awareness
Algorithm
Hadoop Arch – Rack Awareness
Algorithm
Hadoop Arch – Rack Awareness
Algorithm
Advantages of Rack Awareness in Hadoop

Let’s now discu ss some advantages of Rack Awareness in Had oop HDFS-
Provide higher ban dwidth and low latency – This polic y ma ximizes
netwo rk bandwidth by transf er rin g block within a rack ra the r than b etween
racks. Th e YARN is able to opti mize Map Redu ce job pe rfo r man ce by
assigning tasks to nodes that a re clos er to thei r d a ta in t er m s o f n etwo rk
topolog y.
Minimize the writing cos t and Maximize read s peed – Ra ck awar ene ss,
policy plac es re ad/write r equ es ts to r eplicas which ar e in the sa me rack.
Thu s, this minimizes writing cost and maximizes reading speed .
Advantages of Rack Awareness in Hadoop

•Provides data protection against rack failure – Namenode

assign the block replicas of 2 nd And 3 rd Block to nodes
in different rack from the first replica. Thus, it provides
data protection even against rack failure. However, this
is possible only if Hadoop was configured with
knowledge of its rack configuration.
Advantages of Rack Awareness in Hadoop

•Minimize the writing cost and Maximize read speed –

Rack awareness, policy places read/write requests to
replicas which are in the same rack. Thus, this
minimizes writing cost and maximizes reading speed.
R ac k Awarenes s in H adoop is the c onc ept to c hoos e a near by data node
(c los es t to the c lient whic h has r ais ed the R ead/W rite reques t), thereby
reduc ing the networ k tr af f ic . H adoop s upports the c onf iguration of rac k
awarenes s to ens ur e the plac ement of one replic a of the data bloc k on a
dif f erent rac k. The per f or manc e as pec t of rac k awarenes s is that des pite
c opies of the data s pr ead ac r os s r ac ks , but it is not more than two
ens uring that the bandwidth utilization is les s and lower latenc y. This
makes the W r ite oper ations f as ter at the s ame time pr oviding f ault
toleranc e. This als o pr ovides data availability if there is a partition within
the c lus ter or in the event of a networ k s witc h f ailure.

Robbins, Philip - Python Programming For Beginners (2023)
93% (14)
Robbins, Philip - Python Programming For Beginners (2023)
178 pages
Hive Case Study Assignment Upgrad
71% (14)
Hive Case Study Assignment Upgrad
45 pages
Joel Murach, Mary Delamater - Murach's C++ Programming-Mike Murach & Associates (2018) PDF
100% (7)
Joel Murach, Mary Delamater - Murach's C++ Programming-Mike Murach & Associates (2018) PDF
802 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (21)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
The Python Bible
97% (31)
The Python Bible
506 pages
Hacking The Art of Exploitation 2nd Edition Jon Erickson
100% (20)
Hacking The Art of Exploitation 2nd Edition Jon Erickson
492 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
93% (43)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Kubernetes Basic To Advance End To End
100% (6)
Kubernetes Basic To Advance End To End
295 pages
Understanding Machine Learning
100% (69)
Understanding Machine Learning
416 pages
Fundamentals of Quantum Computing (2021) (9783030636890) (2021)
100% (7)
Fundamentals of Quantum Computing (2021) (9783030636890) (2021)
480 pages
Linux Essentials For Cybersecurity
100% (23)
Linux Essentials For Cybersecurity
1,966 pages
Practical Projects
100% (30)
Practical Projects
478 pages
(Hunt, J.) A Beginners Guide To Python 3 Programming
96% (47)
(Hunt, J.) A Beginners Guide To Python 3 Programming
440 pages
The Python Manual
97% (31)
The Python Manual
196 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
93% (15)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Linux For Developers - Jumpstart Your Linux Programming Skills (PDFDrive) PDF
100% (4)
Linux For Developers - Jumpstart Your Linux Programming Skills (PDFDrive) PDF
224 pages
Resumen Ejercicios Libro Spark
No ratings yet
Resumen Ejercicios Libro Spark
86 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Learning PySpark
From Everand
Learning PySpark
Tomasz Drabas
No ratings yet
Docker Docker Tutorial For Beginners Build Ship and Run - Dennis Hutten
100% (11)
Docker Docker Tutorial For Beginners Build Ship and Run - Dennis Hutten
187 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Coffee Break NumPy PDF
100% (5)
Coffee Break NumPy PDF
211 pages
Kubernetes Tutorial
100% (11)
Kubernetes Tutorial
83 pages
Data Structure and Algorithms With Python
100% (14)
Data Structure and Algorithms With Python
369 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Rack Awareness
No ratings yet
Rack Awareness
4 pages
Unit 2
No ratings yet
Unit 2
56 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Deep Learning with Hadoop
From Everand
Deep Learning with Hadoop
Dipayan Dev
No ratings yet
Unit-2 CH 1 Updated
No ratings yet
Unit-2 CH 1 Updated
22 pages
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
From Everand
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
Robert Johnson
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
From Everand
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
Rob Botwright
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Document 4 HDFS
No ratings yet
Document 4 HDFS
8 pages
HDFS and YARN
No ratings yet
HDFS and YARN
91 pages
Breadth First Search: Fundamentals and Applications
From Everand
Breadth First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
4 pages
Introduction To Hadoop - Chapter-2
No ratings yet
Introduction To Hadoop - Chapter-2
59 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
Relayd and Httpd Mastery: IT Mastery, #11
From Everand
Relayd and Httpd Mastery: IT Mastery, #11
Michael W. Lucas
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PostgreSQL Replication - Second Edition
From Everand
PostgreSQL Replication - Second Edition
Hans-Jurgen Schonig
No ratings yet
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
From Everand
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
Friend Good
No ratings yet
Learn Cassandra in 24 Hours
From Everand
Learn Cassandra in 24 Hours
Alex Nordeen
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Handling Selfishness in Replica Allocation Over A Mobile Ad-Hoc Network
No ratings yet
Handling Selfishness in Replica Allocation Over A Mobile Ad-Hoc Network
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
HDFS
No ratings yet
HDFS
8 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
SAP HANA Interview Questions You'll Most Likely Be Asked
From Everand
SAP HANA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Apache Flume: Distributed Log Collection for Hadoop
From Everand
Apache Flume: Distributed Log Collection for Hadoop
Steve Hoffman
No ratings yet
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
5/5 (5)
BD Module 1 Final
No ratings yet
BD Module 1 Final
17 pages
Unit - 3 (HDFS) - 1
No ratings yet
Unit - 3 (HDFS) - 1
24 pages
Unit - 3 (HDFS)
No ratings yet
Unit - 3 (HDFS)
23 pages
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Wireshark Cookbook: Packet Analysis Bible
From Everand
Wireshark Cookbook: Packet Analysis Bible
Rob Botwright
No ratings yet
Hadoop Fundamentals
No ratings yet
Hadoop Fundamentals
45 pages
Large Scale Machine Learning with Python
From Everand
Large Scale Machine Learning with Python
Bastiaan Sjardin
2/5 (1)
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Depth First Search: Fundamentals and Applications
From Everand
Depth First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Parallel Python with Dask
From Everand
Parallel Python with Dask
Tim Peters
No ratings yet
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
From Everand
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
Tim Peters
No ratings yet
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet
Ansible For Kubernetes PDF
100% (6)
Ansible For Kubernetes PDF
172 pages
Bhyve Bsdmag
No ratings yet
Bhyve Bsdmag
82 pages
Hacking With Python
93% (15)
Hacking With Python
501 pages
Python Programming For Beginners - Learn Python Programming in 24 Hours PDF
100% (21)
Python Programming For Beginners - Learn Python Programming in 24 Hours PDF
133 pages
Concepts in Programming Languages
100% (12)
Concepts in Programming Languages
541 pages
Linux Commands Handbook PDF
100% (14)
Linux Commands Handbook PDF
135 pages
Hdfs Part 2
No ratings yet
Hdfs Part 2
42 pages
5 6093571592359510119 PDF
86% (14)
5 6093571592359510119 PDF
261 pages
TCP/IP
100% (15)
TCP/IP
286 pages
Tableau Calculations
No ratings yet
Tableau Calculations
52 pages
Big Data Analytics
No ratings yet
Big Data Analytics
16 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
GST Basics
No ratings yet
GST Basics
11 pages
Sale of Goods Act
No ratings yet
Sale of Goods Act
10 pages
Examen BigData Juin2016 en
No ratings yet
Examen BigData Juin2016 en
3 pages
Spark SQL
No ratings yet
Spark SQL
2 pages
Etl - ApacheSpark - Booking - Colab
No ratings yet
Etl - ApacheSpark - Booking - Colab
9 pages
HDFS - Rackawareness
No ratings yet
HDFS - Rackawareness
21 pages
BDA LabRecord Week04 07
No ratings yet
BDA LabRecord Week04 07
31 pages
Chapter - 2 Introduction To HADOOP
No ratings yet
Chapter - 2 Introduction To HADOOP
34 pages
Data Engineer Requirment
No ratings yet
Data Engineer Requirment
2 pages
2023年11月系统架构设计师真题回忆版
No ratings yet
2023年11月系统架构设计师真题回忆版
11 pages
Workshop Advanced Big Data
No ratings yet
Workshop Advanced Big Data
9 pages
The Handbook of Solitude Psychological Perspectives On Social Isolation
0% (2)
The Handbook of Solitude Psychological Perspectives On Social Isolation
14 pages
Wa0006.
No ratings yet
Wa0006.
14 pages
Bda Toppers Solution
No ratings yet
Bda Toppers Solution
71 pages
Prácticas Bigdata: 1. Lanzar Un Proceso Mapreduce Contra El Cluster
No ratings yet
Prácticas Bigdata: 1. Lanzar Un Proceso Mapreduce Contra El Cluster
3 pages
Accessing S3 Data Through SQL With Presto
No ratings yet
Accessing S3 Data Through SQL With Presto
1 page
MCQ - Hadoop - Javaguides
No ratings yet
MCQ - Hadoop - Javaguides
3 pages
Advanced Data Cleaning Techniques With PySpark
No ratings yet
Advanced Data Cleaning Techniques With PySpark
25 pages
Formación Cloudera I - Intermedio - Ejercicios Con Soluciones
No ratings yet
Formación Cloudera I - Intermedio - Ejercicios Con Soluciones
29 pages
Screenshot 2023-03-20 at 8.42.57 AM
No ratings yet
Screenshot 2023-03-20 at 8.42.57 AM
3 pages
Big Data Cat Questions
No ratings yet
Big Data Cat Questions
7 pages
Hbase 1.x Installation Steps
No ratings yet
Hbase 1.x Installation Steps
4 pages
Iot Notes Unit 5
No ratings yet
Iot Notes Unit 5
12 pages
Cloud Computing Unit 5
No ratings yet
Cloud Computing Unit 5
16 pages
Pig Hive Spark Big Data Analytics
No ratings yet
Pig Hive Spark Big Data Analytics
10 pages
EMATM0051 2022 W8L2 Hadoop
No ratings yet
EMATM0051 2022 W8L2 Hadoop
92 pages
Big Data Notes 2025
No ratings yet
Big Data Notes 2025
13 pages
Questions DASCA
No ratings yet
Questions DASCA
24 pages
Oozie Commands
No ratings yet
Oozie Commands
3 pages
Iot Assignment 3
No ratings yet
Iot Assignment 3
14 pages

HDFS - Rackawareness

Uploaded by

HDFS - Rackawareness

Uploaded by

HDFS –

Rack Awaren ess in Hadoop is the concept that chooses close r

To improve network traffic while reading/writing HDFS file s in large

The main pu rpose of Rack awaren ess is t o:

•Improve data reliability and data availability.

•Better cluster performance.

•Prevents data loss if the entire rack fails.

•To improve network bandwidth.

•Keep the bulk flow in -rack when possible.

Hadoop keeps m ul ti pl e copi es for al l data that i s present i n

Repl i cati on of data bl ocks i n m ul ti pl e racks i n HD FS vi a rack

The pol i cy states that “N o m ore than one repli ca i s placed on

Yo u sho uld no t pla ce mo re t ha n o ne re plica o n o ne no d e. Yo u sho uld a lso no t

A rack is n oth in g b u t a col l ection of 30 -40 DataNod es or mach in es in a

At t he t ime of t he cr eat ion of a new block: The f ir s t r eplic a is st or ed on t he c los es t

At t he t ime of r e- r eplicat i ng a block : I f t he numb er of t he exist i ng r eplic as is one,

So Replic at io n policy is desig ned in s uc h a w ay t o r educe t he net w or k bandw idt h

•Provides data protection against rack failure – Namenode

•Minimize the writing cost and Maximize read speed –

You might also like