HDFS - Rackawareness
HDFS - Rackawareness
Rackawareness
Rackwareness
NameNode chooses data nodes, which are on the same rack or a ne arby
rack to read/ write re que sts (client node ). HDFS Namenode achieves
this rack information by maintaining rack ids of each data node .
Why Rack Awareness?
The m ai n purpose of repli ca placem ent vi a Rack awareness, the pol i cy i s to im prove
data rel i abi l i ty etc.
A si m pl e pol i cy i s to pl ace repl i cas on the rack to prevent l osi ng of data when an enti re
rack fai l s. And al l ow the use of bandwi dth from m ul ti pl e racks when readi ng a fi l e.
On m ul ti pl e rack cl usters, bl ock repl i cati on fol l ows the bel ow pol i cy:
When a Hadoop fram ework creates new bl ock, i t pl aces fi rst repl i ca on the
l ocal node. And pl ace a second one i n a di fferent rack, and the thi rd one i s on
di fferent node on the l ocal node.
When re- repl i cati ng a bl ock, i f the num ber of exi sti ng repl i cas i s one, pl ace the
second on a di fferent rack.
When num ber of exi sti ng repli cas are two, i f the two repl i cas are i n the sam e
rack, pl ace the thi rd one on a di fferent rack.
How does Hadoop decide
where to store the
replica of blocks created?
What is a rack?
The proc es s of making H adoop awar e of what mac hine is part of whic h
rac k and how thes e r ac ks ar e c onnec ted to eac h other within the H adoop
c lus ter is what def ines r ac k awar enes s . In a H adoop c lus ter, N ameN ode
keeps the r ac k ids of all the D ataN odes . N amenode c hoos es the c los es t
D ataN ode while s tor ing the data bloc ks us ing the r ac k inf or mation. In
s imple terms , having the knowledge of how dif f erent data nodes are
dis tributed ac r os s the r ac ks or knowing the c lus ter topology in the H adoop
c lus ter is c alled r ac k awar enes s in H adoop. R ac k awar enes s is impor tant
as it ens ures data r eliability and helps to rec over data in c as e of a rac k
f ailure.
Rack Awareness Example
The def ault r eplicat ion f act or is 3 or it can also be conf igur ed .
A simp le w ay of st or ing dat a block r epl icas is pl aci ng eac h o ne o n a separ at e r ack
how ever, t his could incr ease t he lat ency of Read/ W r it e oper at ions .
Let’s now discu ss some advantages of Rack Awareness in Had oop HDFS-
Provide higher ban dwidth and low latency – This polic y ma ximizes
netwo rk bandwidth by transf er rin g block within a rack ra the r than b etween
racks. Th e YARN is able to opti mize Map Redu ce job pe rfo r man ce by
assigning tasks to nodes that a re clos er to thei r d a ta in t er m s o f n etwo rk
topolog y.
Minimize the writing cos t and Maximize read s peed – Ra ck awar ene ss,
policy plac es re ad/write r equ es ts to r eplicas which ar e in the sa me rack.
Thu s, this minimizes writing cost and maximizes reading speed .
Advantages of Rack Awareness in Hadoop