0% found this document useful (0 votes)
11 views5 pages

SandeepKumar Das 20020343071

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

SandeepKumar Das 20020343071

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Name: Sandeep Kumar Das

PRN: 20020343071

Tech Class: C

Operation: Fault Tolerance Case


Types of faults in HDFS:

1. Data Node failure

Hadoop file system is a master/slave file system in which Namenode works as the master and
Datanode work as a slave. Namenode is so critical term to Hadoop file system because it acts as
a central component of HDFS. If Namenode gets down then the whole Hadoop cluster is
inaccessible and considered dead. Datanode stores actual data and works as instructed by
Namenode. A Hadoop file system can have multiple data nodes but only one active Namenode.

In HDFS, each DataNode sends Heartbeat and Data Blocks report to NameNode. Receipt of a
heartbeat implies that the datanode is functioning properly. A block report contains a list of all
blocks on a datanode.
Data node passes a heartbeat signal to Name node in an interval of 2 minutes.When Name node
does not receive heartbeat signals from Data node, it assumes that the data node is either dead
or non-functional.
As soon as the data node is declared dead/non-functional all the data blocks it hosts are
transferred to the other data nodes with which the blocks are replicated initially. This is how
Namenode handles datanode failures.

2. Rack failure

3. Name node failure

If NameNode gets fail the whole Hadoop cluster will not work. Actually, there will not any data
loss only the cluster work will be shut down, because NameNode is only the point of contact to
all DataNodes and if the NameNode fails all communication will stop.
Available solutions to handle Name node failure in Hadoop 1

To handle the single point of failure, we can use another setup configuration which can backup
NameNode metadata. If the primary NameNode will fail our setup can switch to secondary
(backup) and no any type to shutdown will happen for Hadoop cluster.

Available solutions to handle Name node failure in Hadoop 2

HDFS High Availability of Namenode is introduced with Hadoop 2. In these two, separate
machines are getting configured as NameNodes, where one NameNode always in working state
and anther is in standby. Working Name node handling all clients request in the cluster where
standby is behaving as the slave and maintaining enough state to provide a fast failover on
Working Name node.

Fault Tolerance

Fault Tolerance: It means making the data available even in case of some failures.

A distributed system works on cluster of computers.

Most of the time a distributed system will spread the data in partitions over various systems in
the cluster as shown in the figure below.
If 1-2 systems in the cluster fails, we may not be able to read the data. This is a fault which we
can tolerate.

Ways for achieving Fault-Tolerance in Hadoop HDFS

1. Replication Mechanism

Replication Factor: Make multiple copies of the data and keep it on separate systems. If we
have 3 copies of Partition stored in 3 difference machines. We should be able to avoid 2 failures.
Since we have 3 copies on 3 different systems even if 2 of them fails, we can still read our data
from the 3rd system. The tern used for making multiple copies is called Replication factor.
Replication factor = 2, means I am maintaining 2 copies of my Partition.

Replication factor = 3, means each data is replicated 3 times on various data node. This is by
default 3 in Hadoop system.

2. Erasure Coding

Erasure coding is a method used for fault tolerance that durably stores data with significant
space savings compared to replication.

RAID (Redundant Array of Independent Disks) uses Erasure Coding. Erasure coding works by
striping the file into small units and storing them on various disks.

For each strip of the original dataset, a certain number of parity cells are calculated and stored.
If any of the machines fails, the block can be recovered from the parity cell. Erasure
coding reduces the storage overhead to 50%.

You might also like