0% found this document useful (0 votes)
53 views2 pages

Homework 2 (22 11 2022)

HDFS is Hadoop's distributed file system for storage. YARN is the resource negotiator that improved upon MapReduce's scalability limitations. The NameNode manages metadata and heartbeats from DataNodes, which store data and service read/write requests. Hadoop administrators frequently add and remove DataNodes to scale the cluster's storage capacity with growing data volumes. HDFS uses data replication across multiple DataNodes to provide fault tolerance in the event of DataNode failures.

Uploaded by

Prabha K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views2 pages

Homework 2 (22 11 2022)

HDFS is Hadoop's distributed file system for storage. YARN is the resource negotiator that improved upon MapReduce's scalability limitations. The NameNode manages metadata and heartbeats from DataNodes, which store data and service read/write requests. Hadoop administrators frequently add and remove DataNodes to scale the cluster's storage capacity with growing data volumes. HDFS uses data replication across multiple DataNodes to provide fault tolerance in the event of DataNode failures.

Uploaded by

Prabha K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

HOMEWORK-2(22-11-2022)

Q1. What are HDFS and YARN?


HDFS is the basic component of hadoop, that helps in storage of data ,it is called as hadoop
distributed file system. YARN is the acronym of yet another resource negotiater, in map
reduce version1 scalability is bottleneck when cluster grows to 4000+.

Q2. What are the various Hadoop daemons and their roles in a Hadoop cluster?
MASTER DAEMON ,this maintains and manages data nodes ,record metadata and receives
heartbeat and block report format data nodes. SLAVE DAEMON, this maintains and
manages name nodes, stores actual data and serves read and write requests from clients.

Q3. Why does one remove or add nodes in a Hadoop cluster frequently?
It is a striking feature of Hadoop Framework is the ease of scale in accordance with the rapid
growth in data volume. Because of these two reasons, one of the most common task of a
Hadoop administrator is to commission (Add) and decommission (Remove) “Data Nodes” in a
Hadoop Cluster.

Q4. What happens when two clients try to access the same file in the HDFS?
 HDFS provides support only for exclusive writes so when one client is already writing the
file, the other client cannot open the file in write mode.

Q5. How does Name Node tackle Data Node failures?


Data blocks on the failed Data node are replicated on other Data nodes based on the specified
replication factor in hdfs - site. xml file. Once the failed data nodes comes back the Name
node will manage the replication factor again. This is how Name node handles the failure of
data node.

Q6. What will you do when Name Node is down?


When the NameNode goes down, the file system goes offline. There is an optional Secondary
Name Node that can be hosted on a separate machine. It only creates checkpoints of the name
space by merging the edits file into the image file and does not provide any real redundancy.

Q7. How is HDFS fault tolerant?


HDFS is highly fault tolerant. It uses replica process to handle faults. This means client data
is repeated many times (default replica factor is 3) on different DataNode in the HDFS cluster.
So that in case of any DataNode goes down, the data will be accessed from other DataNodes.
HOMEWORK-2(22-11-2022)
Q8. Why do we use HDFS for applications having large data sets and not when
there are a lot of small files?
HDFS is more efficient for a large number of data sets, maintained in a single file as
compared to the small chunks of data stored in multiple files. As the NameNode performs
storage of metadata for the file system in RAM, the amount of memory limits the number of
files in HDFS file system.

Q9. How do you define “block” in HDFS? What is the default block size in
Hadoop 1 and in Hadoop 2? Can it be changed?
Each file in HDFS is stored as "block". Default block size in Hadoop1 is 64MB and Default
block size in Hadoop2 is 128MB.Yes it can be changed.

You might also like