0% found this document useful (0 votes)
61 views3 pages

Namenode and Datanodes

HDFS is a distributed file system designed to run on commodity hardware. It has a master/slave architecture with a single NameNode that manages the file system namespace and regulates client access to files. DataNodes manage storage on each node and serve read/write requests under the direction of the NameNode. The NameNode and DataNodes are designed to run on commodity machines using Java, making HDFS highly portable and able to deploy on a wide range of machines.

Uploaded by

tejaswini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views3 pages

Namenode and Datanodes

HDFS is a distributed file system designed to run on commodity hardware. It has a master/slave architecture with a single NameNode that manages the file system namespace and regulates client access to files. DataNodes manage storage on each node and serve read/write requests under the direction of the NameNode. The NameNode and DataNodes are designed to run on commodity machines using Java, making HDFS highly portable and able to deploy on a wide range of machines.

Uploaded by

tejaswini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Introduction

The Hadoop Distributed File System (HDFS) is a distributed file


system designed to run on commodity hardware. It has many
similarities with existing distributed file systems. However, the
differences from other distributed file systems are significant.
HDFS is highly fault-tolerant and is designed to be deployed on
low-cost hardware. HDFS provides high throughput access to
application data and is suitable for applications that have large
data sets. HDFS relaxes a few POSIX requirements to enable
streaming access to file system data. HDFS was originally built as
infrastructure for the Apache Nutch web search engine project.
HDFS is now an Apache Hadoop subproject. The project URL
is http://hadoop.apache.org/hdfs/.

NameNode and DataNodes


HDFS has a master/slave architecture. An HDFS cluster consists
of a single NameNode, a master server that manages the file
system namespace and regulates access to files by clients. In
addition, there are a number of DataNodes, usually one per node
in the cluster, which manage storage attached to the nodes that
they run on. HDFS exposes a file system namespace and allows
user data to be stored in files. Internally, a file is split into one or
more blocks and these blocks are stored in a set of DataNodes.
The NameNode executes file system namespace operations like
opening, closing, and renaming files and directories. It also
determines the mapping of blocks to DataNodes. The DataNodes
are responsible for serving read and write requests from the file
systems clients. The DataNodes also perform block creation,
deletion, and replication upon instruction from the NameNode.
The NameNode and DataNode are pieces of software designed to
run on commodity machines. These machines typically run a
GNU/Linux operating system (OS).

HDFS is built using the Java language; any machine that supports
Java can run the NameNode or the DataNode software.

Usage of the highly portable Java language means that HDFS can
be deployed on a wide range of machines. A typical deployment
has a dedicated machine that runs only the NameNode software.
Each of the other machines in the cluster runs one instance of the
DataNode software. The architecture does not preclude running
multiple DataNodes on the same machine but in a real
deployment that is rarely the case.The existence of a single
NameNode in a cluster greatly simplifies the architecture of the
system. The NameNode is the arbitrator and repository for all
HDFS metadata. The system is designed in such a way that user
data never flows through the NameNode.
All HDFS communication protocols are layered on top of the
TCP/IP protocol. A client establishes a connection to a
configurable TCP port on the NameNode machine. It talks the
ClientProtocol with the NameNode. The DataNodes talk to the
NameNode using the DataNode Protocol. A Remote Procedure Call
(RPC) abstraction wraps both the Client Protocol and the
DataNode Protocol. By design, the NameNode never initiates any

RPCs. Instead, it only responds to RPC requests issued by


DataNodes or clients.

You might also like