Hadoop Cluster - Architecture, Core Components
Hadoop Cluster - Architecture, Core Components
Components
What is Hadoop?
Hadoop Cluster:
Normally any set of loosely connected or tightly connected
computers that work together as a single system is called
Cluster. In simple words, a computer cluster used for Hadoop
is called Hadoop Cluster.
Hadoop cluster is a special type of computational cluster
designed for storing and analyzing vast amount of
unstructured data in a distributed computing
environment. These clusters run on low cost
commodity computers.
Hadoop clusters are often referred to as "shared nothing" systems
because the only thing that is shared between nodes is the network
that connects them
Yahoo's Hadoop cluster. They have more than 10,000 machines
running Hadoop and nearly 1 petabyte of user data.
Hadoop cluster has 3 components
Client,
Master
Slave
Client: It is neither master nor slave, rather play a role of loading
the data into cluster, submit MapReduce jobs describing how
the data should be processed and then retrieve the data to see
the response after job completion.
Masters: The Masters consists of 3 components
NameNode,
Secondary NameNode
JobTracker.
NameNode:NameNode does NOT store the files but only the file's
metadata.
NameNode oversees the health of DataNode and coordinates access
to the data stored in DataNode.
Namenode keeps track of all the file system related information such as
to
--Which section of file is saved in which part of the cluster
--Last access time for the files
--User permissions like which user have access to the file
JobTracker:
JobTracker coordinates the parallel processing of data using
MapReduce.
Secondary Name Node:
Don't get confused with the name "Secondary".
Secondary Node is NOT the backup or high availability node
for Name node.it does job of housekeeping (shuffle and
merge )
Slaves: Slave nodes are the majority of machines in
Hadoop Cluster and are responsible to
Store the data
Process the computation
Each slave runs both a DataNode and Task Tracker
daemon which communicates to their masters.
The Task Tracker daemon is a slave to the JobTracker and
the DataNode daemon a slave to the NameNode