Module 2
Module 2
To Hadoop
B Y:
S H AVA N T R E V VA S B I L A K E R I
Common Types of Architecture-
Multiprocessor
1) Shared Memory (SM): Common Central Memory – Shared by multiple processors
3) Shared Nothing (SN) : Neither memory nor Disk – Shared among multiple
processors.
Hadoop Cluster
1. The architecture of Hadoop Cluster
2. Core Components of Hadoop Cluster
3. Work-flow of How File is Stored in Hadoop
1. Hadoop Cluster • These clusters run on low cost commodity computers.
• Hadoop clusters are often referred to as "shared nothing" systems
because the only thing that is shared between nodes is the network
that connects them.
• Large Hadoop Clusters are arranged in several racks.
• Network traffic between different nodes in the same rack is much
more desirable than network traffic across the racks.
• Example: Yahoo's Hadoop cluster. They have more than 10,000
machines running Hadoop and nearly 1 petabyte of user data.
• A small Hadoop cluster includes a single master node and
multiple worker or slave node.
• As discussed earlier, the entire cluster contains two layers.
• One of the layer of MapReduce Layer and another is of HDFS
Layer.
• The master node consists of a JobTracker, NameNode.
• A slave or worker node consists of a DataNode and TaskTracker.
• It is also possible that slave node or worker node is only data or
compute node.
Hadoop Cluster – 3 Components: Client, Master & Slave
Hadoop Cluster would consists of
Each slave machine(rack server in a rack) has cables coming out it from both
the ends
Cables are connected to rack switch at the top which means that top rack
switch will have around 80 ports
The rack switch has uplinks connected to core switches and hence connecting
In the cluster, you have few machines to act as Name node and as JobTracker.
It shuffle and merge this information into clean file folder and
sent to back again to NameNode, while keeping a copy for itself.