Module 2 Hadoop
Module 2 Hadoop
To Hadoop
B Y:
D R . R A S H M I L M A LG H A N
Common Types of Architecture-
Multiprocessor
1) Shared Memory (SM) : Common Central Memory – Shared by multiple processors
2) Shared Disk (SD) : Multiple Processors - Common Collection of Disks – Own Private
Memory
3) Shared Nothing (SN) : Neither memory nor Disk – Shared among multiple processors.
Parallel Computing vs Distributed Computing
Introduction:
Introduction:
•HDFS (Hadoop Distributed File System): This is a distributed file system that stores data in blocks
across the slave nodes. The master node runs a service called NameNode, which manages the file
system namespace, the metadata of the files and directories, and the mapping of blocks to slave nodes.
The slave nodes run a service called DataNode, which stores the actual data blocks and serves read
and write requests from the clients.
•YARN (Yet Another Resource Negotiator): This is a framework for resource management and job
scheduling in Hadoop. The master node runs a service called ResourceManager, which allocates
resources to different applications and monitors their progress. The slave nodes run a service
called NodeManager, which launches and monitors the tasks assigned by the ResourceManager.
•MapReduce: This is a programming model for parallel processing of large data sets. The master node
runs a service called JobTracker, which splits the input data into smaller chunks and assigns them to
the slave nodes. The slave nodes run a service called TaskTracker, which executes the map and
reduce tasks on the data chunks and reports the results back to the JobTracker.
Hadoop Architecture:
Each slave machine(rack server in a rack) has cables coming out it from both
the ends
Cables are connected to rack switch at the top which means that top rack switch
will have around 80 ports
Global = 8 core switches
The rack switch has uplinks connected to core switches and hence connecting
all other racks with uniform bandwidth, forming the Cluster
In the cluster, you have few machines to act as Name node and as JobTracker.
They are referred as Masters.
Cluster : Core Components
1. Client
2. Masters: Name Node, Secondary Node & Job Tracker
2.1: Name Node:
▪ It shuffle and merge this information into clean file folder and
sent to back again to NameNode, while keeping a copy for itself.