Presented By:
KALAI SELVI PIYUSH JANGIR
2015272013 2015272053
1
Introduction
Apache Hadoop 1.0 vs 2.0
HDFS
Map Reduce
Master-Slave Architecture
Limitation in Hadoop 1.0
Yarn
References
2
Open source software framework designed for
storage and processing of large scale data on
clusters of commodity hardware
Created by Doug Cutting and Mike Carafella .
Cutting named the program after his sons toy
elephant.
The core of Apache Hadoop consists of a storage
part, known as Hadoop Distributed File
System (HDFS), and a processing part called Map
Reduce.
3
4
Architecture
5
HDFS
6
Responsible for storing data on the cluster
Data files are split into blocks and distributed
across the nodes in the cluster
Each block is replicated multiple times
7
Default replication is 3-fold
8
Distributing computation
across nodes
9
A method for distributing computation across
multiple nodes
Each node processes the data that is stored at
that node
Consists of two main phases
Map
Reduce
the reduce task is always performed after the map job.
10
Takes a set of data and broken down into tuples
Takes the output from a map as an input
Combines those data tuples into a smaller set of
tuples.
11
12
Master Slave Architecture
13
Name Node
Stores metadata for the files, like the directory
structure
Handles creation of more replica blocks when
necessary after a DataNode failure
Data Node
Stores the actual data in HDFS
14
JobTracker
splits up data process into smaller tasks and sends
it to the TaskTracker process in each node
TaskTracker
reports back to the JobTracker node and reports on
job progress, sends data or requests new jobs
15
Scalability: JobTracker runs on single machine doing
several task like
Resource management Job scheduling Monitoring
Availability Issue: In Hadoop 1.0, JobTracker is single Point
of availability. This means if JobTracker fails, all jobs must
restart.
Problem with Resource Utilization: In Hadoop 1.0, there is
concept of predefined number of map slots and reduce
slots for each TaskTrackers. Resource Utilization issues
occur because maps slots might be full while reduce slots
is empty (and vice-versa).
16
http://hortonworks.com/apache/yarn/#secti
on_2
http://saphanatutorial.com/how-yarn-
overcomes-mapreduce-limitations-in-
hadoop-2-0/
http://www.slideshare.net/emcacademics/mil
ind-hadoop-trainingbrazil
https://en.wikipedia.org/wiki/Apache_Hadoo
p
17
18