Hadoop Intro
Hadoop Intro
Agenda
Introduction to Hadoop
Hadoop Architecture
Characteristics
Hadoop Features
What is Hadoop?
An Open Source framework that allows distributed processing of
large data-sets across the cluster of commodity hardware
What is Hadoop?
The Technology that empowers Yahoo, Facebook, Twitter, Walmart and others
Hadoop
What is Hadoop?
launched Hive,
SQL Support for Hadoop
Hadoop Components
Hadoop consists of three key parts
Hadoop Nodes
Nodes
Resource Node
Manager Manager
NameNode DataNode
Basic Hadoop Architecture
Sub Work Sub Work Sub Work Sub Work
USER
MASTER(S) Sub Work Sub Work Sub Work Sub Work
100 SLAVES
Hadoop Characteristics
Distributed
Open Source Processing
Fault Tolerance
Easy to use
Reliability
Economic
High Availability
Scalability
Open Source
• Can be modified
Inter- Open Affordable
operable
Source
No vendor
Community
lock
Distributed Processing
Centralized Processing
Distributed Processing
Fault Tolerance
USER
Scalability
Open Source
+
Commodity
Hardware = Economic
Easy to Use
data to computation
• Data is processed on the nodes Data Data
Algo Algo
Data Data
Algorithm
Algo Algo
Data Data
Servers
Summary