0% found this document useful (0 votes)

14 views

Hadoop Intro

hadoop

Uploaded by

215059

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Hadoop Intro

hadoop

Uploaded by

215059

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Intro to Hadoop

Agenda
 Introduction to Hadoop

 Hadoop nodes & daemons

 Hadoop Architecture

 Characteristics

 Hadoop Features
What is Hadoop?
An Open Source framework that allows distributed processing of
large data-sets across the cluster of commodity hardware
What is Hadoop?
The Technology that empowers Yahoo, Facebook, Twitter, Walmart and others

Hadoop
What is Hadoop?

An Open Source framework that Open Source

allows distributed processing of
large data-sets across the cluster  Source code is freely available
of commodity hardware  It may be redistributed and
modified
What is Hadoop?

An open source framework that Distributed Processing

allows Distributed Processing of
large data-sets across the  Data is processed distributedly
cluster of commodity hardware on multiple nodes / servers
 Multiple machines processes
the data independently
What is Hadoop?

An open source framework that Cluster

allows distributed processing of
large data-sets across the  Multiple machines connected
Cluster of commodity hardware together
 Nodes are connected via LAN
What is Hadoop?

An open source framework that Commodity Hardware

allows distributed processing of
large data-sets across the  Economic / affordable
cluster of Commodity Hardware machines
 Typically low performance
hardware
What is Hadoop?

• Open source framework written in Java

• Inspired by Google's Map-Reduce programming model as well as its file
system (GFS)
Hadoop History
Doug Cutting added Hadoop defeated
DFS & MapReduce Super computer
in
converted 4TB of
Doug Cutting started Doug Cutting
image archives over
working on joined Cloudera
100 EC2 instances

2002 2003 2004 2005 2006 2007 2008 2009

published GFS & Hadoop became

Development of
MapReduce papers top-level project
started as Lucene sub-project

launched Hive,
SQL Support for Hadoop
Hadoop Components
Hadoop consists of three key parts
Hadoop Nodes
Nodes

Master Node Slave Node

Hadoop Daemons
Nodes

Master Node Slave Node

Resource Node
Manager Manager

NameNode DataNode
Basic Hadoop Architecture
Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Work Sub Work Sub Work Sub Work Sub Work

USER
MASTER(S) Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

100 SLAVES
Hadoop Characteristics
Distributed
Open Source Processing

Fault Tolerance

Easy to use

Reliability

Economic
High Availability
Scalability
Open Source

• Source code is freely available

• Can be redistributed Free Transparent

• Can be modified
Inter- Open Affordable
operable
Source

No vendor
Community
lock
Distributed Processing

• Data is processed distributedly on

cluster
• Multiple nodes in the cluster
process data independently

Centralized Processing

Distributed Processing
Fault Tolerance

• Failure of nodes are recovered

automatically
• Framework takes care of failure of
hardware as well tasks
Reliability

• Data is reliably stored on the

cluster of machines despite
machine failures
• Failure of nodes doesn’t cause
data loss
High Availability

• Data is highly available and

accessible despite hardware
failure
• There will be no downtime for
end user application due to data

USER
Scalability

• Vertical Scalability – New

hardware can be added to the
nodes

• Horizontal Scalability – New

nodes can be added on the fly
Economic

• No need to purchase costly license

• No need to purchase costly hardware

Open Source
+
Commodity
Hardware = Economic
Easy to Use

• Distributed computing challenges are

handled by framework
• Client just need to concentrate on
business logic
Data Locality

• Move computation to data instead of Data Data

data to computation
• Data is processed on the nodes Data Data

where it is stored Storage Servers App Servers

Algo Algo
Data Data
Algorithm
Algo Algo
Data Data

Servers
Summary

• Everyday we generate 2.3 trillion GBs of data

• Hadoop handles huge volumes of data efficiently
• Hadoop uses the power of distributed computing
• HDFS & Yarn are two main components of Hadoop
• It is highly fault tolerant, reliable & available

Big Data Camp Intro Hadoop
No ratings yet
Big Data Camp Intro Hadoop
22 pages
Iot-Based Big Data Storage Systems in Cloud Computing: Perspectives and Challenges
No ratings yet
Iot-Based Big Data Storage Systems in Cloud Computing: Perspectives and Challenges
13 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
23 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Unit II Big Data
No ratings yet
Unit II Big Data
27 pages
Hadoop
No ratings yet
Hadoop
7 pages
unit 2
No ratings yet
unit 2
28 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
Unit-5 -Hadoop.pptx
No ratings yet
Unit-5 -Hadoop.pptx
29 pages
Hadoop Important Lecture
No ratings yet
Hadoop Important Lecture
38 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Hadoop Ankit
No ratings yet
Hadoop Ankit
20 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
HADOOP
No ratings yet
HADOOP
10 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
Hadoop PDF
0% (1)
Hadoop PDF
4 pages
DSCI 5350 - Lecture 2 PDF
No ratings yet
DSCI 5350 - Lecture 2 PDF
54 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Guided By:-Prof. K. Kakwani: Payal M. Wadhwani
No ratings yet
Guided By:-Prof. K. Kakwani: Payal M. Wadhwani
24 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Lecture 5 - Hadoop and Mapreduce
No ratings yet
Lecture 5 - Hadoop and Mapreduce
30 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Bachelor of Engineering: C K Pithawalla College of Engineering & Technology, SURAT
No ratings yet
Bachelor of Engineering: C K Pithawalla College of Engineering & Technology, SURAT
14 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
Chicago Crime (2013) Analysis Using Pig and Visualization Using R
No ratings yet
Chicago Crime (2013) Analysis Using Pig and Visualization Using R
61 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Hadoop
No ratings yet
Hadoop
14 pages
Hadoop Notes 2
No ratings yet
Hadoop Notes 2
5 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
BDA-UNIT-2 - 2023
No ratings yet
BDA-UNIT-2 - 2023
58 pages
HADOOP ECOSSYTEM, COMPONENTS, Loading, Getting Data From Hadoop
No ratings yet
HADOOP ECOSSYTEM, COMPONENTS, Loading, Getting Data From Hadoop
10 pages
BDA-Module2
No ratings yet
BDA-Module2
43 pages
DATA228 Lecture Notes Week 3
No ratings yet
DATA228 Lecture Notes Week 3
21 pages
Part 02 - Big Data Solutions
No ratings yet
Part 02 - Big Data Solutions
17 pages
CC 2
No ratings yet
CC 2
25 pages
Hadoop
No ratings yet
Hadoop
5 pages
UNIT 3-1
No ratings yet
UNIT 3-1
14 pages
Hadoop
No ratings yet
Hadoop
7 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
Bda PPT M1 P2 1
No ratings yet
Bda PPT M1 P2 1
19 pages
Hadoop
No ratings yet
Hadoop
13 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
Hadoop Introduction PDF
No ratings yet
Hadoop Introduction PDF
3 pages
Bda Unit 4 Material
No ratings yet
Bda Unit 4 Material
37 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
30 pages
HADOOP
No ratings yet
HADOOP
18 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Understanding individual behaviour
No ratings yet
Understanding individual behaviour
18 pages
Controlling in management
No ratings yet
Controlling in management
25 pages
16. Managing Leadership and Influence Processes P1
No ratings yet
16. Managing Leadership and Influence Processes P1
11 pages
Chapter 9- Managing Human Resource
No ratings yet
Chapter 9- Managing Human Resource
19 pages
IM Multiple Choice Questions
No ratings yet
IM Multiple Choice Questions
34 pages
Big-data (1)
No ratings yet
Big-data (1)
33 pages
✨AI · CyberConnect Hub_ A University Digital Marketplace for Cybersecurity
No ratings yet
✨AI · CyberConnect Hub_ A University Digital Marketplace for Cybersecurity
14 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
Use Case Final.drawio (1)
No ratings yet
Use Case Final.drawio (1)
1 page
CyberConnect Hub Presentation
No ratings yet
CyberConnect Hub Presentation
10 pages
BIg data anslysi
No ratings yet
BIg data anslysi
57 pages
L14 CDN
No ratings yet
L14 CDN
19 pages
Survey On Various Small File Handling Strategies On Hadoop
No ratings yet
Survey On Various Small File Handling Strategies On Hadoop
4 pages
Priyanka Kumari Python 3yrs
No ratings yet
Priyanka Kumari Python 3yrs
3 pages
Ibm Iis
No ratings yet
Ibm Iis
103 pages
Recover From Namenode Failure
No ratings yet
Recover From Namenode Failure
14 pages
13 SparkBuildingAndDeploying
No ratings yet
13 SparkBuildingAndDeploying
53 pages
DIH 1011 AdministratorGuide en
No ratings yet
DIH 1011 AdministratorGuide en
114 pages
(eBook PDF) Internet of Things and Data Analytics Handbook pdf download
100% (5)
(eBook PDF) Internet of Things and Data Analytics Handbook pdf download
57 pages
Case Study: Digital Transformation at New India Assurance Co. LTD
No ratings yet
Case Study: Digital Transformation at New India Assurance Co. LTD
6 pages
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
No ratings yet
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
260 pages
Write A Mapreduce Program To Find Dept Wise Salary. Empno Empname Dept Salary
100% (1)
Write A Mapreduce Program To Find Dept Wise Salary. Empno Empname Dept Salary
5 pages
Big Data Analytics in Cybersecurity 1st Edition by Onur Savas ISBN 1351650416 9781351650410 download
100% (1)
Big Data Analytics in Cybersecurity 1st Edition by Onur Savas ISBN 1351650416 9781351650410 download
65 pages
An Introduction To Hadoop Presentation PDF
100% (1)
An Introduction To Hadoop Presentation PDF
91 pages
Open Source Software Referance Guide
No ratings yet
Open Source Software Referance Guide
9 pages
Project Report
No ratings yet
Project Report
14 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
Relational Database Poster
No ratings yet
Relational Database Poster
1 page
Pyspark Essentials
No ratings yet
Pyspark Essentials
24 pages
CS6703 Grid and Cloud Computing Question Paper Nov Dec 2017
No ratings yet
CS6703 Grid and Cloud Computing Question Paper Nov Dec 2017
2 pages
BDA Question Bank - 2023
No ratings yet
BDA Question Bank - 2023
4 pages
Grid and Cloud Question Bank
No ratings yet
Grid and Cloud Question Bank
10 pages
Big Data Manual - Edited
No ratings yet
Big Data Manual - Edited
69 pages
BIG DATA With OIL and Gas
No ratings yet
BIG DATA With OIL and Gas
11 pages
Summary of Introduction To Big Data
No ratings yet
Summary of Introduction To Big Data
39 pages
ENISA RM Deliverable2 Final Version v1.0 2006-03-30
No ratings yet
ENISA RM Deliverable2 Final Version v1.0 2006-03-30
10 pages
Ds 42 Doc Map en
No ratings yet
Ds 42 Doc Map en
12 pages
Apache Flume Distributed Log Collection for Hadoop 2nd Edition Steve Hoffman instant download
100% (3)
Apache Flume Distributed Log Collection for Hadoop 2nd Edition Steve Hoffman instant download
55 pages
Infosys Big Data Testing Services: Service Catalogue
No ratings yet
Infosys Big Data Testing Services: Service Catalogue
2 pages

Hadoop Intro

Uploaded by

Hadoop Intro

Uploaded by

Intro to Hadoop

 Hadoop nodes & daemons

An Open Source framework that Open Source

An open source framework that Distributed Processing

An open source framework that Cluster

An open source framework that Commodity Hardware

• Open source framework written in Java

2002 2003 2004 2005 2006 2007 2008 2009

published GFS & Hadoop became

Master Node Slave Node

Master Node Slave Node

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Work Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

• Source code is freely available

• Data is processed distributedly on

• Failure of nodes are recovered

• Data is reliably stored on the

• Data is highly available and

• Vertical Scalability – New

• Horizontal Scalability – New

• No need to purchase costly license

• Distributed computing challenges are

• Move computation to data instead of Data Data

where it is stored Storage Servers App Servers

• Everyday we generate 2.3 trillion GBs of data

You might also like