Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
299 views
24 pages
BDACh 02 L01 Hadoop
Uploaded by
mkarveer
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF, TXT or read online on Scribd
Download
Save
Save BDACh02L01Hadoop For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
0 ratings
0% found this document useful (0 votes)
299 views
24 pages
BDACh 02 L01 Hadoop
Uploaded by
mkarveer
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF, TXT or read online on Scribd
Carousel Previous
Carousel Next
Download
Save
Save BDACh02L01Hadoop For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
Download
Save BDACh02L01Hadoop For Later
You are on page 1
/ 24
Search
Fullscreen
Lesson 1
Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 1
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Programming Model
• Distributed pieces of codes as well as
the data at the computing nodes
• Distributed data storage systems do not
use the concept of joins
• Hadoop provides that model
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 2
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Distributed Computing
Model in Hadoop
• Distributed model which requires no
sharing between data nodes
• Multiple tasks of an application also
distribute, run using machines
associated with multiple data nodes
and execute at the same time in
parallel.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 3
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Storage Model in
Hadoop
• Data partitions into data blocks and
written at one set of nodes
• The blocks replicate at multiple nodes
to take care of possibilities of network
faults; (When a network fault occurs,
then replicated node makes the data
available)
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 4
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Computing Model
• Fault tolerant due to replication
• Follows CAP theorem─ out of three
properties (consistency, availability
and partitions), two must at least be
present
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 5
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Hadoop consisted of two components:
data store in blocks in the clusters and
the other is computations at each
individual cluster in parallel with
another.
• Hadoop system uses the Big Data
programming and storage models
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 6
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Jobs or tasks assigned and scheduled
on the same servers which hold the
data
• The system provides faster results
from Big Data and from unstructured
data as well
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 7
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Infrastructure
• Execution of instructions in two
interrelated entities, such as a query
and the database
• Cloud for clusters
• A cluster consists of sets of computers
or PCs
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 8
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Platform
• Provides a low cost Big Data
platform, which is open source and
uses cloud services
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 9
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Tera Bytes of data processing takes
just few minutes
• Hadoop enables distributed processing
of large datasets (above 10 million
bytes) across clusters of computers
using a programming model called
MapReduce.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 10
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop System Characteristics
• Scalable
• Self-manageable
• Self-healing
• Distributed file system
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 11
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Scalability
• Means can be scaled up (enhanced) by
adding storage and processing units as
per the requirements failure.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 12
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Self Manageability
• Means creation of storage and
processing resources which are used,
scheduled and reduced or increased
with the help of the system itself
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 13
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Self Healing
• Means taken care of by the system
itself in case of faults
• Enables functioning and resources
availability
• Software detect and handle failures at
the task level and also Software
enable the task execution on
communication failure.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 14
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Hardware Need
• The hardware scales up from a single
server to thousands of machines that
store the clusters
• Each cluster stores a large number of
data blocks in racks. Default data
block size is 64 MB.
• IBM BigInsights, built on Hadoop
deploys default 128 MB block size. of
data.
2019 “Big Data Analytics “, Ch.02 L01: Introduction To Hadoop 15
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data analytics applications
• Software applications that leverage
large-scale data
• The applications analyze Big Data
using massive parallel processing
frameworks
• Hadoop provides that framework
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 16
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Framework
• Provides the computing features of a
system of distributed, flexible,
scalable, fault tolerant computing with
high computing power
• Provides an efficient platform for the
distributed storage and processing of a
large amount
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 17
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Big Data storage and
cluster computing
• Manages both, large-sized structured
and unstructured data in different
formats, such as XML, JSON and text
with efficiency and effectiveness
• Performs better with clusters of many
servers when the focus is on
horizontal scalability
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 18
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Figure 2.1 Core components of Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 19
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Open Source Framework
• Java and Linux based: Hadoop uses Java
interfaces
• Base is Linux but has its own set of shell
commands support
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 20
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Figure 2.2 Hadoop main components and
ecosystem components
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 21
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Summary
We learnt
• Hadoop Distributed model with pieces
of codes as well as the data at the
computing nodes which requires no
sharing between data nodes
• Hadoop multiple tasks distribution,
running using machines associated,
execute at the same time in parallel
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 22
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Summary
We learnt
• Partitionability
• Replication of Data
• Java, Linux based, Hadoop Shell
Command Codes
• Hadoop Core Components and
Ecosystem Tools
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 23
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
End of Lesson 1 on
Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 24
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
You might also like
Unit V-Apache Pig
PDF
No ratings yet
Unit V-Apache Pig
10 pages
TOC Module-1 Notes
PDF
No ratings yet
TOC Module-1 Notes
19 pages
CC Lab Manual 1-6
PDF
No ratings yet
CC Lab Manual 1-6
55 pages
BDA Lab Manual - BAD601-Final One - 7-11
PDF
No ratings yet
BDA Lab Manual - BAD601-Final One - 7-11
25 pages
Mean Stack Technologies-Module II - Angular JS, Mongodb
PDF
No ratings yet
Mean Stack Technologies-Module II - Angular JS, Mongodb
6 pages
Podcasting For Dollars - How To Create & Grow Your Own Internet Talk Show (Make Money On The Internet With Internet Marketing)
PDF
100% (2)
Podcasting For Dollars - How To Create & Grow Your Own Internet Talk Show (Make Money On The Internet With Internet Marketing)
85 pages
3.7.YARN - Failures in Classic MapReduce
PDF
No ratings yet
3.7.YARN - Failures in Classic MapReduce
5 pages
Ad3002 - Question Bank Health Care
PDF
100% (1)
Ad3002 - Question Bank Health Care
16 pages
APP Question Bank Unit3
PDF
100% (1)
APP Question Bank Unit3
5 pages
I MSC CS Ooad
PDF
No ratings yet
I MSC CS Ooad
110 pages
Network Security Research Presentation
PDF
100% (1)
Network Security Research Presentation
14 pages
Broadcasting Chat Server
PDF
83% (6)
Broadcasting Chat Server
25 pages
r22 1 9 ML Lab Manual r22 Regulations
PDF
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
Unit 1-PROBLEM SOLVING AND PYTHON PROGRAMMING
PDF
No ratings yet
Unit 1-PROBLEM SOLVING AND PYTHON PROGRAMMING
85 pages
IT3401 Web Essentials Study Materials
PDF
No ratings yet
IT3401 Web Essentials Study Materials
2 pages
CCS342 Devops
PDF
No ratings yet
CCS342 Devops
4 pages
Dbms by Bipin C Desai PDF
PDF
0% (1)
Dbms by Bipin C Desai PDF
3 pages
Python Module-2 Notes (21EC646)
PDF
No ratings yet
Python Module-2 Notes (21EC646)
34 pages
Unit 4
PDF
No ratings yet
Unit 4
40 pages
Python Programming Unit 1
PDF
No ratings yet
Python Programming Unit 1
99 pages
Python Full Stack
PDF
0% (1)
Python Full Stack
6 pages
Malware Forensics Introduction
PDF
No ratings yet
Malware Forensics Introduction
16 pages
Klick Micro
PDF
No ratings yet
Klick Micro
3 pages
Angular JS Lab Manual
PDF
No ratings yet
Angular JS Lab Manual
43 pages
Unit 1
PDF
100% (1)
Unit 1
12 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
PDF
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
MCQ Type Questions
PDF
No ratings yet
MCQ Type Questions
24 pages
Aim: Write A Program To Parse XML Text, Generate Web Graph and Compute Topic Specific Page Rank. Source Code
PDF
0% (1)
Aim: Write A Program To Parse XML Text, Generate Web Graph and Compute Topic Specific Page Rank. Source Code
5 pages
Ns Unit 2
PDF
No ratings yet
Ns Unit 2
18 pages
Python Record
PDF
No ratings yet
Python Record
35 pages
Cloud Computing Lab Manual-New
PDF
No ratings yet
Cloud Computing Lab Manual-New
150 pages
R18 B.Tech. ECE Syllabus Jntu Hyderabad: ND TH
PDF
No ratings yet
R18 B.Tech. ECE Syllabus Jntu Hyderabad: ND TH
1 page
AD3391 Database Design and Management Nov Dec 2022 Question Paper Download
PDF
No ratings yet
AD3391 Database Design and Management Nov Dec 2022 Question Paper Download
3 pages
CN Manual Lab (R20)
PDF
No ratings yet
CN Manual Lab (R20)
85 pages
UNIT 3 Developing IoTs-1
PDF
No ratings yet
UNIT 3 Developing IoTs-1
53 pages
Python Bca Qustion Banks Sem-5
PDF
No ratings yet
Python Bca Qustion Banks Sem-5
7 pages
Operating Systems Lab: Implement The Following Using C/C++/JAVA
PDF
No ratings yet
Operating Systems Lab: Implement The Following Using C/C++/JAVA
22 pages
Ontological Engineering
PDF
No ratings yet
Ontological Engineering
17 pages
Python Lab Manual 1
PDF
No ratings yet
Python Lab Manual 1
40 pages
Ad3251 Unit 2 Notes Edu Engg
PDF
No ratings yet
Ad3251 Unit 2 Notes Edu Engg
35 pages
Model Driven Test Design
PDF
No ratings yet
Model Driven Test Design
17 pages
Data Warehousing Full
PDF
No ratings yet
Data Warehousing Full
41 pages
CS3361 Set1
PDF
No ratings yet
CS3361 Set1
5 pages
R Language
PDF
No ratings yet
R Language
59 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
PDF
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
Streamprocessing Labmanual
PDF
No ratings yet
Streamprocessing Labmanual
48 pages
Chapter 6 Iot Systems Logical Design Using Python
PDF
No ratings yet
Chapter 6 Iot Systems Logical Design Using Python
33 pages
Animal Detection and Prevention in Agri Field Using Iot
PDF
No ratings yet
Animal Detection and Prevention in Agri Field Using Iot
36 pages
CT-1 - Paper (Python-BCC302) - Solution
PDF
No ratings yet
CT-1 - Paper (Python-BCC302) - Solution
12 pages
Deep Learning r18 Jntuh Lab Manual
PDF
No ratings yet
Deep Learning r18 Jntuh Lab Manual
20 pages
GE3151
PDF
No ratings yet
GE3151
251 pages
Assignment Access PDF
PDF
No ratings yet
Assignment Access PDF
9 pages
Os Lab Manual AI&DS
PDF
No ratings yet
Os Lab Manual AI&DS
64 pages
Django Ppts
PDF
No ratings yet
Django Ppts
243 pages
Ques Python
PDF
No ratings yet
Ques Python
30 pages
3-1 Bigdata (Spark)
PDF
No ratings yet
3-1 Bigdata (Spark)
3 pages
DAN Lab ManuaL
PDF
No ratings yet
DAN Lab ManuaL
53 pages
Central Social Welfare Board: Chapter-7
PDF
No ratings yet
Central Social Welfare Board: Chapter-7
12 pages
M.sc. Computer Science
PDF
No ratings yet
M.sc. Computer Science
18 pages
22 PLC15 B
PDF
No ratings yet
22 PLC15 B
5 pages
CP7102-Advanced Datastructure and Algorithm Question Bank
PDF
No ratings yet
CP7102-Advanced Datastructure and Algorithm Question Bank
4 pages
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
PDF
No ratings yet
18CS42 Model Question Paper - 1 With Effect From 2019-20 (CBCS Scheme)
3 pages
Megger Test Procedure Explained With Transformer Example
PDF
No ratings yet
Megger Test Procedure Explained With Transformer Example
4 pages
MEBD Embedded Systems Scheme With Syllabus - 2011
PDF
No ratings yet
MEBD Embedded Systems Scheme With Syllabus - 2011
40 pages
7UT633 Settings
PDF
No ratings yet
7UT633 Settings
7 pages
Nicolas Hohn PHD Thesis
PDF
No ratings yet
Nicolas Hohn PHD Thesis
210 pages
Technological Factors
PDF
No ratings yet
Technological Factors
3 pages
Massey Ferguson Mf5400 Workshop Manual 01 Introduction
PDF
98% (58)
Massey Ferguson Mf5400 Workshop Manual 01 Introduction
5 pages
Td2 Quickstart Guide: Device Connection Diagram
PDF
No ratings yet
Td2 Quickstart Guide: Device Connection Diagram
1 page
13 Sppu/Bba (Ca) Syllabus Semester-Ii Cbcs/2019 Pattern
PDF
No ratings yet
13 Sppu/Bba (Ca) Syllabus Semester-Ii Cbcs/2019 Pattern
3 pages
Starting Methods of Induction Motors
PDF
No ratings yet
Starting Methods of Induction Motors
4 pages
BDACh01L03DesignLayersindata Processingarchitecture
PDF
No ratings yet
BDACh01L03DesignLayersindata Processingarchitecture
12 pages
TPLINK TL WR841N 300Mbps Wireless N Router
PDF
No ratings yet
TPLINK TL WR841N 300Mbps Wireless N Router
18 pages
Registry Explorer Manual
PDF
No ratings yet
Registry Explorer Manual
86 pages
Catálogo SP8
PDF
No ratings yet
Catálogo SP8
4 pages
Object Relational DBMSs
PDF
No ratings yet
Object Relational DBMSs
34 pages
Cisco IOS Flexible NetFlow
PDF
No ratings yet
Cisco IOS Flexible NetFlow
5 pages
OSM 4 WFP External
PDF
No ratings yet
OSM 4 WFP External
10 pages
Ev Report
PDF
No ratings yet
Ev Report
32 pages
Report On Autopilot System in Car
PDF
No ratings yet
Report On Autopilot System in Car
31 pages
Data Science Brochure
PDF
No ratings yet
Data Science Brochure
18 pages
IS - H - Preregistration
PDF
No ratings yet
IS - H - Preregistration
19 pages
HE600UK IB MP 210727 Mv2 LR
PDF
No ratings yet
HE600UK IB MP 210727 Mv2 LR
7 pages
My Orginal C.V
PDF
No ratings yet
My Orginal C.V
3 pages
Cloud Notes1
PDF
No ratings yet
Cloud Notes1
31 pages
Print Server Setup Roadmap
PDF
No ratings yet
Print Server Setup Roadmap
2 pages
Hanson Vision G2 Elite 8
PDF
100% (1)
Hanson Vision G2 Elite 8
2 pages
Computer ICSE 8
PDF
No ratings yet
Computer ICSE 8
8 pages
Air Compressor Starting and Operation
PDF
No ratings yet
Air Compressor Starting and Operation
3 pages
Jonathan Chin - Smart Bike Helmet - Paper
PDF
No ratings yet
Jonathan Chin - Smart Bike Helmet - Paper
6 pages
Pcp-Ssp/Api in Java Script For Chrome Browsers: Us/Docs/Web/Api/Mediadevices/Getusermedia
PDF
No ratings yet
Pcp-Ssp/Api in Java Script For Chrome Browsers: Us/Docs/Web/Api/Mediadevices/Getusermedia
2 pages
The Heavy-Duty Laser Marking Solution
PDF
No ratings yet
The Heavy-Duty Laser Marking Solution
2 pages
Ritam Singha - Updated - Resume
PDF
No ratings yet
Ritam Singha - Updated - Resume
3 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet