BIGDATA AND HADOOP - Unit II

Hadoop is a software framework designed for distributed processing of large datasets, featuring HDFS for file storage and MapReduce for data processing. It offers advantages such as fault tolerance, cost-effective storage, and the ability to handle both structured and unstructured data. The document also outlines the architecture of Hadoop, its components, and various ecosystem tools that enhance its functionality.

Uploaded by

sahilsharma747392

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views11 pages

BIGDATA AND HADOOP - Unit II

Uploaded by

sahilsharma747392

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

BigData&Hadoop

UNITII
• NeedofHadoop
• DataCentervsHadoop
• OverviewofHadoopDaemons
• HadoopClusterandRacks
• LearningLinuxrequiredforHadoop
• Hadoopecosystemtoolsoverview
• BigdataHadoopopportunities
Introduction

Hadoop is a software framework that is optimized for the distributed

processing of very large datasets. Its two main features are the Hadoop
Distributed File System (HDFS), which handles storing files, and
MapReduce, which processes the stored information.
AdvantagesofusingHadoopare-
1. Itstoresbothstructuredandunstructureddataasit is.
2. ItisFaultTolerantasfailureofanynodeisrecoveredautomatically.
3. Itprocesscomplex dataeasily and veryfast.
4. Itworksindistributedprocessingmannerthatmeansmultipletaskexecution
willbedone parallellyatthesametime.
5. Hadoopoffersacosteffectivedatastoragesolutions.
6. Dataisreliablystoredonclusterofmachines despiteofmachine failure.
DataCentervsHadoop
Data Center, the data is actually stored there for a particular site. Whenever
youfireaqueryortypeFacebook.com,therequestcomestothedatacenter and
then the packets are delivered on your system.

Hadoopisatoolorprocessthroughwhichwecanaccessthedataandprocess that
data. It is rather a mechanism to perform operation on a data.
HadoopComponents&Daemons
• Thereare2layersinHadoop–
• HDFSlayer
• Map-Reducelayer
• There are 5 daemons (Daemonsarethe processesthatrunin the background)which run on Hadoop in
these above 2 layers -
a) Namenode–Itrunsonmasternode.
b) Datanode–Itrunsonslavenodes.
c) JobTracker–ItrunsonYARNmasternodeforMapReduce.
d) TaskTracker–ItrunsonYARNslavenodeforMapReduce.
e) SecondaryNamenode– Itisbackupfornamenodeandrunson adifferentsystem (otherthan
masterandslavenodes.)
Architecture
NameNode
FunctionsofNameNode:-
1. ManagestheDataNodes
2. Recordsthemetadataofallthefiles
stored in the cluster
3. ReceivesaHeartbeattoensurethat
the DataNodes are live.

FunctionsofDataNodes:-
1. Actualdataisstoredon them.
2. Perform the low-level read and
write requests from the file system’s ………….
clients.
Data Data Data Data
Node-1 Node-2 Node-3 Node-N
HadoopCluste rand Racks
The rack is a physical collectionof nodes in Hadoop cluster (maybe 30 to40). A large Hadoop cluster is consists
of many Racks. With the help of this Racks information, Namenode chooses the closest Datanode to achieve
maximum performance while performing the read/write information which reduces the Network Traffic.
Hadoopcluster containsmultipleRacks,in each racktherearelotsofdatanodesareavailable.Communication
between the Datanodes that are present on the same rack is quite much faster than the communication
between the data node present at the 2 different racks.
LearningLinuxrequiredforHadoop
1) Command forUploadingafilein HDFS
• Hadoopfs–put
Thiscommandisusedtouploadafile fromthelocalfilesystemtoHDFS.Multiplefiles canbeuploadedusingthiscommandby separatingthefilenames withaspace.
2) CommandforDownloadingafileinHDFS
• Hadoopfs–get
Thiscommandisused todownloadafilefromthelocalfilesystemtoHDFS.Multiplefilescanbedownloadedusingthiscommandby separatingthefilenameswithaspace.
3) CommandforViewingtheContentsofafile
• Hadoopfs–cat
4) CommandforMovingFilesfromSourcetoDestination
• Hadoopfs–mv
5) CommandforRemovingaDirectoryorFileinHDFS
• Hadoopfs–rm
Note-Toremoveadirectory,thedirectoryshouldbeemptybeforeusingtherm command.
6) CommandforCopyingfilesfromlocalfilesystemtoHDFS
• Hadoopfs–copyFromLocal
7) Commandtodisplaythelengthofafile
• Hadoopfs–du
8) Commandto viewthecontentofadirectory
• Hadoopfs–ls
9) Commandtocreate aDirectory in HDFS
• Hadoopfs–mkdir
10) Commandtodisplaythe firstfewlinesofafile
• Hadoopfs–head
Zookeeper

Storm

Spark

Flume&Sqoop

Pig
Ambari
MAPReduce
HDFS

Hive

Mahout

CustomMR

Impala
Hadoopecosystemtoolsoverview

HBase

Oozie
Hadoopecosystemtoolsoverview
• 1.Flume&Sqoop(DataIngestionapplication):-flumeisusedforlogcollection sqoopforsqltohadoop
• 2.Pig:-DataProcessing/Analysis/ProgrammingLanguage
• 3.Hive:-InterfaceSQLlikeFunctionality
• 4.Mahout:-Itisamachinelearninglibrary/application.Providem/clearningalgo
• 5.CustomMR:-JAVAetc..
• DisadvantageofMapReduce
a. VerySlow
b. Batchprocessing

AlternativeofMapReduce
6. Impala:-SqlLikeinterfacesimilartoHivebutdoesnotusemapreduceratherhasitsownmechnismtoaccessdataandcluster
7. Hbase:-BasedonNOSQLDatabasei.e.dataisstoredinKeyValue pair
8. Spark&Storm:-ProcessRealtimedata(streaming)
9. Zookeeper:-usedforManagement
10. Oozie:-scheduler
11. Ambari:-WebbasedGUIforprovisioning,managing,andmonitoring

Unit II Hadoop and Map Reduce Overview
No ratings yet
Unit II Hadoop and Map Reduce Overview
136 pages
Sigachi Industries Limited: Purchase Order
100% (1)
Sigachi Industries Limited: Purchase Order
1 page
Hadoop
No ratings yet
Hadoop
71 pages
Hadoop Interview Qs
No ratings yet
Hadoop Interview Qs
99 pages
Apache Hadoop Developer Training PDF
100% (1)
Apache Hadoop Developer Training PDF
397 pages
Hadoop 3
No ratings yet
Hadoop 3
8 pages
Adobe Scan 05-Nov-2023
No ratings yet
Adobe Scan 05-Nov-2023
9 pages
IMTC634 - Data Science - Chapter 13
No ratings yet
IMTC634 - Data Science - Chapter 13
16 pages
Unit 4-1
No ratings yet
Unit 4-1
6 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
BDA Lab Assignment 1 PDF
No ratings yet
BDA Lab Assignment 1 PDF
20 pages
Top Hadoop Interview Q&A
No ratings yet
Top Hadoop Interview Q&A
25 pages
Instructional Module
100% (2)
Instructional Module
6 pages
BIGDATA
No ratings yet
BIGDATA
180 pages
Bda A2
No ratings yet
Bda A2
17 pages
Hadoop
No ratings yet
Hadoop
27 pages
Hadoop: A Software Framework For Data Intensive Computing Applications
No ratings yet
Hadoop: A Software Framework For Data Intensive Computing Applications
47 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
HDFS 79
No ratings yet
HDFS 79
74 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
Unit IV
No ratings yet
Unit IV
10 pages
Assignment 1 Write-Up
No ratings yet
Assignment 1 Write-Up
8 pages
Unit 2 Part A
No ratings yet
Unit 2 Part A
34 pages
Big Data Lecture # 05
No ratings yet
Big Data Lecture # 05
22 pages
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
No ratings yet
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
25 pages
Another Intro To Hadoop
No ratings yet
Another Intro To Hadoop
23 pages
Module 2 Big Data Analytics
No ratings yet
Module 2 Big Data Analytics
38 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
67 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Getting Started With Hadoop
No ratings yet
Getting Started With Hadoop
47 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
Big Data Apache Spark123
No ratings yet
Big Data Apache Spark123
121 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Bda 2
No ratings yet
Bda 2
25 pages
Chapter 4 Legal Regulatory and Political Issues
No ratings yet
Chapter 4 Legal Regulatory and Political Issues
2 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Unit I
No ratings yet
Unit I
38 pages
INTEGERS (Lesson Plan)
No ratings yet
INTEGERS (Lesson Plan)
4 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
Hadoop Notes
No ratings yet
Hadoop Notes
8 pages
DC Hadoop
No ratings yet
DC Hadoop
48 pages
Apache Hadoop
No ratings yet
Apache Hadoop
11 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Bda Summer 2022 Solution
No ratings yet
Bda Summer 2022 Solution
30 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
Introduction To
No ratings yet
Introduction To
7 pages
Apache Hadoop: Getting Started With
No ratings yet
Apache Hadoop: Getting Started With
7 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Big Data Introduction PDF
No ratings yet
Big Data Introduction PDF
180 pages
SPM Physics Definition List
No ratings yet
SPM Physics Definition List
5 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Tewodros Tesfahun
No ratings yet
Tewodros Tesfahun
155 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
A48970353 16469 14 2019 Hadoop
No ratings yet
A48970353 16469 14 2019 Hadoop
18 pages
BQ Tandas
100% (1)
BQ Tandas
102 pages
TLE7 - 8-ICT-PROGRAMMING FOR ROBOTICS Q1 M1 W1 - noAK
No ratings yet
TLE7 - 8-ICT-PROGRAMMING FOR ROBOTICS Q1 M1 W1 - noAK
16 pages
Introduction To The Big Data Ecosystem
No ratings yet
Introduction To The Big Data Ecosystem
13 pages
BMS Procedure
100% (3)
BMS Procedure
138 pages
Hadoop Chapter 1
No ratings yet
Hadoop Chapter 1
6 pages
Presentation - J&J 2
0% (1)
Presentation - J&J 2
47 pages
F5 Got It Pass Class Notes 2021 June
No ratings yet
F5 Got It Pass Class Notes 2021 June
221 pages
Supported Upgrade Paths For FortiOS Firmware 5.2
0% (1)
Supported Upgrade Paths For FortiOS Firmware 5.2
20 pages
First Summative Test in English 5
No ratings yet
First Summative Test in English 5
2 pages
4-Lens and Cataract
No ratings yet
4-Lens and Cataract
59 pages
Experiment 106: Uniform Circular Motion
No ratings yet
Experiment 106: Uniform Circular Motion
7 pages
IJCRT2310639
No ratings yet
IJCRT2310639
9 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
34 pages
Midterm Exam: TEST I MULTIPLE CHOICE. Select The Best Answer by Writing The Letter of Your Choice.
100% (1)
Midterm Exam: TEST I MULTIPLE CHOICE. Select The Best Answer by Writing The Letter of Your Choice.
3 pages
In-Line Mixing
No ratings yet
In-Line Mixing
9 pages
Leg en D: Construction Project Schedule
No ratings yet
Leg en D: Construction Project Schedule
6 pages
NY2B21
No ratings yet
NY2B21
8 pages
12620101AN - KS-VISION - Modbus Supervision Protocol Rev08
No ratings yet
12620101AN - KS-VISION - Modbus Supervision Protocol Rev08
16 pages
Skin Rejuvenation Regimens
No ratings yet
Skin Rejuvenation Regimens
5 pages
Pressure Volume Curve 2005
No ratings yet
Pressure Volume Curve 2005
22 pages
(YEAR 3) Math Worksheet
No ratings yet
(YEAR 3) Math Worksheet
7 pages
Britannia Industries Historical Closing Price Data-Final
No ratings yet
Britannia Industries Historical Closing Price Data-Final
48 pages
VA Pilot Competencies
No ratings yet
VA Pilot Competencies
1 page
Adding and Subtracting Integers Lesson Plan
No ratings yet
Adding and Subtracting Integers Lesson Plan
3 pages
Probability Althea
No ratings yet
Probability Althea
8 pages
Campus Map
No ratings yet
Campus Map
1 page

BIGDATA AND HADOOP - Unit II

Uploaded by

BIGDATA AND HADOOP - Unit II

Uploaded by

BigData&Hadoop

Hadoop is a software framework that is optimized for the distributed

You might also like