0% found this document useful (0 votes)

2 views

hadoop

The document outlines the Hadoop framework, an open-source solution for distributed processing of large datasets across commodity hardware, emphasizing its scalability, fault tolerance, and cost-effectiveness. It details key components such as HDFS, YARN, and MapReduce, along with installation prerequisites, configuration steps, and cluster management. Additionally, it provides guidance on verifying cluster setup and expanding the cluster by adding more worker nodes.

Uploaded by

ddevikash2001

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

hadoop

Uploaded by

ddevikash2001

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

An open source framework that allows distributed processing of

large data-sets across the cluster of Commodity Hardware

It provides a scalable, fault-tolerant, and cost-effective solution

for handling big data.

The key components of the Hadoop framework are the Hadoop

Distributed File System (HDFS), Hadoop Yarn and MapReduce
Distributed
processing

Fault
Easy to use
tolerance

Economic Scalability

Open
Source
Nodes

Master Node Slave Node

The NameNode manages the distributed file system and
knows where stored data blocks inside the cluster are.

The ResourceManager manages the YARN jobs and takes

care of scheduling and executing processes on worker nodes

The DataNode manages the physical data stored on the

node

The NodeManager manages execution of tasks on the node

Sub Work Sub Work Sub Work Sub Work

1. Prerequisites
2. Hadoop Installation
3. SSH Configuration
4. Configuration
5. Hadoop Daemons Setup
6. Verify the Cluster Setup
7. Scaling the Cluster
 GNU/Linux based operating
system

 Java Installation

 Hardware requirement like

RAM, Hard disk drives for
data storage and processors
wget http://apache.cs.utah.edu/hadoop/common/current/hadoop-
3.1.2.tar.gz

tar -xzf hadoop-3.1.2.tar.gz

mv hadoop-3.1.2 hadoop
Enable passwordless SSH access between all machines in the cluster to
Enable facilitate communication and remote execution

Generate an SSH key pair on the machine designated as the master node
Generate • ssh-keygen -t rsa (press enter 4 times: do this in all the nodes)
• Id_rsa.pub (in .ssh directory, copy all the keys in a new file named authorized_keys)

Test SSH connectivity by logging into each machine from the master node
Test using SSH, without requiring a password.
Configure Hadoop files in both master and slave nodes
for clustering
 “etc/Hadoop” (Hadoop configuration files are located
here)
 core Hadoop configuration files
▪ hadoop-env.sh (set Java_home path)
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

▪ core-site.xml [set default file system to HDFS(fs.defaultFS),

path to tmp folder(hadoop.tmp.dir)]
<property>
<name>fs.defaultFS</name>
<value>hdfs://master_node_IP:9000</value>
</property>
▪ hdfs-site.xml (set HDFS replication factor, and
data directory)
<property>
<name>dfs.replication</name>
<value>3</value> For master node
</property> dfs.namenode.data.dir
<property>
<name>dfs.datanode.data.dir</name>
<value>/path/to/data/dir1,/path/to/data/dir2</value>
</property>

▪ yarn-site.xml (Set resource manager properties)

<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master_nodeIP</value>
</property>

<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1536</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
</property>

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
 /hadoop/etc/hadoop/mapred-site.xml
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>512</value>
</property>

<property>
<name>mapreduce.map.memory.mb</name>
<value>256</value>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>256</value>
</property>

yarn.nodemanager.resource.memory-mb 1536
yarn.scheduler.maximum-allocation-mb 1536
yarn.scheduler.minimum-allocation-mb 128
yarn.app.mapreduce.am.resource.mb 512
mapreduce.map.memory.mb 256
mapreduce.reduce.memory.mb 256
 bin/hdfs namenode –format

 On the master node, start the NameNode

daemon and ResourceManager daemon by
running the appropriate commands (start-
dfs.sh and start-yarn.sh, respectively)
Access the Hadoop web interfaces to ensure the daemons are running correctly.
The primary interfaces are the NameNode UI (NameNode (HDFS Status) -
http://localhost:9870/

http://localhost:9870/) ResourceManager UI (ResourceManager – (YARN status)

http://localhost:8088/ http://localhost:19888/

http://localhost:8088/) , and JobHistoryServer UI(http://localhost:19888/) .

Test HDFS functionality by creating directories, uploading files, and listing or

retrieving data from HDFS using the Hadoop command-line interface (CLI).

Submit sample MapReduce jobs to validate the functionality of the MapReduce

framework.
 To expand the cluster, repeat the steps for setting up
additional worker nodes, ensuring they have the
same Hadoop installation, configuration files, and
SSH access.
 Update the configuration files (workers, yarn-
site.xml) on the master node to include the new
worker nodes.
 Restart the necessary Hadoop daemons (start-dfs.sh
and start-yarn.sh) to incorporate the changes.

Yahoo Hadoop Tutorial
No ratings yet
Yahoo Hadoop Tutorial
28 pages
Lab 1
No ratings yet
Lab 1
12 pages
Hadoop Week 2
No ratings yet
Hadoop Week 2
40 pages
TP2 _3IM - En
No ratings yet
TP2 _3IM - En
7 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
How To Install and Set Up A 3-Node Hadoop Cluster
No ratings yet
How To Install and Set Up A 3-Node Hadoop Cluster
36 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
BDA Lab File
No ratings yet
BDA Lab File
4 pages
Yarn Tutorial PDF
No ratings yet
Yarn Tutorial PDF
30 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
4 pages
Bda A2
No ratings yet
Bda A2
17 pages
Bda A1
No ratings yet
Bda A1
15 pages
hadoop3
No ratings yet
hadoop3
8 pages
Start Hadoop
No ratings yet
Start Hadoop
4 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
Install Apache Hadoop Using Cloudera
No ratings yet
Install Apache Hadoop Using Cloudera
132 pages
bda-manual
No ratings yet
bda-manual
33 pages
BDA LAB MANUEL
No ratings yet
BDA LAB MANUEL
9 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
Hadoop 1
No ratings yet
Hadoop 1
39 pages
Big Data File
No ratings yet
Big Data File
16 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Experiment-2_BDA_Lab
No ratings yet
Experiment-2_BDA_Lab
13 pages
Hadoop & Spark
No ratings yet
Hadoop & Spark
40 pages
BIG DATA UNIT-III Notes
No ratings yet
BIG DATA UNIT-III Notes
16 pages
Expt 1 - Hadoop Installation
No ratings yet
Expt 1 - Hadoop Installation
10 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
No ratings yet
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
7 pages
NEW BDA MANUAL
No ratings yet
NEW BDA MANUAL
80 pages
Lab 2
No ratings yet
Lab 2
9 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
HadoopfilePP
No ratings yet
HadoopfilePP
83 pages
Hadoop All Installations
No ratings yet
Hadoop All Installations
19 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Unit IV
No ratings yet
Unit IV
10 pages
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
No ratings yet
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
14 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
Hadoop Presentaton
No ratings yet
Hadoop Presentaton
47 pages
big_data_1
No ratings yet
big_data_1
2 pages
Hadoop
No ratings yet
Hadoop
27 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
HADOOP RECORD 2024-FINAL
No ratings yet
HADOOP RECORD 2024-FINAL
59 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
2 - Installation
No ratings yet
2 - Installation
15 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
No ratings yet
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
35 pages
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
No ratings yet
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
47 pages
Us 21 The Unbelievable Insecurity of The Big Data Stack An Offensive Approach To Analyzing Huge and Complex Big Data Infrastructures
No ratings yet
Us 21 The Unbelievable Insecurity of The Big Data Stack An Offensive Approach To Analyzing Huge and Complex Big Data Infrastructures
48 pages
Part 03 Intro To Hadoop
No ratings yet
Part 03 Intro To Hadoop
22 pages
Hadoop Multinode Cluster Installation
No ratings yet
Hadoop Multinode Cluster Installation
4 pages
HADOOP PPT
No ratings yet
HADOOP PPT
21 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Reltio The Self Learning Enterprise 0228
No ratings yet
Reltio The Self Learning Enterprise 0228
8 pages
RPM Commands and Eg
No ratings yet
RPM Commands and Eg
16 pages
User Guide Registering A Business Name
No ratings yet
User Guide Registering A Business Name
28 pages
LENOVO l480 l580 Ug en
No ratings yet
LENOVO l480 l580 Ug en
174 pages
CVDEMARCASSIUS
No ratings yet
CVDEMARCASSIUS
2 pages
Alphacam EULA
No ratings yet
Alphacam EULA
17 pages
4 - Process Models
No ratings yet
4 - Process Models
32 pages
Manajemen Pusat Data - Pertemuan 3
No ratings yet
Manajemen Pusat Data - Pertemuan 3
18 pages
Living in The IT Era Topic 1 The World of Computers: 1. What Is A Computer?
No ratings yet
Living in The IT Era Topic 1 The World of Computers: 1. What Is A Computer?
12 pages
Update To PPD Specification: Technical Note #5645
No ratings yet
Update To PPD Specification: Technical Note #5645
12 pages
Unit 6
No ratings yet
Unit 6
11 pages
EvaML - Reference - Manual
No ratings yet
EvaML - Reference - Manual
32 pages
Helpinghands: by Ankit Thakur Hardik Parekh Ankit Solanki
No ratings yet
Helpinghands: by Ankit Thakur Hardik Parekh Ankit Solanki
58 pages
Menna Wael - A00424220
No ratings yet
Menna Wael - A00424220
2 pages
Cloud Computing Task
No ratings yet
Cloud Computing Task
11 pages
01 NumberSystems
No ratings yet
01 NumberSystems
63 pages
Assessing The Network With Common Security Tools 3e - Mason Burton
No ratings yet
Assessing The Network With Common Security Tools 3e - Mason Burton
12 pages
Ch4 Eigrp Route Summarization Filtering
No ratings yet
Ch4 Eigrp Route Summarization Filtering
5 pages
NR RAN Notification/Tracking Area
No ratings yet
NR RAN Notification/Tracking Area
7 pages
1z0-448 (1)
No ratings yet
1z0-448 (1)
35 pages
DFD For ATM System
67% (3)
DFD For ATM System
15 pages
Finalized Copy ICT 4 6
No ratings yet
Finalized Copy ICT 4 6
36 pages
Learning Webrtc: The Ultimate Getting Started Guide
No ratings yet
Learning Webrtc: The Ultimate Getting Started Guide
59 pages
Nvami
No ratings yet
Nvami
398 pages
Logic Families PDF
No ratings yet
Logic Families PDF
10 pages
Tugas Bahasa Inggris 4 Makalah (Discussion Hardware and Software)
100% (1)
Tugas Bahasa Inggris 4 Makalah (Discussion Hardware and Software)
10 pages
Automatic Smart Parking System Using Internet of Things (IOT)
100% (1)
Automatic Smart Parking System Using Internet of Things (IOT)
4 pages
Sorting Visualizer
No ratings yet
Sorting Visualizer
16 pages
Jit V2
No ratings yet
Jit V2
30 pages
Half Yearly Rev
No ratings yet
Half Yearly Rev
7 pages

hadoop

Uploaded by

hadoop

Uploaded by

An open source framework that allows distributed processing of

large data-sets across the cluster of Commodity Hardware

It provides a scalable, fault-tolerant, and cost-effective solution

The key components of the Hadoop framework are the Hadoop

Master Node Slave Node

The ResourceManager manages the YARN jobs and takes

The DataNode manages the physical data stored on the

The NodeManager manages execution of tasks on the node

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

Sub Work Sub Work Sub Work Sub Work

 Hardware requirement like

tar -xzf hadoop-3.1.2.tar.gz

▪ core-site.xml [set default file system to HDFS(fs.defaultFS),

▪ yarn-site.xml (Set resource manager properties)

▪ mapred-site.xml (set map reduce frame work)

 On the master node, start the NameNode

http://localhost:9870/) ResourceManager UI (ResourceManager – (YARN status)

http://localhost:8088/) , and JobHistoryServer UI(http://localhost:19888/) .

Test HDFS functionality by creating directories, uploading files, and listing or

Submit sample MapReduce jobs to validate the functionality of the MapReduce

You might also like