0% found this document useful (0 votes)

79 views5 pages

B1 - Install Hadoop Va Spark

Uploaded by

Trung Kiên Hồ Ngọc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views5 pages

B1 - Install Hadoop Va Spark

Uploaded by

Trung Kiên Hồ Ngọc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Môn học: Khai phá dữ liệu lớn

Hướng dẫn cài đặt

1. Hadoop
a. Install Java:
i. Get update: sudo apt-get update
ii. Providing from a supported Ubuntu repository:
sudo apt-get install default-jdk
iii. Check version: java -version
b. Adding a dedicated Hadoop user
i. Add group: sudo addgroup hadoop
ii. Add user: sudo adduser --ingroup hadoop hduser => enter Y
c. Installing SSH:
i. Install: sudo apt-get install ssh
ii. Check: which ssh
Which sshd
d. Create and Setup SSH Certificates
i. Change hduser: su hduser
e. Install Hadoop 3.3.0
i. Download hadoop:
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.-src.tar.gz
ii. Extract hadoop: tar xvzf hadoop-3.3.0-src.tar.gz
iii. Move to hadoop install to directory /usr/local/hadoop:
sudo mkdir -p /usr/local/hadoop => enter password
iv. Check user hduser if it is not in sudo group: sudo –v
v. Add hduser to sudo group: sudo adduser hduser sudo
vi. Move hadoop installation to /usr/local/hadoop:
sudo mv * /usr/local/hadoop
Sudo chown -R hduser:hadoop /usr/local/hadoop
vii. Setup configuration file
1. ~/.bashrc
Open ~/.bashrc: nano ~/.bachsrc
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL

1
ThS. Hồ Ngọc Trung Kiên
Môn học: Khai phá dữ liệu lớn

export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib
/native
export
HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
save file ~./bashrc: source ~./bashrc

2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Set Javahome in hadoop-env.sh: nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Adding the above statement in the hadoop-env.sh file

source /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3. /usr/local/hadoop/etc/hadoop/core-site.xml:
override the default settings that Hadoop starts with.
Sudo mkdir –p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
Open the file: nano /usr/local/hadoop/etc/hadoop/core-site.xml
and enter the following in between the <configuration></configuration> tag:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml
Check default file in /usr/local/hadoop/etc/hadoop/ folder,
If it equal /usr/local/hadoop/etc/hadoop/mapred-
site.xml.template, we need change its name mapred-site.xml

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml

we need configuration it
2
ThS. Hồ Ngọc Trung Kiên
Môn học: Khai phá dữ liệu lớn

open file: nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

then, add configuration
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml

The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each

host in the cluster that is being used.
It specifies the directories which will be used as the namenode and the datanode on
that host.

Before editing this file, we need to create two directories which will contain
the namenode and the datanode for this Hadoop installation.

This can be done using the following commands:

sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode

sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
sudo chown -R hduser:hadoop /usr/local/hadoop_store

open file: nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

and configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.

3
ThS. Hồ Ngọc Trung Kiên
Môn học: Khai phá dữ liệu lớn

The default is used if replication is not specified in create time.

</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
viii. Format the New Hadoop Filesystem
hadoop namenode -format

starting hadoop
move to directory sbin: cd /usr/local/hadoop/sbin
start yarn.sh: start-dfs.sh
start-yarn.sh
check: jps
stop: stop-dfs.sh
stop-yarn.sh

Web UI: localhost:9870

http://localhost:8088/cluster

2. Spark
a. Install Java JDK
i. Sudo apt update
ii. Sudo apt install default-jdk
iii. Check version: java –version

b. Install Scala

4
ThS. Hồ Ngọc Trung Kiên
Môn học: Khai phá dữ liệu lớn

sudo apt install scala

Check version: scala –version

c. Install Apache Spark

wget https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-
hadoop3.2.tgz

extracted: tar –zvzf spark-3.1.2-bin-hadoop3.2.tgz

move downloaded file to /opt/spark: sudo mv spark-3.1.2-bin-hadoop3.2.tgz /opt/spark

create environment variables

nano ~/.bashrc

add configuration:

export SPARK_HOME=/opt/spark

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

save file: source ~/.bashrc

Start Apache Spark: start-master.sh

Start spark-shell in terminal: spark-shell

3. IDE
a. Eclipse: Ubuntu software => search Elcipse => install
b. Elcipse for Scala: Eclipse =>Help => Eclipse Marketplace => Scala=> Go =>
Scala IDE 4.7.x => Install
c. Intelli J: Ubuntu software => search IDE/Intelli J => install
4. Tạo project trong Eclipse

5
ThS. Hồ Ngọc Trung Kiên

2-2.2. Pokayoke Check Sheet
No ratings yet
2-2.2. Pokayoke Check Sheet
1 page
Embedded TM EWM Integration 1661760750
No ratings yet
Embedded TM EWM Integration 1661760750
25 pages
4.0 IT Infrastructure and Emerging Technologies
100% (1)
4.0 IT Infrastructure and Emerging Technologies
44 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster) STEP:1
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster) STEP:1
13 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
No ratings yet
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
14 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Noeud
No ratings yet
Noeud
4 pages
Hadoop Cluster Creation
No ratings yet
Hadoop Cluster Creation
8 pages
Hadoop All Installations
No ratings yet
Hadoop All Installations
19 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Install Hadoop
No ratings yet
Install Hadoop
5 pages
Hadoop Installation Commands
No ratings yet
Hadoop Installation Commands
3 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
Installing Hadoop in Ubuntu in Virtual Box Instructions
No ratings yet
Installing Hadoop in Ubuntu in Virtual Box Instructions
4 pages
Installationof Hadoop 3
No ratings yet
Installationof Hadoop 3
6 pages
Hadoop Installation On Linux
No ratings yet
Hadoop Installation On Linux
4 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Original
No ratings yet
Original
17 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
BDAO
No ratings yet
BDAO
23 pages
Hadoop & Spark
No ratings yet
Hadoop & Spark
40 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Nitish Steps To Install Hadoop
No ratings yet
Nitish Steps To Install Hadoop
3 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Cygnus 850: 2-Wire G.SHDSL Modem Family
No ratings yet
Cygnus 850: 2-Wire G.SHDSL Modem Family
2 pages
Installing A Single Node Hadoop Cluster
No ratings yet
Installing A Single Node Hadoop Cluster
4 pages
Install Hdfs
No ratings yet
Install Hdfs
3 pages
Hadoop Single Node Installation
No ratings yet
Hadoop Single Node Installation
7 pages
Tle8 CSS
No ratings yet
Tle8 CSS
14 pages
Mohamed Khaled Node - Js Back-End Developer
No ratings yet
Mohamed Khaled Node - Js Back-End Developer
2 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Hadoop Installation
No ratings yet
Hadoop Installation
3 pages
I-SIMPA ScriptingGuide en
No ratings yet
I-SIMPA ScriptingGuide en
59 pages
AltaLink Product Enhancements Read Me R18-08
No ratings yet
AltaLink Product Enhancements Read Me R18-08
22 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Instalhadoop
No ratings yet
Instalhadoop
3 pages
Experion PKS R400.1 HSI Patch 1
No ratings yet
Experion PKS R400.1 HSI Patch 1
17 pages
Had Oop Installation
No ratings yet
Had Oop Installation
4 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
Odu Base Band Board Schematics0
No ratings yet
Odu Base Band Board Schematics0
38 pages
TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Python
No ratings yet
Python
20 pages
Steps of Hadoop Installation
No ratings yet
Steps of Hadoop Installation
3 pages
Hadoop For Ubuntu 2
No ratings yet
Hadoop For Ubuntu 2
4 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
RH850-F1Kx, RH850-F1K Hardware Design Guide
No ratings yet
RH850-F1Kx, RH850-F1K Hardware Design Guide
132 pages
Universal Logic Elements Constructed On The Turing Tumble
No ratings yet
Universal Logic Elements Constructed On The Turing Tumble
9 pages
Windows 11 Specifications - Microsoft
No ratings yet
Windows 11 Specifications - Microsoft
7 pages
Building A Fortran CLI
No ratings yet
Building A Fortran CLI
6 pages
Solving 15-Puzzle Problem: #SD20IF007
No ratings yet
Solving 15-Puzzle Problem: #SD20IF007
6 pages
BDA Exp-1.2
No ratings yet
BDA Exp-1.2
3 pages
Big Data
No ratings yet
Big Data
5 pages
UM MC96F8208S Eng-1
No ratings yet
UM MC96F8208S Eng-1
227 pages
Half Adder
No ratings yet
Half Adder
5 pages
anubhav@CV
No ratings yet
anubhav@CV
3 pages
GFT Factsheet Bankstart
No ratings yet
GFT Factsheet Bankstart
8 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
2024-02-07T04 59 54.941Z-Revaluation Examination Report B.Tech ComputerScienceandEngineering SemesterV
No ratings yet
2024-02-07T04 59 54.941Z-Revaluation Examination Report B.Tech ComputerScienceandEngineering SemesterV
8 pages
Hospital Management Code With Output
No ratings yet
Hospital Management Code With Output
23 pages
7.3.2.10 Lab - Research Laptop Drives
No ratings yet
7.3.2.10 Lab - Research Laptop Drives
2 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Dap-1620 Reva Qig
No ratings yet
Dap-1620 Reva Qig
40 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Term Paper On Database Management System
100% (1)
Term Paper On Database Management System
4 pages
JUSense - Jadavpur University
No ratings yet
JUSense - Jadavpur University
1 page
Hadoop
No ratings yet
Hadoop
5 pages
IT-Services - VPN Eng - March24 - OpenVPN - Only
No ratings yet
IT-Services - VPN Eng - March24 - OpenVPN - Only
2 pages
AV Cyber Security Guide
No ratings yet
AV Cyber Security Guide
1 page
WarmUp Java - CodingBat
No ratings yet
WarmUp Java - CodingBat
11 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Pattern Publication 1 2025
No ratings yet
Pattern Publication 1 2025
1 page
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet

B1 - Install Hadoop Va Spark

Uploaded by

B1 - Install Hadoop Va Spark

Uploaded by

Môn học: Khai phá dữ liệu lớn

Hướng dẫn cài đặt

Adding the above statement in the hadoop-env.sh file

open file: nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each

This can be done using the following commands:

sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode

open file: nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

The default is used if replication is not specified in create time.

Web UI: localhost:9870

sudo apt install scala

c. Install Apache Spark

extracted: tar –zvzf spark-3.1.2-bin-hadoop3.2.tgz

move downloaded file to /opt/spark: sudo mv spark-3.1.2-bin-hadoop3.2.tgz /opt/spark

create environment variables

save file: source ~/.bashrc

Start Apache Spark: start-master.sh

Start spark-shell in terminal: spark-shell

You might also like