0% found this document useful (0 votes)
43 views5 pages

Install Hadoop

The document provides instructions for installing Java, Hadoop, and Hive on a single node Linux system. It includes steps to install Java and set the JAVA_HOME environment variable. It then describes downloading and configuring Hadoop, including editing configuration files and setting up directories. Steps are provided for running Hadoop daemons and verifying the installation. Finally, it briefly outlines installing Hive and configuring it to work with the Hadoop installation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views5 pages

Install Hadoop

The document provides instructions for installing Java, Hadoop, and Hive on a single node Linux system. It includes steps to install Java and set the JAVA_HOME environment variable. It then describes downloading and configuring Hadoop, including editing configuration files and setting up directories. Steps are provided for running Hadoop daemons and verifying the installation. Finally, it briefly outlines installing Hive and configuring it to work with the Hadoop installation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Installing the Default JRE/JDK (java)

sudo apt update


sudo apt install openjdk-8-jdk

Setting the JAVA_HOME Environment Variable:


sudo nano /etc/environment
add line:
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
PATH="/usr/lib/jvm/java-8-openjdk-amd64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/
sbin:/bin:/usr/games:/usr/local/games"

source /etc/environment

echo $JAVA_HOME
java -version

Install ssh
sudo apt install openssh-server openssh-client
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Install Hadoop:
wget https://dlcdn.apache.org/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
tar xzf hadoop-2.10.1.tar.gz
mv hadoop-2.10.1 hadoop

Config Hadoop trong thư mục:

cd hadoop/etc/hadoop

nano core-site.xml
----- core-site.xml ---------
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

nano hdfs-site.xml
---- hdfs-site.xml ----
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>file:///home/ubuntu/hadoop/hdfs/namenode </value>
</property>

<property>
<name>dfs.data.dir</name>
<value>file:///home/ubuntu/hadoop/hdfs/datanode </value>
</property>
</configuration>

nano yarn-site.xml
--- yarn-site.xml ----
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

## Đổi tên tập tin


cp mapred-site.xml.template mapred-site.xml
nano mapred-site.xml
----- mapred-site.xml -----
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

---------------
-- Environment Setup —
cp ~/.bashrc ~/.bashrc0
nano ~/.bashrc

Thêm các dòng sau vào cuối tập tin :


export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=~/hadoop
export PATH=$PATH:${JAVA_HOME}/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

source ~/.bashrc

nano ~/hadoop/etc/hadoop/hadoop-env.sh
----- hadoop-env.sh ------
Thay đường dẫn : JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

-------------
- Name Node Setup
hdfs namenode -format
- Verifying Hadoop dfs
start-dfs.sh
- Verifying Yarn Script
start-yarn.sh
- Or start all:
start-all.sh

- jps
- Accessing Hadoop on Browser
http://localhost:50070/
- Verify All Applications for Cluster
http://localhost:8088/

--- Compile WordCount.java ---


$ hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf wc.jar WordCount*.class
$ hadoop jar wc.jar WordCount /data/input /data/output1

Eclipse:
Them thu vien nguoi dung Hadoop: add cac file jar trong cac thu muc sau:

hadoop/share/hadoop/common/
hadoop/share/hadoop/common/lib
hadoop/share/hadoop/hdfs
hadoop/share/hadoop/yarn
hadoop/share/hadoop/mapreduce
Cai dat Hadoop cluster:
Apache Hadoop Installation on Multi Node Tutorial | CloudDuggu
Tutorial Hadoop multi node installation - intellitech.pro

Apache Hive:
Cài đặt và cấu hình:
>> https://sparkbyexamples.com/apache-hive/apache-hive-installation-on-hadoop/

Sau đó cấu hình thêm hive.server2 và Beeline:


>> http://www.mtitek.com/tutorials/bigdata/hive/install.php
>> Khi cấu hình Beeline , Bước 17:
${HADOOP_HOME}/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.ubuntu.groups</name>
<value>*</value>
</property>

<property>
<name>hadoop.proxyuser.ubuntu.hosts</name>
<value>*</value>
</property>

ubuntu/cntt@2021 >> tài khoản cài hive của Ubuntu OS


>>> Restart lại HDFS: stop-dfs.sh ; start-dfs.sh

beeline> !connect jdbc:hive2://localhost:10000


>> ubuntu/cntt@2021

>> Thực hiện Command line:


$HIVE_HOME/bin/beeline -u jdbc:hive2://
hive> SET hive.exec.mode.local.auto=true;

Start HiveServer
$ mkdir ~/hiveserver2log
$ cd ~/hiveserver2log
$ nohup hiveserver2 &
$ nohup hive --service hiveserver2 &
$ nohup hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf
hive.root.logger=INFO,console &
$ tail -f ~/hiveserver2log/nohup.out

HiveServer web UI: http://localhost:10002

Start Hive MetaStore


$ mkdir ~/hivemetastorelog
$ cd ~/hivemetastorelog
$ nohup hive --service metastore &
$ tail -f ~/hiveserver2log/nohup.out

Hive Tutorial:
https://www.guru99.com/hive-tutorials.html

https://sparkbyexamples.com/apache-hive-tutorial/

You might also like