Instalisasi Hadoop Dengan Ubuntu
Instalisasi Hadoop Dengan Ubuntu
Instalisasi Hadoop Dengan Ubuntu
Prerequisites
• Access to a terminal window/command line
• Sudo or root privileges on local /remote machines
At the moment, Apache Hadoop 3.x fully supports Java 8. The OpenJDK 8 package in
Ubuntu contains both the runtime environment and development kit.
The OpenJDK or Oracle Java version can affect how elements of a Hadoop ecosystem
interact. To install a specific Java version, check out our detailed guide on how to install
Java on Ubuntu.
Once the installation process is complete, verify the current Java version:
In the example below, the output confirms that the latest version is already installed.
If you have installed OpenSSH for the first time, use this opportunity to implement these
vital SSH security recommendations.
The username, in this example, is hdoop. You are free the use any username and
password you see fit. Switch to the newly created user and enter the corresponding
password:
su - hdoop
Disadur pada: phoenixnap
The user now needs to be able to SSH to the localhost without being prompted for a
password.
The system proceeds to generate and save the SSH key pair.
Use the cat command to store the public key as authorized_keys in the ssh directory:
Set the permissions for your user with the chmod command:
The new user is now able to SSH without needing to enter a password every time. Verify
everything is set up correctly by using the hdoop user to SSH to localhost:
Disadur pada: phoenixnap
ssh localhost
After an initial prompt, the Hadoop user is now able to establish an SSH connection to
the localhost seamlessly.
The steps outlined in this tutorial use the Binary download for Hadoop Version 3.2.1.
Select your preferred option, and you are presented with a mirror link that allows you to
download the Hadoop tar package.
Disadur pada: phoenixnap
Note: It is sound practice to verify Hadoop downloads originating from mirror sites. The
instructions for using GPG or SHA-512 for verification are provided on the official
download page.
Use the provided mirror link and download the Hadoop package with
the wget command:
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/had
oop-3.2.1.tar.gz
Once the download is complete, extract the files to initiate the Hadoop installation:
The Hadoop binary files are now located within the hadoop-3.2.1 directory.
This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as
a single Java process. A Hadoop environment is configured by editing a set of
configuration files:
• bashrc
• hadoop-env.sh
• core-site.xml
• hdfs-site.xml
• mapred-site-xml
• yarn-site.xml
Define the Hadoop environment variables by adding the following content to the end of
the file:
export HADOOP_OPTS"-Djava.library.path=$HADOOP_HOME/lib/nativ"
Once you add the variables, save and exit the .bashrc file.
It is vital to apply the changes to the current running environment by using the following
command:
source ~/.bashrc
When setting up a single node Hadoop cluster, you need to define which Java
implementation is to be utilized. Use the previously created $HADOOP_HOME variable to
access the hadoop-env.sh file:
Uncomment the $JAVA_HOME variable (i.e., remove the # sign) and add the full path to
the OpenJDK installation on your system. If you have installed the same version as
presented in the first part of this tutorial, add the following line:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
The path needs to match the location of the Java installation on your system.
If you need help to locate the correct Java path, run the following command in your
terminal window:
which javac
The resulting output provides the path to the Java binary directory.
Use the provided path to find the OpenJDK directory with the following command:
Disadur pada: phoenixnap
readlink -f /usr/bin/javac
The section of the path just before the /bin/javac directory needs to be assigned to
the $JAVA_HOME variable.
To set up Hadoop in a pseudo-distributed mode, you need to specify the URL for your
NameNode, and the temporary directory Hadoop uses for the map and reduce process.
Add the following configuration to override the default values for the temporary
directory and add your HDFS URL to replace the default local file system setting:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
This example uses values specific to the local system. You should use values that
match your systems requirements. The data needs to be consistent throughout the
configuration process.
Disadur pada: phoenixnap
Do not forget to create a Linux directory in the location you specified for your temporary
data.
Use the following command to open the hdfs-site.xml file for editing:
Add the following configuration to the file and, if needed, adjust the NameNode and
DataNode directories to your custom locations:
<configuration>
<property>
<name>dfs.data.dir</name>
Disadur pada: phoenixnap
<value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
If necessary, create the specific directories you defined for the dfs.data.dir value.
Add the following configuration to change the default MapReduce framework name
value to yarn:
Disadur pada: phoenixnap
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
Disadur pada: phoenixnap
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</n
ame>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO
NF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRE
D_HOME</value>
</property>
</configuration>
The shutdown notification signifies the end of the NameNode format process.
Disadur pada: phoenixnap
./start-dfs.sh
Once the namenode, datanodes, and secondary namenode are up and running, start the
YARN resource and nodemanagers by typing:
./start-yarn.sh
As with the previous command, the output informs you that the processes are starting.
Disadur pada: phoenixnap
Type this simple command to check if all the daemons are active and running as Java
processes:
jps
http://localhost:9870
The NameNode user interface provides a comprehensive overview of the entire cluster.
Disadur pada: phoenixnap
The default port 9864 is used to access individual DataNodes directly from your
browser:
http://localhost:9864
http://localhost:8088
The Resource Manager is an invaluable tool that allows you to monitor all running
processes in your Hadoop cluster.
Disadur pada: phoenixnap
Conclusion
You have successfully installed Hadoop on Ubuntu and deployed it in a pseudo-
distributed mode. A single node Hadoop deployment is an excellent starting point to
explore basic HDFS commands and acquire the experience you need to design a fully
distributed Hadoop cluster.