Hadoop Installation
Hadoop Installation
This guide provides a comprehensive walkthrough for installing and setting up Apache Hadoop
on a Windows system. Hadoop is an open-source framework designed for distributed storage and
processing of large datasets using clusters of computers.
1. Install Java
Before installing Hadoop, it is essential to install Java, as Hadoop runs on the Java platform. We
will be using Java 8, which is widely supported and stable for Hadoop.
II. Scroll to the Java SE Development Kit 8 section and click the link to download the
Windows x64 Installer.
2) During installation, change the destination path to the folder you created:
a) If Java was installed in C:\Program Files\Java, cut the entire JDK folder (e.g.,
jdk1.8.0_351) and paste it inside your C:\java folder for consistency.
Fig 1.2 Sample setup
1. Open the Start menu and search for Environment Variables.
2. Click “Edit the system environment variables”, then click the Environment Variables
button.
4. Still in System Variables, select the Path variable and click Edit:
1. Open Command Prompt. Type the following command: java -version
2. You should see the installed Java version displayed.
Example output:
java version "1.8.0_351"
3. Java is now successfully installed and configured
Now that Java is installed, let's move on to installing Apache Hadoop. In this guide, we'll be
using Hadoop version 3.3.6, which is the latest stable release at the time of writing.
2.1Download Hadoop:
1. Visit the official Hadoop release page:
https://hadoop.apache.org/release/3.3.6.html
2. On the right-hand side, click the “Download tar.gz” button to download the Hadoop
compressed file.
1. After the download is complete, extract the entire tar.gz file using tools like WinRAR or
7-Zip.
2. Create a new folder in the C drive and name it something like:
C:\Hadoop_test
3. Cut and paste the extracted Hadoop folder (e.g., hadoop-3.3.6) into this new
Hadoop_test folder:
Final path should be: C:\Hadoop_test\hadoop-3.3.6
Fig 2.2 Hadoop folder setup
Just like we did for Java, now it's time to configure the environment variables for Hadoop.
1. Open the Start menu and search for Environment Variables.
2. Click “Edit the system environment variables”, then click the Environment Variables
button.
After installing Hadoop and setting the HADOOP_HOME environment variable, you
now need to configure Hadoop by editing some XML configuration files and preparing system
folders.
1. Go to:
C:\Hadoop_test\hadoop-3.3.6\etc\hadoop
2. Open the file hadoop-env.cmd in a text editor (Right-click → Edit).
3. Find the line that sets the JAVA_HOME and change it to your actual Java path. Example:
set JAVA_HOME=C:\java\jdk1.8.0_351(Your java jdk path)
1. Open Environment Variables (Search → “Edit the system environment variables”).
2. Under User Variables:
■ C:\Hadoop_test\hadoop-3.3.6\bin
■ C:\Hadoop_test\hadoop-3.3.6\sbin
○ namenode
○ datanode
● C:\Hadoop_test\hadoop-3.3.6\data\namenode
● C:\Hadoop_test\hadoop-3.3.6\data\datanode
3.4.1. core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
3.4.2. hdfs-site.xml
Replace with:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:/Hadoop_test/hadoop-3.3.6/data/namenode</value>(Give your’s file
path)
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:/Hadoop_test/hadoop-3.3.6/data/datanode</value> (Give your’s file
path)
</property>
</configuration>
In the values give the path of the datanode and , namenode.
Fig 3.3 Hdfs-site.xml file
Repeat this for httpfs-site.xml file also with same changes.
3.4.3. mapred-site.xml
Then paste:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3.4.4 yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.shuffleHandler</value>
</property>
</configuration>
The default Hadoop distribution does not include the necessary Windows binaries.
Follow these steps:
2. Download the fixed bin folder for Windows from the following link:
Download Hadoop bin for Windows or
Link: https://drive.google.com/file/d/1nCN_jK7EJF2DmPUUxgOggnvJ6k6tksYz/view?pli=1
2. If it shows an error like “msvcr120.dll missing”, download the DLL file from a trusted
site.
https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-17
0
● NameNode UI → http://localhost:9870
● ResourceManager UI → http://localhost:8088