0% found this document useful (0 votes)
15 views17 pages

Hadoop Installation

This installation guide provides detailed steps for setting up Apache Hadoop on a Windows system, starting with the installation of Java 8. It includes instructions for downloading, extracting, and configuring Hadoop version 3.3.6, as well as setting necessary environment variables and configuring XML files. The guide concludes with steps to format the NameNode, start Hadoop daemons, and access the web interfaces for monitoring.

Uploaded by

Jeya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Hadoop Installation

This installation guide provides detailed steps for setting up Apache Hadoop on a Windows system, starting with the installation of Java 8. It includes instructions for downloading, extracting, and configuring Hadoop version 3.3.6, as well as setting necessary environment variables and configuring XML files. The guide concludes with steps to format the NameNode, start Hadoop daemons, and access the web interfaces for monitoring.

Uploaded by

Jeya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Installation Guide

This guide provides a comprehensive walkthrough for installing and setting up Apache Hadoop
on a Windows system. Hadoop is an open-source framework designed for distributed storage and
processing of large datasets using clusters of computers.

1. Install Java

Before installing Hadoop, it is essential to install Java, as Hadoop runs on the Java platform. We
will be using Java 8, which is widely supported and stable for Hadoop.

1.1 Download Java:

I.​ Visit the official Oracle Java download page:​


https://www.oracle.com/java/technologies/downloads/#java8​

II.​ Scroll to the Java SE Development Kit 8 section and click the link to download the
Windows x64 Installer.​

III.​ Oracle requires you to sign in:​

A.​ If you already have an Oracle account, log in.​

B.​ If not, create a new account — it’s free.


Fig 1.1 Installation of Java​

1.2 Install and Organize Java:

1)​ Create a folder named java directly in your C drive:​


C:\java​

2)​ During installation, change the destination path to the folder you created:​

a)​ Set the path to: C:\java​

3)​ After installation:​

a)​ If Java was installed in C:\Program Files\Java, cut the entire JDK folder (e.g.,
jdk1.8.0_351) and paste it inside your C:\java folder for consistency.​
​ ​ ​ ​ Fig 1.2 Sample setup

1.3 Set Environment Variables:

1.​ Open the Start menu and search for Environment Variables.​

2.​ Click “Edit the system environment variables”, then click the Environment Variables
button.​

3.​ Under User Variables:​

○​ Click New and enter the following:​

■​ Variable Name: JAVA_HOME​

■​ Variable Value: C:\java\jdk1.8.0_351 (adjust based on your JDK folder


path)
Fig 1.3 Setting of java path in user variable​

4.​ Still in System Variables, select the Path variable and click Edit:​

○​ Click Add and paste:​


C:\java\jdk1.8.0_351\bin(your folder path)​
​ ​ ​ Fig 1.4 System variable path

5.​ Click OK to close all dialogs and save changes.​

Verify Java Installation

1.​ Open Command Prompt. Type the following command: java -version
2.​ You should see the installed Java version displayed.​
Example output:​
java version "1.8.0_351"
3.​ Java is now successfully installed and configured

2. Install Hadoop (Version 3.3.6)

Now that Java is installed, let's move on to installing Apache Hadoop. In this guide, we'll be
using Hadoop version 3.3.6, which is the latest stable release at the time of writing.

2.1Download Hadoop:
1.​ Visit the official Hadoop release page:​
https://hadoop.apache.org/release/3.3.6.html​

2.​ On the right-hand side, click the “Download tar.gz” button to download the Hadoop
compressed file.

​ ​ ​ Fig:2.1 Installation of hadoop

2.2 Extract and Organize

1.​ After the download is complete, extract the entire tar.gz file using tools like WinRAR or
7-Zip.​

2.​ Create a new folder in the C drive and name it something like:​
C:\Hadoop_test​

3.​ Cut and paste the extracted Hadoop folder (e.g., hadoop-3.3.6) into this new
Hadoop_test folder:​
Final path should be: C:\Hadoop_test\hadoop-3.3.6

​ ​ ​ ​ Fig 2.2 Hadoop folder setup

2.3 Set Hadoop Environment Variable

Just like we did for Java, now it's time to configure the environment variables for Hadoop.

1.​ Open the Start menu and search for Environment Variables.​

2.​ Click “Edit the system environment variables”, then click the Environment Variables
button.​

3.​ Under User Variables:​

○​ Click New and enter the following:​

■​ Variable Name: HADOOP_HOME​

■​ Variable Value: C:\Hadoop_test\hadoop-3.3.6​


(Make sure the path matches your actual folder)​
4.​ Click OK to save and exit.

​ ​ Fig 2.3 Creating variable and value in user variable

5.​ Under System Variable:


○​ Click the path and click edit.
■​ Copy and paste the path of the Java file and javabin.\
6.​ Click ok to save it.

Fig 2.4 Path in System variable


3. Configure Hadoop

After installing Hadoop and setting the HADOOP_HOME environment variable, you
now need to configure Hadoop by editing some XML configuration files and preparing system
folders.

3.1 Update Java Path in Hadoop Configuration

1.​ Go to:​
C:\Hadoop_test\hadoop-3.3.6\etc\hadoop
2.​ Open the file hadoop-env.cmd in a text editor (Right-click → Edit).
3.​ Find the line that sets the JAVA_HOME and change it to your actual Java path. Example:​
set JAVA_HOME=C:\java\jdk1.8.0_351(Your java jdk path)

​ ​ Fig 3.1 Java path in hadoop-env.cmd

3.2 Set Hadoop Environment Variables

1.​ Open Environment Variables (Search → “Edit the system environment variables”).​
2.​ Under User Variables:​

○​ Click New → Name: HADOOP_HOME, Value: C:\Hadoop_test\hadoop-3.3.6​

3.​ Under System Variables:​

○​ Select Path → Click Edit → Click Add, and paste:​

■​ C:\Hadoop_test\hadoop-3.3.6\bin​

■​ C:\Hadoop_test\hadoop-3.3.6\sbin​

3.3 Create Data Directories

Create folders to store NameNode and DataNode data:

1.​ Navigate to:​


C:\Hadoop_test\hadoop-3.3.6​

2.​ Create a new folder: data​

3.​ Inside data, create two folders:​

○​ namenode​

○​ datanode​

So the full paths will be:

●​ C:\Hadoop_test\hadoop-3.3.6\data\namenode​
●​ C:\Hadoop_test\hadoop-3.3.6\data\datanode

​ ​ Fig 3.2 Creation of datanode and namenode folder

3.4 Configure Hadoop XML Files

Go to C:\Hadoop_test\hadoop-3.3.6\etc\hadoop and update the following files:

3.4.1. core-site.xml

Replace the content with:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

3.4.2. hdfs-site.xml

Replace with:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:/Hadoop_test/hadoop-3.3.6/data/namenode</value>(Give your’s file
path)
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:/Hadoop_test/hadoop-3.3.6/data/datanode</value> (Give your’s file
path)
</property>
</configuration>
In the values give the path of the datanode and , namenode.
​ Fig 3.3 Hdfs-site.xml file
Repeat this for httpfs-site.xml file also with same changes.

3.4.3. mapred-site.xml

If the file doesn’t exist, copy mapred-site.xml.template and rename it to mapred-site.xml.

Then paste:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3.4.4 yarn-site.xml

Paste the following:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.shuffleHandler</value>
</property>
</configuration>

3.5 Fix Missing bin Files (winutils.exe)

The default Hadoop distribution does not include the necessary Windows binaries.
Follow these steps:

1.​ Delete the existing bin folder inside C:\Hadoop_test\hadoop-3.3.6.​

2.​ Download the fixed bin folder for Windows from the following link:​
Download Hadoop bin for Windows ​ or

Link: https://drive.google.com/file/d/1nCN_jK7EJF2DmPUUxgOggnvJ6k6tksYz/view?pli=1​

3.​ Extract and place the bin folder into C:\Hadoop_test\hadoop-3.3.6.


3.6 Fix winutils.exe Error (msvcr120.dll)

1.​ Try to run winutils.exe from:​


C:\Hadoop_test\hadoop-3.3.6\bin\winutils.exe​

2.​ If it shows an error like “msvcr120.dll missing”, download the DLL file from a trusted
site.​

3.​ Copy the downloaded msvcr120.dll to:​


C:\Windows\System32​

4.​ Re-run winutils.exe – the error should be gone.​

3.7 Install Microsoft C++ Redistributable

1.​ Visit the Microsoft VC++ download page:​

https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-17
0​

2.​ Download and install the x64 version.​

3.8 Format the NameNode

1.​ Open Command Prompt.


2.​ Run the following command: hdfs namenode -format
3.​ You should see:​
“Successfully formatted NameNode”​
3.9 Start Hadoop Daemons
Navigate to:​
C:\Hadoop_test\hadoop-3.3.6\sbin

1.​ Start HDFS:start-dfs.cmd


2.​ This will start the NameNode and DataNode.​
Start YARN: start-yarn.cmd
3.​ This starts ResourceManager and NodeManager.​

4. Access Web Interfaces

After starting the services, open your browser and check:

●​ NameNode UI → http://localhost:9870​

●​ ResourceManager UI → http://localhost:8088​

​ ​ ​ ​ ​ Fig 4.1 Localhost:8088



​ ​ ​ Fig 4.2 Localhost:9870

You might also like