0% found this document useful (0 votes)
27 views18 pages

Hadoop Installation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views18 pages

Hadoop Installation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

EXP NO: 1A INSTALLATION OF JAVA

DATE:

AIM:

To setup java development environment for working with Hadoop software.


STEPS:

1. sudo apt-get update

2. In this step, we will install latest version of JDK(1.8) on the machine.

The Oracle JDK is the official JDK; however, it is nolonger provided by Oracle as a default
installation for Ubuntu. You can still install it using apt-get.

To install any version, first execute the followingcommands:

a. sudo apt-get install python-software-properties

b. sudo add-apt-repository ppa:webupd8team/java

c. sudo apt-get update

Then, depending on the version you want to install, execute one of the following commands:

Oracle JDK 7: sudo apt-get install oracle-java7-installer

Oracle JDK 8: sudo apt-get install oracle-java8-installer

Follow these steps to setup Java on Windows and validate the install.
Download Java for Windows 10

Download the latest Java Development Kit installation file for Windows 10 to have the latest
features and bug fixes.

1. Using your preferred web browser, navigate to the Oracle Java Downloads page.
2. On the Downloads page, click the x64 Installer download link under
the Windows category. At the time of writing this article, Java version 17 is the latest
long-term support Java version.

Wait for the download to complete.

Install Java on Windows 10

After downloading the installation file, proceed with installing Java on your Windows system.

Follow the steps below:

Step 1: Run the Downloaded File

Double-click the downloaded file to start the installation.

Step 2: Configure the Installation Wizard

After running the installation file, the installation wizard welcome screen appears.

1. Click Next to proceed to the next step.


2. Choose the destination folder for the Java installation files or stick to the default path.
Click Next to proceed.

3. Wait for the wizard to finish the installation process until the Successfully Installed message
appears. Click Close to exit the wizard.

Set Environmental Variables in Java

Set Java environment variables to enable program compiling from any directory. To do so,
follow the steps below:

Step 1: Add Java to System Variables

1. Open the Start menu and search for environment variables.

2. Select the Edit the system environment variables result.


3. In the System Properties window, under the Advanced tab, click Environment Variables…

4. Under the System variables category, select the Path variable and click Edit:
5. Click the New button and enter the path to the Java bin directory:

Download Java for Windows 10

Download the latest Java Development Kit installation file for Windows 10 to have the latest
features and bug fixes.

1. Using your preferred web browser, navigate to the Oracle Java Downloads page.
2. On the Downloads page, click the x64 Installer download link under
the Windows category. At the time of writing this article, Java version 17 is the latest
long-term support Java version.

Wait for the download to complete.

Install Java on Windows 10

After downloading the installation file, proceed with installing Java on your Windows system.
Follow the steps below:

Step 1: Run the Downloaded File

Double-click the downloaded file to start the installation.

Step 2: Configure the Installation Wizard

After running the installation file, the installation wizard welcome screen appears.

1. Click Next to proceed to the next step.

2. Choose the destination folder for the Java installation files or stick to the default path.
Click Next to proceed.

3. Wait for the wizard to finish the installation process until the Successfully Installed message
appears. Click Close to exit the wizard.
Set Environmental Variables in Java

Set Java environment variables to enable program compiling from any directory. To do so,
follow the steps below:

Step 1: Add Java to System Variables

1. Open the Start menu and search for environment variables.

2. Select the Edit the system environment variables result.

3. In the System Properties window, under the Advanced tab, click Environment Variables…
4. Under the System variables category, select the Path variable and click Edit:

5. Click the New button and enter the path to the Java bin directory:
Step 2: Add JAVA_HOME Variable

Some applications require the JAVA_HOME variable. Follow the steps below to create the
variable:

1. In the Environment Variables window, under the System variables category, click
the New… button to create a new variable.
2. Name the variable as JAVA_HOME.

3. In the variable value field, paste the path to your Java jdk directory and click OK.

4. Confirm the changes by clicking OK in the Environment Variables and System


properties windows.

Test the Java Installation

Run the java -version command in the command prompt to make sure Java installed correctly:

If installed correctly, the command outputs the Java version

RESULT:

Thus the installation of Java has been executed successfully.


EX:NO: 1B INSTALLATION OF HADOOP

DATE:

AIM:

Downloading and installing Hadoop; Understanding different Hadoop modes.


Startup scripts,Configuration files.

PROCEDURE:

Hadoop software can be installed in three modes ofoperation:

• Stand Alone Mode: Hadoop is a distributed software and is designed to run on a


commodity of machines. However, we can install it on a single node in stand-alone
mode. In this mode, Hadoop software runs as a single monolithic java process. This
mode is extremelyuseful for debugging purpose. You can first testrun your Map-
Reduce application in this mode on small data, before actually executing it on cluster
with big data.

• Pseudo Distributed Mode: In this mode also,Hadoop software is installed on a


Single Node.Various daemons of Hadoop will run on the same machine as separate
java processes. Hence all the daemons namely NameNode, DataNode,
SecondaryNameNode, JobTracker,TaskTracker run on single machine.

• Fully Distributed Mode: In Fully Distributed Mode, the daemons NameNode,


JobTracker, SecondaryNameNode (Optional and can be run on a separate node) run
on the Master Node.The daemons DataNode and TaskTracker runon the Slave Node.

Hadoop Installation: Ubuntu Operating System in stand-alonemode

STEPS:

1. Now, let us setup a new user account for Hadoop

installation. This step is optional, but recommendedbecause it gives you flexibility to have a
separate account for Hadoop installation by separating this installation from other software
installation

• sudo adduser hadoop_dev ( Upon executing this command, you will


prompted to enter the newpassword for this user. Please enter the password
and enter other details. Don’t forget to save the details at the end)

• su - hadoop_dev( Switches the user fromcurrent user to the new


user created i.e Hadoop_dev)

2. Download the latest Hadoop distribution.


• Visit this URL and choose one of the mirror sites.You can copy the download
link and also use “wget” to download it from command prompt:

We get http:// apache.mirrors.lucidnetworks.net/hadoop/

3. Untar the file :

common/hadoop-2.7.0/hadoop-2.7.0.tar.gz

tar xvzf hadoop-2.7.0.tar.gz

4. Rename the folder to hadoop2

mv hadoop-2.7.0 hadoop2

5. Edit configuration file /home/hadoop_dev/ hadoop2/etc/hadoop/hadoop-


env.sh and setJAVA_HOME in that file.

vim /home/hadoop_dev/hadoop2/etc/hadoop/

• hadoop-env.sh
• uncomment JAVA_HOME and update it followingline:

export JAVA_HOME=/usr/lib/jvm/java-8- oracle

( Please check for your relevant java installation and set this value accordingly. Latest
versions of Hadoop require > JDK1.7)

6. Let us verify if the installation is successful or not

( change to home directory cd /home/ hadoop_dev/hadoop2/):

• bin/hadoop( running this command shouldprompt you with various


options)
7. This finishes the Hadoop setup in stand-alonemode.

8. Let us run a sample hadoop programs that isprovided to you in the download
package:

$ mkdir input (create the input directory)

$ cp etc/hadoop/*.xml input ( copy over all the xml files to input folder)

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-


2.7.0.jar grepinput output 'dfs[a-z.]+'

(grep/find all the files matching the pattern ‘dfs[a-z.]+’ and copy those files
to output directory)

$ cat output/* (look for the output in the outputdirectory that Hadoop creates
for you).

Hadoop Installation: PsuedoDistributed Mode( Locally )

Steps for Installation

1. Edit the file /home/Hadoop_dev/hadoop2/etc/hadoop/core-site.xml as below:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Note: This change sets the namenode ip and port.

2. Edit the file /home/Hadoop_dev/hadoop2/etc/hadoop/hdfs-site.xml as below:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Note: This change sets the default replicationcount for blocks used by HDFS.

3. We need to setup password less login so that themaster will be able to do a password-
less ssh to start the daemons on all the slaves.

Check if ssh server is running on your host or not:

a. ssh localhost( enter your password and if youare able to login then ssh server is
running)

b. In step a. if you are unable to login, then installssh as follows:

sudo apt-get install ssh

C.Setup password less login as below:

i. ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

ii. cat ~/.ssh/id_dsa.pub >> ~/.ssh/

We can run Hadoop jobs locally or on YARN in this mode. In this Post, we
will focus on authorized_keys

4. running thejobs locally.

5. Format the file system. When we format namenode it formats the meta-data related to
data-nodes. By doing that, all the information on the datanodes are lost and they can be
reused for newdata:

a. bin/hdfs namenode –format

6. Start the daemons

a. sbin/start-dfs.sh (Starts NameNode andDataNode)


You can check If NameNode has started successfully or not by using the following
web interface: http://0.0.0.0:50070 .

If you are unable tosee this, try to check the logs in the /home/ hadoop_dev/hadoop2/logs
folder.

7. You can check whether the daemons are runningor not by issuing Jps command.

8. This finishes the installation of Hadoop in pseudodistributed mode.

9. Let us run the same example we can in theprevious blog post:

i) Create a new directory on the hdfs

bin/hdfs dfs -mkdir –p /user/hadoop_dev


Copy the input files for the program to hdfs:
bin/hdfs dfs -put etc/hadoop input

Run the program:

bin/hadoop jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-


2.6.0.jar grep

input output 'dfs[a-z.]+'


ii) View the output on hdfs:

bin/hdfs dfs -cat output/*

10. Stop the daemons when you are done executing the jobs, with the below command:

sbin/stop-dfs.sh

Hadoop Installation – PsuedoDistributed Mode( YARN )

Steps for Installation

1. Edit the file /home/hadoop_dev/hadoop2/etc/hadoop/mapred-site.xml as below:


<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

2. Edit the fie /home/hadoop_dev/hadoop2/etc/hadoop/yarn-site.xml as below:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Note: This particular configuration tells

MapReduce how to do its shuffle. In this case ituses the mapreduce_shuffle.

3. Format the NameNode:

bin/hdfs namenode –format

4. Start the daemons using the command:

sbin/start-yarn.sh
This starts the daemons ResourceManager andNodeManager.

Once this command is run, you can check if ResourceManager is running or not by visiting
the following URL on browser : http://0.0.0.0:8088 . If you are unable to see this, check for
the logs in thedirectory: /home/hadoop_dev/hadoop2/logs

5. To check whether the services are running, issuea jps command. The following shows all
the services necessary to run YARN on a single server:

$ jps
15933 Jps
15567 ResourceManager
15785 NodeManager
6. Let us run the same example as we ran before:

i) Create a new directory on the hdfs

bin/hdfs dfs -mkdir –p /user/hadoop_dev

Copy the input files for the program to hdfs:


bin/hdfs dfs -put etc/hadoop input

ii) Run the program:

bin/yarn jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep

input output 'dfs[a-z.]+'

iii) View the output on hdfs:

bin/hdfs dfs -cat output/*

7. Stop the daemons when you are done executingthe jobs, with the below command:

sbin/stop-yarn.sh

This completes the installation part of Hadoop.

RESULT:

Thus the installation of Hadoop has been executed successfully.

You might also like