BDAO

Ex no :1
Date :
1. Downloading and installing Hadoop; Understanding different

Hadoop modes. Startup scripts, Configuration files.
Aim
To Download and install Hadoop and understand different Hadoop modes. Startup
scripts, Configuration files.
Algorithm: Install and Configure Apache Hadoop on Linux
Step 1: Install Java

1.1. Execute "sudo apt install default-jdk default-jre -y" to install the latest Java version.
1.2. Verify the installed Java version with "java -version."
Step 2: Create Hadoop User and Configure Password-less SSH

2.1. Add a new user "hadoop" with "sudo adduser hadoop."
2.2. Add the "hadoop" user to the sudo group with "sudo usermod -aG sudo hadoop."
2.3
2.4. Install OpenSSH server and client with "apt install openssh-server openssh-client -y."
2.5. Generate SSH key pairs with "ssh-keygen -t rsa."
2.6. Append the public key to authorized_keys with "sudo cat ~/.ssh/id_rsa.pub >>
~/.ssh/authorized_keys."
2.7. Set permissions for authorized_keys with "sudo chmod 640 ~/.ssh/authorized_keys."
2.8. Verify password-less SSH with "ssh localhost."
Step 3: Install Apache Hadoop

3.1. Log in as the "hadoop" user with "sudo su - hadoop."
3.2. Download the latest stable Hadoop version from Apache Hadoop's official download
page with "wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-
3.3.1.tar.gz."
3.3. Extract the downloaded file with "tar -xvzf hadoop-3.3.1.tar.gz."
3.4. Move the extracted directory to "/usr/local/hadoop" with "sudo mv hadoop-3.3.1
/usr/local/hadoop."
3.5. Create a directory for system logs with "sudo mkdir /usr/local/hadoop/logs."
3.6. Change ownership of the Hadoop directory with "sudo chown -R hadoop:hadoop
/usr/local/hadoop."
Step 4: Configure Hadoop
4.1. Edit "~/.bashrc" to configure Hadoop environment variables with "sudo nano
~/.bashrc."
4.2. Add the specified environment variable lines and save the file.
4.3. Activate the environment variables with "source ~/.bashrc."
Step 5: Configure Java Environment Variables

5.1. Find the Java path with "which javac."
5.2. Find the OpenJDK directory with "readlink -f /usr/bin/javac."
5.3. Edit "hadoop-env.sh" with "sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh."
5.4. Add the specified Java environment variable lines and save the file.
5.5. Download the Javax activation file to Hadoop's lib directory.
5.6. Verify the Hadoop version with "hadoop version."
5.7. Edit "core-site.xml" to specify the NameNode URL.
5.8. Create directories for node metadata and change ownership.
5.9. Edit "hdfs-site.xml" to define node metadata locations.
5.10. Edit "mapred-site.xml" to configure MapReduce settings.
5.11. Edit "yarn-site.xml" for YARN-related settings.
5.12. Log in with hadoop user.
5.13.Validate the Hadoop configuration and format the HDFS NameNode
Step 6: Start Hadoop Components

6.1. Start the NameNode and DataNode with "start-dfs.sh."
6.2. Start the YARN resource and node managers with "start-yarn.sh."
6.3. Verify running components with "jps."
Step 7: Access Apache Hadoop Web Interface

7.1. Access the Hadoop NameNode in a web browser using "http://server-IP:9870," where
"server-IP" is your server's IP address.
1. Install Java
1.1 Install the latest version of Java.
$ sudo apt install default-jdk default-jre –y
1.2 Verify the installed version of Java.
$ java –version
2. Create Hadoop User and Configure Password-less SSH
2.1. Add a new user hadoop.
$ sudo adduser hadoop
2.2 Add the hadoop user to the sudo group.
$ sudo usermod -aG sudo hadoop

2.3 Switch to the created user.
$ sudo su – hadoop
2.4 Install the OpenSSH server and client.
$ sudo apt install openssh-server openssh-client –y
2.5 Switch to the created user.
2.5 Generate public and private key pairs.
$ ssh-keygen -t rsa
2.6 Add the generated public key from id_rsa.pub to authorized_keys.

$ sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
2.7 Change the permissions of the authorized_keys file.
$ sudo chmod 640 ~/.ssh/authorized_keys
2.8 Verify if the password-less SSH is functional.
$ ssh localhost
3. Install Apache Hadoop
3.1 Log in with hadoop user.
3.2 Download the latest stable version of Hadoop.

$ wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-
3.3.1.tar.gz
3.3 Extract the downloaded file.
$ tar -xvzf hadoop-3.3.1.tar.gz

3.4 Move the extracted directory to the /usr/local/ directory.
$ sudo mv hadoop-3.3.1 /usr/local/hadoop
3.5 Create directory to store system logs.
$ sudo mkdir /usr/local/hadoop/logs

3.6 Change the ownership of the hadoop directory.
$ sudo chown -R hadoop:hadoop /usr/local/hadoop
4. Configure Hadoop
4.1Edit file ~/.bashrc to configure the Hadoop environment variables.
$ sudo nano ~/.bashrc
Add the following lines to the file. Save and close the file.
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Activate the environment variables.

$ source ~/.bashrc
5. Configure Java Environment Variables

Hadoop has a lot of components that enable it to perform its core functions. To configure
these components such as YARN, HDFS, MapReduce, and Hadoop-related project settings,
you need to define Java environment variables in hadoop-env.sh configuration file
5.1 Find the Java path.
$ which javac
5.2 Find the OpenJDK directory.
$ readlink -f /usr/bin/javac
5.3 Edit the hadoop-env.sh file.
Add the following lines to the file. Then, close and save the file.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"
$ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
5.4 Browse to the hadoop lib directory.
$ cd /usr/local/hadoop/lib
5.5 Download the Javax activation file.

$ sudo wget https://jcenter.bintray.com/javax/activation/javax.activation-
api/1.2.0/javax.activation-api-1.2.0.jar
5. 6 Verify the Hadoop version.
$ hadoop version
5.7 Edit the core-site.xml configuration file to specify the URL for your NameNode. Add the
following lines. Save and close the file.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value>
<description>The default file system URI</description>
</property>
</configuration>
$ sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

5.8 Create a directory for storing node metadata and change the ownership to hadoop.
a) $ sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}
b) $ sudo chown -R hadoop:hadoop /home/hadoop/hdfs

5.9 Edit hdfs-site.xml configuration file to define the location for storing node metadata, fs-
image file.Add the following lines. Close and save the file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

5.10 Edit mapred-site.xml configuration file to define MapReduce values.Add the
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
$ sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

5.11 Edit the yarn-site.xml configuration file and define YARN-related settings. Add the
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
$ sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

5.12 Log in with hadoop user.
$ sudo su - hadoop
5.13 Validate the Hadoop configuration and format the HDFS NameNode.
$ hdfs namenode –format

6. Start the Apache Hadoop Cluster
6.1 Start the NameNode and DataNode.
$ start-dfs.sh
6.2 Start the YARN resource and node managers.
$ start-yarn.sh
Verify all the running components.

$ jps
7. Access Apache Hadoop Web Interface
You can access the Hadoop NameNode on your browser via http://server-IP:9870. For
example:
http://127.0.0.2:9870
Result:
The successful completion of these steps ensures that Apache Hadoop is installed,
configured, and running on the Linux system, ready to process and manage big data
workloads

BDAO

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

BDAO

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BDAO

Uploaded by

Copyright:

Available Formats

Ex no :1

1. Downloading and installing Hadoop; Understanding different

Algorithm: Install and Configure Apache Hadoop on Linux

Step 1: Install Java

Step 2: Create Hadoop User and Configure Password-less SSH

Step 3: Install Apache Hadoop

Step 5: Configure Java Environment Variables

Step 6: Start Hadoop Components

Step 7: Access Apache Hadoop Web Interface

1.1 Install the latest version of Java.

$ sudo apt install default-jdk default-jre –y

1.2 Verify the installed version of Java.

2.1. Add a new user hadoop.

$ sudo adduser hadoop

2.2 Add the hadoop user to the sudo group.

$ sudo usermod -aG sudo hadoop

2.4 Install the OpenSSH server and client.

$ sudo apt install openssh-server openssh-client –y

2.5 Switch to the created user.

2.6 Add the generated public key from id_rsa.pub to authorized_keys.

2.7 Change the permissions of the authorized_keys file.

$ sudo chmod 640 ~/.ssh/authorized_keys

2.8 Verify if the password-less SSH is functional.

3.1 Log in with hadoop user.

3.2 Download the latest stable version of Hadoop.

3.3 Extract the downloaded file.

$ tar -xvzf hadoop-3.3.1.tar.gz

$ sudo mv hadoop-3.3.1 /usr/local/hadoop

3.5 Create directory to store system logs.

$ sudo mkdir /usr/local/hadoop/logs

$ sudo chown -R hadoop:hadoop /usr/local/hadoop

Activate the environment variables.

5. Configure Java Environment Variables

5.1 Find the Java path.

5.2 Find the OpenJDK directory.

5.3 Edit the hadoop-env.sh file.

5.5 Download the Javax activation file.

5. 6 Verify the Hadoop version.

<description>The default file system URI</description>

$ sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

a) $ sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}

b) $ sudo chown -R hadoop:hadoop /home/hadoop/hdfs

$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

$ sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

$ sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

$ hdfs namenode –format

6.1 Start the NameNode and DataNode.

6.2 Start the YARN resource and node managers.

Verify all the running components.

You might also like