0% found this document useful (0 votes)
18 views

Bigdata Lab File

University lab file that contains all basic practicals one needs to go through to get to know the subject.

Uploaded by

Grette
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Bigdata Lab File

University lab file that contains all basic practicals one needs to go through to get to know the subject.

Uploaded by

Grette
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Remarks

Experiment 1

Objective: In this practical, you will learn how to download, install, and configure Hadoop, one of the most
popular distributed storage and processing frameworks. You will also gain an understanding of different
Hadoop modes, explore startup scripts, and work with configuration files.
Prerequisites:
• A Linux-based operating system (e.g., Ubuntu) or access to a virtual machine with Linux installed.
• Basic command-line skills.
• Java Development Kit (JDK) 8 or higher installed.
• Familiarity with basic Linux commands.
Materials:
• A computer with internet access.
• Hadoop distribution (Hadoop can be downloaded from the official Apache Hadoop website).
Procedure:
1. Downloading and Installing Hadoop:
1.1. Open a terminal on your Linux system.
1.2. Download the Hadoop distribution from the official Apache Hadoop website:
https://hadoop.apache.org/releases.html.
1.3. Choose the latest stable version and download the binary distribution. For example, you can use wget
or curl to download the distribution:
wget https://www.apache.org/dyn/closer.lua/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
1.4. Extract the downloaded Hadoop archive:
tar -xzvf hadoop-3.3.1.tar.gz
1.5. Move the extracted Hadoop directory to a suitable location (e.g., /usr/local):
sudo mv hadoop-3.3.1 /usr/local

2. Understanding Different Hadoop Modes:


Hadoop can operate in three different modes:
• Local (Standalone) Mode: Used for debugging, running Hadoop on a single machine.
• Pseudo-Distributed Mode: Simulates a cluster on a single machine, useful for development and
testing.
• Fully-Distributed Mode: Deploys Hadoop on a cluster of multiple machines.
2.1. Open the Hadoop configuration file, hadoop-env.sh, located in the etc/hadoop directory and set the
JAVA_HOME environment variable to point to your JDK installation:
export JAVA_HOME=/path/to/your/jdk
2.2. Explore the core-site.xml and hdfs-site.xml configuration files in the etc/hadoop directory. Understand
how they define various properties like Hadoop filesystem and data node settings.
3. Startup Scripts:
3.1. Navigate to the sbin directory in your Hadoop installation (e.g., /usr/local/hadoop-3.3.1/sbin).
3.2. Run the following command to start the Hadoop NameNode and DataNode in a pseudo-distributed
mode:
./start-dfs.sh
3.3. Open a web browser and visit the Hadoop NameNode web interface at http://localhost:9870 to check
the cluster status.
3.4. Use the following command to stop the Hadoop cluster:
./stop-dfs.sh

4. Configuration Files:
4.1. Explore other configuration files in the etc/hadoop directory, such as mapred-site.xml and yarn-
site.xml. Understand their purposes and how they affect Hadoop behavior.
4.2. Modify the configuration files to change various Hadoop settings. For example, increase the replication
factor in the hdfs-site.xml file.

5. Conclusion:
By completing this practical, you have learned how to download, install, and configure Hadoop. You've also
gained an understanding of different Hadoop modes, worked with startup scripts, and explored
configuration files. This knowledge is essential for working with Hadoop and distributed data processing.
Experiment 2

Objective: In this practical, you will learn how to perform basic file management tasks in Hadoop, such as adding files
and directories, retrieving files, and deleting files using Hadoop Distributed File System (HDFS).

Prerequisites:

• Hadoop installed and configured (you can refer to the previous practical for installation and configuration).

• A running Hadoop cluster in Pseudo-Distributed or Fully-Distributed mode.

• Familiarity with basic Hadoop commands (e.g., hadoop fs, hdfs dfs).

Materials:

• Access to a Hadoop cluster.

Procedure:

1. Adding Files and Directories:

1.1. Open a terminal on your local machine.

1.2. Use the hadoop fs or hdfs dfs command to add a local file to the HDFS. For example, to add a file named
example.txt from your local system to HDFS:

hadoop fs -copyFromLocal /path/to/local/example.txt /user/yourusername/

1.3. Check if the file has been successfully copied to HDFS:

hadoop fs -ls /user/yourusername/

1.4. To create a directory in HDFS, use the following command:

hadoop fs -mkdir /user/yourusername/new_directory

2. Retrieving Files:

2.1. Retrieve a file from HDFS to your local filesystem using the hadoop fs or hdfs dfs command. For example, to
retrieve example.txt from HDFS to your local directory:

hadoop fs -copyToLocal /user/yourusername/example.txt /path/to/local/

2.2. Verify that the file has been copied to your local directory.

3. Deleting Files:

3.1. Use the hadoop fs or hdfs dfs command to delete a file in HDFS. For example, to delete example.txt:

hadoop fs -rm /user/yourusername/example.txt

3.2. Confirm that the file has been deleted:

hadoop fs -ls /user/yourusername/

4. Conclusion:

By completing this practical, you have learned how to perform basic file management tasks in Hadoop using HDFS.
You can add files and directories, retrieve files, and delete files as needed. These fundamental file management skills
are crucial when working with Hadoop for distributed data storage and processing.
Experiment 3
Experiment 4
Experiment 5
Experiment 6
Experiment 7
Experiment 8

You might also like