0% found this document useful (0 votes)
6 views

Bda Practical File

Notes

Uploaded by

Mayur Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Bda Practical File

Notes

Uploaded by

Mayur Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

0

GURU TEGH BAHADUR


INSTITUTE OF TECHNOLOGY

Big Data Analytics


Practical File

Submitted by:
Name: Upanshu Jha
Branch: AI-DS
Enrollment no: 01313211921

INDEX

Upanshu Jha 01313211921 AI-DS


1

S.No. Topic Page Date of Date of Teacher’s


No. Experiment Submission Signature

Upanshu Jha 01313211921 AI-DS


2

Upanshu Jha 01313211921 AI-DS


3

EXPERIMENT 1

Aim: Install Apache Hadoop

Introduction: Apache Hadoop software is an open source framework that allows


for the distributed storage and processing of large datasets across clusters of
computers using simple programming models. Hadoop is designed to scale up
from a single computer to thousands of clustered computers, with each machine
offering local computation and storage. In this way, Hadoop can efficiently store
and process large datasets ranging in size from gigabytes to petabytes of data.

Procedure:

Step 1: Download and install Java


Hadoop is built on Java, so you must have Java installed on your PC. You can get
the most recent version of Java from the official website. After downloading,
follow the installation wizard to install Java on your system.
JDK: https://www.oracle.com/java/technologies/javase-downloads.html

Step 2: Download Hadoop


Hadoop can be downloaded from the Apache Hadoop website. Make sure to have
the latest stable release of Hadoop. Once downloaded, extract the contents to a

Upanshu Jha 01313211921 AI-DS


4

convenient location.
Hadoop: https://hadoop.apache.org/releases.html

Step 3: Set Environment Variables


You must configure environment variables after downloading and unpacking
Hadoop. Launch the Start menu, type “Edit the system environment variables,” and
select the result. This will launch the System Properties dialogue box. Click on
“Environment Variables” button to open.
Click “New” under System Variables to add a new variable. Enter the variable
name “HADOOP_HOME” and the path to the Hadoop folder as the variable value.
Then press “OK.”
Then, under System Variables, locate the “Path” variable and click “Edit.” Click
“New” in the Edit Environment Variable window and enter “%HADOOP_HOME
%bin” as the variable value. To close all the windows, use the “OK” button.

Step 4: Setup Hadoop


You must configure Hadoop in this phase by modifying several configuration files.
Navigate to the “etc/hadoop” folder in the Hadoop folder. You must make changes
to three files:
● core-site.xml
● hdfs-site.xml
● mapred-site.xml
Open each file in a text editor and edit the following properties:

In core-site.xml

Upanshu Jha 01313211921 AI-DS


5

In hdfs-site.xml

In mapred-site.xml

Save the changes in each file.

Step 5: Format Hadoop NameNode


You must format the NameNode before you can start Hadoop. Navigate to the
Hadoop bin folder using a command prompt. Execute this command:

Step 6: Start Hadoop

Upanshu Jha 01313211921 AI-DS


6

To start Hadoop, open a command prompt and navigate to the Hadoop bin folder.
Run the following command:

This command will start all the required Hadoop services, including the
NameNode, DataNode, and JobTracker. Wait for a few minutes until all the
services are started.

Step 7: Verify Hadoop Installation

To ensure that Hadoop is properly installed, open a web browser and go to


http://localhost:50070/. This will launch the web interface for the Hadoop
NameNode. You should see a page with Hadoop cluster information.

● Remember to get the most recent stable version of Hadoop, install Java,
configure Hadoop, format the NameNode, and start Hadoop services.
Finally, check the NameNode web interface to ensure that Hadoop is
properly installed.

Upanshu Jha 01313211921 AI-DS


7

EXPERIMENT 2

Aim : To study all Hadoop Commands

Introduction :
There are three components of Hadoop:

1. Hadoop HDFS - Hadoop Distributed File System (HDFS) is the storage


unit.
2. Hadoop MapReduce - Hadoop MapReduce is the processing unit.
3. Hadoop YARN - Yet Another Resource Negotiator (YARN) is a resource
management unit.

Basic Hadoop Commands

To use the HDFS commands, first you need to start the Hadoop services using the
following command:

Upanshu Jha 01313211921 AI-DS


8

sbin/start-all.sh
To check the Hadoop services are up and running use the following command:

Jps

Commands:
1. ls:

This command is used to list all the files. Use lsr for a recursive approach. It
is useful when we want a hierarchy of a folder. Syntax:
bin/hdfs dfs -ls <path>

Example:
bin/hdfs dfs -ls /

It will print all the directories present in HDFS. bin directory contains
executables so, bin/hdfs means we want the executables of hdfs particularly
dfs(Distributed File System) commands.

2. Fs-ls-r

Upanshu Jha 01313211921 AI-DS


9

Use R to display the files and subdirectories inside a directory recursively.

3. mkdir: To create a directory. In Hadoop dfs there is no home directory by


default. So let’s first create it. Syntax:
bin/hdfs dfs -mkdir <folder name>

Example:

Upanshu Jha 01313211921 AI-DS


10

4. copyFromLocal (or) put: To copy files/folders from local file system to hdfs
store. This is the most important command. Local filesystem means the files
present on the OS.

Syntax:

Example: Let’s suppose we have a file AI.txt on Desktop which we want to copy
to folder geeks present on hdfs.

Upanshu Jha 01313211921 AI-DS


11

5. copyToLocal (or) get: To copy files/folders from hdfs store to local file system.

Syntax:

Example:

myfile.txt from geeks folder will be copied to folder hero present on Desktop.

Note: Observe that we don’t write bin/hdfs while checking the things present on
the local filesystem.

6. put — this command is used to copy the data from the local file system to
HDFS.

hadoop fs -put <Local File Path> <HDFS file path>

Upanshu Jha 01313211921 AI-DS


12

We can verify the same from HDFS WebUI.

7. get — this command is used to copy the data from HDFS to the local file
system. This command is the reverse of the ‘put’ command.

hadoop fs -get <HDFS file path> <Local File Path>

We can verify the same from our local file system.

Upanshu Jha 01313211921 AI-DS


13

8. cat — command used to view the data from the file in HDFS

hadoop fs -cat <HDFS file path with file name>

9. mv — this command is used to move a file from one location to HDFS to


another location in HDFS.

hadoop fs -mv <Source HDFS path> <Destination HDFS path>

We can verify the same from Web UI.

Upanshu Jha 01313211921 AI-DS


14

10. cp — this command is used to copy a file from one location to HDFS to
another location within HDFS only.

hadoop fs -cp <Source HDFS path> <Destination HDFS path>

We can verify the same from Web UI.

11. moveFromLocal — this command is used for moving a file or directory from
the local file system to HDFS.

Upanshu Jha 01313211921 AI-DS


15

hadoop fs -moveFromLocal <Local File Path> <HDFS file path>

12. moveToLocal — this command is used for moving a file or directory from
HDFS to the local file system. This command is yet not implemented, but soon
will be.

hadoop fs -moveToLocal <HDFS file path> <Local File Path>

13. rm — removes, this command is used to delete/remove a file from HDFS.

hadoop fs -rm <HDFS file path>

Upanshu Jha 01313211921 AI-DS


16

14. tail — this command is used to read the tail/end part of the file from HDFS. It
has an additional parameter “[-f]”, that is used to show the appended data to the
file.

hadoop fs -tail [-f] <HDFS file path>

15. expunge — this command is used to make the trash empty.

hadoop fs -expunge

16. chown — we should use this command when we want to change the user of a
file or directory in HDFS.

hadoop fs -chown <HDFS file path>

Upanshu Jha 01313211921 AI-DS


17

We can verify if the user changed or not using the hadoop -ls command or from
WebUI.

17. chgrp — we should use this command when we want to change the group of a
file or directory in HDFS.

hadoop fs -chgrp <HDFS file path>

We can verify if the user changed or not using the hadoop -ls command or from
WebUI.

Upanshu Jha 01313211921 AI-DS


18

18. setrep — this command is used to change the replication factor of a file in
HDFS.

hadoop fs -setrep <Replication Factor> <HDFS file path>

We can check it from the WebUI.

19. du — this command is used to check the amount of disk usage of the file or
directory.

Upanshu Jha 01313211921 AI-DS


19

hadoop fs -du <HDFS file path>

20. df — this command is used to shows the capacity, free space and size of the
HDFS file system. It has an additional parameter “[-h]” to convert the data to a
human-readable format.

hadoop fs -df [-h] <HDFS file path>

21. fsck — this command is used to check the health of the files present in the
HDFS file system.

hadoop fsck <HDFS file path>

Upanshu Jha 01313211921 AI-DS


20

It also has some attributes/options to modify the command use.

22. touchz — this command creates a new file in the specified directory of size 0.

hadoop fs -touchz <HDFS file path>

Upanshu Jha 01313211921 AI-DS


21

The new file can be seen in the WebUI.

23. test — this command answer various questions about <HDFS path>, with the
result via exit status.

hadoop fs -test <HDFS file path>

24. text — this is a simple command, used to print the data of an HDFS file on the
console.

hadoop fs -text <HDFS file path>

25. stat — this command provides the stat of the file or directory in HDFS.

Upanshu Jha 01313211921 AI-DS


22

hadoop fs -stat <HDFS file path>

It can provide data in the following formats. By default, it uses ‘%y’.

26. usage — Displays the usage for given command or all commands if none is
specified.

hadoop fs -usage <command>

Upanshu Jha 01313211921 AI-DS


23

27. help — Displays help for given command or all commands if none is specified.

hadoop fs -help <command>

28. chmod — is used to change the permission of the file in the HDFS file system.

hadoop fs -chmod [-r] <HDFS file path>

Old Permission

Upanshu Jha 01313211921 AI-DS


24

hadoop chmod old permission

New Permission

hadoop chmod new permission

29. appendToFile — this command is used to merge two files from the local file
system to one file in the HDFS file.

hadoop fs -appendToFile <Local file path1> <Local file path2> <HDFS file path>

Upanshu Jha 01313211921 AI-DS


25

30. checksum — this command is used to check the checksum of the file in the
HDFS file system.

hadoop fs -checksum <HDFS file Path>

31. count — it counts the number of files, directories and size at a particular path.

hadoop fs -count [options] <HDFS directory path>

This function also has few functions to modify the query as per need.

Upanshu Jha 01313211921 AI-DS


26

hadoop count options

32. find — this command is used to find the files in the HDFS file system. We
need to provide the expression that we are looking for and can also provide a path
if we want to look for the file at a particular directory.

hadoop fs -find <HDFS directory path> <Expression>

33. getmerge — this command is used to merge the contents of a directory from
HDFS to a file in the local file system.

hadoop fs -getmerge <HDFS directory> <Local file path>

Upanshu Jha 01313211921 AI-DS


27

The merged file can be seen in the local file system.

Upanshu Jha 01313211921 AI-DS

You might also like