Bda Practical File
Bda Practical File
Submitted by:
Name: Upanshu Jha
Branch: AI-DS
Enrollment no: 01313211921
INDEX
EXPERIMENT 1
Procedure:
convenient location.
Hadoop: https://hadoop.apache.org/releases.html
In core-site.xml
In hdfs-site.xml
In mapred-site.xml
To start Hadoop, open a command prompt and navigate to the Hadoop bin folder.
Run the following command:
This command will start all the required Hadoop services, including the
NameNode, DataNode, and JobTracker. Wait for a few minutes until all the
services are started.
● Remember to get the most recent stable version of Hadoop, install Java,
configure Hadoop, format the NameNode, and start Hadoop services.
Finally, check the NameNode web interface to ensure that Hadoop is
properly installed.
EXPERIMENT 2
Introduction :
There are three components of Hadoop:
To use the HDFS commands, first you need to start the Hadoop services using the
following command:
sbin/start-all.sh
To check the Hadoop services are up and running use the following command:
Jps
Commands:
1. ls:
This command is used to list all the files. Use lsr for a recursive approach. It
is useful when we want a hierarchy of a folder. Syntax:
bin/hdfs dfs -ls <path>
Example:
bin/hdfs dfs -ls /
It will print all the directories present in HDFS. bin directory contains
executables so, bin/hdfs means we want the executables of hdfs particularly
dfs(Distributed File System) commands.
2. Fs-ls-r
Example:
4. copyFromLocal (or) put: To copy files/folders from local file system to hdfs
store. This is the most important command. Local filesystem means the files
present on the OS.
Syntax:
Example: Let’s suppose we have a file AI.txt on Desktop which we want to copy
to folder geeks present on hdfs.
5. copyToLocal (or) get: To copy files/folders from hdfs store to local file system.
Syntax:
Example:
myfile.txt from geeks folder will be copied to folder hero present on Desktop.
Note: Observe that we don’t write bin/hdfs while checking the things present on
the local filesystem.
6. put — this command is used to copy the data from the local file system to
HDFS.
7. get — this command is used to copy the data from HDFS to the local file
system. This command is the reverse of the ‘put’ command.
8. cat — command used to view the data from the file in HDFS
10. cp — this command is used to copy a file from one location to HDFS to
another location within HDFS only.
11. moveFromLocal — this command is used for moving a file or directory from
the local file system to HDFS.
12. moveToLocal — this command is used for moving a file or directory from
HDFS to the local file system. This command is yet not implemented, but soon
will be.
14. tail — this command is used to read the tail/end part of the file from HDFS. It
has an additional parameter “[-f]”, that is used to show the appended data to the
file.
hadoop fs -expunge
16. chown — we should use this command when we want to change the user of a
file or directory in HDFS.
We can verify if the user changed or not using the hadoop -ls command or from
WebUI.
17. chgrp — we should use this command when we want to change the group of a
file or directory in HDFS.
We can verify if the user changed or not using the hadoop -ls command or from
WebUI.
18. setrep — this command is used to change the replication factor of a file in
HDFS.
19. du — this command is used to check the amount of disk usage of the file or
directory.
20. df — this command is used to shows the capacity, free space and size of the
HDFS file system. It has an additional parameter “[-h]” to convert the data to a
human-readable format.
21. fsck — this command is used to check the health of the files present in the
HDFS file system.
22. touchz — this command creates a new file in the specified directory of size 0.
23. test — this command answer various questions about <HDFS path>, with the
result via exit status.
24. text — this is a simple command, used to print the data of an HDFS file on the
console.
25. stat — this command provides the stat of the file or directory in HDFS.
26. usage — Displays the usage for given command or all commands if none is
specified.
27. help — Displays help for given command or all commands if none is specified.
28. chmod — is used to change the permission of the file in the HDFS file system.
Old Permission
New Permission
29. appendToFile — this command is used to merge two files from the local file
system to one file in the HDFS file.
hadoop fs -appendToFile <Local file path1> <Local file path2> <HDFS file path>
30. checksum — this command is used to check the checksum of the file in the
HDFS file system.
31. count — it counts the number of files, directories and size at a particular path.
This function also has few functions to modify the query as per need.
32. find — this command is used to find the files in the HDFS file system. We
need to provide the expression that we are looking for and can also provide a path
if we want to look for the file at a particular directory.
33. getmerge — this command is used to merge the contents of a directory from
HDFS to a file in the local file system.