0% found this document useful (0 votes)

32 views28 pages

Bda Practical File

Notes

Uploaded by

Mayur Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views28 pages

Bda Practical File

Notes

Uploaded by

Mayur Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 28

0

GURU TEGH BAHADUR

INSTITUTE OF TECHNOLOGY

Big Data Analytics

Practical File

Submitted by:
Name: Upanshu Jha
Branch: AI-DS
Enrollment no: 01313211921

INDEX

Upanshu Jha 01313211921 AI-DS

S.No. Topic Page Date of Date of Teacher’s

No. Experiment Submission Signature

Upanshu Jha 01313211921 AI-DS

EXPERIMENT 1

Aim: Install Apache Hadoop

Introduction: Apache Hadoop software is an open source framework that allows

for the distributed storage and processing of large datasets across clusters of
computers using simple programming models. Hadoop is designed to scale up
from a single computer to thousands of clustered computers, with each machine
offering local computation and storage. In this way, Hadoop can efficiently store
and process large datasets ranging in size from gigabytes to petabytes of data.

Procedure:

Step 1: Download and install Java

Hadoop is built on Java, so you must have Java installed on your PC. You can get
the most recent version of Java from the official website. After downloading,
follow the installation wizard to install Java on your system.
JDK: https://www.oracle.com/java/technologies/javase-downloads.html

Step 2: Download Hadoop

Hadoop can be downloaded from the Apache Hadoop website. Make sure to have
the latest stable release of Hadoop. Once downloaded, extract the contents to a

Upanshu Jha 01313211921 AI-DS

convenient location.
Hadoop: https://hadoop.apache.org/releases.html

Step 3: Set Environment Variables

You must configure environment variables after downloading and unpacking
Hadoop. Launch the Start menu, type “Edit the system environment variables,” and
select the result. This will launch the System Properties dialogue box. Click on
“Environment Variables” button to open.
Click “New” under System Variables to add a new variable. Enter the variable
name “HADOOP_HOME” and the path to the Hadoop folder as the variable value.
Then press “OK.”
Then, under System Variables, locate the “Path” variable and click “Edit.” Click
“New” in the Edit Environment Variable window and enter “%HADOOP_HOME
%bin” as the variable value. To close all the windows, use the “OK” button.

Step 4: Setup Hadoop

You must configure Hadoop in this phase by modifying several configuration files.
Navigate to the “etc/hadoop” folder in the Hadoop folder. You must make changes
to three files:
● core-site.xml
● hdfs-site.xml
● mapred-site.xml
Open each file in a text editor and edit the following properties:

In core-site.xml

Upanshu Jha 01313211921 AI-DS

In hdfs-site.xml

In mapred-site.xml

Save the changes in each file.

Step 5: Format Hadoop NameNode

You must format the NameNode before you can start Hadoop. Navigate to the
Hadoop bin folder using a command prompt. Execute this command:

Step 6: Start Hadoop

Upanshu Jha 01313211921 AI-DS

To start Hadoop, open a command prompt and navigate to the Hadoop bin folder.
Run the following command:

This command will start all the required Hadoop services, including the
NameNode, DataNode, and JobTracker. Wait for a few minutes until all the
services are started.

Step 7: Verify Hadoop Installation

To ensure that Hadoop is properly installed, open a web browser and go to

http://localhost:50070/. This will launch the web interface for the Hadoop
NameNode. You should see a page with Hadoop cluster information.

● Remember to get the most recent stable version of Hadoop, install Java,
configure Hadoop, format the NameNode, and start Hadoop services.
Finally, check the NameNode web interface to ensure that Hadoop is
properly installed.

Upanshu Jha 01313211921 AI-DS

EXPERIMENT 2

Aim : To study all Hadoop Commands

Introduction :
There are three components of Hadoop:

1. Hadoop HDFS - Hadoop Distributed File System (HDFS) is the storage

unit.
2. Hadoop MapReduce - Hadoop MapReduce is the processing unit.
3. Hadoop YARN - Yet Another Resource Negotiator (YARN) is a resource
management unit.

Basic Hadoop Commands

To use the HDFS commands, first you need to start the Hadoop services using the
following command:

Upanshu Jha 01313211921 AI-DS

sbin/start-all.sh
To check the Hadoop services are up and running use the following command:

Jps

Commands:
1. ls:

This command is used to list all the files. Use lsr for a recursive approach. It
is useful when we want a hierarchy of a folder. Syntax:
bin/hdfs dfs -ls <path>

Example:
bin/hdfs dfs -ls /

It will print all the directories present in HDFS. bin directory contains
executables so, bin/hdfs means we want the executables of hdfs particularly
dfs(Distributed File System) commands.

2. Fs-ls-r

Upanshu Jha 01313211921 AI-DS

Use R to display the files and subdirectories inside a directory recursively.

3. mkdir: To create a directory. In Hadoop dfs there is no home directory by

default. So let’s first create it. Syntax:
bin/hdfs dfs -mkdir <folder name>

Example:

Upanshu Jha 01313211921 AI-DS

4. copyFromLocal (or) put: To copy files/folders from local file system to hdfs
store. This is the most important command. Local filesystem means the files
present on the OS.

Syntax:

Example: Let’s suppose we have a file AI.txt on Desktop which we want to copy
to folder geeks present on hdfs.

Upanshu Jha 01313211921 AI-DS

5. copyToLocal (or) get: To copy files/folders from hdfs store to local file system.

Syntax:

Example:

myfile.txt from geeks folder will be copied to folder hero present on Desktop.

Note: Observe that we don’t write bin/hdfs while checking the things present on
the local filesystem.

6. put — this command is used to copy the data from the local file system to
HDFS.

hadoop fs -put <Local File Path> <HDFS file path>

Upanshu Jha 01313211921 AI-DS

We can verify the same from HDFS WebUI.

7. get — this command is used to copy the data from HDFS to the local file
system. This command is the reverse of the ‘put’ command.

hadoop fs -get <HDFS file path> <Local File Path>

We can verify the same from our local file system.

Upanshu Jha 01313211921 AI-DS

8. cat — command used to view the data from the file in HDFS

hadoop fs -cat <HDFS file path with file name>

9. mv — this command is used to move a file from one location to HDFS to

another location in HDFS.

hadoop fs -mv <Source HDFS path> <Destination HDFS path>

We can verify the same from Web UI.

Upanshu Jha 01313211921 AI-DS

10. cp — this command is used to copy a file from one location to HDFS to
another location within HDFS only.

hadoop fs -cp <Source HDFS path> <Destination HDFS path>

We can verify the same from Web UI.

11. moveFromLocal — this command is used for moving a file or directory from
the local file system to HDFS.

Upanshu Jha 01313211921 AI-DS

hadoop fs -moveFromLocal <Local File Path> <HDFS file path>

12. moveToLocal — this command is used for moving a file or directory from
HDFS to the local file system. This command is yet not implemented, but soon
will be.

hadoop fs -moveToLocal <HDFS file path> <Local File Path>

13. rm — removes, this command is used to delete/remove a file from HDFS.

hadoop fs -rm <HDFS file path>

Upanshu Jha 01313211921 AI-DS

14. tail — this command is used to read the tail/end part of the file from HDFS. It
has an additional parameter “[-f]”, that is used to show the appended data to the
file.

hadoop fs -tail [-f] <HDFS file path>

15. expunge — this command is used to make the trash empty.

hadoop fs -expunge

16. chown — we should use this command when we want to change the user of a
file or directory in HDFS.

hadoop fs -chown <HDFS file path>

Upanshu Jha 01313211921 AI-DS

We can verify if the user changed or not using the hadoop -ls command or from
WebUI.

17. chgrp — we should use this command when we want to change the group of a
file or directory in HDFS.

hadoop fs -chgrp <HDFS file path>

We can verify if the user changed or not using the hadoop -ls command or from
WebUI.

Upanshu Jha 01313211921 AI-DS

18. setrep — this command is used to change the replication factor of a file in
HDFS.

hadoop fs -setrep <Replication Factor> <HDFS file path>

We can check it from the WebUI.

19. du — this command is used to check the amount of disk usage of the file or
directory.

Upanshu Jha 01313211921 AI-DS

hadoop fs -du <HDFS file path>

20. df — this command is used to shows the capacity, free space and size of the
HDFS file system. It has an additional parameter “[-h]” to convert the data to a
human-readable format.

hadoop fs -df [-h] <HDFS file path>

21. fsck — this command is used to check the health of the files present in the
HDFS file system.

hadoop fsck <HDFS file path>

Upanshu Jha 01313211921 AI-DS

It also has some attributes/options to modify the command use.

22. touchz — this command creates a new file in the specified directory of size 0.

hadoop fs -touchz <HDFS file path>

Upanshu Jha 01313211921 AI-DS

The new file can be seen in the WebUI.

23. test — this command answer various questions about <HDFS path>, with the
result via exit status.

hadoop fs -test <HDFS file path>

24. text — this is a simple command, used to print the data of an HDFS file on the
console.

hadoop fs -text <HDFS file path>

25. stat — this command provides the stat of the file or directory in HDFS.

Upanshu Jha 01313211921 AI-DS

hadoop fs -stat <HDFS file path>

It can provide data in the following formats. By default, it uses ‘%y’.

26. usage — Displays the usage for given command or all commands if none is
specified.

hadoop fs -usage <command>

Upanshu Jha 01313211921 AI-DS

27. help — Displays help for given command or all commands if none is specified.

hadoop fs -help <command>

28. chmod — is used to change the permission of the file in the HDFS file system.

hadoop fs -chmod [-r] <HDFS file path>

Old Permission

Upanshu Jha 01313211921 AI-DS

hadoop chmod old permission

New Permission

hadoop chmod new permission

29. appendToFile — this command is used to merge two files from the local file
system to one file in the HDFS file.

hadoop fs -appendToFile <Local file path1> <Local file path2> <HDFS file path>

Upanshu Jha 01313211921 AI-DS

30. checksum — this command is used to check the checksum of the file in the
HDFS file system.

hadoop fs -checksum <HDFS file Path>

31. count — it counts the number of files, directories and size at a particular path.

hadoop fs -count [options] <HDFS directory path>

This function also has few functions to modify the query as per need.

Upanshu Jha 01313211921 AI-DS

hadoop count options

32. find — this command is used to find the files in the HDFS file system. We
need to provide the expression that we are looking for and can also provide a path
if we want to look for the file at a particular directory.

hadoop fs -find <HDFS directory path> <Expression>

33. getmerge — this command is used to merge the contents of a directory from
HDFS to a file in the local file system.

hadoop fs -getmerge <HDFS directory> <Local file path>

Upanshu Jha 01313211921 AI-DS

The merged file can be seen in the local file system.

Upanshu Jha 01313211921 AI-DS

Hadoop Command Line Interface
No ratings yet
Hadoop Command Line Interface
10 pages
BMW E-SYS 3.24.2 With Patch For F-Series Coding + How To Install
No ratings yet
BMW E-SYS 3.24.2 With Patch For F-Series Coding + How To Install
2 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Guid 2020 PDF
No ratings yet
Guid 2020 PDF
248 pages
Hadoop 1
No ratings yet
Hadoop 1
15 pages
Ai&Ml (Bdamanual)
No ratings yet
Ai&Ml (Bdamanual)
24 pages
Hadoop
No ratings yet
Hadoop
6 pages
Hadoop Commands
No ratings yet
Hadoop Commands
2 pages
Hadoop Commands Only
No ratings yet
Hadoop Commands Only
19 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
BDA Record
No ratings yet
BDA Record
34 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Experiment No 1
No ratings yet
Experiment No 1
13 pages
4.hadoop Commands
No ratings yet
4.hadoop Commands
6 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
BDA UNIT - 3 Updated
No ratings yet
BDA UNIT - 3 Updated
25 pages
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
No ratings yet
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
10 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
Apache Hadoop
No ratings yet
Apache Hadoop
3 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
Exp1 Hirday Merged
No ratings yet
Exp1 Hirday Merged
102 pages
Hafs Commands
No ratings yet
Hafs Commands
17 pages
BDA-ALLEXP (2) - Merged
No ratings yet
BDA-ALLEXP (2) - Merged
149 pages
SSJ Bda File
No ratings yet
SSJ Bda File
16 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
26 pages
kh5 (Bda) Merged
No ratings yet
kh5 (Bda) Merged
21 pages
Hadoop Commands
100% (1)
Hadoop Commands
6 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
DSCI 551 - Lab 2 - Aayush Chamria
No ratings yet
DSCI 551 - Lab 2 - Aayush Chamria
3 pages
Command
No ratings yet
Command
1 page
Lab Assignment-1
No ratings yet
Lab Assignment-1
4 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
BDA Final Compiled - Pagenumber
No ratings yet
BDA Final Compiled - Pagenumber
71 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
Basic HDFS Commands
No ratings yet
Basic HDFS Commands
7 pages
HDFS File System Shell Guide
No ratings yet
HDFS File System Shell Guide
10 pages
HDFS Commands v02 PDF
No ratings yet
HDFS Commands v02 PDF
7 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
1 page
Big Data Unit - 2
No ratings yet
Big Data Unit - 2
18 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
Data Science
No ratings yet
Data Science
82 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
HDFS
No ratings yet
HDFS
6 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Ccs 334 Bigdata Manual
No ratings yet
Ccs 334 Bigdata Manual
45 pages
Practical 8
No ratings yet
Practical 8
5 pages
Basic Hadoop Commands
No ratings yet
Basic Hadoop Commands
7 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
DA Lab
No ratings yet
DA Lab
89 pages
Hadoop Record 2024-Final
No ratings yet
Hadoop Record 2024-Final
59 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
4th October Event Schedule (1) - 1
No ratings yet
4th October Event Schedule (1) - 1
1 page
Bisection Method
No ratings yet
Bisection Method
16 pages
Ai - Ds - 4thsem - 4th Aids
No ratings yet
Ai - Ds - 4thsem - 4th Aids
3 pages
Ai - Ds - 4thsem - 4th Aids
No ratings yet
Ai - Ds - 4thsem - 4th Aids
3 pages
Aids 2
No ratings yet
Aids 2
1 page
(JAVA FILE) Kuldeep Singh 11713202719 CSE-2
No ratings yet
(JAVA FILE) Kuldeep Singh 11713202719 CSE-2
60 pages
Abdullah PME Assignment 2
No ratings yet
Abdullah PME Assignment 2
4 pages
Addaptive Quadrarture
No ratings yet
Addaptive Quadrarture
3 pages
Excel VBA Save As PDF - Step-By-Step Guide
No ratings yet
Excel VBA Save As PDF - Step-By-Step Guide
39 pages
A Smart Home Architecture For Smart Energy Consumption in A Residence With Multiple Users
No ratings yet
A Smart Home Architecture For Smart Energy Consumption in A Residence With Multiple Users
18 pages
Report
No ratings yet
Report
44 pages
Fraud Detection in Financial Transaction
No ratings yet
Fraud Detection in Financial Transaction
5 pages
Designing Efective Interview Chatbots: Automatic Chatbot Profiling and Design Suggestion Generation For Chatbot Debugging
No ratings yet
Designing Efective Interview Chatbots: Automatic Chatbot Profiling and Design Suggestion Generation For Chatbot Debugging
15 pages
WWW Scribd Com Document 567540838 Multiple Company S Data NCR Location...
No ratings yet
WWW Scribd Com Document 567540838 Multiple Company S Data NCR Location...
17 pages
CP Unit Vi R16
No ratings yet
CP Unit Vi R16
10 pages
C++ Bible
No ratings yet
C++ Bible
77 pages
EM2301. Practical Class 2
No ratings yet
EM2301. Practical Class 2
4 pages
Predicting Employee Layoffs With Machine Learning: A Social Network and Data Mining Approach
No ratings yet
Predicting Employee Layoffs With Machine Learning: A Social Network and Data Mining Approach
75 pages
Owl - Component - MD at Master Odoo - Owl
No ratings yet
Owl - Component - MD at Master Odoo - Owl
17 pages
Computer Literacy Needs Assessment Survey
No ratings yet
Computer Literacy Needs Assessment Survey
4 pages
Jhanavi
No ratings yet
Jhanavi
24 pages
CH - 4 Concurrency Control Techniques
No ratings yet
CH - 4 Concurrency Control Techniques
39 pages
S7-SCL - Working With S7-SCL
No ratings yet
S7-SCL - Working With S7-SCL
28 pages
2021-22 CBSE Class 10 Computer SQP (Term 2)
No ratings yet
2021-22 CBSE Class 10 Computer SQP (Term 2)
3 pages
Firmware Dev
No ratings yet
Firmware Dev
2 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
17.5.9 Packet Tracer - Interpret Show Command Output
No ratings yet
17.5.9 Packet Tracer - Interpret Show Command Output
2 pages
MM Fiori Apps
100% (1)
MM Fiori Apps
54 pages
PSCAD V5 - HPC Brochure 2023
No ratings yet
PSCAD V5 - HPC Brochure 2023
2 pages
Mimum Requ Suppo
No ratings yet
Mimum Requ Suppo
5 pages
CyberAces Module3-Python 2 Flow
No ratings yet
CyberAces Module3-Python 2 Flow
17 pages
6av3617-1jc00-0ax1 Siemens Manual Datasheet
No ratings yet
6av3617-1jc00-0ax1 Siemens Manual Datasheet
3 pages
Final Draft English - 2
No ratings yet
Final Draft English - 2
12 pages
JSD 1
No ratings yet
JSD 1
23 pages
AZ 900 Udemy Practice Test 1
No ratings yet
AZ 900 Udemy Practice Test 1
118 pages
Os Lab
No ratings yet
Os Lab
33 pages