0% found this document useful (0 votes)

61 views16 pages

Big Data Analytics Lab Experiments

Uploaded by

Kiran alex Challagiri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views16 pages

Big Data Analytics Lab Experiments

Uploaded by

Kiran alex Challagiri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

BIG DATA ANALYTICS

Course Code : 21CA32D4 L-T-P-S : 2-0-2-0

EXPERIMENT-1:

(i)Perform setting up and Installing Hadoop in its three operating modes:

Standalone, Pseudo distributed, Fully distributed .

(ii)Use web based tools to monitor your Hadoop setup.

Hadoop architecture

HDFS (Hadoop Distributed File System) is utilized for storage permission. It is mainly designed
for working on commodity hardware devices, working on a distributed file system design. HDFS
is designed in such a way that it believes more in storing the data in a large chunk of blocks
rather than storing small data blocks.

HDFS in Hadoop provides Fault-tolerance and High availability to the storage layer and the
other devices present in that Hadoop cluster. Data storage Nodes in HDFS.

• NameNode(Master)

• DataNode(Slave)

NameNode : NameNode works as a Master in a Hadoop cluster that guides the

Datanode(Slaves). Namenode is mainly used for storing the Metadata i.e. the data about the data.
Meta Data can be the transaction logs that keep track of the user’s activity in a Hadoop cluster.

Meta Data can also be the name of the file, size, and the information about the location(Block
number, Block ids) of Datanode that Namenode stores to find the closest DataNode for Faster
Communication. Namenode instructs the DataNodes with the operation like delete, create,
Replicate, etc.
DataNode: DataNodes works as a Slave DataNodes are mainly utilized for storing the data in a
Hadoop cluster, the number of DataNodes can be from 1 to 500 or even more than that. The
more number of DataNode, the Hadoop cluster will be able to store more data. So it is advised

that the DataNode should have High storing capacity to store a large number of file blocks.

(i) Installing Hadoop in its three operating modes: Standalone, Pseudo

distributed, Fully distributed .
Hadoop is a framework written in Java for running applications on large clusters of
commodity hardware and incorporates features similar to those of the Google File System (GFS)
and of the MapReduce computing paradigm. Hadoop’s HDFS is a highly fault-tolerant distributed
file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides
high throughput access to application data and is suitable for applications that have large data sets.

Hadoop can be installed in 3 Modes

➢ Local (Standalone)
➢ Pseudo distributed
➢ Fully Distributed

INSTALLING hadoop -3.3.6 in Ubuntu 20.04 (Pseudo Mode)

➢ Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is
recommended because it helps to separate the Hadoop installation from other software applications
and user accounts running on the same machine

Settings → System Settings→User Accounts→unlock →click + button

Choose user as administrator or Standard

You can set password here -- Click on Account Disabled →click password →confirm password
→ click on change

From Console , open terminal

# To create a group and user

$ sudo addgroup hadoop

$ sudo adduser --ingroup hadoop hduser

# To change password

$sudo su

$Enter login password

$sudo passwd new username

Enter password and confirm password

$ exit # to come out of terminal

The prerequisites needed to install Hadoop on Ubuntu are pretty simple

Java 11 and SSH

STEP 1 : Install JAVA

# Update the source list

$ sudo apt-get update

# Check whether java installed /not

$ java –version

# Install Sun Java 6 JDK

$ sudo apt-get install openjdk-11-jdk

The full JDK which will be placed in /usr/lib/jvm/java

java-1.11.0-openjdk i386 (i386 means 32-bit OS) You can remove the last chars

copy it to java-1.11.0-

ivcse@hadoop cd /usr/lib/jvm/

ivcse@hadoop cp -r java-1.11.0-openjdk i386 java-1.11.0-openjdk

STEP 2: Install SSH

$sudo apt-get install ssh

$ssh localhost

#To disable password

$ ssh-keygen -t dsa -P ‘ ‘ –f ~/.ssh/id_dsa

$cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now use this command to make sure that your SSH connection has all the required permissions:

chmod 600 ~/ .ssh/authorized_keys

Switch to new user

STEP 3 : Download Apache Tarbal from Apache

You can visit the Apache Hadoop website to see a list of versions with their recent change log.
Select the version of your choice

Google → type Apache Mirrors → Select the first site( apache.org) →all apache softwares will
be displayed → select hadoop folder→ displays 3 type

Chukwa\,common\ core\ select core\ select stable version for hadoop-3.3.6 i.e., stable1 → select
hadoop-3.3.6. tar.gz

(or) use the below command

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-
3.3.6.tar.gz

Create a new folder (say ABC) in home directory

Copy tar.gz into ABC from downloads folder and extract it.

(or) Use tar command to extract files

STEP 4 : CONFIGURATION (5 files)

Enter into hadoop-3.3.6 folder main folder – conf folder

Important files --hadoop-env.sh , core-site.xml, hdfs-site.xml, map-red.xml

Configuring ~/.bashrc file

$ gedit ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk

export HADOOP_HOME=/home/ivcse/ABC/hadoop-1.2.1

export PATH=$ JAVA_HOME/bin:$ HADOOP_HOME/bin:$PATH

export HADOOP_HOME=/home/username/folder/hadoop-1.2.1

Conf/core-site.xml

<name>fs.default.name</name>

<value>hdfs://localhost:8020 </value>

</property>

</configuration>

conf/mapred-site.xml

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

</configuration>

conf/mapred-site.xml

yarn-site.xml
<configuration>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

hdfs-site.xml

<name>dfs.replication</name>

</property>

</configuration>

Hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk

STEP 6 : Format namenode

ivcse@hadoop:~$ hadoop namenode –format

STEP 7 : Start the Processes

Launch HDFS and YARN services.

hduser@ubuntu:~$ start-dfs.sh

hduser@ubuntu:~$ start-yarn.sh
hduser@ubuntu:~$ start-all.sh

hduser@ubuntu:~$ jps

hduser@ubuntu:~$ stop-all.sh

EXPERIMENT-2:

Implement the following file management tasks in Hadoop:

• Adding files and directories

• Retrieving files

• Deleting files

File management tasks & Basic linux commands

➢ Print hadoop version
Usage : hadoop version

➢ fs : The File System (FS) shell includes various shell-like commands that
directly interact with the Hadoop Distributed File System (HDFS) as well as
other file systems that Hadoop supports.

Usage: hadoop fs ( or) hdfs dfs

➢ mkdir : Create a directory in HDFS at given path(s).
Usage: hadoop fs -mkdir [-p] <paths>
Takes path uri’s as argument and creates directories.
Options:

The -p option behavior is much like Unix mkdir -p, creating parent directories
along the path.

Example:

hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2

hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dir

hdfs://nn2.example.com/user/hadoop/dir

➢ ls : List the contents of a directory

Usage: hadoop fs -ls [-d] [-h] [-R] <args>
Options:

-d: Directories are listed as plain files.

-h: Format file sizes in a human-readable fashion (eg 64.0m instead of

67108864).

-R: Recursively list subdirectories encountered.

Example:

hadoop fs -ls /user/hadoop/file1

➢ lsr
Usage: hadoop fs -lsr <args>
Recursive version of ls.

Note: This command is deprecated. Instead use hadoop fs -ls –R

➢ cat
Usage: hadoop fs -cat URI [URI ...]

Copies source paths to stdout.

Example:

hadoop fs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2

hadoop fs -cat file:///file3 /user/hadoop/file4

➢ get
Usage: hadoop fs -get <src> <localdst>
Copy files to the local file system. Example:

hadoop fs -get /user/hadoop/file localfile

hadoop fs -get hdfs://nn.example.com/user/hadoop/file localfile

➢ put
Usage: hadoop fs -put <localsrc> ... <dst>
Copy single src, or multiple srcs from local file system to the destination file
system. Also reads input from stdin

and writes to destination file system.

hadoop fs -put localfile /user/hadoop/hadoopfile

hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir

hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile

hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input

from stdin.

➢ copyFromLocal
Usage: hadoop fs -copyFromLocal <localsrc> URI
Similar to put command, except that the source is restricted to a local file
reference.

Options:

The -f option will overwrite the destination if it already exists.

➢ copyToLocal
Usage: hadoop fs -copyToLocal URI <localdst>
Similar to get command, except that the destination is restricted to a local file
reference.

➢ count
Usage: hadoop fs -count <paths>
Count the number of directories, files and bytes under the paths that match the
specified file pattern. The

output columns with -count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE,

PATHNAME

➢ cp
Usage: hadoop fs -cp [-f] [-p ] [URI ...] <dest>
Copy files from source to destination (Both files should be in HDFS). This
command allows multiple sources as

well in which case the destination must be a directory.

Options:

The -f option will overwrite the destination if it already exists.

The -p option will preserve file attributes.

Example:
hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2

hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir

➢ mv
Usage: hadoop fs -mv URI [URI ...] <dest>
Moves files from source to destination (both should be in HDFS) . This command
allows multiple sources as well

in which case the destination needs to be a directory. Moving files across file
systems is not permitted.

Example:

hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2

hadoop fs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2

hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1

➢ rm
Usage: hadoop fs -rm [-f] [-r |-R] URI [URI ...]

Delete files specified as args.

Options:

The -f option will not display a diagnostic message or modify the exit status to
reflect an error if the file

does not exist.

The -R option deletes the directory and any content under it recursively.

The -r option is equivalent to -R.

Example:

hadoop fs -rm hdfs://nn.example.com/file /user/hadoop/emptydir

➢ rmdir
Usage: hadoop fs -rmdir [--ignore-fail-on-non-empty] URI [URI ...]

Delete a directory.

Options:

--ignore-fail-on-non-empty: When using wildcards, do not fail if a directory still

contains files.

Example:

hadoop fs -rmdir /user/hadoop/emptydir

rmr

Usage: hadoop fs -rmr URI [URI ...]

Recursive version of delete.

Note: This command is deprecated. Instead use hadoop fs -rm -r

➢ df
Usage: hadoop fs -df [-h] URI [URI ...]
Displays free space.

Options:

The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m
instead of 67108864)

Example:

hadoop dfs -df /user/hadoop/dir1

➢ du
Usage: hadoop fs -du URI [URI ...]
Displays sizes of files and directories contained in the given directory or the length
of a file in case its just a file.

Example:

hadoop fs -du /user/hadoop/dir1 /user/hadoop/file1

hdfs://nn.example.com/user/hadoop/dir1

➢ help

Usage: hadoop fs -help

➢ setrep
Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
Changes the replication factor of a file. If path is a directory then the command
recursively changes the

replication factor of all files under the directory tree rooted at path.

Options:

The -w flag requests that the command wait for the replication to complete. This
can potentially take a

very long time.

The -R flag is accepted for backwards compatibility. It has no effect.

Example:

hadoop fs -setrep -w 3 /user/hadoop/dir1

➢ tail
Usage: hadoop fs -tail URI
Displays last kilobyte of the file to stdout.

Example:

hadoop fs -tail pathname

➢ checksum
Usage: hadoop fs -checksum URI
Returns the checksum information of a file.

Example:

hadoop fs -checksum hdfs://nn1.example.com/file1

hadoop fs -checksum file:///etc/hosts

➢ chgrp
Usage: hadoop fs -chgrp [-R] GROUP URI [URI ...]

Change group association of files. The user must be the owner of files, or else a
super-user. Additional

information is in the Permissions Guide.

Options

The -R option will make the change recursively through the directory structure.

➢ chmod
Usage: hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE>
URI [URI ...]
Change the permissions of files. With -R, make the change recursively through the
directory structure. The user

must be the owner of the file, or else a super-user. Additional information is in

the Permissions Guide.

Options

The -R option will make the change recursively through the directory structure.

➢ chown
Usage: hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
Change the owner of files. The user must be a super-user. Additional information
is in the Permissions Guide.

Options

The -R option will make the change recursively through the directory structure.

Robert T. Futrell, Donald F. Shafer, Linda Isabell Shafer - Quality Software Project Management (2002)
0% (1)
Robert T. Futrell, Donald F. Shafer, Linda Isabell Shafer - Quality Software Project Management (2002)
1,893 pages
Practical: 1: Aim: Getting Started With Nodemcu, Arduino With Esp8266 and Esp32 in The Arduino Ide
No ratings yet
Practical: 1: Aim: Getting Started With Nodemcu, Arduino With Esp8266 and Esp32 in The Arduino Ide
25 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Coal India 24-05-2023 Revised 14
No ratings yet
Coal India 24-05-2023 Revised 14
303 pages
Accounting Mock Exam
100% (1)
Accounting Mock Exam
6 pages
PPAP Workbook Template
100% (1)
PPAP Workbook Template
18 pages
Hazard Analysis Template
100% (1)
Hazard Analysis Template
3 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
Big_data_Lab_Manual[1] (4)
No ratings yet
Big_data_Lab_Manual[1] (4)
32 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
lab manual
No ratings yet
lab manual
34 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
EXP 1-2
No ratings yet
EXP 1-2
9 pages
BDA lab manual UPDATED
No ratings yet
BDA lab manual UPDATED
45 pages
bi lab file
No ratings yet
bi lab file
19 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
BDA Record (1)
No ratings yet
BDA Record (1)
34 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Bdafile
No ratings yet
Bdafile
9 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
BDA Lab manual
No ratings yet
BDA Lab manual
49 pages
BDAO
No ratings yet
BDAO
23 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Experiment-2_BDA_Lab
No ratings yet
Experiment-2_BDA_Lab
13 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
Big Data
No ratings yet
Big Data
23 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
hbase_installationn
No ratings yet
hbase_installationn
12 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
CCS334 BDA LAB MANUAL
No ratings yet
CCS334 BDA LAB MANUAL
48 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Big Data File
No ratings yet
Big Data File
16 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
CCS334 Bda
No ratings yet
CCS334 Bda
23 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
BD Lab File
No ratings yet
BD Lab File
39 pages
big data
No ratings yet
big data
32 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Final Copy - BDA LAB Record
No ratings yet
Final Copy - BDA LAB Record
44 pages
BIG data file
No ratings yet
BIG data file
28 pages
Bda Record
No ratings yet
Bda Record
27 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
1.Mrplab Intro
No ratings yet
1.Mrplab Intro
18 pages
HadoopfilePP
No ratings yet
HadoopfilePP
83 pages
bda-manual
No ratings yet
bda-manual
33 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
In-Sem I Imporatanat Questions
No ratings yet
In-Sem I Imporatanat Questions
2 pages
Computer Network Quiz - 2 With Answers
No ratings yet
Computer Network Quiz - 2 With Answers
15 pages
Computer Network Quizes
No ratings yet
Computer Network Quizes
5 pages
Read The IELTS Advertising Essay
No ratings yet
Read The IELTS Advertising Essay
19 pages
Going Beyond Lean
No ratings yet
Going Beyond Lean
7 pages
MH320 Eng Manual
No ratings yet
MH320 Eng Manual
27 pages
Sika® Polysulphide PG: Product Data Sheet
No ratings yet
Sika® Polysulphide PG: Product Data Sheet
3 pages
Apu Presentation 2024
No ratings yet
Apu Presentation 2024
51 pages
2019 Ford Ranger Quick Reference Guide
No ratings yet
2019 Ford Ranger Quick Reference Guide
10 pages
FYP Documentation Sample
No ratings yet
FYP Documentation Sample
19 pages
Realflow C++ SDK Users Manual
No ratings yet
Realflow C++ SDK Users Manual
48 pages
Future Plan Calander Startup
No ratings yet
Future Plan Calander Startup
4 pages
Auditing in Computerized Environment Introduction
100% (2)
Auditing in Computerized Environment Introduction
4 pages
Catálogo Benevision N15
No ratings yet
Catálogo Benevision N15
7 pages
Computer Application in Business
No ratings yet
Computer Application in Business
8 pages
Fravia Searching Recon2006
No ratings yet
Fravia Searching Recon2006
18 pages
Introduction To Robotics: Mechatronics
No ratings yet
Introduction To Robotics: Mechatronics
26 pages
Autocad 2012 Licensing Guide
No ratings yet
Autocad 2012 Licensing Guide
68 pages
1st Year Computer Science Chapter Wise Short Test .PDF
No ratings yet
1st Year Computer Science Chapter Wise Short Test .PDF
10 pages
GB 1
No ratings yet
GB 1
4 pages
MIni Rules Quick Card PDF
No ratings yet
MIni Rules Quick Card PDF
2 pages
2822 H
No ratings yet
2822 H
3 pages
Tire Self Inflation System
No ratings yet
Tire Self Inflation System
12 pages
Satellite Handbook
100% (6)
Satellite Handbook
145 pages
OCR Comp Sci WB 2 Answers
No ratings yet
OCR Comp Sci WB 2 Answers
20 pages
TCP Client - Server PowerPoint
No ratings yet
TCP Client - Server PowerPoint
19 pages
Engineering Colleges: KAB Educational Consultants
No ratings yet
Engineering Colleges: KAB Educational Consultants
88 pages
QuickSpecs HP Z4 G4 Workstation Technical Specifications - c05527757
No ratings yet
QuickSpecs HP Z4 G4 Workstation Technical Specifications - c05527757
99 pages