0% found this document useful (0 votes)

25 views51 pages

Manual 5

Hadoop lab manual

Uploaded by

rajesh.ds

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views51 pages

Manual 5

Hadoop lab manual

Uploaded by

rajesh.ds

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

MASTER OF COMPUTER APPLICATIONS

BIG DATA ANALYTICS LAB

LAB REPORT
III SEMESTER

Faculty In-Charge
Prof. R Padmaja
Assistant Professor, MCA Department

SREENIVASA INSTITUTE OF TECHNOLOGY AND MANAGEMENT STUDIES

(Autonomous)
(Approved by AICTE, New Delhi, Affiliated to JNTUA, Anantapuramu)
Murukambattu, Chittoor-517127
2023-2024
SREENIVASA INSTITUTE OF TECHNOLOGY AND MANAGEMENT STUDIES
(Autonomous)

Chittoor-517127

MCA DEPARTMENT

Reg. No:

BIG DATA ANALYTICS LAB

This is to certify that this is the bonafide record work done in the laboratory by the
candidate studying MCA III Semester during the year 2023-
2024.

No. of experiments conducted: No. of experiments attended:

Faculty In-Charge HOD

Submitted for the practical exam held on .

Internal Examiner External Examiner

INDEX

S. No Date Name of the Experiment Page. No. Initial’s

1 Basic HDFS Commands

2 Word Count Program Using Map Reduce

Component
3 Weather Data Analysis Using MapReduce

4 Pig Data Processing Operators

5 Pig Eval Functions

6 Pig String Functions

7 Pig Math Functions

8 Weather Dataset Analysis using PigLatin

9 Basic Hive Commands

10 Hive Partitions
RUBRICS FOR BIG DATA ANALYTICS LAB

Excellent(3) Good(2) Fair(1)

Student successfully
completes the Student successfully
Student successfully
experiment, records the completes the
Conduct completes the experiment,
data, analyses the experiment, records the
Experiments records the data, and unable
experiment's main topics, data, and analyses the
(CO1) to analyses.
and explains the experiment's main
experiment concisely and topics
well.

Analysis and
Thorough analysis of the Reasonable analysis of Improper analysis of the
Synthesis
problem designed the given problem given problem
(CO2)
Student understands
what needs to be tested Student understands Student understands what
Design and designs an what needs to be tested needs to be tested and does
(CO3) appropriate experiment, and designs an not design an appropriate
and explains the appropriate experiment.
experiment concisely and experiment.
well
Complex Analysis Thorough Reasonable
Improper comprehension
& Conclusion comprehension through comprehension through
through analysis/ synthesis
(CO4) analysis/ synthesis analysis/ synthesis
Use modern Student uses the tools to
tools in measure correctly, and Student uses the tools
Student uses the tools
engineering understands the correctly, and unable to
to measure correctly.
practice limitations of the measure properly.
(CO5) hardware.
Status report with
Status report with clear
Report Writing logical sequence of Status report not properly
and logical sequence of
(CO6) parameter using organized
parameter using
understandable
excellent language
language
Student will demonstrate Student will
Lab safety good demonstrate good Students demonstrate a little
(CO7) understanding and follow understanding of lab knowledge of lab safety.
lab safety safety
Performance on teams is
Ability to work in excellent with clear Performance on teams Performance on teams is
teams evidence of equal is good with equal acceptable with one or more
(CO8) distribution of tasks and distribution of tasks members carrying a larger
effort and effort amount of the effort

Continuous Highly enthusiastic

Interested in Inadequate interest in
learning towards continuous
continuous learning continuous learning
(CO9) learning
INDEX SHEET

S. No Experiment Name

10 10 5 5 10 40

1 Basic HDFS Commands

2 Word Count Program Using

Map Reduce Component

3 Weather Data Analysis

Using MapReduce
4 Pig Data Processing
Operators

5 Pig Eval Functions

6 Pig String Functions

7 Pig Math Functions

8 Weather Dataset Analysis

using PigLatin
9 Basic Hive Commands
10 Hive Partitions
ATTAINMENT LEVELS

S. No Experiment Name

1 Basic HDFS Commands

2 Word Count Program Using
Map Reduce Component
3 Weather Data Analysis Using
MapReduce
4 Pig Data Processing Operators
5 Pig Eval Functions
6 Pig String Functions
7 Pig Math Functions
8 Weather Dataset Analysis using
PigLatin
9 Basic Hive Commands
10 Hive Partitions
Total Attainment(B1)
Big Data Analytics Lab 2022

1) ls: To display a list of the contents of a directory

[root@sandbox ~] # hadoop fs -ls /

Found 14 items
drwxr-xr-x - root hdfs 0 2019-10-14 09:08 /MR
drwxrwxrwx - yarn hadoop 0 2016-03-14 14:19 /app-logs

drwxr-xr-x - hdfs hdfs 0 2016-03-14 14:25 /apps

drwxr-xr-x - yarn hadoop 0 2016-03-14 14:19 /ats

drwxr-xr-x - hdfs hdfs 0 2016-03-14 14:50 /demo

drwxr-xr-x - hdfs hdfs 0 2016-03-14 14:19 /hdp

drwxr-xr-x - mapred hdfs 0 2016-03-14 14:19 /mapred

drwxrwxrwx - mapred hadoop 0 2016-03-14 14:19 /mr-history

drwxr-xr-x - root hdfs 0 2019-10-14 10:12 /myfiles

drwxr-xr-x - root hdfs 0 2019-10-14 08:52 /new

drwxr-xr-x - hdfs hdfs 0 2016-03-14 14:42 /ranger

drwxrwxrwx - spark hadoop 0 2019-02-21 11:51 /spark-history

drwxrwxrwx - hdfs hdfs 0 2016-03-14 14:31 /tmp

drwxr-xr-x - hdfs hdfs 0 2019-10-14 08:48 /user

2) mkdir: To Create a Directory in HDFS

[root@sandbox ~]# hadoop fs -mkdir new

[root@sandbox ~]# hadoop fs -ls / Found
14 items
drwxr-xr-x - root hdfs 0 2019-10-14 09:08 /MR
drwxrwxrwx - yarn hadoop 0 2016-03-14 14:19 /app-logs

drwxr-xr-x - hdfs hdfs 0 2016-03-14 14:25 /apps

drwxr-xr-x - yarn hadoop 0 2016-03-14 14:19 /ats

drwxr-xr-x - hdfs hdfs 0 2016-03-14 14:50 /demo

drwxr-xr-x - hdfs hdfs 0 2016-03-14 14:19 /hdp

Big Data Analytics Lab 2022

drwxr-xr-x - mapred hdfs 0 2016-03-14 14:19 /mapred

drwxrwxrwx - mapred hadoop 0 2016-03-14 14:19 /mr-history

drwxr-xr-x - root hdfs 0 2019-10-14 10:12 /myfiles

drwxr-xr-x - root hdfs 0 2019-10-14 08:52 /new

drwxr-xr-x - hdfs hdfs 0 2016-03-14 14:42 /ranger

drwxrwxrwx - spark hadoop 0 2019-02-21 11:51 /spark-history

drwxrwxrwx - hdfs hdfs 0 2016-03-14 14:31 /tmp

drwxr-xr-x - hdfs hdfs 0 2019-10-14 08:48 /user

3) put (or) copyFromLocal :copies the file or directory from the local file system to the HDFS

[root@sandbox ~]# cat > FirstFile

This is a Unix File created in Unix Local system which will be pushed to HDFS
^Z
[1]+ Stopped cat > FirstFile

[root@sandbox ~]# cat FirstFile

This is a Unix File created in Unix Local system which will be pushed to HDFS
[root@sandbox ~]# hadoop fs -put FirstFile new/
[root@sandbox ~]# hadoop fs -ls new/
Found 1 items
-rw-r--r-- 3 root hdfs 78 2019-10-31 10:03 new/FirstFile
[root@sandbox ~]# hadoop fs -cat new/FirstFile
This is a Unix File created in Unix Local system which will be pushed to HDFS
[root@sandbox local]# hadoop fs -copyFromLocal file1 new
[root@sandbox local]# hadoop fs -ls new
Found 2 items
-rw-r--r-- 3 root hdfs 78 2019-10-31 10:03 new/FirstFile
-rw-r--r-- 3 root hdfs 42 2019-10-31 10:21 new/file1
4) get (or) copyToLocal :copies the file or directory from the HDFS to Local File System
[root@sandbox ~]# hadoop fs -get new/FirstFile local/
[root@sandbox ~]# cd local
[root@sandbox local]# ls
FirstFile
Big Data Analytics Lab 2022

[root@sandbox local]# cat FirstFile

This is a Unix File created in Unix Local system which will be pushed to HDFS
[root@sandbox ~]# hadoop fs -copyToLocal new/file1 newLocal/
[root@sandbox ~]# ls newLocal
file1
5) cp :copies the file or directory from one location to another in HDFS
[root@sandbox local]# hadoop fs -mkdir /latest
[root@sandbox local]# hadoop fs -cp new/FirstFile /latest
[root@sandbox local]# hadoop fs -ls /latest
Found 1 items
-rw-r--r-- 3 root hdfs 78 2019-10-31 10:16 /latest/FirstFile
[root@sandbox local]# hadoop fs -cat /latest/FirstFile
This is a Unix File created in Unix Local system which will be pushed to HDFS
6) mv :moves the file or directory from one location to another in HDFS
[root@sandbox local]# cat > dummy
This is a dummy file which is created in Unix
^Z
[2]+ Stopped cat > dummy
[root@sandbox local]# cat dummy
This is a dummy file which is created in Unix
[root@sandbox local]# hadoop fs -put dummy new/
[root@sandbox local]# hadoop fs -mv new/dummy /latest/
[root@sandbox local]# hadoop fs -ls latest
[root@sandbox local]# hadoop fs -ls /latest
Found 2 items
-rw-r--r-- 3 root hdfs 78 2019-10-31 10:16 /latest/FirstFile
-rw-r--r-- 3 root hdfs 46 2019-10-31 10:18 /latest/dummy
7) rm,rmdir :removes the file or directory from the HDFS
[root@sandbox ~]# hadoop fs -ls new
Found 2 items
-rw-r--r-- 3 root hdfs 78 2019-10-31 10:03 new/FirstFile
-rw-r--r-- 3 root hdfs 42 2019-10-31 10:21 new/file1
[root@sandbox ~]# hadoop fs -rm new/file1
19/10/31 10:33:42 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier
interval = 0 minutes.
Moved: 'hdfs://sandbox.hortonworks.com:8020/user/root/new/file1' to trash at:
hdfs://sandbox.hortonworks.com:8020/user/root/.Trash/Current
Big Data Analytics Lab 2022

[root@sandbox ~]# hadoop fs -ls new

Found 1 items
-rw-r--r-- 3 root hdfs 78 2019-10-31 10:03 new/FirstFile
[root@sandbox ~]# hadoop fs -rmdir new rmdir:
`new': Directory is not empty
[root@sandbox ~]# hadoop fs -rm-r new
-rm-r: Unknown command
[root@sandbox ~]# hadoop fs -rm -r new
19/10/31 10:35:49 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier
interval = 0 minutes.
Moved: 'hdfs://sandbox.hortonworks.com:8020/user/root/new' to trash at:
hdfs://sandbox.hortonworks.com:8020/user/root/.Trash/Current
8) touchz : To Create an empty file in HDFS
[root@sandbox ~]# hadoop fs -ls /rpk
Found 1 items
-rw-r--r-- 3 root hdfs 95 2021-01-18 09:42 /rpk/states.txt
[root@sandbox ~]# hadoop fs -touchz /rpk/empty.txt
[root@sandbox ~]# hadoop fs -ls /rpk
Found 2 items
-rw-r--r-- 3 root hdfs 0 2021-01-30 08:34 /rpk/empty.txt
-rw-r--r-- 3 root hdfs 95 2021-01-18 09:42 /rpk/states.txt
9) count : To Count the number of Directories, files and bytes under the path

[root@sandbox ~]# hadoop fs -count /rpk

1 2 95 /rpk

10) usage : Returns the Usage for an Individual command

[root@sandbox ~]# hadoop fs -usage rm

Usage: hadoop fs [generic options] -rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...

11) help: Displays help for given command or all commands if none is specified
[root@sandbox ~]# hadoop fs -help mkdir
-mkdir [-p] <path> ... :
Create a directory in specified location.
-p Do not fail if the directory already exists
Big Data Analytics Lab 2022

12) du : command to check the file size

[root@sandbox ~]# hadoop fs -du -s /rpk/states.txt

95 /rpk/states.txt

13) appendToFile : Appends the contents of all given local files to the given destination file on HDFS
[root@sandbox ~]# cat > localfile
This is to illustrate appendToFile command
^Z
[1]+ Stopped cat > localfile
[root@sandbox ~]# hadoop fs -appendToFile localfile rpk/empty.txt
[root@sandbox ~]# hadoop fs -cat rpk/empty.txt
This is to illustrate appendToFile command
14) tail : shows the last 1KB of the given file
[root@sandbox ~]# hadoop fs -tail /rpk/states.txt
100,smith,AP
101,jones,TN
102,KIng,AP
103,ram,TN
104,sita,K
105,Lakshman,Kerala
106,aaa,Kerala
15) stat : Prints statistics about the file/directory
[root@sandbox ~]# hadoop fs -stat %b /rpk/states.txt
95
[root@sandbox ~]# hadoop fs -stat %g /rpk/states.txt hdfs
[root@sandbox ~]# hadoop fs -stat %n /rpk/states.txt states.txt
[root@sandbox ~]# hadoop fs -stat %o /rpk/states.txt
134217728
[root@sandbox ~]# hadoop fs -stat %r /rpk/states.txt
3
[root@sandbox ~]# hadoop fs -stat %u /rpk/states.txt root
[root@sandbox ~]# hadoop fs -stat %y /rpk/states.txt
2021-01-18 09:42:33
Big Data Analytics Lab 2022

2) WordCount and Weather Data Analysis Program using MapReduce

Step 1: Right click on Package Explorer window, select new ->Java project to Create a Java Project
called “Hadoop_Exer”

Step2 : Right click on Hadoop_Exer” Project , select new ->Package to Create a Package called
“main.java.com.training “ in the Java Project “Hadoop_Exer”
Big Data Analytics Lab 2022

Step 3: Right click on “main.java.com.training1 “ package , select new ->class to Create a Class called
“WordCount” in package “main.java.com.training1” package
Big Data Analytics Lab 2022

Step 4 : Right click on Hadoop_Exer” Project, select Build path ->Configure Build path,select Libraries
tab, remove all Jar files and add External Jar files to execute the program
Step 4 : Right click on Hadoop_Exer” Project, select Build path ->Configure Build path,select Libraries
tab, remove all Jar files and add External Jar files to execute the program
Big Data Analytics Lab 2022

Step 5: select Export ->Javajar , type the jar file with destination
Big Data Analytics Lab 2022
Big Data Analytics Lab 2022

Step 6: Open winscp , to transfer fruits.txt and wc_new.jar from windows to Unix. Drag the wc_new.jar
from d:/ to unix’s root directory
Big Data Analytics Lab 2022

Step 7: create a directory called “Inputs” in unix and transfer the input file “fruits.txt” from root to
Inputs directory

Step8 : Create a directory called “inputFiles” in hadoop

Big Data Analytics Lab 2022

Step 9: transfer fruits.txt which is in Inputs directory to hadoop’s directory “InputFiles”

Step 9: type the below command to execute the code

Big Data Analytics Lab 2022

Step 10: check the output folder called “OutputFiles

Step 11: Display the output

Step 12: create another java file called “WeatherData “ in the same package as
Big Data Analytics Lab 2022

Step 13: type the weather data analysis code

Step 14: transfer the dataset from windows to unix folder “Inputs”

Step 15 : transfer Unix file “WeatherDataset.txt” from unix to Hadoop

Big Data Analytics Lab 2022

Step 16 : Check the dataset existence in hadoop folder “InputFiles”

Step 17 : execute the Weatherata program

Step 18 : Display the Results

Big Data Analytics Lab 2022

3) TO ILLUSTRATE THE PIG DATA PROCESSING OPERATORS LOAD and STORE Data

LOAD Operator

2) Creating a file called sample

[root@sandbox ~]# cat > sample
100 Ram MCA
101 Sita MCA
102 Lakshman MBA
103 Bharat MBA
104 Smith MCA
105 Jones MBA
3) Create a directory called Operators and move sample to operators
[root@sandbox ~]# mkdir operators
[root@sandbox ~]# mv sample operators
[root@sandbox ~]# ls operators sample
4) Create a directory called hadoop_dir in hadoop
[root@sandbox ~]# hadoop fs -mkdir hadoop_dir 5)
5) Transfer unix file called sample to hadoop directory
[root@sandbox ~]# hadoop fs -put operators/sample hadoop_dir 6)
6) Goto Pig
[root@sandbox ~]# pig grunt>
7) Load sample file which is in hadoop directory to Pig
Relation grunt> A = Load 'hadoop_dir/sample' using
PigStorage('\t') as
(sno:int,sname:charArray,branch:charArray);
8) Display the contents of Relation A grunt> dump A;
(100,Ram,MCA)
(101,Sita,MCA)
(102,LakshmanMBA,)
(103,Bharat,MBA)
(104,Smith,MCA)
(105,Jones,MBA)
Big Data Analytics Lab 2022

STORE OPERATOR

1) Display the contents of States.txt

[root@sandbox ~]# cat states.txt
100,smith,AP
101,jones,TN
102,KIng,AP
103,ram,TN
104,sita,K
105,Lakshman,Kerala
106,aaa,Kerala
2) Create a Directory in Hadoop as rp1
[root@sandbox ~]# hadoop fs -mkdir rp1
3) Transfer the Unix File to Hadoop’s Directory rp1
[root@sandbox ~]# hadoop fs -put states.txt rp1
4) Display the contents of Hadoop directory rp1’s contents
[root@sandbox ~]# hadoop fs -ls rp1
Found 1 items
-rw-r--r-- 3 root hdfs 95 2021-01-28 05:16 rp1/states.txt
5) Load the file into the Pig Relation A and display the Relation content grunt> A = Load 'rp1/states.txt' using
PigStorage(',') as (sno:int,sname:chararray,states:chararray); grunt> dump A; (100,smith,AP)
(101,jones,TN)
(102,KIng,AP)
(103,ram,TN)
(104,sita,K)
(105,Lakshman,Kerala)
(106,aaa,Kerala)
5) Get only the AP employees and display the results grunt> B =
filter A by states == 'AP'; grunt> dump B; (100,smith,AP)
(102,KIng,AP)
6) Store the Pig relation contents into Hadoop directory rp2 grunt>
store B into 'rp2';
7) Check the Contents in Hadoop Directory
[root@sandbox ~]# hadoop fs -ls rp2
Big Data Analytics Lab 2022

Found 2 items
-rw-r--r-- 3 root hdfs 0 2021-01-28 05:20 rp2/_SUCCESS
-rw-r--r-- 3 root hdfs 25 2021-01-28 05:20 rp2/part-m-0000
8) Display the file contents
[root@sandbox ~]# hadoop fs -cat rp2/part-m-00000
100 smith AP
102 KIng AP

2) FILTERING DATA

FILTER OPERATOR

1) To display only MCA Students using FILTER operator

grunt> B = FILTER A BY branch == 'MCA'; grunt>
dump B;
(100,Ram,MCA)
(101,Sita,MCA)
(104,Smith,MCA)
2) To display only MBA Students using FILTER operator
grunt> C = FILTER A BY branch == 'MBA'; grunt>
dump C;
(102,Lakshman,MBA)
(103,Bharat,MBA)
(105,Jones,MBA)
Big Data Analytics Lab 2022

3) GROUPING AND JOINING DATA

JOIN Operator

GROUP Operator

COGROUP Operator

CROSS Operator

JOIN OPERATOR
1) Create a file called join1
[root@sandbox ~]# cat > join1
joe 2
hank 4
ali 0
eve 3

hank 2
2) Create another file called join2 [root@sandbox ~]# cat > join2
2 tie
4 coat
1 scarf
3 hat
3) Create a directory called h1 in hadoop
[root@sandbox ~]# hadoop fs –mkdir h1
3) Push both join1 and join2 to the h1 directory
[root@sandbox ~]# hadoop fs -put join1 h1
[root@sandbox ~]# hadoop fs -put join2 h1
4) Load join2 file as Pig Relation grunt>A = LOAD 'h1/join2' USING PigStorage
('\t') AS (id:int,name:charArray); grunt>DUMP A;
(2,tie)
(4,coat)
(1,scarf)
(3,hat)
Big Data Analytics Lab 2022

5) Load join1 file as Pig Relation grunt>B = LOAD 'h1/join1' USING PigStorage
('\t') AS (name:charArray,id:int); grunt>DUMP B;
(joe,2)
(hank,4)
(ali,0)
(eve,3)
(hank,2)
6) Join the relations A and B grunt>C = JOIN A BY $0, B BY $1; grunt>DUMP C;
(2,tie,hank,2)
(2,tie,joe,2)
(3,hat,eve,3)
(4,coat,hank,4)

COGROUP OPERATOR:

grunt>D = COGROUP A BY $0, B BY $1;

grunt>DUMP D;

(0,{},{(ali,0)})

(1,{(1,scarf)},{})

(2,{(2,tie)},{(hank,2),(joe,2)})

(3,{(3,hat)},{(eve,3)})

(4,{(4,coat)},{(hank,4)})

GROUP OPERATOR

1) Load student1 file from hadoop into Pig Relation “Records”

Records = LOAD '/pigfunctions/student1' using PigStorage('\t') as (name:charArray,

subject:charArray marks:int);

Dump Records;

2) To Group students Subjectwise

sub = GROUP Records by subject;

Big Data Analytics Lab 2022

Dump sub;

Dump Records; // Displays the original records

(james, wp,78)

(smith, dsat,65)

(james, dsat,73)

(jones, unix,89)

(smith, unix,50)

(jones, unix,85)

Dump sub; // Displays subject wise marks

( wp,{(james, wp,78)})

( dsat,{(james, dsat,73),(smith, dsat,65)})

( unix,{(jones, unix,85),(smith, unix,50),(jones, unix,89)})

CROSS OPERATOR

1) Create a file called cross2

[root@sandbox ~]# cat>cross2

1 2

3 4

2) Create another file called cross1

[root@sandbox ~]# cat>cross1

a b c

e f g

3) Push both cross1 and cross2 into hadoop’s directory h1

[root@sandbox ~]# hadoop fs -put cross h1

[root@sandbox ~]# hadoop fs -put cross1 h1

Big Data Analytics Lab 2022

4) Load both cross1 and cross2 from hadoop’s directory to Pig grunt> G = LOAD

'h1/cross' USING PigStorage ('\t') AS (fname:int,lname:int);

DUMP G;

(1,2)

(3,4)

grunt> H = LOAD 'h1/cross1' USING PigStorage ('\t') AS (fname:chararray,lname:chararray);

DUMP H;

(a,b)

(e,f)

5) Apply Cross Operator grunt> I = CROSS G,H;

DUMP I;

(3,4,e,f)

(3,4,a,b)

(1,2,e,f)

(1,2,a,b)

4) SORTING DATA

ORDER Operator

1) Create a File called “order”

[root@sandbox ~]# cat>order
104 ram mca
105 sita mca
102 lakshman mba
103 bharatha mba
2) Push the file into hadoop’s directory h1
Big Data Analytics Lab 2022

[root@sandbox ~]# hadoop fs -put order h1

3) Load the Hadoop file into Pig grunt> J = LOAD 'h1/order' USING PigStorage ('\t') AS
(id:int,name:chararray,branch:chararray); DUMP J;
(104,ram,mca)
(105,sita,mca)
(102,lakshman,mba)
(103,bharatha,mba)
4) Sort by id using ORDER operator grunt> K = ORDER J BY $0;
DUMP K;

(102,lakshman,mba)
(103,bharatha,mba)
(104,ram,mca)
(105,sita,mca)
5) COMBINING AND SPLITTING DATA

UNION Operator for Combining the Pig Relations

SPLIT Operator for Spliting the Pig Relations UNION
OPERATOR

1) Create a file called “union”

[root@sandbox ~]# cat>union
2 3
1 2
2 4
2) Create another file called “union1”
[root@sandbox ~]# cat>union1
z x 8w
y 1
3) Push both union and union1 into hadoop’s directory “h1”
[root@sandbox ~]# hadoop fs -put union h1
[root@sandbox ~]# hadoop fs -put union1 h1
4) Load both union and union1 Hadoop’s files into Pig grunt> L = LOAD 'h1/union'
USING PigStorage ('\t') AS (id:int,id1:int); DUMP L;
Big Data Analytics Lab 2022

(2,3)
(1,2)
(2,4) grunt> M = LOAD 'h1/union1' USING PigStorage ('\t') AS
(fname:chararray,lname:chararray,id:int);
DUMP M;
(z,x,8)
(w,y,1)
5) Apply UNION Operator grunt> N = UNION L,M;
DUMP N;

(z,x,8)
(w,y,1)
(2,3)
(1,2)
(2,4)
SPLIT OPERATOR:

1) Create a file called split

[root@sandbox ~]# cat>split
100 smith 24 hyd
101 ivan 20 chennai
102 bayross 23 delhi
103 tornwhite 25 bay
104 leon 26 chennai
2) Push the file to Hadoop’s Directory “h1”
[root@sandbox ~]# hadoop fs -put split h1
3) Load the file from Hadoop to Pig grunt> O = LOAD 'h1/split'
USING PigStorage ('\t') AS
(id:int,name:chararray,age:int,location:chararray);
DUMP O;
(100,smith,24,hyd)
(101,ivan,20,chennai)
(102,bayross,23,delhi)
Big Data Analytics Lab 2022

(103,tornwhite,25,bay)
(104,leon,26,chennai)
4) Apply Split Operator grunt> SPLIT O INTO Q IF age<=23,R if
age>23; DUMP Q;
(101,ivan,20,chennai)
(102,bayross,23,delhi)
DUMP R;
(100,smith,24,hyd)
(103,tornwhite,25,bay)
(104,leon,26,chennai)

LIMIT OPERATOR

1) To Display Top 2 records of Relation A grunt> A = Load

'hadoop_dir/sample' using PigStorage('\t') as
(sno:int,sname:chararray,branch:chararray);
grunt> D = LIMIT A 2; grunt> dump D;
(100,Ram,MCA)

(101,Sita,MCA)
Big Data Analytics Lab 2022

4) Pig Code to find

1) Total Marks obtained by each student
2) Average Obtained by each student and
3) Number of subjects taken by each student using Pig EVAL FUNTIONS
$ vi eval1.pig

Records = LOAD '/pigfunctions/student1' using PigStorage('\t')

as(name:charArray,subject:charArray,marks:int);

Dump Records;

Name = GROUP Records by name; Dump

Name;

tot = forEach Name generate group,SUM(Records.marks);

Dump tot; per = forEach Name generate

group,AVG(Records.marks); Dump per; count = forEach Name

generate group,COUNT(Records.marks);

Dump count;
Big Data Analytics Lab 2022

OUTPUT

Dump Records; // Displays the original records

(james, wp,78)

(smith, dsat,65)

(james, dsat,73)

(jones, unix,89)

(smith, unix,50)

(jones, unix,85)

Dump Name; // Displays student wise records

(james,{(james, dsat,73),(james, wp,78)})

(jones,{(jones, unix,85),(jones, unix,89)})

(smith,{(smith, unix,50),(smith, dsat,65)})

Dump tot ; ( To display Total Marks obtained by each student)

(james,151)

(jones,174)

(smith,115)

Dump per ; (percentage of each student)

(james,75.5)

(jones,87.0)

(smith,57.5)

Dump count ; (Number of Subjects taken by each student)

(james,2)

(jones,2)

(smith,2)
Big Data Analytics Lab 2022

5) Pig Code to find

1) Maximum Marks secured in Each subject and
2) Minimum Marks secured in Each Subject
USING Pig EVAL FUNTIONS
$ vi eval2.pig

Records = LOAD '/pigfunctions/student1' using PigStorage('\t')

as(name:charArray,subject:charArray,marks:int); Dump

Records; sub = GROUP Records by subject; Dump sub; max =

forEach sub generate group,MAX(Records.marks); Dump max;

min = forEach sub generate group,MIN(Records.marks);

Dump min;
Big Data Analytics Lab 2022

OUTPUT

Dump Records; // Displays the original records

(james, wp,78)

(smith, dsat,65)

(james, dsat,73)

(jones, unix,89)

(smith, unix,50)

(jones, unix,85)

Dump sub; // Displays subject wise marks

( wp,{(james, wp,78)})

( dsat,{(james, dsat,73),(smith, dsat,65)})

( unix,{(jones, unix,85),(smith, unix,50),(jones, unix,89)})

Dump min; // Displays subject wise Minimum mark

( wp,78)

( dsat,65)

( unix,50)

Dump max; // Displays Subject wise Maximum markss

( wp,78)

( dsat,73)

( unix,89)
Big Data Analytics Lab 2022

5) To Illustrate the Pig String Functions _

A = LOAD '/pigfunctions/student1' using PigStorage('\t') as(name:charArray,subject:charArray,marks:int);

Dump A;

B = forEach A generate name,LOWER(name);

Dump B;

C = forEach A generate name,UPPER(name);

Dump C;

D =forEach A generate name,SUBSTRING(name,1,3);

Dump D;

E = forEach A generate name,INDEXOF(name,'s');

Dump E;

F = forEach A generate name,ENDSWITH(name,'s'),STARTSWITH(name,'j'); Dump F;

Big Data Analytics Lab 2022

OUTPUT

Dump A; // Displays Original Records

(james, wp,78)

(smith, dsat,65)

(james, dsat,73)

(jones, unix,89)

(smith, unix,50)

(jones, unix,85)

Dump B;// Displays names in Lower Case

(james,james)

(smith,smith)

(james,james)

(jones,jones)

(smith,smith)

(jones,jones)

Dump C; // Displays Names in UpperCase

(james,JAMES)

(smith,SMITH)

(james,JAMES)

(jones,JONES)

(smith,SMITH)

(jones,JONES)

Dump D; Displays substring of name String

(james,am)

(smith,mi)

(jones,on)
Big Data Analytics Lab 2022

Dump E; // Displays Index Of the letter ‘s’ (smith,0)

(james,4)

(king,-1)

(jones,4)

(rama,-1)

(sita,0)

Dump F; Displays the names that ends with ‘s’ and the names that starts with ‘j’

(smith,false,false)

(james,true,true)

(king,false,false)

(jones,true,true)

(rama,false,false)

(sita,false,false)
Big Data Analytics Lab 2022

6) To Illustrate the Pig Math Functions

X = LOAD '/pigex/student4' using PigStorage ('\t') as (name:charArray,GPA:float);

Dump X;

Y = forEach X generate GPA,ROUND(GPA);

Dump Y;

Z = forEach X generate GPA,FLOOR(GPA);

Dump Z;

U = forEach X generate GPA,CEIL(GPA);

Dump U;
Big Data Analytics Lab 2022

OUTPUT

Dump X; // Displays Original data

(smith,7.8)

(james,8.7)

(king,6.9)

(jones,8.5)

(rama,7.7)

(sita,8.9)

Dump Y;// Displays the ROUND values of GPA

(7.8,8)

(8.7,9)

(6.9,7)

(8.5,9)

(7.7,8)

(8.9,9)

Dump Z; Displays the FLOOR values of GPA Dump U; Displays the CEIL values of GPA

(7.8,8.0) (7.8,7.0)

(8.7,9.0) (8.7,8.0)

(6.9,7.0) (6.9,6.0)

(8.5,9.0) (8.5,8.0)

(7.7,8.0) (7.7,7.0)

(8.9,9.0) (8.9,8.0)
Big Data Analytics Lab 2022

7) Pig Code to Analyse Weather dataset

$ vi Weather.pig
A = Load 'rp/WeatherDataset.txt' as (line:chararray);
B = foreach A generate SUBSTRING(line,15,19),SUBSTRING(line,88,92),SUBSTRING(line,92,93); store B
into 'rp/revised5';
E = load 'rp/revised5/part-m-00000' using PigStorage('\t') as (year:int,temp:int,quality:int);
filtered_records = FILTER E by temp!=9999 AND quality IN (0,1,4,5,9); grouped_records =
GROUP filtered_records by year; max_temp = FOREACH grouped_records GENERATE
group,MAX(filtered_records.temp); dump max_temp;
Big Data Analytics Lab 2022

$ pig –x mapreduce Weather.pig

OUTPUT
(1901,250)
(1902,72)
(1903,56)
(1904,44)
(1905,44)
Big Data Analytics Lab 2022

8) To Illustrate Hive Tables

1) Create a file in Unix as “emp”

[root@sandbox ~]# cat emp
100,smith,Manager,50000
101,jones,Asst.Manager,40000
102,KIng,Clerk, 15000
103,ram,SE,25000
104,sita,SE,25000
105,Lakshman,Analyst,30000
2) Push the Unix file “emp” to hdfs directory “rpk”
[root@sandbox ~]# hadoop fs -put emp /rpk/ 3)
Check the contents of the file at HDFS
[root@sandbox ~]# hadoop fs -ls /rpk
Found 2 items
-rw-r--r-- 3 root hdfs 138 2021-02-01 08:53 /rpk/emp
-rw-r--r-- 3 root hdfs 0 2021-01-30 08:34 /rpk/empty.txt
[root@sandbox ~]# hadoop fs -cat /rpk/emp
100,smith,Manager,50000
101,jones,Asst.Manager,40000
102,KIng,Clerk, 15000
103,ram,SE,25000
104,sita,SE,25000
105,Lakshman,Analyst,30000
4) Create a Table in Hive hive> create table
employees(eno int,ename string,desg string,sal
double);
OK
Time taken: 1.434 seconds
5) Check the schema of the Hive Table hive> describe
Employees; OK
eno int ename
string desg
Big Data Analytics Lab 2022

string sal
double
Time taken: 0.506 seconds, Fetched: 4 row(s)

6) Load the ‘emp’ file data into Employee Table hive>

load data inpath 'rpk/emp' into table employee;
Loading data to table default.employee
Table default.employee stats: [numFiles=1, totalSize=138]
OK
Time taken: 2.114 seconds
7) Retrieve the Records of Hive table “employee” table
hive> select * from employee;
OK
100 smith Manager 50000.0
101 jones Asst.Manager 40000.0
102 KIng Clerk 15000.0
103 ram SE 25000.0
104 sita SE 25000.0
105 Lakshman Analyst 30000.0
Time taken: 0.596 seconds, Fetched: 7 row(s)
8) Get employees who are working as SE hive> select *
from employee where desg == 'SE';
OK
103 ram SE 25000.0
104 sita SE 25000.0
Time taken: 1.639 seconds, Fetched: 2 row(s)
9) Display Employees who earn salary greater than
30000 hive> select * from employee where sal
>30000;
OK
100 smith Manager 50000.0
101 jones Asst.Manager 40000.0
Time taken: 0.669 seconds, Fetched: 2 row(s)
Big Data Analytics Lab 2022

10) Change the name of the Hive Table hive> alter table
employee rename to EMP;
OK
Time taken: 1.301 seconds hive>
DESCRIBE EMP;
OK
eno int
ename string
desg string
sal double
dept string
Time taken: 0.555 seconds, Fetched: 5 row(s)
11) Change the Column name of Employee Table hive>
alter table employee change ename EmployeeName
string;
OK
Time taken: 0.642 seconds hive>
describe employee;
OK
eno int
employeename string
desg string
sal double
dept string
Time taken: 0.576 seconds, Fetched: 5 row(s)
12) Add new Column to the Employee Table hive> alter
table employee add columns (dept string);
OK
Time taken: 0.502 seconds hive>
describe employee;
OK
eno int ename
string desg
string sal
Big Data Analytics Lab 2022

double dept
string
Time taken: 0.468 seconds, Fetched: 5 row(s)
13) Drop the hive table “employee” hive> drop table
employee;
OK
Time taken: 0.723 seconds hive>
describe employee;
FAILED: SemanticException [Error 10001]: Table not found employee
Big Data Analytics Lab 2022

9) To Create Partitions in Hive

1) Create a Table called “AllStates” in hive

hive> create table AllStates(eno int, ename string, state string) row format delimited fields terminated by ',';
OK
Time taken: 1.298 seconds
2) Check the input file exists in HDFS
[root@sandbox ~]# hadoop fs -ls /rpk
Found 2 items
-rw-r--r-- 3 root hdfs 0 2021-01-30 08:34 /rpk/empty.txt
-rw-r--r-- 3 root hdfs 95 2021-01-18 09:42 /rpk/states.txt
3) Check the contents of input file “Sates.txt”
[root@sandbox ~]# hadoop fs -cat /rpk/states.txt
100,smith,AP
101,jones,TN
102,KIng,AP
103,ram,TN
104,sita,K
105,Lakshman,Kerala
106,aaa,Kerala
3) Load the Input file from HDFS to hive
hive> load data inpath '/rpk/states.txt' into table AllStates;
Loading data to table default.allstates
Table default.allstates stats: [numFiles=1, totalSize=95]
OK
Time taken: 2.613 seconds
4) Check the contents of Input File in Hive
hive> select * from AllStates;
OK
100 smith AP
101 jones TN
102 KIng AP
103 ram TN
104 sita K
Big Data Analytics Lab 2022

105 Lakshman Kerala

106 aaa Kerala
Time taken: 0.726 seconds, Fetched: 7 row(s)
5) Create a Partitions Table hive> create table state_part(eno int,ename string) partitioned by (state
string);
OK
Time taken: 1.686 seconds
6) Insert data into Partition Tables hive> insert overwrite table state_part partition(state) select eno,ename,state
from AllStates;
Query ID = root_20210201083126_fbc2931c-12ea-4f6b-b561-fede08805943
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1612167624046_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 5.91 s
--------------------------------------------------------------------------------
Loading data to table default.state_part partition (state=null)
Time taken for load dynamic partitions : 1769
Loading partition {state=TN}
Loading partition {state=Kerala}
Loading partition {state=AP}
Loading partition {state=K}
Time taken for adding to write entity : 2
Partition default.state_part{state=AP} stats: [numFiles=1, numRows=2, totalSize=19, rawDataSize=17]
Partition default.state_part{state=K} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
Partition default.state_part{state=Kerala} stats: [numFiles=1, numRows=2, totalSize=21, rawDataSize=19]
Partition default.state_part{state=TN} stats: [numFiles=1, numRows=2, totalSize=18, rawDataSize=16]
Big Data Analytics Lab 2022

7) To View the Partitions hive>

show partitions state_part;

state=AP
state=K

state=Kerala

state=TN

Time taken: 0.798 seconds, Fetched: 4 row(s)

Ccs334 - Big Data Analytics
75% (4)
Ccs334 - Big Data Analytics
2 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Bda Lab Record
No ratings yet
Bda Lab Record
32 pages
Bda Record 18071a0597-1
No ratings yet
Bda Record 18071a0597-1
28 pages
BDA Journal
No ratings yet
BDA Journal
52 pages
20dce017 Bda Pracfil
No ratings yet
20dce017 Bda Pracfil
41 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
Bad601 Lab Maual
No ratings yet
Bad601 Lab Maual
34 pages
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
No ratings yet
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
28 pages
Bda Lab
No ratings yet
Bda Lab
94 pages
Notes
No ratings yet
Notes
53 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Bad601 Lab
No ratings yet
Bad601 Lab
32 pages
Bda Record (24-25)
No ratings yet
Bda Record (24-25)
50 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
Hadoop Plan
No ratings yet
Hadoop Plan
9 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
kh5 (Bda) Merged
No ratings yet
kh5 (Bda) Merged
21 pages
Data Science
No ratings yet
Data Science
82 pages
Bda Recordfo
No ratings yet
Bda Recordfo
24 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
Lab File Format
No ratings yet
Lab File Format
60 pages
Bda Lab
No ratings yet
Bda Lab
36 pages
Bda Lab Manual - Cse 8 Sem - Compl
No ratings yet
Bda Lab Manual - Cse 8 Sem - Compl
57 pages
BDA Record
No ratings yet
BDA Record
34 pages
Bda Record
No ratings yet
Bda Record
46 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Syllabus BDA
No ratings yet
Syllabus BDA
1 page
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
Big Data
No ratings yet
Big Data
4 pages
Big Data Masters Program
No ratings yet
Big Data Masters Program
13 pages
CCS334 Bda Lab Manual
No ratings yet
CCS334 Bda Lab Manual
48 pages
Big Data Analytics IT
No ratings yet
Big Data Analytics IT
55 pages
Int 421
No ratings yet
Int 421
2 pages
BDA Practical File
No ratings yet
BDA Practical File
61 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Lab Syllabus Format
No ratings yet
Lab Syllabus Format
4 pages
AI&DS AC Lab Manual
No ratings yet
AI&DS AC Lab Manual
5 pages
Hadoop 1
No ratings yet
Hadoop 1
15 pages
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
No ratings yet
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
44 pages
BDAA
No ratings yet
BDAA
4 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
43 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
CS442 DSA Practical File
No ratings yet
CS442 DSA Practical File
60 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
23 pages
HadoopExercises July2011 PDF
No ratings yet
HadoopExercises July2011 PDF
26 pages
Lab - Aiml Manual
No ratings yet
Lab - Aiml Manual
36 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
1 page
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
IGNOU BCA Introduction to Software Engineering Previous Year Unsolved Papers BCS 051
From Everand
IGNOU BCA Introduction to Software Engineering Previous Year Unsolved Papers BCS 051
Manish Soni
No ratings yet
How To Define Logical and Physical File and Path Names For Archiving
No ratings yet
How To Define Logical and Physical File and Path Names For Archiving
6 pages
2 Granta5 Start GB
No ratings yet
2 Granta5 Start GB
64 pages
Linux Commands Cheat Sheet 2234
100% (1)
Linux Commands Cheat Sheet 2234
10 pages
Um String Code 2 0 e
No ratings yet
Um String Code 2 0 e
46 pages
Dragon 11.5 Manual
No ratings yet
Dragon 11.5 Manual
360 pages
Administering Agilent 3070 Systems (MS Windows NT and 2000) Chapter 8
No ratings yet
Administering Agilent 3070 Systems (MS Windows NT and 2000) Chapter 8
60 pages
Online Faculty Evaluation System Documentation
67% (3)
Online Faculty Evaluation System Documentation
98 pages
Toad For DB2 Installation Guide
No ratings yet
Toad For DB2 Installation Guide
74 pages
UFT 12.01 - Licence Server - Installation Guide
No ratings yet
UFT 12.01 - Licence Server - Installation Guide
35 pages
Visual Programming - PDF
100% (1)
Visual Programming - PDF
143 pages
Pacman - Arch
No ratings yet
Pacman - Arch
8 pages
Data-File Conversion To Intel Hex-32 Format
No ratings yet
Data-File Conversion To Intel Hex-32 Format
4 pages
YAMJ Instructions
No ratings yet
YAMJ Instructions
18 pages
Symfony Cookbook 2.3
No ratings yet
Symfony Cookbook 2.3
361 pages
Chapter 5: File Systems
No ratings yet
Chapter 5: File Systems
15 pages
A-Z List of Windows CMD Commands - Also Included CMD Commands PDF
No ratings yet
A-Z List of Windows CMD Commands - Also Included CMD Commands PDF
15 pages
FTP Using C# PDF
No ratings yet
FTP Using C# PDF
8 pages
Python OS Module - 30 Most Useful Methods From Python OS Module
No ratings yet
Python OS Module - 30 Most Useful Methods From Python OS Module
5 pages
Unit 3: Exceptions and Files
No ratings yet
Unit 3: Exceptions and Files
96 pages
Sample String: Output
No ratings yet
Sample String: Output
14 pages
Configuring Apache HTTP Server and Tomcat With Mod JK
No ratings yet
Configuring Apache HTTP Server and Tomcat With Mod JK
3 pages
GNU Linear Programming Kit Java Binding: Reference Manual
100% (1)
GNU Linear Programming Kit Java Binding: Reference Manual
21 pages
Open Deploy 7.2 Install Guide
No ratings yet
Open Deploy 7.2 Install Guide
86 pages
OpenText AppWorks Platform 22.3 Installation Guide For Windows2
No ratings yet
OpenText AppWorks Platform 22.3 Installation Guide For Windows2
99 pages
Python Environment Setup and Essentials-1
No ratings yet
Python Environment Setup and Essentials-1
27 pages
UNIT-II CCP Notes PDF
No ratings yet
UNIT-II CCP Notes PDF
10 pages
CIV2FAN
No ratings yet
CIV2FAN
7 pages
Rhel Notes For Rh253
100% (1)
Rhel Notes For Rh253
27 pages
Embedded Lab Manual 8051 New
No ratings yet
Embedded Lab Manual 8051 New
102 pages
How To Script Nipper Studio
No ratings yet
How To Script Nipper Studio
8 pages