Manual 5
Manual 5
LAB REPORT
III SEMESTER
Faculty In-Charge
Prof. R Padmaja
Assistant Professor, MCA Department
Chittoor-517127
MCA DEPARTMENT
Reg. No:
This is to certify that this is the bonafide record work done in the laboratory by the
candidate studying MCA III Semester during the year 2023-
2024.
10 Hive Partitions
RUBRICS FOR BIG DATA ANALYTICS LAB
Analysis and
Thorough analysis of the Reasonable analysis of Improper analysis of the
Synthesis
problem designed the given problem given problem
(CO2)
Student understands
what needs to be tested Student understands Student understands what
Design and designs an what needs to be tested needs to be tested and does
(CO3) appropriate experiment, and designs an not design an appropriate
and explains the appropriate experiment.
experiment concisely and experiment.
well
Complex Analysis Thorough Reasonable
Improper comprehension
& Conclusion comprehension through comprehension through
through analysis/ synthesis
(CO4) analysis/ synthesis analysis/ synthesis
Use modern Student uses the tools to
tools in measure correctly, and Student uses the tools
Student uses the tools
engineering understands the correctly, and unable to
to measure correctly.
practice limitations of the measure properly.
(CO5) hardware.
Status report with
Status report with clear
Report Writing logical sequence of Status report not properly
and logical sequence of
(CO6) parameter using organized
parameter using
understandable
excellent language
language
Student will demonstrate Student will
Lab safety good demonstrate good Students demonstrate a little
(CO7) understanding and follow understanding of lab knowledge of lab safety.
lab safety safety
Performance on teams is
Ability to work in excellent with clear Performance on teams Performance on teams is
teams evidence of equal is good with equal acceptable with one or more
(CO8) distribution of tasks and distribution of tasks members carrying a larger
effort and effort amount of the effort
S. No Experiment Name
10 10 5 5 10 40
S. No Experiment Name
3) put (or) copyFromLocal :copies the file or directory from the local file system to the HDFS
Usage: hadoop fs [generic options] -rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...
11) help: Displays help for given command or all commands if none is specified
[root@sandbox ~]# hadoop fs -help mkdir
-mkdir [-p] <path> ... :
Create a directory in specified location.
-p Do not fail if the directory already exists
Big Data Analytics Lab 2022
95 /rpk/states.txt
13) appendToFile : Appends the contents of all given local files to the given destination file on HDFS
[root@sandbox ~]# cat > localfile
This is to illustrate appendToFile command
^Z
[1]+ Stopped cat > localfile
[root@sandbox ~]# hadoop fs -appendToFile localfile rpk/empty.txt
[root@sandbox ~]# hadoop fs -cat rpk/empty.txt
This is to illustrate appendToFile command
14) tail : shows the last 1KB of the given file
[root@sandbox ~]# hadoop fs -tail /rpk/states.txt
100,smith,AP
101,jones,TN
102,KIng,AP
103,ram,TN
104,sita,K
105,Lakshman,Kerala
106,aaa,Kerala
15) stat : Prints statistics about the file/directory
[root@sandbox ~]# hadoop fs -stat %b /rpk/states.txt
95
[root@sandbox ~]# hadoop fs -stat %g /rpk/states.txt hdfs
[root@sandbox ~]# hadoop fs -stat %n /rpk/states.txt states.txt
[root@sandbox ~]# hadoop fs -stat %o /rpk/states.txt
134217728
[root@sandbox ~]# hadoop fs -stat %r /rpk/states.txt
3
[root@sandbox ~]# hadoop fs -stat %u /rpk/states.txt root
[root@sandbox ~]# hadoop fs -stat %y /rpk/states.txt
2021-01-18 09:42:33
Big Data Analytics Lab 2022
Step 1: Right click on Package Explorer window, select new ->Java project to Create a Java Project
called “Hadoop_Exer”
Step2 : Right click on Hadoop_Exer” Project , select new ->Package to Create a Package called
“main.java.com.training “ in the Java Project “Hadoop_Exer”
Big Data Analytics Lab 2022
Step 3: Right click on “main.java.com.training1 “ package , select new ->class to Create a Class called
“WordCount” in package “main.java.com.training1” package
Big Data Analytics Lab 2022
Step 4 : Right click on Hadoop_Exer” Project, select Build path ->Configure Build path,select Libraries
tab, remove all Jar files and add External Jar files to execute the program
Step 4 : Right click on Hadoop_Exer” Project, select Build path ->Configure Build path,select Libraries
tab, remove all Jar files and add External Jar files to execute the program
Big Data Analytics Lab 2022
Step 4 : Right click on Hadoop_Exer” Project, select Build path ->Configure Build path,select Libraries
tab, remove all Jar files and add External Jar files to execute the program
Big Data Analytics Lab 2022
Step 5: select Export ->Javajar , type the jar file with destination
Big Data Analytics Lab 2022
Big Data Analytics Lab 2022
Step 6: Open winscp , to transfer fruits.txt and wc_new.jar from windows to Unix. Drag the wc_new.jar
from d:/ to unix’s root directory
Big Data Analytics Lab 2022
Step 7: create a directory called “Inputs” in unix and transfer the input file “fruits.txt” from root to
Inputs directory
Step 12: create another java file called “WeatherData “ in the same package as
Big Data Analytics Lab 2022
Step 14: transfer the dataset from windows to unix folder “Inputs”
3) TO ILLUSTRATE THE PIG DATA PROCESSING OPERATORS LOAD and STORE Data
LOAD Operator
STORE OPERATOR
Found 2 items
-rw-r--r-- 3 root hdfs 0 2021-01-28 05:20 rp2/_SUCCESS
-rw-r--r-- 3 root hdfs 25 2021-01-28 05:20 rp2/part-m-0000
8) Display the file contents
[root@sandbox ~]# hadoop fs -cat rp2/part-m-00000
100 smith AP
102 KIng AP
2) FILTERING DATA
FILTER OPERATOR
JOIN Operator
GROUP Operator
COGROUP Operator
CROSS Operator
JOIN OPERATOR
1) Create a file called join1
[root@sandbox ~]# cat > join1
joe 2
hank 4
ali 0
eve 3
hank 2
2) Create another file called join2 [root@sandbox ~]# cat > join2
2 tie
4 coat
1 scarf
3 hat
3) Create a directory called h1 in hadoop
[root@sandbox ~]# hadoop fs –mkdir h1
3) Push both join1 and join2 to the h1 directory
[root@sandbox ~]# hadoop fs -put join1 h1
[root@sandbox ~]# hadoop fs -put join2 h1
4) Load join2 file as Pig Relation grunt>A = LOAD 'h1/join2' USING PigStorage
('\t') AS (id:int,name:charArray); grunt>DUMP A;
(2,tie)
(4,coat)
(1,scarf)
(3,hat)
Big Data Analytics Lab 2022
5) Load join1 file as Pig Relation grunt>B = LOAD 'h1/join1' USING PigStorage
('\t') AS (name:charArray,id:int); grunt>DUMP B;
(joe,2)
(hank,4)
(ali,0)
(eve,3)
(hank,2)
6) Join the relations A and B grunt>C = JOIN A BY $0, B BY $1; grunt>DUMP C;
(2,tie,hank,2)
(2,tie,joe,2)
(3,hat,eve,3)
(4,coat,hank,4)
COGROUP OPERATOR:
grunt>DUMP D;
(0,{},{(ali,0)})
(1,{(1,scarf)},{})
(2,{(2,tie)},{(hank,2),(joe,2)})
(3,{(3,hat)},{(eve,3)})
(4,{(4,coat)},{(hank,4)})
GROUP OPERATOR
Dump Records;
Dump sub;
(james, wp,78)
(smith, dsat,65)
(james, dsat,73)
(jones, unix,89)
(smith, unix,50)
(jones, unix,85)
( wp,{(james, wp,78)})
CROSS OPERATOR
1 2
3 4
a b c
e f g
4) Load both cross1 and cross2 from hadoop’s directory to Pig grunt> G = LOAD
DUMP G;
(1,2)
(3,4)
DUMP H;
(a,b)
(e,f)
DUMP I;
(3,4,e,f)
(3,4,a,b)
(1,2,e,f)
(1,2,a,b)
4) SORTING DATA
ORDER Operator
(102,lakshman,mba)
(103,bharatha,mba)
(104,ram,mca)
(105,sita,mca)
5) COMBINING AND SPLITTING DATA
(2,3)
(1,2)
(2,4) grunt> M = LOAD 'h1/union1' USING PigStorage ('\t') AS
(fname:chararray,lname:chararray,id:int);
DUMP M;
(z,x,8)
(w,y,1)
5) Apply UNION Operator grunt> N = UNION L,M;
DUMP N;
(z,x,8)
(w,y,1)
(2,3)
(1,2)
(2,4)
SPLIT OPERATOR:
(103,tornwhite,25,bay)
(104,leon,26,chennai)
4) Apply Split Operator grunt> SPLIT O INTO Q IF age<=23,R if
age>23; DUMP Q;
(101,ivan,20,chennai)
(102,bayross,23,delhi)
DUMP R;
(100,smith,24,hyd)
(103,tornwhite,25,bay)
(104,leon,26,chennai)
LIMIT OPERATOR
(101,Sita,MCA)
Big Data Analytics Lab 2022
Dump Records;
Name;
generate group,COUNT(Records.marks);
Dump count;
Big Data Analytics Lab 2022
OUTPUT
(james, wp,78)
(smith, dsat,65)
(james, dsat,73)
(jones, unix,89)
(smith, unix,50)
(jones, unix,85)
(james,151)
(jones,174)
(smith,115)
(james,75.5)
(jones,87.0)
(smith,57.5)
(james,2)
(jones,2)
(smith,2)
Big Data Analytics Lab 2022
as(name:charArray,subject:charArray,marks:int); Dump
Dump min;
Big Data Analytics Lab 2022
OUTPUT
(james, wp,78)
(smith, dsat,65)
(james, dsat,73)
(jones, unix,89)
(smith, unix,50)
(jones, unix,85)
( wp,{(james, wp,78)})
( wp,78)
( dsat,65)
( unix,50)
( wp,78)
( dsat,73)
( unix,89)
Big Data Analytics Lab 2022
Dump A;
Dump B;
Dump C;
Dump D;
Dump E;
OUTPUT
(smith, dsat,65)
(james, dsat,73)
(jones, unix,89)
(smith, unix,50)
(jones, unix,85)
(james,james)
(smith,smith)
(james,james)
(jones,jones)
(smith,smith)
(jones,jones)
(james,JAMES)
(smith,SMITH)
(james,JAMES)
(jones,JONES)
(smith,SMITH)
(jones,JONES)
(james,am)
(smith,mi)
(jones,on)
Big Data Analytics Lab 2022
(james,4)
(king,-1)
(jones,4)
(rama,-1)
(sita,0)
Dump F; Displays the names that ends with ‘s’ and the names that starts with ‘j’
(smith,false,false)
(james,true,true)
(king,false,false)
(jones,true,true)
(rama,false,false)
(sita,false,false)
Big Data Analytics Lab 2022
Dump X;
Dump Y;
Dump Z;
Dump U;
Big Data Analytics Lab 2022
OUTPUT
(smith,7.8)
(james,8.7)
(king,6.9)
(jones,8.5)
(rama,7.7)
(sita,8.9)
(7.8,8)
(8.7,9)
(6.9,7)
(8.5,9)
(7.7,8)
(8.9,9)
Dump Z; Displays the FLOOR values of GPA Dump U; Displays the CEIL values of GPA
(7.8,8.0) (7.8,7.0)
(8.7,9.0) (8.7,8.0)
(6.9,7.0) (6.9,6.0)
(8.5,9.0) (8.5,8.0)
(7.7,8.0) (7.7,7.0)
(8.9,9.0) (8.9,8.0)
Big Data Analytics Lab 2022
$ vi Weather.pig
A = Load 'rp/WeatherDataset.txt' as (line:chararray);
B = foreach A generate SUBSTRING(line,15,19),SUBSTRING(line,88,92),SUBSTRING(line,92,93); store B
into 'rp/revised5';
E = load 'rp/revised5/part-m-00000' using PigStorage('\t') as (year:int,temp:int,quality:int);
filtered_records = FILTER E by temp!=9999 AND quality IN (0,1,4,5,9); grouped_records =
GROUP filtered_records by year; max_temp = FOREACH grouped_records GENERATE
group,MAX(filtered_records.temp); dump max_temp;
Big Data Analytics Lab 2022
string sal
double
Time taken: 0.506 seconds, Fetched: 4 row(s)
10) Change the name of the Hive Table hive> alter table
employee rename to EMP;
OK
Time taken: 1.301 seconds hive>
DESCRIBE EMP;
OK
eno int
ename string
desg string
sal double
dept string
Time taken: 0.555 seconds, Fetched: 5 row(s)
11) Change the Column name of Employee Table hive>
alter table employee change ename EmployeeName
string;
OK
Time taken: 0.642 seconds hive>
describe employee;
OK
eno int
employeename string
desg string
sal double
dept string
Time taken: 0.576 seconds, Fetched: 5 row(s)
12) Add new Column to the Employee Table hive> alter
table employee add columns (dept string);
OK
Time taken: 0.502 seconds hive>
describe employee;
OK
eno int ename
string desg
string sal
Big Data Analytics Lab 2022
double dept
string
Time taken: 0.468 seconds, Fetched: 5 row(s)
13) Drop the hive table “employee” hive> drop table
employee;
OK
Time taken: 0.723 seconds hive>
describe employee;
FAILED: SemanticException [Error 10001]: Table not found employee
Big Data Analytics Lab 2022
OK
state=AP
state=K
state=Kerala
state=TN