0% found this document useful (0 votes)
46 views7 pages

Lab07-Apache Pig V1.01

Uploaded by

Chloe Tee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views7 pages

Lab07-Apache Pig V1.01

Uploaded by

Chloe Tee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab 07 – Apache Pig

Full name:
Student D:

Important Note: You are required to perform the following tasks and provide a report
based on your observations and understanding of the commands, the output, and any
possible errors or additional information provided by the system.

Note: in the following tasks, “$” means the command should be executed in Linux
terminal and “grunt>” means the command should be executed in Pig's interactive shell
called Grunt.

Tasks:

1. Open terminal

2. Check running services: sudo jps

3. Run Pig in local mode

pig -x local

after a while grunt shell will be appeared. Grunt is Pig's interactive shell. The
prompt will be “grunt>”

4. Short Research: What is Pig in Local mode?

5. Execute the following commands and write your understanding about what each
command does based on the previous practices:

a. grunt> ls
b. grunt> cat mysales.txt

1
c. grunt> quit

6. Run Pig in MapReduce (HDFS) mode

a. $ pig
OR
b. $ pig -x mapreduce

7. Short Research: What is Pig in MapReduce (HDFS) mode?

8. What are the differences between Pig in local mode ad Pig in MapReduce (HDFS)
mode?

9. Execute the following commands and write your understanding about what each
command does based on the previous practices:

a. grunt>mkdir /user/myinput
b. grunt> ls /user/myinput

c. grunt> quit

d. Use the commands that you learned for HDFS and create a file with name
mysales.txt in “/user/myinput/”. Use the following content and add it to the
create file “/user/myinput/mysales.txt”. Write all the steps.

Content of the file “/user/myinput/mysales.txt” should be:

E2001,400,4000.00
E2004,300,3000.30
E2011,500,5500.55
E2012,200,2000.20
E2001,100,500.50
E2011,600,7000.70

e. $ pig

f. grunt> cat /user/myinput/mysales.txt

2
Create a Pig table (relation):

10. To create Table ‘employee’ from mysales.txt

a. grunt> employee = LOAD 'hdfs://quickstart.cloudera:8020/user/myinput'


USING PigStorage(',') AS (eid:chararray,sales:int,comm:double);

11. To view the content of employee

a. grunt> DUMP employee

12. To view the structure of employee

a. grunt> DESCRIBE employee

13. To read data from MapReduce result (tab delimited)

a. grunt> employee2 = LOAD


'hdfs://quickstart.cloudera:8020/outputfolder/part-r-00000' USING
PigStorage('\t') AS (eid:chararray,comm:double);

b. grunt> DESCRIBE employee2

c. grunt> DUMP employee2

Selected Pig table operations

14. To see transactions with sales quantity greater than 400

a. grunt> highSales = FILTER employee BY sales > 400;

b. grunt> DUMP highSales

15. To sort transactions based on highest commission

a. grunt> highestComm = ORDER employee BY comm DESC;

b. grunt> DUMP highComm

3
Note: For Mathematical functions, group the table

16. To find total overall sales quantity

a. grunt> employeeGR = GROUP employee ALL;

b. grunt> sumSales = foreach employeeGR GENERATE (employee.eid),


SUM(employee.sales);

c. grunt> DUMP sumSales

17. To find total commission for each employee

a. grunt> employeeEID = GROUP employee BY eid;

b. grunt> totComm = foreach employeeEID GENERATE (employee.eid),


SUM(employee.comm);

c. grunt> DUMP totComm

Saving Data

18. To see all executed commands

a. grunt> history

19. To save result (HDFS) in CSV (comma delimited format)

a. grunt> STORE totComm INTO 'hdfs://localhost:9000/user/mypigoutput'


using PigStorage(',');
The result is stored in folder user/mypigoutput (automatically created)

20. To save result (LOCAL) in Tab delimited format

a. grunt> STORE totComm INTO 'mypigoutput' using PigStorage('\t');


The result is stored in home subfolder mypigoutput (automatically created)

4
Running Script (Saving commands)

To save commands as script Create script file named myscript.pig Enter the following
as content:

21. If using Pig in HDFS mode:

a. grunt>employee = LOAD 'hdfs://localhost:9000/user/myinput2' USING


PigStorage(',') AS (eid:chararray,sales:int,comm:double);

b. grunt>DESCRIBE employee;

22. If using Pig in LOCAL mode:

a. grunt>employee = LOAD 'mysales.txt' USING PigStorage(',') AS


(eid:chararray,sales:int,comm:double);

b. grunt>DESCRIBE employee;

23. To execute myscript.pig in /user/myinput folder using Pig in HDFS mode:

a. grunt> RUN hdfs://localhost:9000/user/myinput/myscript.pig;

b. grunt> DUMP employee

24. To execute myscript.pig in home folder using Pig in LOCAL mode:

a. grunt> RUN myscript.pig;

b. grunt> DUMP employee

25. Save and submit your Lab documents in PDF with the following filename format.
Submit your report via the submission link for this available on MyTIMeS.

Filename format: Name_ID_Lab07

5
Note: you can practice more commands using the resources available at bottom of this
document.

Resources:

1. Apache Pig Tutorial:


https://www.tutorialspoint.com/apache_pig/index.htm
a. Apache Pig Overview:
https://www.tutorialspoint.com/apache_pig/apache_pig_overview.htm
b. Apache Pig – Architecture:
https://www.tutorialspoint.com/apache_pig/apache_pig_architecture.htm

2. Pig Installation in ubuntu: https://hiberstack.com/install-apache-pig-in-ubuntu/

I. Start Hadoop Server

a. $ start-dfs.sh

b. $ start-yarn.sh

II. Create Pig folder, download and extract latest Pig

a. $ mkdir pig

b. $ cd pig

c. $ wget https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz
(the size is about 220mb)

d. $ tar -xvf pig-0.17.0.tar.gz

III. Set and activate environment variables

a. $ cd (go to home folder)

b. $ nano .bashrc

6
IV. Add this after Hadoop setup. Check your java version (use javac -version and ls
/usr/lib/jvm)
#JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
#Apache Pig Environment Variables
export PIG_HOME=/home/hadoop/pig/pig-0.17.0
export PATH=$PATH:/home/hadoop/pig/pig-0.17.0/bin
export PIG_CLASSPATH=$HADOOP_HOME/conf

V. Refresh environment variables

a. $ source .bashrc

VI. Check Pig version

a. $ pig -version

You might also like