Lab07-Apache Pig V1.01
Lab07-Apache Pig V1.01
Full name:
Student D:
Important Note: You are required to perform the following tasks and provide a report
based on your observations and understanding of the commands, the output, and any
possible errors or additional information provided by the system.
Note: in the following tasks, “$” means the command should be executed in Linux
terminal and “grunt>” means the command should be executed in Pig's interactive shell
called Grunt.
Tasks:
1. Open terminal
pig -x local
after a while grunt shell will be appeared. Grunt is Pig's interactive shell. The
prompt will be “grunt>”
5. Execute the following commands and write your understanding about what each
command does based on the previous practices:
a. grunt> ls
b. grunt> cat mysales.txt
1
c. grunt> quit
a. $ pig
OR
b. $ pig -x mapreduce
8. What are the differences between Pig in local mode ad Pig in MapReduce (HDFS)
mode?
9. Execute the following commands and write your understanding about what each
command does based on the previous practices:
a. grunt>mkdir /user/myinput
b. grunt> ls /user/myinput
c. grunt> quit
d. Use the commands that you learned for HDFS and create a file with name
mysales.txt in “/user/myinput/”. Use the following content and add it to the
create file “/user/myinput/mysales.txt”. Write all the steps.
E2001,400,4000.00
E2004,300,3000.30
E2011,500,5500.55
E2012,200,2000.20
E2001,100,500.50
E2011,600,7000.70
e. $ pig
2
Create a Pig table (relation):
3
Note: For Mathematical functions, group the table
Saving Data
a. grunt> history
4
Running Script (Saving commands)
To save commands as script Create script file named myscript.pig Enter the following
as content:
b. grunt>DESCRIBE employee;
b. grunt>DESCRIBE employee;
25. Save and submit your Lab documents in PDF with the following filename format.
Submit your report via the submission link for this available on MyTIMeS.
5
Note: you can practice more commands using the resources available at bottom of this
document.
Resources:
a. $ start-dfs.sh
b. $ start-yarn.sh
a. $ mkdir pig
b. $ cd pig
c. $ wget https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz
(the size is about 220mb)
b. $ nano .bashrc
6
IV. Add this after Hadoop setup. Check your java version (use javac -version and ls
/usr/lib/jvm)
#JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
#Apache Pig Environment Variables
export PIG_HOME=/home/hadoop/pig/pig-0.17.0
export PATH=$PATH:/home/hadoop/pig/pig-0.17.0/bin
export PIG_CLASSPATH=$HADOOP_HOME/conf
a. $ source .bashrc
a. $ pig -version