0% found this document useful (0 votes)

46 views7 pages

Lab07-Apache Pig V1.01

Uploaded by

Chloe Tee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views7 pages

Lab07-Apache Pig V1.01

Uploaded by

Chloe Tee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lab 07 – Apache Pig

Full name:
Student D:

Important Note: You are required to perform the following tasks and provide a report
based on your observations and understanding of the commands, the output, and any
possible errors or additional information provided by the system.

Note: in the following tasks, “$” means the command should be executed in Linux
terminal and “grunt>” means the command should be executed in Pig's interactive shell
called Grunt.

Tasks:

1. Open terminal

2. Check running services: sudo jps

3. Run Pig in local mode

pig -x local

after a while grunt shell will be appeared. Grunt is Pig's interactive shell. The
prompt will be “grunt>”

4. Short Research: What is Pig in Local mode?

5. Execute the following commands and write your understanding about what each
command does based on the previous practices:

a. grunt> ls
b. grunt> cat mysales.txt

1
c. grunt> quit

6. Run Pig in MapReduce (HDFS) mode

a. $ pig
OR
b. $ pig -x mapreduce

7. Short Research: What is Pig in MapReduce (HDFS) mode?

8. What are the differences between Pig in local mode ad Pig in MapReduce (HDFS)
mode?

9. Execute the following commands and write your understanding about what each
command does based on the previous practices:

a. grunt>mkdir /user/myinput
b. grunt> ls /user/myinput

c. grunt> quit

d. Use the commands that you learned for HDFS and create a file with name
mysales.txt in “/user/myinput/”. Use the following content and add it to the
create file “/user/myinput/mysales.txt”. Write all the steps.

Content of the file “/user/myinput/mysales.txt” should be:

E2001,400,4000.00
E2004,300,3000.30
E2011,500,5500.55
E2012,200,2000.20
E2001,100,500.50
E2011,600,7000.70

e. $ pig

f. grunt> cat /user/myinput/mysales.txt

2
Create a Pig table (relation):

10. To create Table ‘employee’ from mysales.txt

a. grunt> employee = LOAD 'hdfs://quickstart.cloudera:8020/user/myinput'

USING PigStorage(',') AS (eid:chararray,sales:int,comm:double);

11. To view the content of employee

a. grunt> DUMP employee

12. To view the structure of employee

a. grunt> DESCRIBE employee

13. To read data from MapReduce result (tab delimited)

a. grunt> employee2 = LOAD

'hdfs://quickstart.cloudera:8020/outputfolder/part-r-00000' USING
PigStorage('\t') AS (eid:chararray,comm:double);

b. grunt> DESCRIBE employee2

c. grunt> DUMP employee2

Selected Pig table operations

14. To see transactions with sales quantity greater than 400

a. grunt> highSales = FILTER employee BY sales > 400;

b. grunt> DUMP highSales

15. To sort transactions based on highest commission

a. grunt> highestComm = ORDER employee BY comm DESC;

b. grunt> DUMP highComm

3
Note: For Mathematical functions, group the table

16. To find total overall sales quantity

a. grunt> employeeGR = GROUP employee ALL;

b. grunt> sumSales = foreach employeeGR GENERATE (employee.eid),

SUM(employee.sales);

c. grunt> DUMP sumSales

17. To find total commission for each employee

a. grunt> employeeEID = GROUP employee BY eid;

b. grunt> totComm = foreach employeeEID GENERATE (employee.eid),

SUM(employee.comm);

c. grunt> DUMP totComm

Saving Data

18. To see all executed commands

a. grunt> history

19. To save result (HDFS) in CSV (comma delimited format)

a. grunt> STORE totComm INTO 'hdfs://localhost:9000/user/mypigoutput'

using PigStorage(',');
The result is stored in folder user/mypigoutput (automatically created)

20. To save result (LOCAL) in Tab delimited format

a. grunt> STORE totComm INTO 'mypigoutput' using PigStorage('\t');

The result is stored in home subfolder mypigoutput (automatically created)

4
Running Script (Saving commands)

To save commands as script Create script file named myscript.pig Enter the following
as content:

21. If using Pig in HDFS mode:

a. grunt>employee = LOAD 'hdfs://localhost:9000/user/myinput2' USING

PigStorage(',') AS (eid:chararray,sales:int,comm:double);

b. grunt>DESCRIBE employee;

22. If using Pig in LOCAL mode:

a. grunt>employee = LOAD 'mysales.txt' USING PigStorage(',') AS

(eid:chararray,sales:int,comm:double);

b. grunt>DESCRIBE employee;

23. To execute myscript.pig in /user/myinput folder using Pig in HDFS mode:

a. grunt> RUN hdfs://localhost:9000/user/myinput/myscript.pig;

b. grunt> DUMP employee

24. To execute myscript.pig in home folder using Pig in LOCAL mode:

a. grunt> RUN myscript.pig;

b. grunt> DUMP employee

25. Save and submit your Lab documents in PDF with the following filename format.
Submit your report via the submission link for this available on MyTIMeS.

Filename format: Name_ID_Lab07

5
Note: you can practice more commands using the resources available at bottom of this
document.

Resources:

1. Apache Pig Tutorial:

https://www.tutorialspoint.com/apache_pig/index.htm
a. Apache Pig Overview:
https://www.tutorialspoint.com/apache_pig/apache_pig_overview.htm
b. Apache Pig – Architecture:
https://www.tutorialspoint.com/apache_pig/apache_pig_architecture.htm

2. Pig Installation in ubuntu: https://hiberstack.com/install-apache-pig-in-ubuntu/

I. Start Hadoop Server

a. $ start-dfs.sh

b. $ start-yarn.sh

II. Create Pig folder, download and extract latest Pig

a. $ mkdir pig

b. $ cd pig

c. $ wget https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz
(the size is about 220mb)

d. $ tar -xvf pig-0.17.0.tar.gz

III. Set and activate environment variables

a. $ cd (go to home folder)

b. $ nano .bashrc

6
IV. Add this after Hadoop setup. Check your java version (use javac -version and ls
/usr/lib/jvm)
#JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
#Apache Pig Environment Variables
export PIG_HOME=/home/hadoop/pig/pig-0.17.0
export PATH=$PATH:/home/hadoop/pig/pig-0.17.0/bin
export PIG_CLASSPATH=$HADOOP_HOME/conf

V. Refresh environment variables

a. $ source .bashrc

VI. Check Pig version

a. $ pig -version

Chapter 5 - Introducing Pig Pig Architecture
No ratings yet
Chapter 5 - Introducing Pig Pig Architecture
81 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
System Design Resources
No ratings yet
System Design Resources
25 pages
Apache Pig
100% (2)
Apache Pig
80 pages
BDA Unit-4
No ratings yet
BDA Unit-4
98 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
Bda - Module Ii
No ratings yet
Bda - Module Ii
239 pages
Geo Informatics and Nano Technology
No ratings yet
Geo Informatics and Nano Technology
63 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
101 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
94 pages
Pig Tutorial PDF
No ratings yet
Pig Tutorial PDF
22 pages
Unit III
No ratings yet
Unit III
118 pages
BDA Unit - IV
No ratings yet
BDA Unit - IV
81 pages
Pig 2
No ratings yet
Pig 2
63 pages
Pig Hive
No ratings yet
Pig Hive
59 pages
4.1 Pig Unit4
No ratings yet
4.1 Pig Unit4
55 pages
BIG DATA Module 2 FINAL SMI
No ratings yet
BIG DATA Module 2 FINAL SMI
44 pages
Hadoop - PIG User Material
No ratings yet
Hadoop - PIG User Material
292 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Pig Hive
No ratings yet
Pig Hive
58 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
Unit 3
No ratings yet
Unit 3
26 pages
Cse 17CS82 M2 S1 PPT
No ratings yet
Cse 17CS82 M2 S1 PPT
35 pages
Apache Pig
No ratings yet
Apache Pig
28 pages
32 BDA Exp5
No ratings yet
32 BDA Exp5
33 pages
GRAW Track Measurement Systems
No ratings yet
GRAW Track Measurement Systems
48 pages
Addition of Integers
No ratings yet
Addition of Integers
6 pages
Pig Tutorial
No ratings yet
Pig Tutorial
22 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
Active Directory Delegation
No ratings yet
Active Directory Delegation
233 pages
UNIT 5 Notes by ARUN JHAPATE
No ratings yet
UNIT 5 Notes by ARUN JHAPATE
21 pages
BigData Module 2
No ratings yet
BigData Module 2
41 pages
Unit-4 SGS
No ratings yet
Unit-4 SGS
13 pages
Session 3.3
No ratings yet
Session 3.3
30 pages
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
No ratings yet
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
58 pages
Pig Slides
No ratings yet
Pig Slides
46 pages
Demonstration: Understanding Pig: HDP Developer: Apache Pig and Hive
No ratings yet
Demonstration: Understanding Pig: HDP Developer: Apache Pig and Hive
26 pages
Pig
No ratings yet
Pig
16 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Enterprise Architecture Udemy Course Contents
No ratings yet
Enterprise Architecture Udemy Course Contents
17 pages
Pig Commands
No ratings yet
Pig Commands
9 pages
Unit 5
No ratings yet
Unit 5
16 pages
Pig, Grunt, Hive: Presented By:Akila 20Spcs01
No ratings yet
Pig, Grunt, Hive: Presented By:Akila 20Spcs01
16 pages
Bda V
No ratings yet
Bda V
10 pages
BDC Output 7
No ratings yet
BDC Output 7
9 pages
Pig Notes-1
No ratings yet
Pig Notes-1
6 pages
Apache PIG by Sravanthi
No ratings yet
Apache PIG by Sravanthi
31 pages
Apache Pig: For Live Hadoop Training, Please See Courses
No ratings yet
Apache Pig: For Live Hadoop Training, Please See Courses
25 pages
Pig Expt 5
No ratings yet
Pig Expt 5
4 pages
Pig
No ratings yet
Pig
12 pages
Pig Setup and Test Run: by Kannan Kalidasan
No ratings yet
Pig Setup and Test Run: by Kannan Kalidasan
17 pages
7 Ibiz Pig Workouts
No ratings yet
7 Ibiz Pig Workouts
7 pages
Apache Pig Data Processing Guide
No ratings yet
Apache Pig Data Processing Guide
10 pages
Pig 2
No ratings yet
Pig 2
3 pages
Unit 4
No ratings yet
Unit 4
5 pages
PIG Exercise 1
No ratings yet
PIG Exercise 1
2 pages
Pig Mini Project
No ratings yet
Pig Mini Project
1 page
Pig Programming - Create Your First Apache Pig Script - Edureka
No ratings yet
Pig Programming - Create Your First Apache Pig Script - Edureka
5 pages
Lab 7
No ratings yet
Lab 7
2 pages
Image Segmentation DeepLearning
No ratings yet
Image Segmentation DeepLearning
18 pages
Business Communication Skills UNIT 1
No ratings yet
Business Communication Skills UNIT 1
23 pages
Travel Request Form: Traveller Information
No ratings yet
Travel Request Form: Traveller Information
1 page
Sip PSTN Call Flow
No ratings yet
Sip PSTN Call Flow
7 pages
Digital Instrumentation
No ratings yet
Digital Instrumentation
1 page
Mobile Application Development Past
No ratings yet
Mobile Application Development Past
3 pages
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
No ratings yet
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
19 pages
Statistics and Probability Reviewer
No ratings yet
Statistics and Probability Reviewer
10 pages
W11 Lecture ITS69204 Image Recognition
No ratings yet
W11 Lecture ITS69204 Image Recognition
44 pages
QASs Presentation
No ratings yet
QASs Presentation
20 pages
Watershed Segmentation
No ratings yet
Watershed Segmentation
22 pages
ITS66034 Group 24 Assignment
No ratings yet
ITS66034 Group 24 Assignment
13 pages
Step-By-Step Example For Practical PCB Design - Power Supply Design Tutorial Section 3-3 - Power Electronics
No ratings yet
Step-By-Step Example For Practical PCB Design - Power Supply Design Tutorial Section 3-3 - Power Electronics
28 pages
Image Segmentation in Python - Practical Hands-On
No ratings yet
Image Segmentation in Python - Practical Hands-On
24 pages
Unit 5
No ratings yet
Unit 5
7 pages
Group3 Robotics
No ratings yet
Group3 Robotics
33 pages
Wheatstone Bridge's Sensitivity, Resistors' Values Effect PDF
No ratings yet
Wheatstone Bridge's Sensitivity, Resistors' Values Effect PDF
6 pages
Java Programming Made Notes
No ratings yet
Java Programming Made Notes
6 pages
PracticalWeek03a
No ratings yet
PracticalWeek03a
1 page
Library Management Project 96ec
No ratings yet
Library Management Project 96ec
25 pages
Probuds t31
No ratings yet
Probuds t31
7 pages
Lecture 08 Image Segmentation
No ratings yet
Lecture 08 Image Segmentation
31 pages
Quick Start Guide Cisco 7911 IP Telephone: More User Manuals On
No ratings yet
Quick Start Guide Cisco 7911 IP Telephone: More User Manuals On
18 pages
T.C. Electronic M5000 General Instructions
No ratings yet
T.C. Electronic M5000 General Instructions
17 pages
AI-ML Using Py
No ratings yet
AI-ML Using Py
10 pages
Effect of Speedometer Positioning: Distraction and Workload While Driving
No ratings yet
Effect of Speedometer Positioning: Distraction and Workload While Driving
6 pages
(2020) Liquid Case Study - Animation Studio Unlocks VDI Performance and Efficiency With Liquid (Liquid)
No ratings yet
(2020) Liquid Case Study - Animation Studio Unlocks VDI Performance and Efficiency With Liquid (Liquid)
7 pages
Answer Key
No ratings yet
Answer Key
2 pages
NS 21ec742 Assignment 2
No ratings yet
NS 21ec742 Assignment 2
2 pages
LJF
No ratings yet
LJF
3 pages
WWW Kratikal Com Blog How Is Vulnerability Management Different From Vulnerability Assessment
No ratings yet
WWW Kratikal Com Blog How Is Vulnerability Management Different From Vulnerability Assessment
7 pages
Students - Students
No ratings yet
Students - Students
1 page
PracticalWeek02
No ratings yet
PracticalWeek02
1 page
Windows 7 Regal Business Edition 2014 SP1
No ratings yet
Windows 7 Regal Business Edition 2014 SP1
1 page
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet

Lab07-Apache Pig V1.01

Uploaded by

Lab07-Apache Pig V1.01

Uploaded by

Lab 07 – Apache Pig

2. Check running services: sudo jps

3. Run Pig in local mode

4. Short Research: What is Pig in Local mode?

6. Run Pig in MapReduce (HDFS) mode

7. Short Research: What is Pig in MapReduce (HDFS) mode?

Content of the file “/user/myinput/mysales.txt” should be:

f. grunt> cat /user/myinput/mysales.txt

10. To create Table ‘employee’ from mysales.txt

a. grunt> employee = LOAD 'hdfs://quickstart.cloudera:8020/user/myinput'

11. To view the content of employee

a. grunt> DUMP employee

12. To view the structure of employee

a. grunt> DESCRIBE employee

13. To read data from MapReduce result (tab delimited)

a. grunt> employee2 = LOAD

b. grunt> DESCRIBE employee2

c. grunt> DUMP employee2

Selected Pig table operations

14. To see transactions with sales quantity greater than 400

a. grunt> highSales = FILTER employee BY sales > 400;

b. grunt> DUMP highSales

15. To sort transactions based on highest commission

a. grunt> highestComm = ORDER employee BY comm DESC;

b. grunt> DUMP highComm

16. To find total overall sales quantity

a. grunt> employeeGR = GROUP employee ALL;

b. grunt> sumSales = foreach employeeGR GENERATE (employee.eid),

c. grunt> DUMP sumSales

17. To find total commission for each employee

a. grunt> employeeEID = GROUP employee BY eid;

b. grunt> totComm = foreach employeeEID GENERATE (employee.eid),

c. grunt> DUMP totComm

18. To see all executed commands

19. To save result (HDFS) in CSV (comma delimited format)

a. grunt> STORE totComm INTO 'hdfs://localhost:9000/user/mypigoutput'

20. To save result (LOCAL) in Tab delimited format

a. grunt> STORE totComm INTO 'mypigoutput' using PigStorage('\t');

21. If using Pig in HDFS mode:

a. grunt>employee = LOAD 'hdfs://localhost:9000/user/myinput2' USING

22. If using Pig in LOCAL mode:

a. grunt>employee = LOAD 'mysales.txt' USING PigStorage(',') AS

23. To execute myscript.pig in /user/myinput folder using Pig in HDFS mode:

a. grunt> RUN hdfs://localhost:9000/user/myinput/myscript.pig;

b. grunt> DUMP employee

24. To execute myscript.pig in home folder using Pig in LOCAL mode:

a. grunt> RUN myscript.pig;

b. grunt> DUMP employee

Filename format: Name_ID_Lab07

1. Apache Pig Tutorial:

2. Pig Installation in ubuntu: https://hiberstack.com/install-apache-pig-in-ubuntu/

I. Start Hadoop Server

II. Create Pig folder, download and extract latest Pig

d. $ tar -xvf pig-0.17.0.tar.gz

III. Set and activate environment variables

a. $ cd (go to home folder)

V. Refresh environment variables

VI. Check Pig version

You might also like