Install PIG

The document provides steps to download, install, and configure Apache Pig on a Linux system. It describes downloading the Pig source and binary files, extracting and moving them to a Pig directory, setting environment variables and properties files, and verifying the installation by checking the Pig version.

Uploaded by

Kajal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views6 pages

Install PIG

Uploaded by

Kajal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Prerequisites

It is essential that you have Hadoop and Java installed on your system before you go for
Apache Pig. Therefore, prior to installing Apache Pig, install Hadoop and Java.

Step 1
Open the homepage of Apache Pig website. Under the section News, click on the link release
page as shown in the following snapshot.

Step 2
On clicking the specified link, you will be redirected to the Apache Pig Releases page. On this
page, under the Download section, you will have two links, namely, Pig 0.8 and later and Pig
0.7 and before. Click on the link Pig 0.8 and later, then you will be redirected to the page
having a set of mirrors.
Step 3
Choose and click any one of these mirrors as shown below.
Step 4
These mirrors will take you to the Pig Releases page. This page contains various versions of
Apache Pig. Click the latest version among them.
Step 5
Within these folders, you will have the source and binary files of Apache Pig in various
distributions. Download the tar files of the source and binary files of Apache Pig
0.15, pig0.15.0-src.tar.gz and pig-0.15.0.tar.gz.

Install Apache Pig

After downloading the Apache Pig software, install it in your Linux environment by following the
steps given below.

Step 1
Create a directory with the name Pig in the same directory where the installation directories
of Hadoop, Java, and other software were installed. (In our tutorial, we have created the Pig
directory in the user named Hadoop).
$ mkdir Pig

Step 2
Extract the downloaded tar files as shown below.
$ cd Downloads/
$ tar zxvf pig-0.15.0-src.tar.gz
$ tar zxvf pig-0.15.0.tar.gz
Step 3
Move the content of pig-0.15.0-src.tar.gz file to the Pig directory created earlier as shown
below.
$ mv pig-0.15.0-src.tar.gz/* /home/Hadoop/Pig/

Configure Apache Pig

After installing Apache Pig, we have to configure it. To configure, we need to edit two files
− bashrc and pig.properties.

.bashrc file
In the .bashrc file, set the following variables −
 PIG_HOME folder to the Apache Pig’s installation folder,
 PATH environment variable to the bin folder, and
 PIG_CLASSPATH environment variable to the etc (configuration) folder of your Hadoop
installations (the directory that contains the core-site.xml, hdfs-site.xml and mapred-
site.xml files).
export PIG_HOME = /home/Hadoop/Pig
export PATH = $PATH:/home/Hadoop/pig/bin
export PIG_CLASSPATH = $HADOOP_HOME/conf

pig.properties file
In the conf folder of Pig, we have a file named pig.properties. In the pig.properties file, you
can set various parameters as given below.
pig -h properties
The following properties are supported −
Logging: verbose = true|false; default is false. This property is the same
as -v
switch brief=true|false; default is false. This property is the
same
as -b switch debug=OFF|ERROR|WARN|INFO|DEBUG; default is INFO.
This property is the same as -d switch aggregate.warning =
true|false; default is true.
If true, prints count of warnings of each type rather than logging
each warning.

Performance tuning: pig.cachedbag.memusage=<mem fraction>; default is 0.2

(20% of all memory).
Note that this memory is shared across all large bags used by the
application.
pig.skewedjoin.reduce.memusagea=<mem fraction>; default is 0.3 (30%
of all memory).
Specifies the fraction of heap available for the reducer to perform
the join.
pig.exec.nocombiner = true|false; default is false.
Only disable combiner as a temporary workaround for problems.
opt.multiquery = true|false; multiquery is on by default.
Only disable multiquery as a temporary workaround for problems.
opt.fetch=true|false; fetch is on by default.
Scripts containing Filter, Foreach, Limit, Stream, and Union
can be dumped without MR jobs.
pig.tmpfilecompression = true|false; compression is off by default.
Determines whether output of intermediate jobs is compressed.
pig.tmpfilecompression.codec = lzo|gzip; default is gzip.
Used in conjunction with pig.tmpfilecompression. Defines
compression type.
pig.noSplitCombination = true|false. Split combination is on by
default.
Determines if multiple small files are combined into a single
map.

pig.exec.mapPartAgg = true|false. Default is false.

Determines if partial aggregation is done within map phase,
before records are sent to combiner.
pig.exec.mapPartAgg.minReduction=<min aggregation factor>. Default
is 10.
If the in-map partial aggregation does not reduce the output
num records by this factor, it gets disabled.

Miscellaneous: exectype = mapreduce|tez|local; default is mapreduce. This

property is the same as -x switch
pig.additional.jars.uris=<comma seperated list of jars>. Used in
place of register command.
udf.import.list=<comma seperated list of imports>. Used to avoid
package names in UDF.
stop.on.failure = true|false; default is false. Set to true to
terminate on the first error.
pig.datetime.default.tz=<UTC time offset>. e.g. +08:00. Default is
the default timezone of the host.
Determines the timezone used to handle datetime datatype and
UDFs.
Additionally, any Hadoop property can be specified.

Verifying the Installation

Verify the installation of Apache Pig by typing the version command. If the installation is
successful, you will get the version of Apache Pig as shown below.
$ pig –version

Apache Pig version 0.15.0 (r1682971)

compiled Jun 01 2015, 11:44:35

Pandas Python For Data Science
100% (1)
Pandas Python For Data Science
1 page
Apache Spark RDD API Examples
No ratings yet
Apache Spark RDD API Examples
38 pages
Lecture 6 - DBMS Keys Primary, Candidate, Super, Alternate and Foreign
No ratings yet
Lecture 6 - DBMS Keys Primary, Candidate, Super, Alternate and Foreign
17 pages
What Is An Embedded System?: Laser Printer
No ratings yet
What Is An Embedded System?: Laser Printer
9 pages
Gears Timing Belts and Bearings
No ratings yet
Gears Timing Belts and Bearings
29 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
12th Computer Science EM Chapter 12 Study Materials English Medium PDF Download
No ratings yet
12th Computer Science EM Chapter 12 Study Materials English Medium PDF Download
17 pages
Scheme Cheatsheet
No ratings yet
Scheme Cheatsheet
4 pages
Compiler Lab Report
No ratings yet
Compiler Lab Report
30 pages
Ankit Frontenddevloper 1
No ratings yet
Ankit Frontenddevloper 1
1 page
Cse2006 Programming-In-java LP 1.0 8 Cse2006-Programming-In-java LP 1.0 1 Programming in Java
No ratings yet
Cse2006 Programming-In-java LP 1.0 8 Cse2006-Programming-In-java LP 1.0 1 Programming in Java
4 pages
Introduction To HIVE
No ratings yet
Introduction To HIVE
8 pages
Computer Science Paper 3
No ratings yet
Computer Science Paper 3
16 pages
Unit 5 - Embedded System
100% (1)
Unit 5 - Embedded System
58 pages
Sorting & Searching PDF
No ratings yet
Sorting & Searching PDF
4 pages
Document Type Definition (DTD) : Well-Formed
No ratings yet
Document Type Definition (DTD) : Well-Formed
12 pages
CYBER SECURITY Unit 3 Trojan Horse
No ratings yet
CYBER SECURITY Unit 3 Trojan Horse
17 pages
Diseases of Field and Horticultural Crops and Their Management-Ii
No ratings yet
Diseases of Field and Horticultural Crops and Their Management-Ii
2 pages
Se Module 2
No ratings yet
Se Module 2
28 pages
Specialized Model in Software Engineering: Component Based Development
No ratings yet
Specialized Model in Software Engineering: Component Based Development
6 pages
BTIT603: Cyber and Network Security: Botnet
No ratings yet
BTIT603: Cyber and Network Security: Botnet
15 pages
LM32 Ait L22
No ratings yet
LM32 Ait L22
20 pages
NoSQL Data Management
No ratings yet
NoSQL Data Management
7 pages
A. B. C. D.: Module - 4 1
No ratings yet
A. B. C. D.: Module - 4 1
4 pages
Chapter13 Programming Languages
No ratings yet
Chapter13 Programming Languages
57 pages
SAP Note 3370503 - S4TWL - DMEE
No ratings yet
SAP Note 3370503 - S4TWL - DMEE
2 pages
SWP391-AppDevProject - Weekly Report
No ratings yet
SWP391-AppDevProject - Weekly Report
17 pages
Using Hive To Query Hadoop Files
No ratings yet
Using Hive To Query Hadoop Files
1 page
Unit 3
No ratings yet
Unit 3
37 pages
04-C - Strings
No ratings yet
04-C - Strings
9 pages
Activity2 Exploring EDA Playground
No ratings yet
Activity2 Exploring EDA Playground
4 pages
C++ Primer: CSE225: Data Structures and Algorithms
No ratings yet
C++ Primer: CSE225: Data Structures and Algorithms
11 pages
Visit Sikshapath For More Info
No ratings yet
Visit Sikshapath For More Info
7 pages
Exercise 1
No ratings yet
Exercise 1
8 pages
D11 - 0396 - CSE2003 - CAO - 100118 - Prof. Anand Motwani - Fall 2021-22 - Midterm
No ratings yet
D11 - 0396 - CSE2003 - CAO - 100118 - Prof. Anand Motwani - Fall 2021-22 - Midterm
1 page
Abstract:: Summary Report Multi-Layer Perceptron Training Optimization Using Nature Inspired Computing
No ratings yet
Abstract:: Summary Report Multi-Layer Perceptron Training Optimization Using Nature Inspired Computing
2 pages
Non-Divisible Subset - ENSAH-IC-001 1573745597 Question - Contests - HackerRank
No ratings yet
Non-Divisible Subset - ENSAH-IC-001 1573745597 Question - Contests - HackerRank
4 pages
Practical 6
No ratings yet
Practical 6
4 pages
Kader Elixir v1
No ratings yet
Kader Elixir v1
4 pages
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Install PIG

Uploaded by

Install PIG

Uploaded by

Prerequisites

Install Apache Pig

Configure Apache Pig

Performance tuning: pig.cachedbag.memusage=<mem fraction>; default is 0.2

pig.exec.mapPartAgg = true|false. Default is false.

Miscellaneous: exectype = mapreduce|tez|local; default is mapreduce. This

Verifying the Installation

Apache Pig version 0.15.0 (r1682971)

You might also like