Apache Pig: Pig Is The Abstraction Over Mapreduce

Apache Pig is a tool for analyzing structured and semi-structured data that sits on top of Hadoop. It provides a high-level language called Pig Latin for expressing data analysis processes like extraction, transformation, and loading (ETL) of data. Pig Latin scripts are compiled into sequences of MapReduce programs that process large datasets in parallel across Hadoop clusters. Pig aims to make data analysis easier for users by hiding the complexity of writing MapReduce code and allowing SQL-like syntax.

Uploaded by

prerna gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views4 pages

Apache Pig: Pig Is The Abstraction Over Mapreduce

Uploaded by

prerna gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Apache PIG

Apache Pig was the research project of the yahoo to create and execute MapReduce jobs on
different dataset. Pig is the abstraction over MapReduce. It is a tool used for analyzing the
structured and semi structured data. It is used along with the hadoop framework. Pig supports
a language called as Pig Latin which is a high level language. With Pig we can perform any
type of data manipulation which we generally do using MapReduce Programming. Pig is used
and favored by most of the developers or analysts who don’t want to do hard core java
programming for a job. Pig supports SQL like syntax. In the background, it is only mapreduce
which is actually running. At the present time, Pig's infrastructure layer consists of a compiler
that produces sequences of Map-Reduce programs, for which large-scale parallel
implementations already exist

Fig: Pig is a tool which sits on the top of Hadoop

 Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly

parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data
transformations are explicitly encoded as data flow sequences, making them easy to write,
understand, and maintain.
 Optimization opportunities. The way in which tasks are encoded permits the system to
optimize their execution automatically, allowing the user to focus on semantics rather than
efficiency.
 Extensibility. Users can create their own functions to do special-purpose processing.

Programmers can perform line by line pig operations in pig’s grunt shell or they can even write
a whole script of pig and then, can run the entire script of pig. Eventually all these pig scripts
and pig code will get converted into Map and Reduce tasks. Pig have a component called as
Pig Engine which accepts pig latin script or commands as input and convert them into
MapReduce Jobs.
MapReduce Job which a programmer performs by writing Java code will get reduced to 1/10th
of the whole java code, using pig latin language.
Below is the picture showing the workflow of the Apache Pig.
Fig: Workflow of the Apache Pig

To enter the mapreduce mode of pig we need to type the following command on the terminal.
pig
By typing above command we will get into pig’s grunt shell which will open in mapreduce mode.
Pig supports a healthy set of operations which can do almost same things as mapreduce hard
code programming can. But, of course hard code programming aka java programming will
always have higher flexibility of the mapreduce jobs which we are performing.
Fig: Dissection of a Relation in Pig
 The above diagram shows the dissection of a relation.
 The data which we load in grunt shell using pig latin is called as relation.
 Row is a tuple which have fields and collection of tuples is called as bag.

Pig Comes with a lot of inbuilt functions. Some of those functions are like PigStorage, ABS, MAX,
MIN, group by, order by etc.

P.S. : For knowing about the pig commands, just see the Pig_all_commands.txt file in the zip
folder.

Unit Iv Part - 2
No ratings yet
Unit Iv Part - 2
59 pages
Nosql 24 011 Pig
No ratings yet
Nosql 24 011 Pig
41 pages
BDA - HIVE & PIG-Other Notes in Detail
No ratings yet
BDA - HIVE & PIG-Other Notes in Detail
162 pages
Pig: Building High-Level Dataflows Over Map-Reduce
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce
59 pages
BDA - UNIT 4 PIG Notes
No ratings yet
BDA - UNIT 4 PIG Notes
9 pages
Apache Pig
No ratings yet
Apache Pig
23 pages
Pig
No ratings yet
Pig
6 pages
PIG
No ratings yet
PIG
9 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Unit 5
No ratings yet
Unit 5
39 pages
Apache Pig Handy Notes Lab
No ratings yet
Apache Pig Handy Notes Lab
11 pages
Unit IV
No ratings yet
Unit IV
36 pages
IMTC634 - Data Science - Chapter 16
No ratings yet
IMTC634 - Data Science - Chapter 16
20 pages
What Is Apache Pig?
No ratings yet
What Is Apache Pig?
5 pages
BDA Unit5
No ratings yet
BDA Unit5
36 pages
Unit5 Bigdatanotes
No ratings yet
Unit5 Bigdatanotes
52 pages
Pig SKB
No ratings yet
Pig SKB
7 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
Unit 4 Bba
No ratings yet
Unit 4 Bba
10 pages
Unit 5
No ratings yet
Unit 5
76 pages
Unit 4
No ratings yet
Unit 4
20 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
28 pages
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
No ratings yet
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
9 pages
Unit 5
No ratings yet
Unit 5
24 pages
Scet Unit 5
No ratings yet
Scet Unit 5
9 pages
Unit-4 SGS
No ratings yet
Unit-4 SGS
13 pages
Unit 4 Apachepig 210825041412
No ratings yet
Unit 4 Apachepig 210825041412
16 pages
BDA-Unit 5-Notes
No ratings yet
BDA-Unit 5-Notes
36 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
Unit-4 Bigdata Analytics: What Is Apache Pig?
No ratings yet
Unit-4 Bigdata Analytics: What Is Apache Pig?
47 pages
Unit No. 8
No ratings yet
Unit No. 8
24 pages
Bdaut 2
No ratings yet
Bdaut 2
66 pages
Notes of Aktu Btech 3 Yr Big Data
No ratings yet
Notes of Aktu Btech 3 Yr Big Data
15 pages
BD 5
No ratings yet
BD 5
28 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
6 Part1
No ratings yet
6 Part1
5 pages
KCS 061 - Big Data - Unit V
No ratings yet
KCS 061 - Big Data - Unit V
17 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
52 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
6 Part2
No ratings yet
6 Part2
45 pages
Unit III
No ratings yet
Unit III
118 pages
BDA - Unit-4 Part 1
No ratings yet
BDA - Unit-4 Part 1
47 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Big Data Unit IV
No ratings yet
Big Data Unit IV
19 pages
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
No ratings yet
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
58 pages
BD Unit 2
No ratings yet
BD Unit 2
20 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Pig
No ratings yet
Pig
61 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
Apache Pig - A Data Flow Framework Based On Hadoop Map Reduce
No ratings yet
Apache Pig - A Data Flow Framework Based On Hadoop Map Reduce
6 pages
07 Pig
No ratings yet
07 Pig
5 pages
BDA Module 4 - Part 1 (Pig) 2023
No ratings yet
BDA Module 4 - Part 1 (Pig) 2023
34 pages
3 Pig
No ratings yet
3 Pig
77 pages
Pig Slides
No ratings yet
Pig Slides
46 pages
Notes UNIT 5 Bigdata
No ratings yet
Notes UNIT 5 Bigdata
18 pages
Cue Words Relaxation
No ratings yet
Cue Words Relaxation
4 pages
RFQ - Section - III - Technical - Questionnaire
No ratings yet
RFQ - Section - III - Technical - Questionnaire
12 pages
Final Project Report MRI Reconstruction
No ratings yet
Final Project Report MRI Reconstruction
19 pages
ONDC - Sept 2022
No ratings yet
ONDC - Sept 2022
16 pages
In Mathematics Facts and Concepts
No ratings yet
In Mathematics Facts and Concepts
1 page
Cloud Seeding
No ratings yet
Cloud Seeding
23 pages
In Vivo and in Vitro Evaluation of Four Different Aqueous Polymeric Dispersions For Producing An Enteric Coated Tablet
No ratings yet
In Vivo and in Vitro Evaluation of Four Different Aqueous Polymeric Dispersions For Producing An Enteric Coated Tablet
6 pages
Principles of Economics MM MBA 2018
No ratings yet
Principles of Economics MM MBA 2018
60 pages
Explosion Proof Pressure Transmitter: Model PT124B-282 Intelligent Type
No ratings yet
Explosion Proof Pressure Transmitter: Model PT124B-282 Intelligent Type
2 pages
FINAL MODEL PAPER 2023-24 Class 7
No ratings yet
FINAL MODEL PAPER 2023-24 Class 7
11 pages
Mini Research On Homeless
No ratings yet
Mini Research On Homeless
6 pages
Establishing OPC UA Connectivity With Rockwell Automation® Integrated Architecture
No ratings yet
Establishing OPC UA Connectivity With Rockwell Automation® Integrated Architecture
3 pages
Liverpool Football Club Annual Report and Consolidated Financial Statements
No ratings yet
Liverpool Football Club Annual Report and Consolidated Financial Statements
38 pages
Revised PN Staff Writing Manual - 1
No ratings yet
Revised PN Staff Writing Manual - 1
334 pages
Research II Proposal
No ratings yet
Research II Proposal
26 pages
Big Data in Healthcare Systems and Research
No ratings yet
Big Data in Healthcare Systems and Research
4 pages
Mitochondrial Disorders Biochemical and Molecular Analysis Methods in Molecular Biology Vol 837 2012th Edition Lee-Jun C. Wong (Editor) Download PDF
100% (2)
Mitochondrial Disorders Biochemical and Molecular Analysis Methods in Molecular Biology Vol 837 2012th Edition Lee-Jun C. Wong (Editor) Download PDF
84 pages
Sneha SVMCM SC 2023-2024
No ratings yet
Sneha SVMCM SC 2023-2024
2 pages
Chemistry Sheet Haxked - 5
No ratings yet
Chemistry Sheet Haxked - 5
7 pages
Death by Thomas Nagel Com
100% (1)
Death by Thomas Nagel Com
10 pages
Surface Roughness
No ratings yet
Surface Roughness
8 pages
Samuel Maldonado Setu - CV & Ijazah
No ratings yet
Samuel Maldonado Setu - CV & Ijazah
4 pages
Tan ChineseLiteratureEssays 2016
No ratings yet
Tan ChineseLiteratureEssays 2016
5 pages
Magdala de Nemure Volume 1
No ratings yet
Magdala de Nemure Volume 1
271 pages
Dan Glimne Motor Tuning 2 - MC Jan-70
No ratings yet
Dan Glimne Motor Tuning 2 - MC Jan-70
40 pages
O Level Forces
No ratings yet
O Level Forces
16 pages
Operating Systems
No ratings yet
Operating Systems
7 pages
SCM in Motor Vehicle Industry
No ratings yet
SCM in Motor Vehicle Industry
44 pages
Marks Oriented Notes For IGCSE O Level Physics v37
No ratings yet
Marks Oriented Notes For IGCSE O Level Physics v37
76 pages
A Brief History of Consumer Culture
No ratings yet
A Brief History of Consumer Culture
6 pages

Apache Pig: Pig Is The Abstraction Over Mapreduce

Uploaded by

Apache Pig: Pig Is The Abstraction Over Mapreduce

Uploaded by

Apache PIG

Fig: Pig is a tool which sits on the top of Hadoop

 Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly

You might also like