0% found this document useful (0 votes)
61 views4 pages

Apache Pig: Pig Is The Abstraction Over Mapreduce

Apache Pig is a tool for analyzing structured and semi-structured data that sits on top of Hadoop. It provides a high-level language called Pig Latin for expressing data analysis processes like extraction, transformation, and loading (ETL) of data. Pig Latin scripts are compiled into sequences of MapReduce programs that process large datasets in parallel across Hadoop clusters. Pig aims to make data analysis easier for users by hiding the complexity of writing MapReduce code and allowing SQL-like syntax.

Uploaded by

prerna gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views4 pages

Apache Pig: Pig Is The Abstraction Over Mapreduce

Apache Pig is a tool for analyzing structured and semi-structured data that sits on top of Hadoop. It provides a high-level language called Pig Latin for expressing data analysis processes like extraction, transformation, and loading (ETL) of data. Pig Latin scripts are compiled into sequences of MapReduce programs that process large datasets in parallel across Hadoop clusters. Pig aims to make data analysis easier for users by hiding the complexity of writing MapReduce code and allowing SQL-like syntax.

Uploaded by

prerna gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Apache PIG

Apache Pig was the research project of the yahoo to create and execute MapReduce jobs on
different dataset. Pig is the abstraction over MapReduce. It is a tool used for analyzing the
structured and semi structured data. It is used along with the hadoop framework. Pig supports
a language called as Pig Latin which is a high level language. With Pig we can perform any
type of data manipulation which we generally do using MapReduce Programming. Pig is used
and favored by most of the developers or analysts who don’t want to do hard core java
programming for a job. Pig supports SQL like syntax. In the background, it is only mapreduce
which is actually running. At the present time, Pig's infrastructure layer consists of a compiler
that produces sequences of Map-Reduce programs, for which large-scale parallel
implementations already exist

Fig: Pig is a tool which sits on the top of Hadoop

 Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly


parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data
transformations are explicitly encoded as data flow sequences, making them easy to write,
understand, and maintain.
 Optimization opportunities. The way in which tasks are encoded permits the system to
optimize their execution automatically, allowing the user to focus on semantics rather than
efficiency.
 Extensibility. Users can create their own functions to do special-purpose processing.

Programmers can perform line by line pig operations in pig’s grunt shell or they can even write
a whole script of pig and then, can run the entire script of pig. Eventually all these pig scripts
and pig code will get converted into Map and Reduce tasks. Pig have a component called as
Pig Engine which accepts pig latin script or commands as input and convert them into
MapReduce Jobs.
MapReduce Job which a programmer performs by writing Java code will get reduced to 1/10th
of the whole java code, using pig latin language.
Below is the picture showing the workflow of the Apache Pig.
Fig: Workflow of the Apache Pig

To enter the mapreduce mode of pig we need to type the following command on the terminal.
pig
By typing above command we will get into pig’s grunt shell which will open in mapreduce mode.
Pig supports a healthy set of operations which can do almost same things as mapreduce hard
code programming can. But, of course hard code programming aka java programming will
always have higher flexibility of the mapreduce jobs which we are performing.
Fig: Dissection of a Relation in Pig
 The above diagram shows the dissection of a relation.
 The data which we load in grunt shell using pig latin is called as relation.
 Row is a tuple which have fields and collection of tuples is called as bag.

Pig Comes with a lot of inbuilt functions. Some of those functions are like PigStorage, ABS, MAX,
MIN, group by, order by etc.

P.S. : For knowing about the pig commands, just see the Pig_all_commands.txt file in the zip
folder.

You might also like