0% found this document useful (0 votes)
6 views7 pages

Pig Architecture

Uploaded by

shivam.agrawalpy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Pig Architecture

Uploaded by

shivam.agrawalpy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

PIG

ARCHITECTURE
Group 9
What is Apache Pig
Apache Pig is a high-level data processing platform that simplifies the
process of analyzing large datasets stored in Hadoop. It provides a scripting
language called Pig Latin
Rich set of operators − It provides many operators to perform operations
like join, sort, filer, etc.
Ease of programming − Pig Latin is similar to SQL and it is easy to write a
Pig script if you are good at SQL.
All these scripts are internally converted to Map and Reduce tasks. Apache
Pig has a component known as Pig Engine that accepts the Pig Latin scripts
as input and converts those scripts into MapReduce jobs.
Handles all kinds of data − Apache Pig analyzes all kinds of data, both
structured as well as unstructured. It stores the results in HDFS.
Contd....
Scalability: Pig programs are designed to run on Hadoop clusters, which
can scale horizontally by adding more nodes. This allows Pig to handle very
large datasets efficiently.
Parallelization: Pig programs are automatically parallelized by the
underlying execution engine (typically MapReduce or Spark). This enables
parallel processing of data across multiple nodes in a Hadoop cluster,
significantly improving processing speed.
Extensibility: Pig allows users to define custom functions (UDFs) in
languages like Java or Python. This extensibility enables users to extend
Pig's functionality to handle specific data processing needs.
Why Apache Pig?
Hadoop uses Map Reduce to analyze & process big data
Programmers who are not so good at Java normally used to struggle
working with Hadoop, especially while performing any MapReduce
tasks.
Difficult for Non - programmer
Difficult task to maintain and optimize code
Using Pig Latin, programmers can perform MapReduce tasks easily
without having to type complex codes in Java.
Apache Pig uses multi-query approach, thereby reducing the length of
codes. For example, an operation that would require you to type 200
lines of code (LoC) in Java can be easily done by typing as less as just 10
LoC in Apache Pig. Ultimately Apache Pig reduces the development
time by almost 16 times.
Pig Latin is SQL-like language and it is easy to learn Apache Pig when
you are familiar with SQL.
Pig Architecture
IT is Pig interactive shell
which is used to execute Check syntax of the
all pig script script , The output of the
parser will be a DAG
(directed acyclic graph)

DAG is Passed to logic


optimizer where
optimization takes place
Convert DAG to map
reduce job

Result are displayed


using “DUMP” statement
and store in HDFS using
“STORE” statement
Apache Pig Data Model
A piece of data or a simple atomic value is
known as a field.
Example − ‘Linkin Park’ or ‘7’

An ordered set of fields is known as a


tuple
Example − (Linkin Park , 7)

A collection of tuples (non-unique) is


known as a bag
Example − {(Linkin Park , 7), (Metallica, 8)}

Map is set of Key-Values


Example[ ‘Bands’ # ’Linkin Park’ ,
‘Members’ # 7]
THANK YOU!!!

You might also like