Pig Architecture
Pig Architecture
ARCHITECTURE
Group 9
What is Apache Pig
Apache Pig is a high-level data processing platform that simplifies the
process of analyzing large datasets stored in Hadoop. It provides a scripting
language called Pig Latin
Rich set of operators − It provides many operators to perform operations
like join, sort, filer, etc.
Ease of programming − Pig Latin is similar to SQL and it is easy to write a
Pig script if you are good at SQL.
All these scripts are internally converted to Map and Reduce tasks. Apache
Pig has a component known as Pig Engine that accepts the Pig Latin scripts
as input and converts those scripts into MapReduce jobs.
Handles all kinds of data − Apache Pig analyzes all kinds of data, both
structured as well as unstructured. It stores the results in HDFS.
Contd....
Scalability: Pig programs are designed to run on Hadoop clusters, which
can scale horizontally by adding more nodes. This allows Pig to handle very
large datasets efficiently.
Parallelization: Pig programs are automatically parallelized by the
underlying execution engine (typically MapReduce or Spark). This enables
parallel processing of data across multiple nodes in a Hadoop cluster,
significantly improving processing speed.
Extensibility: Pig allows users to define custom functions (UDFs) in
languages like Java or Python. This extensibility enables users to extend
Pig's functionality to handle specific data processing needs.
Why Apache Pig?
Hadoop uses Map Reduce to analyze & process big data
Programmers who are not so good at Java normally used to struggle
working with Hadoop, especially while performing any MapReduce
tasks.
Difficult for Non - programmer
Difficult task to maintain and optimize code
Using Pig Latin, programmers can perform MapReduce tasks easily
without having to type complex codes in Java.
Apache Pig uses multi-query approach, thereby reducing the length of
codes. For example, an operation that would require you to type 200
lines of code (LoC) in Java can be easily done by typing as less as just 10
LoC in Apache Pig. Ultimately Apache Pig reduces the development
time by almost 16 times.
Pig Latin is SQL-like language and it is easy to learn Apache Pig when
you are familiar with SQL.
Pig Architecture
IT is Pig interactive shell
which is used to execute Check syntax of the
all pig script script , The output of the
parser will be a DAG
(directed acyclic graph)