0% found this document useful (0 votes)

19 views7 pages

Pig SKB

Apache Pig is a high-level data flow platform for executing MapReduce programs in Hadoop using the Pig Latin language, which simplifies complex programming tasks. It supports various data types and execution modes, including Local Mode for development and MapReduce Mode for production. Key features include ease of programming, optimization opportunities, extensibility, and built-in operators for data manipulation.

Uploaded by

maheshpuli078

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views7 pages

Pig SKB

Uploaded by

maheshpuli078

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

What is Apache Pig

Apache Pig is a high-level data flow platform for executing MapReduce

programs of Hadoop. The language used for Pig is Pig Latin.

The Pig scripts get internally converted to Map Reduce jobs and get
executed on data stored in HDFS. Apart from that, Pig can also execute its
job in Apache Tez or Apache Spark.

Pig can handle any type of data, i.e., structured, semi-structured or

unstructured and stores the corresponding results into Hadoop Data File
System. Every task which can be achieved using PIG can also be achieved
using java used in MapReduce.

Features of Apache Pig

Let's see the various uses of Pig technology.

1) Ease of programming

Writing complex java programs for map reduce is quite tough for non-
programmers. Pig makes this process easy. In the Pig, the queries are
converted to MapReduce internally.

2) Optimization opportunities

It is how tasks are encoded permits the system to optimize their execution
automatically, allowing the user to focus on semantics rather than
efficiency.

3) Extensibility

A user-defined function is written in which the user can write their logic to
execute over the data set.

4) Flexible

It can easily handle structured as well as unstructured data.

5) In-built operators

It contains various type of operators such as sort, filter and joins.

Differences between Apache MapReduce and PIG

Apache MapReduce Apache PIG

It is a high-level data flow

It is a low-level data processing tool.
tool.
It is not required to
Here, it is required to develop complex programs
develop complex
using Java or Python.
programs.

It provides built-in
It is difficult to perform data operations in operators to perform data
MapReduce. operations like union,
sorting and ordering.

It provides nested data

It doesn't allow nested data types. types like tuple, bag, and
map.

Advantages of Apache Pig

o Less code - The Pig consumes less line of code to perform any
operation.

o Reusability - The Pig code is flexible enough to reuse again.

o Nested data types - The Pig provides a useful concept of nested data
types like tuple, bag, and map.

Apache Pig Run Modes

Apache Pig executes in two modes: Local Mode and MapReduce Mode.

Local Mode

o It executes in a single JVM and is used for development

experimenting and prototyping.

o Here, files are installed and run using localhost.

o The local mode works on a local file system. The input and output
data stored in the local file system.

The command for local mode grunt shell:

1. $ pig-x local

MapReduce Mode

o The MapReduce mode is also known as Hadoop Mode.

o It is the default mode.

o In this Pig renders Pig Latin into MapReduce jobs and executes them
on the cluster.

o It can be executed against semi-distributed or fully distributed

Hadoop installation.

o Here, the input and output data are present on HDFS.

The command for Map reduce mode:

1. $ pig

Or,

1. $ pig -x mapreduce

Ways to execute Pig Program

These are the following ways of executing a Pig program on local and
MapReduce mode: -

o Interactive Mode - In this mode, the Pig is executed in the Grunt

shell. To invoke Grunt shell, run the pig command. Once the Grunt
mode executes, we can provide Pig Latin statements and command
interactively at the command line.

o Batch Mode - In this mode, we can run a script file having a .pig
extension. These files contain Pig Latin commands.

o Embedded Mode - In this mode, we can define our own functions.

These functions can be called as UDF (User Defined Functions). Here,
we use programming languages like Java and Python.

Pig Latin

The Pig Latin is a data flow language used by Apache Pig to analyze the
data in Hadoop. It is a textual language that abstracts the programming
from the Java MapReduce idiom into a notation.

Pig Latin Statements

The Pig Latin statements are used to process the data. It is an operator that
accepts a relation as an input and generates another relation as an output.

o It can span multiple lines.

o Each statement must end with a semi-colon.

o It may include expression and schemas.

o By default, these statements are processed using multi-query

execution

Pig Latin Conventions

Convention Description

The parenthesis can enclose one o

It can also be used to indicate the
()
type.
Example - (10, xyz, (3,6,9))

The straight brackets can enclose

items. It can also be used to indica
[]
data type.
Example - [INNER | OUTER]

The curly brackets enclose two or

{} can also be used to indicate the ba
Example - { block | nested_block }

The horizontal ellipsis points indica

... can repeat a portion of the code.
Example - cat path [path ...]

Latin Data Types

Simple Data Types

Type Description

It defines the signed 32-bit

int integer.
Example - 2
It defines the signed 64-bit
long integer.
Example - 2L or 2l

It defines 32-bit floating point

number.
float
Example - 2.5F or 2.5f or
2.5e2f or 2.5.E2F

It defines 64-bit floating point

number.
double
Example - 2.5 or 2.5 or 2.5e2f
or 2.5.E2F

It defines character array in

chararray Unicode UTF-8 format.
Example - javatpoint

bytearray It defines the byte array.

It defines the boolean type

boolean values.
Example - true/false

It defines the values in

datetime order.
datetime
Example - 1970-01-
01T00:00:00.000+00:00

It defines Java BigInteger

biginteger values.
Example - 5000000000000

It defines Java BigDecimal

bigdecimal values.
Example - 52.232344535345

Complex Types

Type Description
It defines an ordered set of fields.
tuple
Example - (15,12)

It defines a collection of tuples.

bag
Example - {(15,12), (12,15)}

It defines a set of key-value pairs.

map
Example - [open#apache]

Pig Example

Use case: Using Pig find the most occurred start letter.

Solution:

Case 1: Load the data into bag named "lines". The entire line is stuck to
element line of type character array.

1. grunt> lines = LOAD "/user/Desktop/data.txt" AS (line: chararray);

Case 2: The text in the bag lines needs to be tokenized this produces one
word per row.

1. grunt>tokens = FOREACH lines GENERATE flatten(TOKENIZE(line)) A

s token: chararray;

Case 3: To retain the first letter of each word type the below
command .This commands uses substring method to take the first
character.

1. grunt>letters = FOREACH tokens GENERATE SUBSTRING(0,1) as lett

er : chararray;

Case 4: Create a bag for unique character where the grouped bag will
contain the same character for each occurrence of that character.

1. grunt>lettergrp = GROUP letters by letter;

Case 5: The number of occurrence is counted in each group.

1. grunt>countletter = FOREACH lettergrp GENERATE group , COUNT(

letters);

Case 6: Arrange the output according to count in descending order using

the commands below.
1. grunt>OrderCnt = ORDER countletter BY $1 DESC;

Case 7: Limit to One to give the result.

1. grunt> result =LIMIT OrderCnt 1;

Case 8: Store the result in HDFS . The result is saved in output directory
under sonoo folder.

1. grunt> STORE result into 'home/sonoo/output';

P30 Lite (HL2MARM) Schematic Diagram
100% (4)
P30 Lite (HL2MARM) Schematic Diagram
74 pages
Manual-Super Ball-English-2022-02-15
No ratings yet
Manual-Super Ball-English-2022-02-15
14 pages
Apache Pig
No ratings yet
Apache Pig
23 pages
PIG
No ratings yet
PIG
9 pages
Bdaut 2
No ratings yet
Bdaut 2
66 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
28 pages
BDP U4
No ratings yet
BDP U4
58 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
3 Pig
No ratings yet
3 Pig
77 pages
Nosql 24 011 Pig
No ratings yet
Nosql 24 011 Pig
41 pages
Unit 5
No ratings yet
Unit 5
24 pages
BDA-Unit 5-Notes
No ratings yet
BDA-Unit 5-Notes
36 pages
Bda Unit 4
No ratings yet
Bda Unit 4
16 pages
Pig: Building High-Level Dataflows Over Map-Reduce
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce
59 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
BDA - UNIT 4 PIG Notes
No ratings yet
BDA - UNIT 4 PIG Notes
9 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
Unit 4 Apachepig 210825041412
No ratings yet
Unit 4 Apachepig 210825041412
16 pages
Unit5 Bigdatanotes
No ratings yet
Unit5 Bigdatanotes
52 pages
Unit Iv Part - 2
No ratings yet
Unit Iv Part - 2
59 pages
BDA - HIVE & PIG-Other Notes in Detail
No ratings yet
BDA - HIVE & PIG-Other Notes in Detail
162 pages
Big Data Unit IV
No ratings yet
Big Data Unit IV
19 pages
BDA Module 4 - Part 1 (Pig) 2023
No ratings yet
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
6 Part2
No ratings yet
6 Part2
45 pages
Unit 4
No ratings yet
Unit 4
20 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
Hadoop - PIG User Material
No ratings yet
Hadoop - PIG User Material
292 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
BDA - Unit-4 Part 1
No ratings yet
BDA - Unit-4 Part 1
47 pages
Unit 5
No ratings yet
Unit 5
76 pages
Apache Pig: Pig Is The Abstraction Over Mapreduce
No ratings yet
Apache Pig: Pig Is The Abstraction Over Mapreduce
4 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
Pig
No ratings yet
Pig
16 pages
Unit No. 8
No ratings yet
Unit No. 8
24 pages
Pig
No ratings yet
Pig
6 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
KCS 061 - Big Data - Unit V
No ratings yet
KCS 061 - Big Data - Unit V
17 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Unit-4 SGS
No ratings yet
Unit-4 SGS
13 pages
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
No ratings yet
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
9 pages
Unit 4
No ratings yet
Unit 4
29 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
Scet Unit 5
No ratings yet
Scet Unit 5
9 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
Unit IV
No ratings yet
Unit IV
36 pages
BDH Exp-4 I232
No ratings yet
BDH Exp-4 I232
8 pages
PIg in BIg Data
No ratings yet
PIg in BIg Data
28 pages
PIg in BIg Data
No ratings yet
PIg in BIg Data
28 pages
IMTC634 - Data Science - Chapter 16
No ratings yet
IMTC634 - Data Science - Chapter 16
20 pages
Unit 4 Bba
No ratings yet
Unit 4 Bba
10 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Unit-4 Bigdata Analytics: What Is Apache Pig?
No ratings yet
Unit-4 Bigdata Analytics: What Is Apache Pig?
47 pages
Unit III
No ratings yet
Unit III
118 pages
Pig 2
No ratings yet
Pig 2
63 pages
Unit 5
No ratings yet
Unit 5
39 pages
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
South Central Railway Women's Welfare Organisations (Regd) Vijayawada
No ratings yet
South Central Railway Women's Welfare Organisations (Regd) Vijayawada
21 pages
2021 Batch B.Tech 4-1 (R20) Sup Exams, Feb-2025 Results
No ratings yet
2021 Batch B.Tech 4-1 (R20) Sup Exams, Feb-2025 Results
16 pages
Nikhil&library Final.... 1
No ratings yet
Nikhil&library Final.... 1
10 pages
21 BQ Guide Acceptance Letter
No ratings yet
21 BQ Guide Acceptance Letter
3 pages
Project Review Guidelines
No ratings yet
Project Review Guidelines
2 pages
Project 01 Brief Creating A SecureFileShare SyncSolution
No ratings yet
Project 01 Brief Creating A SecureFileShare SyncSolution
14 pages
Oop 9
No ratings yet
Oop 9
6 pages
WEKA Lab Session
No ratings yet
WEKA Lab Session
88 pages
Invotery Mangement
No ratings yet
Invotery Mangement
9 pages
Cert-K Alert
No ratings yet
Cert-K Alert
4 pages
Final Term MCQ's and Quizzes CS606-compiler Instruction
No ratings yet
Final Term MCQ's and Quizzes CS606-compiler Instruction
21 pages
Design and Construction of Voice Activation Control Project (1 & 2) - For Merge (1) (Repaired) (Repaired) (Repaired)
No ratings yet
Design and Construction of Voice Activation Control Project (1 & 2) - For Merge (1) (Repaired) (Repaired) (Repaired)
55 pages
Unit-5 CC
No ratings yet
Unit-5 CC
84 pages
HCI Course Outline
No ratings yet
HCI Course Outline
4 pages
Full Syllabus Test No. 9 - HPSC Subjective
No ratings yet
Full Syllabus Test No. 9 - HPSC Subjective
6 pages
Tricon DDE Server (NT)
No ratings yet
Tricon DDE Server (NT)
56 pages
Noc17-Ph05 Week 06 Assignment 01
No ratings yet
Noc17-Ph05 Week 06 Assignment 01
5 pages
02 - DS Agile Architecture Presentation - Rev G
No ratings yet
02 - DS Agile Architecture Presentation - Rev G
36 pages
Sel Architech-Iec61850 Jobbrief Paris
No ratings yet
Sel Architech-Iec61850 Jobbrief Paris
2 pages
Objective Ques
No ratings yet
Objective Ques
8 pages
Veeam Basics: Intro To Veeam Products: Product Demo
No ratings yet
Veeam Basics: Intro To Veeam Products: Product Demo
11 pages
Catalog Rtr500b-Eng
No ratings yet
Catalog Rtr500b-Eng
13 pages
Headwaters College - Elizabeth Campus: Date Started: - Date Finished
No ratings yet
Headwaters College - Elizabeth Campus: Date Started: - Date Finished
1 page
Syllabus For Introduction To Computing
No ratings yet
Syllabus For Introduction To Computing
3 pages
Abhishek Bagmar Embedded Software Engineer
No ratings yet
Abhishek Bagmar Embedded Software Engineer
1 page
Débogage Connexion Réseau Entreprise
No ratings yet
Débogage Connexion Réseau Entreprise
14 pages
Medium Com Omarelgabrys Blog Microservices With Spring Boot Authentication With JWT Part 3 Fafc9d7187e8
No ratings yet
Medium Com Omarelgabrys Blog Microservices With Spring Boot Authentication With JWT Part 3 Fafc9d7187e8
19 pages
Software Reference AIOWDM
No ratings yet
Software Reference AIOWDM
10 pages
Lenovo Thinkstation E31 SFF Lenovo Thinkstation E31 Tower
No ratings yet
Lenovo Thinkstation E31 SFF Lenovo Thinkstation E31 Tower
49 pages
QR Code Generator & Scanner App2
No ratings yet
QR Code Generator & Scanner App2
13 pages
NMK10603 - Chapter 4 - Functions - Part 2
No ratings yet
NMK10603 - Chapter 4 - Functions - Part 2
22 pages
8086 Stack, Procedures
No ratings yet
8086 Stack, Procedures
18 pages
Write A Program To Fin Tte Sum of Iumbers II Ai Array Usiig Poiiters
No ratings yet
Write A Program To Fin Tte Sum of Iumbers II Ai Array Usiig Poiiters
6 pages

Pig SKB

Uploaded by

Pig SKB

Uploaded by

What is Apache Pig

Apache Pig is a high-level data flow platform for executing MapReduce

Pig can handle any type of data, i.e., structured, semi-structured or

Features of Apache Pig

Let's see the various uses of Pig technology.

It can easily handle structured as well as unstructured data.

It contains various type of operators such as sort, filter and joins.

Differences between Apache MapReduce and PIG

Apache MapReduce Apache PIG

It is a high-level data flow

It provides nested data

Advantages of Apache Pig

o Reusability - The Pig code is flexible enough to reuse again.

Apache Pig Run Modes

o It executes in a single JVM and is used for development

o Here, files are installed and run using localhost.

The command for local mode grunt shell:

o The MapReduce mode is also known as Hadoop Mode.

o It is the default mode.

o It can be executed against semi-distributed or fully distributed

o Here, the input and output data are present on HDFS.

The command for Map reduce mode:

Ways to execute Pig Program

o Interactive Mode - In this mode, the Pig is executed in the Grunt

o Embedded Mode - In this mode, we can define our own functions.

Pig Latin Statements

o It can span multiple lines.

o Each statement must end with a semi-colon.

o It may include expression and schemas.

o By default, these statements are processed using multi-query

Pig Latin Conventions

The parenthesis can enclose one o

The straight brackets can enclose

The curly brackets enclose two or

The horizontal ellipsis points indica

Latin Data Types

Simple Data Types

It defines the signed 32-bit

It defines 32-bit floating point

It defines 64-bit floating point

It defines character array in

bytearray It defines the byte array.

It defines the boolean type

It defines the values in

It defines Java BigInteger

It defines Java BigDecimal

It defines a collection of tuples.

It defines a set of key-value pairs.

1. grunt> lines = LOAD "/user/Desktop/data.txt" AS (line: chararray);

1. grunt>tokens = FOREACH lines GENERATE flatten(TOKENIZE(line)) A

1. grunt>letters = FOREACH tokens GENERATE SUBSTRING(0,1) as lett

1. grunt>lettergrp = GROUP letters by letter;

Case 5: The number of occurrence is counted in each group.

1. grunt>countletter = FOREACH lettergrp GENERATE group , COUNT(

Case 6: Arrange the output according to count in descending order using

Case 7: Limit to One to give the result.

1. grunt> result =LIMIT OrderCnt 1;

1. grunt> STORE result into 'home/sonoo/output';

You might also like