0% found this document useful (0 votes)

33 views11 pages

Apache Pig Handy Notes Lab

Uploaded by

juhi46125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views11 pages

Apache Pig Handy Notes Lab

Uploaded by

juhi46125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Introduction to Apache Pig

 Apache Pig is a platform, used to analyze large data sets representing them as data
flows.
 It is designed to provide an abstraction over MapReduce, reducing the complexities of
writing a MapReduce program.
 We can perform data manipulation operations very easily in Hadoop using Apache
Pig.

 Pig Latin isa data flow language. This means it allows users to describe how data from
one or more inputs should be read, processed, and then stored to one or more outputs in
parallel. These data flows can be simple linear flows, or complex workflows that include
points where multiple inputs are joined and where data is split into multiple streams to be
processed by different operators.

The features of Apache pig are:

 Pig enables programmers to write complex data transformations without knowing
Java.
 Apache Pig has two main components – the Pig Latin language and the Pig Run-time
Environment, in which Pig Latin programs are executed.
 For Big Data Analytics, Pig gives a simple data flow language known as Pig Latin which
has functionalities similar to SQL like join, filter, limit etc.
 Developers who are working with scripting languages and SQL, leverages Pig Latin.
This gives developers ease of programming with Apache Pig. Pig Latin provides various
built-in operators like join, sort, filter, etc to read, write, and process large data sets. Thus
it is evident, Pig has a rich set of operators.
 Programmers write scripts using Pig Latin to analyze data and these scripts are internally
converted to Map and Reduce tasks by Pig MapReduce Engine. Before Pig, writing
MapReduce tasks was the only way to process the data stored in HDFS.
 If a programmer wants to write custom functions which are unavailable in Pig, Pig allows
them to write User Defined Functions (UDF) in any language of their choice like
Java, Python, Ruby, Jython, JRubyetc. and embed them in Pig script. This
provides extensibility to Apache Pig.
 Pig can process any kind of data, i.e. structured, semi-structured or unstructured data,
coming from various sources. Apache Pig handles all kinds of data.
 Approximately, 10 lines of pig code is equal to 200 lines of MapReduce code.
 It can handle inconsistent schema (in case of unstructured data).
 Apache Pig extracts the data, performs operations on that data and dumps the data in the
required format in HDFS i.e. ETL (Extract Transform Load).
 Apache Pig automatically optimizes the tasks before execution, i.e. automatic
optimization.
 It allows programmers and developers to concentrate upon the whole operation
irrespective of creating mapper and reducer functions separately.
Use Cases of Apache Pig:
Web Logs, Data Processing for Searching, AdHoc queries, Quick Prototyping of
algorithms for processing large datasets.

The architecture of Apache Pig is shown in the below image.

Parser
From the above image you can see, after passing through Grunt or Pig Server, Pig Scripts are
passed to the Parser. The Parser does type checking and checks the syntax of the script. The
parser outputs a DAG (directed acyclic graph). DAG represents the Pig Latin statements and
logical operators. The logical operators are represented as the nodes and the data flows are
represented as edges.

Optimizer
Then the DAG is submitted to the optimizer. The Optimizer performs the optimization activities
like split, merge, transform, and reorder operators etc. This optimizer provides the automatic
optimization feature to Apache Pig. The optimizer basically aims to reduce the amount of data in
the pipeline at any instance of time while processing the extracted data, and for that it performs
functions like:
 PushUpFilter: If there are multiple conditions in the filter and the filter can be split, Pig
splits the conditions and pushes up each condition separately. Selecting these conditions
earlier, helps in reducing the number of records remaining in the pipeline.
 PushDownForEachFlatten: Applying flatten, which produces a cross product between a
complex type such as a tuple or a bag and the other fields in the record, as late as possible
in the plan. This keeps the number of records low in the pipeline.
 ColumnPruner: Omitting columns that are never used or no longer needed, reducing the
size of the record. This can be applied after each operator, so that fields can be pruned as
aggressively as possible.
 MapKeyPruner: Omitting map keys that are never used, reducing the size of the record.
 LimitOptimizer: If the limit operator is immediately applied after a load or sort operator,
Pig converts the load or sort operator into a limit-sensitive implementation, which does
not require processing the whole data set. Applying the limit earlier, reduces the number
of records.
This is just a flavor of the optimization process. Over that it also performs Join, Order
By and Group Byfunctions.

Compiler
After the optimization process, the compiler compiles the optimized code into a series of
MapReduce jobs. The compiler is the one who is responsible for converting Pig jobs
automatically into MapReduce jobs.

Execution engine
Finally, as shown in the figure, these MapReduce jobs are submitted for execution to the
execution engine. Then the MapReduce jobs are executed and gives the required result. The
result can be displayed on the screen using “DUMP” statement and can be stored in the HDFS
using “STORE” statement.

Pig Latin Scripts

Initially as illustrated in the above image, we submit Pig scripts to the Apache Pig execution
environment which can be written in Pig Latin using built-in operators.
There are three ways to execute the Pig script:
 Grunt Shell: This is Pig’s interactive shell provided to execute all Pig Scripts.
 Script File: Write all the Pig commands in a script file and execute the Pig script file.
This is executed by the Pig Server.
 Embedded Script: If some functions are unavailable in built-in operators, we can
programmatically create User Defined Functions to bring that functionalities using other
languages like Java, Python, Ruby, etc. and embed it in Pig Latin Script file. Then,
execute that script file.
Pig’s data types
Pig’s data types make up the data model for how Pig thinks of the structure of the data it is
processing. With Pig, the data model gets defined when the data is loaded.
Any data you load into Pig from disk is going to have a particular schema and structure.
Pig needs to understand that structure, so when you do the loading, the data automatically goes
through a mapping.
Luckily for you, the Pig data model is rich enough to handle most anything thrown its way,
including table- like structures and nested hierarchical data structures.
In general terms, though, Pig data types can be broken into two categories:
Scalar types and complex types.
 Scalar types contain a single value,
 Complex types contain other types, such as the Tuple, Bag and Map types listed below.
Pig Latin has these four types in its data model:
 Atom: An atom is any single value, such as a string or a number — ‘Diego’, for example.
Pig’s atomic values are scalar types that appear in most programming languages — int,
long, float, double, chararray and bytearray.
 Tuple: A tuple is a record that consists of a sequence of fields. Each field can be of any
type — ‘Diego’, ‘Gomez’, or 6, for example). Think of a tuple as a row in a table.
 Bag: A bag is a collection of non-unique tuples. The schema of the bag is flexible —
each tuple in the collection can contain an arbitrary number of fields, and each field can
be of any type.
 Map: A map is a collection of key value pairs. Any type can be stored in the value, and
the key needs to be unique. The key of a map must be a chararray and the value can be of
any type.
The figure offers some fine examples of Tuple, Bag, and Map data types, as well.

Simple and Complex

Simple Types Description Example
int Signed 32-bit integer 10
long Signed 64-bit integer Data: 10L or 10l
Display: 10L
float 32-bit floating point Data: 10.5F or 10.5f or 10.5e2f or
10.5E2F
Display: 10.5F or 1050.0F
double 64-bit floating point Data: 10.5 or 10.5e2 or 10.5E2
Display: 10.5 or 1050.0
chararray Character array (string) inhello world
Unicode UTF-8 format
bytearray Byte array (blob)
boolean boolean true/false (case insensitive)
datetime datetime 1970-01-01T00:00:00.000+00:00
biginteger Java BigInteger 200000000000
bigdecimal Java BigDecimal 33.456783321323441233442
Complex Types
tuple An ordered set of fields. (19,2)
bag An collection of tuples. {(19,2), (18,1)}
map A set of key value pairs. [open#apache]

The value of all these types can also be null. The semantics for null are similar to those used in
SQL. The concept of null in Pig means that the value is unknown. Nulls can show up in the data
in cases where values are unreadable or unrecognizable — for example, if you were to use a
wrong data type in the LOAD statement.

Null could be used as a placeholder until data is added or as a value for a field that is optional.
Pig Latin has a simple syntax with powerful semantics you’ll use to carry out two primary
operations: access and transform data.

In a Hadoop context, accessing data means allowing developers to load, store, and stream
data, whereas transforming data means taking advantage of Pig’s ability to group, join,
combine, split, filter, and sort data. The table gives an overview of the operators associated
with each operation.

Pig Latin Operators

Operation Operator Explanation

Data Access LOAD Read and Write data to file system. Load
operator specifies schema.

DUMP Write output to standard output (stdout)

STREAM Send all records through external binary

Transformations FOREACH Apply expression to each record and output
one or more
records

FILTER Apply predicate and remove records

that don’t meet
condition

GROUP/COGROUP Aggregate records with the same key

from one or more inputs

JOIN Join two or more records based on a

condition

CROSS Cartesian product of two or more inputs

ORDER Sort records based on key

DISTINCT Remove duplicate records

UNION Merge two data sets

SPLIT Divide data into two or more bags based

on predicate

LIMIT subset the number of records

Operators for Debugging and Troubleshooting

Operation Operator Description

Debug DESCRIBE Return the schema of a relation.

DUMP Dump the contents of a relation to the

screen.

EXPLAIN Display the MapReduce execution

plans.
Apache Pig script Execution Modes
Local Mode: In ‘local mode’, you can execute the pig script in local file system. In this case,
you don’t need to store the data in Hadoop HDFS file system, instead you can work with the data
stored in local file system itself.
Command: pig –x local
MapReduce Mode: In ‘MapReduce mode’, the data needs to be stored in HDFS file system
and you can process the data with the help of pig script.
Command: pig

To check where pig is installed

echo $PIG_HOME
Apache Pig Script in MapReduce Mode
Example 1: To read data from a data file and to display the required contents on the
terminal as output.
The sample data file contains following data:

1. Open gedit and create the file

2. Save the text file with the name on Desktop
3. Now, copy the file into hdfs so that it can be used by PIG.
$ hdfs dfs –put Desktop/filename /newfile
4. Now login into grunt shell by typing pig
5. Follow the following steps to load the file without Schema
A = LOAD ‘/edureka/information.txt’ using PigStorage (‘’) as (FName: chararray,
LName: chararray, MobileNo: chararray, City: chararray, Profession: chararray);
B = FOREACH A generate FName, MobileNo, Profession;
DUMP B;
6. Follow the following steps to load the file with Schema
A = LOAD ‘/edureka/information.txt’ using PigStorage (‘’) as (FName: chararray,
LName: chararray, MobileNo: chararray, City: chararray, Profession: chararray);

7. Save and close the file.

 The first command loads the file ‘information.txt’ into variable A with indirect schema
(FName, LName, MobileNo, City, Profession).
 The second command loads the required data from variable A to variable B.The
FOREACH operator is used to generate specified data transformations based on the
column data.
 The third line displays the content of variable B on the terminal/console.

WORKING DIRECTLY ON GRUNT SHELL

Copy file to HDFS
Check file copied successfully

Start PIG and create schema

To view schema

grunt>DESCRIBE A;

A: {name: chararray,age: int,gpa: float}

quit Command
You can quit from the Grunt shell using this command.
Usage
Quit from the Grunt shell as shown below.
grunt> quit

Steps to create a Case Sensitivity

The names (aliases) of relations and fields are case sensitive. The names of Pig Latin functions
are case sensitive. The names of parameters (see Parameter Substitution) and all other Pig Latin
keywords are case insensitive.
In the example below, note the following:
1. The names (aliases) of relations A, B, and C are case sensitive.
2. The names (aliases) of fields f1, f2, and f3 are case sensitive.
3. Function names PigStorage and COUNT are case sensitive.
4. Keywords LOAD, USING, AS, GROUP, BY, FOREACH, GENERATE, and DUMP
are case insensitive. They can also be written as load, using, as, group, by, etc.
5. In the FOREACH statement, the field in relation B is referred to by positional
notation ($0).

grunt> A = LOAD 'datafile' USING PigStorage() AS (f1:int, f2:int, f3:int);

grunt> B = GROUP A BY f1;
grunt> C = FOREACH B GENERATE COUNT ($0);
grunt> DUMP C;
Bag
A bag is a collection of tuples.
Syntax: Inner bag
{ tuple [, tuple …] }
Terms
{ } An inner bag is enclosed in curly brackets { }.
tuple A tuple.
Usage
Note the following about bags:
 A bag can have duplicate tuples.
 A bag can have tuples with differing numbers of fields. However, if Pig tries to access a field
that does not exist, a null value is substituted.
 A bag can have tuples with fields that have different data types. However, for Pig to effectively
process bags, the schemas of the tuples within those bags should be the same. For example, if
half of the tuples include chararray fields and while the other half include float fields, only half
of the tuples will participate in any kind of computation because the chararray fields will be
converted to null.
Bags have two forms: outer bag (or relation) and inner bag.
Example: Outer Bag
In this example A is a relation or bag of tuples. You can think of this bag as an outer bag.
A = LOAD 'data' as (f1:int, f2:int, f3:int);
DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)
Example: Inner Bag
Now, suppose we group relation A by the first field to form relation X.
In this example X is a relation or bag of tuples. The tuples in relation X have two fields. The first
field is type int. The second field is type bag; you can think of this bag as an inner bag.
X = GROUP A BY f1;
DUMP X;
(1,{(1,2,3)})
(4,{(4,2,1),(4,3,3)})
(8,{(8,3,4)})

What are the relational operators in Pig?

The relational operators in Pig are as follows:

COGROUP
It joins two or more tables and then performs GROUP operation on the joined table result.

CROSS
This is used to compute the cross product (cartesian product) of two or more relations.

FOREACH
This will iterate through the tuples of a relation, generating a data transformation.

JOIN
This is used to join two or more tables in a relation.

LIMIT
This will limit the number of output tuples.

SPLIT
This will split the relation into two or more relations.

UNION
It will merge the contents of two relations.

ORDER
This is used to sort a relation based on one or more fields.
Storing data in PIG
You can store the loaded data in the file system using the store operator.
Syntax
STORE Relation_name INTO ' required_directory_path ' [USING function];
Example
Assume we have a file student_data.txt in HDFS with the following content.
001,Rajiv,Reddy,9848022337,Hyderabad
002,siddarth,Battacharya,9848022338,Kolkata
003,Rajesh,Khanna,9848022339,Delhi
004,Preethi,Agarwal,9848022330,Pune
005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
006,Archana,Mishra,9848022335,Chennai.
And we have read it into a relation student using the LOAD operator as shown below.
grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt'
USING
PigStorage(',')as(id:int,firstname:chararray,lastname:chararray,phone:chararray,city:chararray);
Now, let us store the relation in the HDFS directory “/pig_Output/” as shown below.
grunt> STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage(',');

C - THR81 - 2505 SAP Exam Valid Questions
No ratings yet
C - THR81 - 2505 SAP Exam Valid Questions
6 pages
M Language, Power Query, Microsoft
100% (1)
M Language, Power Query, Microsoft
968 pages
Database Testing Checklist
No ratings yet
Database Testing Checklist
2 pages
ORACLE DBA Activity Checklist
71% (17)
ORACLE DBA Activity Checklist
23 pages
Unit 4 Bba
No ratings yet
Unit 4 Bba
10 pages
BDA Unit5
No ratings yet
BDA Unit5
36 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
Unit 4 Apachepig 210825041412
No ratings yet
Unit 4 Apachepig 210825041412
16 pages
BDA - Unit-4 Part 1
No ratings yet
BDA - Unit-4 Part 1
47 pages
Unit 5
No ratings yet
Unit 5
39 pages
Unit 5
No ratings yet
Unit 5
76 pages
Pig
No ratings yet
Pig
6 pages
Pig: Building High-Level Dataflows Over Map-Reduce
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce
59 pages
Unit IV
No ratings yet
Unit IV
36 pages
6 Part2
No ratings yet
6 Part2
45 pages
BDA-Unit 5-Notes
No ratings yet
BDA-Unit 5-Notes
36 pages
Unit-4 Bigdata Analytics: What Is Apache Pig?
No ratings yet
Unit-4 Bigdata Analytics: What Is Apache Pig?
47 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
What Is Apache Pig?
No ratings yet
What Is Apache Pig?
5 pages
BDA - HIVE & PIG-Other Notes in Detail
No ratings yet
BDA - HIVE & PIG-Other Notes in Detail
162 pages
BigData Unit 4
No ratings yet
BigData Unit 4
13 pages
Apache Pig
No ratings yet
Apache Pig
23 pages
BDA - UNIT 4 PIG Notes
No ratings yet
BDA - UNIT 4 PIG Notes
9 pages
Unit 5
No ratings yet
Unit 5
24 pages
KCS 061 - Big Data - Unit V
No ratings yet
KCS 061 - Big Data - Unit V
17 pages
Unit 4
No ratings yet
Unit 4
29 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
Lecture 12
No ratings yet
Lecture 12
21 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
BDA Module 4 - Part 1 (Pig) 2023
No ratings yet
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
No ratings yet
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
9 pages
BD Unit 2
No ratings yet
BD Unit 2
20 pages
Pig
No ratings yet
Pig
61 pages
BDA Unit-4
No ratings yet
BDA Unit-4
98 pages
Unit 4
No ratings yet
Unit 4
20 pages
Bda Unit Iv Notes
No ratings yet
Bda Unit Iv Notes
32 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
52 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
5 PIG and HIVE
No ratings yet
5 PIG and HIVE
81 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Apache Pig: Pig Is The Abstraction Over Mapreduce
No ratings yet
Apache Pig: Pig Is The Abstraction Over Mapreduce
4 pages
Scet Unit 5
No ratings yet
Scet Unit 5
9 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Unit-4 PIG
No ratings yet
Unit-4 PIG
9 pages
Apache Pig
No ratings yet
Apache Pig
21 pages
BDP U4
No ratings yet
BDP U4
58 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
IMTC634 - Data Science - Chapter 16
No ratings yet
IMTC634 - Data Science - Chapter 16
20 pages
BD 5
No ratings yet
BD 5
28 pages
Notes of Aktu Btech 3 Yr Big Data
No ratings yet
Notes of Aktu Btech 3 Yr Big Data
15 pages
Nosql 24 011 Pig
No ratings yet
Nosql 24 011 Pig
41 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
Notes 5 Unit Big Data
No ratings yet
Notes 5 Unit Big Data
23 pages
Bdaut 2
No ratings yet
Bdaut 2
66 pages
Notes UNIT 5 Bigdata
No ratings yet
Notes UNIT 5 Bigdata
18 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
101 pages
Introductionto Apache Pig by AGurucharan
No ratings yet
Introductionto Apache Pig by AGurucharan
9 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Dart for Flutter
From Everand
Dart for Flutter
Zeuz IT
No ratings yet
IC Fall2016 FunctionReference en
No ratings yet
IC Fall2016 FunctionReference en
205 pages
Assignment 1
0% (1)
Assignment 1
6 pages
Dbms Unit-2 Presentation
No ratings yet
Dbms Unit-2 Presentation
62 pages
Wipro Interview Questions
100% (2)
Wipro Interview Questions
39 pages
Data Types Operator-Php
No ratings yet
Data Types Operator-Php
62 pages
Rendy Khonelius Studi Kasus MYSQL
No ratings yet
Rendy Khonelius Studi Kasus MYSQL
19 pages
Semantic Integrity Control in Distributed DBMSS: References
100% (1)
Semantic Integrity Control in Distributed DBMSS: References
33 pages
C Programming with Database: Speaker: Guo-Heng Luo (羅國亨)
No ratings yet
C Programming with Database: Speaker: Guo-Heng Luo (羅國亨)
10 pages
InformaticsPractices SQP
No ratings yet
InformaticsPractices SQP
8 pages
LAB211 Assignment: Title Background Program Specifications
No ratings yet
LAB211 Assignment: Title Background Program Specifications
4 pages
TCS
No ratings yet
TCS
21 pages
Lab 5 - ER To SQL Mapping
No ratings yet
Lab 5 - ER To SQL Mapping
2 pages
MySQL Queries
No ratings yet
MySQL Queries
2 pages
ABAP Programming On SAP HANA DB
No ratings yet
ABAP Programming On SAP HANA DB
40 pages
Oracle 10g Material Final 1
No ratings yet
Oracle 10g Material Final 1
143 pages
Cuet Exam Bihar Board Question Paper
No ratings yet
Cuet Exam Bihar Board Question Paper
31 pages
DB
100% (1)
DB
92 pages
Firebird Developer Guide Beta Delphi Firedac
No ratings yet
Firebird Developer Guide Beta Delphi Firedac
162 pages
DBMS End Term
No ratings yet
DBMS End Term
27 pages
CC 14.1 Activity7 - Physical Database Design
No ratings yet
CC 14.1 Activity7 - Physical Database Design
3 pages
Database Management System
No ratings yet
Database Management System
9 pages
控制台脚本手册v1 1
No ratings yet
控制台脚本手册v1 1
812 pages
PDF 1Z0 071
No ratings yet
PDF 1Z0 071
183 pages
Arinside 3.1.1: Documentation
No ratings yet
Arinside 3.1.1: Documentation
11 pages
Custom Controllers and Controller Extensions - Visualforce
No ratings yet
Custom Controllers and Controller Extensions - Visualforce
19 pages
ER Diagram
No ratings yet
ER Diagram
12 pages

Apache Pig Handy Notes Lab

Uploaded by

Apache Pig Handy Notes Lab

Uploaded by

Introduction to Apache Pig

The features of Apache pig are:

The architecture of Apache Pig is shown in the below image.

Pig Latin Scripts

Simple and Complex

Pig Latin Operators

Operation Operator Explanation

DUMP Write output to standard output (stdout)

STREAM Send all records through external binary

FILTER Apply predicate and remove records

GROUP/COGROUP Aggregate records with the same key

JOIN Join two or more records based on a

CROSS Cartesian product of two or more inputs

ORDER Sort records based on key

DISTINCT Remove duplicate records

UNION Merge two data sets

SPLIT Divide data into two or more bags based

LIMIT subset the number of records

Operators for Debugging and Troubleshooting

Operation Operator Description

Debug DESCRIBE Return the schema of a relation.

DUMP Dump the contents of a relation to the

EXPLAIN Display the MapReduce execution

To check where pig is installed

1. Open gedit and create the file

7. Save and close the file.

WORKING DIRECTLY ON GRUNT SHELL

Start PIG and create schema

A: {name: chararray,age: int,gpa: float}

Steps to create a Case Sensitivity

grunt> A = LOAD 'datafile' USING PigStorage() AS (f1:int, f2:int, f3:int);

What are the relational operators in Pig?

You might also like