Big Data Analytics Unit 4
Big Data Analytics Unit 4
Unit 4
SURESH BABU M
ASST PROF
IT DEPT
UNIT-IV
Hadoop Eco System-I
Pig: Introduction to PIG, Execution Modes of Pig,
Comparison of Pig with Databases, Grunt, Pig Latin, User
Defined Functions, Data Processing operators.
Hive: Hive Shell, Hive Services, Hive Metastore,
Comparison with Traditional Databases, HiveQL, Tables,
Querying Data and User Defined Functions.
4.1 Introduction to Pig
Apache Pig raises the level of abstraction for processing large
datasets.
With Pig, the data structures are much richer, typically being
multivalued and nested, and the transformations you can
apply to the data are much more powerful.
Pig is made up of two pieces:
• The language used to express data flows, called Pig Latin.
• The execution environment to run Pig Latin programs. There
are currently twoenvironments: local execution in a single
JVM and distributed execution on a Hadoopcluster.
Pig is a scripting language for exploring large
datasets.
Pig was designed to be extensible.
As another benefit, UDFs tend to be more
reusable than the libraries developed for writing
MapReduce programs.
Pig is an open-source high level data flow system.
It provides a simple language called Pig Latin, for
queries and data manipulation, which are then
compiled in to MapReduce jobs that run on
Hadoop.
4.2 Execution Types
Pig has two execution types or modes: local mode and MapReduce
mode.
Local mode
1)In local mode, Pig runs in a single JVM and accesses the local filesystem.
This mode is suitable only for small datasets and when trying out Pig.
% pig -x local
2)MapReduce mode
The MapReduce mode is also known as Hadoop Mode.
It is the default mode. In this Pig renders Pig Latin into MapReduce jobs
and executes them on the cluster.
It can be executed against semi-distributed or fully distributed Hadoop
installation. Here, the input and output data are present on HDFS.
$ pig
4.3 Comparison of Pig with Databases
4.4 Grunt
Grunt is an interactive shell for running Pig commands. Grunt
is started when no file is specified for Pig to run and the -e
option is not used. It is also possible to run Pig scripts from
within Grunt using run and exec.
4.5 Pig Latin
4.5.1 Structures
4.5.2 Statements
4.5.3 Expressions
4.5.4 Types
4.5.5 Schemas
4.5.6 Functions
4.5.7 Macros
4.5.1 Structures