0% found this document useful (0 votes)

128 views157 pages

Unit 4 Hadoop Ecosystem - HIVE and PIG

Uploaded by

madhavmane2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views157 pages

Unit 4 Hadoop Ecosystem - HIVE and PIG

Uploaded by

madhavmane2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 157

MCA31

Big Data Analytics and

Visualization
BDA Syllabus: XP

Module Detailed Contents Hrs.

1 Introduction to Big Data and Hadoop 6
2 HDFS and Map Reduce 6
3 NoSQL: 5
4 Hadoop Ecosystem: HIVE and PIG 6
5 Apache Kafka & Spark: 9
6 Data Visualization: 8
Unit4: Hadoop Ecosystem:
HIVE and PIG
XP

Unit4: Hadoop Ecosystem: HIVE and PIG

` Unit4: Hadoop Ecosystem: HIVE and PIG Hour

s
Sr. No. Topics 06
1 HIVE: background, architecture, warehouse directory and meta-store

2 HIVE query language, loading data into table, HIVE built-in functions, joins in
HIVE, Partitioning.
3 HiveQL: querying data, sorting and aggregation,

4 PIG : background, architecture, PIG Latin Basics, PIG execution

modes,.
5 PIG processing – loading and transforming data,

6 PIG built-in functions, filtering, grouping, sorting data Installation of PIG and
PIG Latin commands
7 Self-Learning Topics: Cloudera IMPALA
Introduction to Hive XP

• Apache Hive is an open source data warehouse system

for querying and analyzing large data sets that are principally

stored in Hadoop files.

• Like Hadoop, Hive was developed to address the need to

handle petabytes of data accumulating via web activity

XP
What is HIVE?
• It is data ware house system for Hadoop that runs SQL like queries called

HQL (Hive query language) which gets internally converted to map reduce

jobs.

• Hive was developed by Facebook.

• It supports Data definition Language(DDL), Data Manipulation Language

(DML)and user defined functions.

• It is used to analyze structured data.

• It is built on the top of Hadoop

• Hive provides the functionality of reading, writing, and managing large

datasets residing in distributed storage

Features of Hive XP

• Hive is fast and scalable.

• It provides SQL-like queries (i.e., HQL) that are implicitly transformed to
MapReduce jobs.
• It is capable of analyzing large datasets stored in HDFS.
• It allows different storage types such as plain text, RCFile(Record Columnar
File), and HBase.

• It uses indexing to accelerate queries.

• It can operate on compressed data stored in the Hadoop ecosystem.
• It supports user-defined functions (UDFs) where user can provide its
functionality.
Limitations of Hive XP

• Hive is not capable of handling real-time data.

• It is not designed for online transaction processing.

• Hive queries contain high latency (delays).

Differences between Hive and Pig XP

Hive Pig
Hive is commonly used by Pig is commonly used by
Data Analysts. programmers.
It follows SQL-like queries It follows the data-flow
(HQL) language.
It can handle structured data. It can handle semi-structured
data.
It works on server-side of It works on client-side of
HDFS cluster. HDFS cluster.
Hive is slower than Pig. Pig is comparatively faster
than Hive.
Hive Architecture
Q. What is HIVE? Explain HIVE architecture in detail. XP
XP
XP
Hive Architecture : Hive Client XP

• Hive allows writing applications in various languages, including Java, Python, and C++. It
supports different types of clients such as:-
• Thrift Server - It is a cross-language service provider platform that serves the request from
all those programming languages that supports Thrift.
– Apache Thrift is basically protocols which define how your connections are made
between clients and servers. Apache Hive uses Thrift to allow remote users to make a
connection with HiveServer2(The thrift server) to connect to it and submit queries. Also
thrift protocols are written in many different language like C++, Java, Python so that
users can query the same source in different languages.
• JDBC Driver - It is used to establish a connection between hive and Java applications.

The JDBC Driver is present in the class org.apache.hadoop.hive.jdbc.HiveDriver

• ODBC Driver - It allows the applications that support the ODBC protocol to connect to Hive.
Hive Architecture: Hive Services
XP

• The following are the services provided by Hive:-

• Hive CLI - The Hive CLI (Command Line Interface) is a shell where
we can execute Hive queries and commands.
• Hive Web User Interface - The Hive Web UI is just an alternative of
Hive CLI. It provides a web-based GUI for executing Hive queries
and commands.
• Hive Server - It is referred to as Apache Thrift Server. It accepts the
request from different clients and provides it to Hive Driver.
• Hive Driver - It receives queries from different sources like web UI,
CLI, Thrift, and JDBC/ODBC driver. It transfers the queries to the
compiler.
Hive Services
XP

• Hive Compiler - The purpose of the compiler is to parse the query

and perform semantic analysis on the different query blocks and

expressions. It converts HiveQL statements into MapReduce jobs.

• Hive Execution Engine - Optimizer generates the logical plan in the

form of DAG of map-reduce tasks and HDFS tasks. In the end, the

execution engine executes the incoming tasks in the order of their

dependencies.
XP
Hive Services

• Hive MetaStore - It is a central repository that stores all the structure

information of various tables and partitions in the warehouse. It also includes
metadata of column and its type information, the serializers and deserializers
which is used to read and write data and the corresponding HDFS files where
the data is stored.
• Hive services such as Meta store, File system, and Job Client in turn
communicates with Hive storage and performs the following actions
✔Metadata information of tables created in Hive is stored in Hive "Meta
storage database".
✔Query results and data loaded in the tables are going to be stored in Hadoop
cluster on HDFS.
XP
HIVE Job execution flow:
Q. Explain the process of HIVE job execution
XP

1. Executing Query from the UI( User Interface)

2. The driver is interacting with Compiler for getting the plan. (Here plan refers

to query execution) process and its related metadata information gathering

3. The compiler creates the plan for a job to be executed. Compiler

communicating with Meta store for getting metadata request

4. Meta store sends metadata information back to compiler

5. Compiler communicating with Driver with the proposed plan to execute the

query

6. Driver Sending execution plans to Execution engine

7. Execution Engine (EE) acts as a bridge between Hive and Hadoop to process the query. For DFS
operations.
– EE should first contacts Name Node and then to Data nodes to get the values stored in tables.
– EE is going to fetch desired records from Data Nodes. The actual data of tables resides in data
node only. While from Name Node it only fetches the metadata information for the query.
– It collects actual data from data nodes related to mentioned query
– Execution Engine (EE) communicates bi-directionally with Meta store present in Hive to
perform DDL (Data Definition Language) operations. Here DDL operations like CREATE, DROP
and ALTERING tables and databases are done. Meta store will store information about
database name, table names and column names only. It will fetch data related to query
mentioned.
– Execution Engine (EE) in turn communicates with Hadoop daemons such as Name node, Data
nodes, and job tracker to execute the query on top of Hadoop file system
8. Fetching results from driver
9.Sending results to Execution engine. Once the results fetched from data nodes to the EE, it will
send results back to driver and to UI ( front end)
XP

• Hive Continuously in contact with Hadoop file system and its

daemons via Execution engine. The dotted arrow in the Job flow

diagram shows the Execution engine communication with Hadoop

daemons.
Different modes of Hive XP

• Hive can operate in two modes depending on the size of data

nodes in Hadoop.
• These modes are,
– Local mode
– Map reduce mode
XP

• When to use Local mode:

– If the Hadoop installed under pseudo mode with having one data node
– If the data size is smaller in term of limited to single local machine
– Processing will be very fast on smaller data sets present in the local machine

• When to use Map reduce mode:

– If Hadoop is having multiple data nodes and data is distributed across
different node
– It will perform on large amount of data sets and query going to execute in
parallel way
– Processing of large data sets with better performance can be achieved
through this mode
XP
HIVE Data Types XP

• Integer Types
Type Size Range
TINYINT 1-byte signed -128 to 127
integer
SMALLINT 2-byte signed 32,768 to 32,767
integer
INT 4-byte signed 2,147,483,648 to 2,147,483,647
integer
BIGINT 8-byte signed -9,223,372,036,854,775,808 to
integer 9,223,372,036,854,775,807
HIVE Data Types XP

• Decimal Type

Type Size Range

FLOAT 4-byte Single precision floating point number

DOUBLE 8-byte Double precision floating point number

HIVE Data Types
XP

Date/Time Types:
• TIMESTAMP
– It supports traditional UNIX timestamp with optional nanosecond precision.
– As Integer numeric type, it is interpreted as UNIX timestamp in seconds.
– As Floating point numeric type, it is interpreted as UNIX timestamp in
seconds with decimal precision.
– As string, it follows java.sql.Timestamp format "YYYY-MM-DD
HH:MM:SS.fffffffff" (9 decimal place precision)

• Date: The Date value is used to specify a particular year, month and day, in
the form YYYY--MM--DD
HIVE Data Types XP

String Types
• STRING :The string is a sequence of characters. It values
can be enclosed within single quotes (') or double quotes
(").
• Varchar: The varchar is a variable length type whose
range lies between 1 and 65535, which specifies that the
maximum number of characters allowed in the character
string.
• CHAR:The char is a fixed-length type whose maximum
length is fixed at 255.
HIVE Data Types XP

Hive Data Transformation Range and Description

Type Data Type
Binary Binary 1 to 104,857,600 bytes. You can
read and write data of Binary data
type in a Hadoop environment. You
can use the user-defined functions to
transform the binary data type.

Boolean Integer 1 or 0. The default transformation

type for boolean is integer. You can
also set this to string data type with
values of True and False.
XP
HIVE Data Types
XP
Complex Type

Type Size Range

Struct It is similar to C struct or an object struct('James','Roy')

where fields are accessed using the
"dot" notation.

Map It contains the key-value tuples map('first','James','last','Ro

where the fields are accessed using y')
array notation.

Array It is a collection of similar type of array('James','Roy')

values that indexable using zero-
based integers.
Hive CLI XP
HDFS, HIVE, YARN, Zookeeper services required
XP
XP
Hive using Hue browser
XP
XP
• Show Database :
– we can maintain multiple tables within a database where a unique name is
assigned to each table. Hive also provides a default database with a
name default.
– check the default database provided by Hive using hive> show databases;
XP
XP
• Create Database :
– create a new database by using the following
command: hive> create database demo;

– check the existence of a newly created database.

hive> show databases;

XP
XP

• Each database must contain a unique name. If we create two databases with the
same name, then HIVE gives error message as

• If we want to suppress the warning generated by Hive on creating the database with
the same name, follow the below command: -

hive> create a database if not exists demo;

• Hive also allows assigning properties with the database in the form of key-value pair.
hive>create the database demo
>WITH DBPROPERTIES ('creator' = 'Gaurav Chawla', 'date' = '2019-06-03');

• retrieve the information associated with the database.

hive> describe database extended demo;
XP
• Hive - Drop Database
– check the list of existing databases by using the following command:
hive> show databases;

– drop the database by using the following command.

hive> drop database demo;
XP

– check whether the database is dropped or not.

hive> show databases;

– The database demo is not present in the list. Hence, the database is dropped
successfully.
– If we try to drop the database that doesn't exist, the following error
generates:
XP

– if we want to suppress the warning generated by Hive on creating the

database with the same name, follow the below command:-
hive> drop database if exists demo;

– it is not allowed to drop the database that contains the tables directly. In such
a case, we can drop the database either by dropping tables first or use
Cascade keyword with the command.

hive> drop database if exists demo cascade

– This command automatically drops the tables present in the database first.
XP

• Hive - Create Table

It provides two types of table: -
– Internal table
– External table
Internal Table XP

• It is also called managed tables as the lifecycle of their data is controlled by the Hive.
• By default, these tables are stored in a subdirectory under the directory defined by
hive.metastore.warehouse.dir (i.e. /user/hive/warehouse).
• If we try to drop the internal table, Hive deletes both table schema and data.
• command:- hive> create table demo.employee (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;

hive> describe demo.employee

• when we try to create the existing table again then the exception occurs. If we
want to ignore this type of exception, we can use if not exists command while
creating the table.
XP

• While creating a table, we can add the comments to the columns and can also
define the table properties.
• hive> create table demo.new_employee (Id int comment 'Employee Id', Name
string comment 'Employee Name', Salary float comment 'Employee Salary')
• comment 'Table Description'
• TBLProperties ('creator'='Gaurav Chawla', 'created_at' = '2019-06-
06 11:00:00');
XP

• To see the metadata of the created table by using the following

command: -
• hive> describe new_employee;
XP

• Hive allows creating a new table by using the schema of an existing

table.
• hive> create table if not exists demo.copy_employee
• like demo.employee;
External Table XP

• The external table allows us to create and access a table and a

data externally. The external keyword is used to specify the
external table, whereas the location keyword is used to determine
the location of loaded data.
• As the table is external, the data is not present in the Hive
directory. Therefore, if we try to drop the table, the metadata of
the table will be deleted, but the data still exists.
XP

To create an external table, follow the below steps: -

Let's create a directory on HDFS by using the following command: -
hdfs dfs -mkdir /HiveDirectory
Now, store the file on the created directory.
hdfs dfs -put hive/emp_details /HiveDirectory
Let's create an external table using the following command: -
– hive> create external table emplist (Id int, Name string , Salary float)
– row format delimited
– fields terminated by ','
– location '/HiveDirectory';
XP

• command to retrieve the data: -

select * from emplist;
Hive - Load Data - To load the data of the file into XP

the database
• Once the internal table has been created, the next step is to load the data into
it.
• To load the data of the file into the database by using the following : -

load data local inpath '/home/codegyani/hive/emp_details' into table demo.employee;

Here, emp_details is the file name that contains the data.

• command to retrieve the data from the database.

select * from demo.employee;

• If we want to add more data into the current database, execute the same
query again by just updating the new file name.
load data local inpath '/home/codegyani/hive/
emp_details1' into table demo.employee;
XP

• check the data of an updated table: -

• If we try to load unmatched data (i.e., one or more column data doesn't match
the data type of specified table columns), it will not throw any exception.
However, it stores the Null value at the position of unmatched tuple.
Hive - Load Data load the data into the table XP

load data local inpath '/home/codegyani/hive/

emp_details2' into table demo.employee;

Here, data loaded successfully.

fetch the records of the table XP

• select * from demo.employee

• we can see the Null values at the position of unmatched data.

Hive - Drop Table XP

• check the list of existing databases by using the following command: -

hive> show databases;

• select the database from which we want to delete the table by using the
following command: -
hive> use demo;
XP

• check the list of existing tables in the corresponding database.

hive> show tables;

• Now, drop the table by using the following command: -

hive> drop table new_employee;
XP

• check whether the table is dropped or not.

hive> show tables;

the table new_employee is not present in the list. Hence, the table is dropped
successfully.
Hive - Alter Table XP

• To perform modifications in the existing table like changing the table name,
column name, comments, and table properties.
• Rename a Table : to change the name of an existing table

Command: Alter table old_table_name rename to new_table_name;

e.g. Alter table emp rename to employee_data;

Adding column XP

• To add one or more columns in an existing table

Alter table table_name add columns(column_name datatype);
A d d in gco lu m n

XP
• Alter table employee_data add columns (age int);

• updated schema of the table

we didn't add any data to the new column, hive consider NULL as the value.
Change Column XP

• we can rename a column, change its type and position

• To change the name of the column
Alter table table_name change old_column_name new_column_name datatype;
• The existing schema of the table is
XP
Change Column
• change the name of the column by using the following command: -
Alter table employee_data change name first_name string;
Delete or Replace Column XP

• To delete one or more columns by replacing them with the new columns.
• To drop a column from the table
alter table employee_data replace columns( id string, first_name string, age int);
Partitioning in Hive XP

• Dividing the table into some parts based on the values of a particular column
like date, course, city or country.
• Advantage - data is stored in slices, the query response time becomes faster.
• partitioning in Hive and divide the data among the different datasets based on
particular columns.
• Types:
– Static partitioning
– Dynamic partitioning
Static Partitioning
XP

• It is required to pass the values of partitioned columns manually while

loading the data into the table. Hence, the data file doesn't contain the
partitioned columns.
• Steps:
• First, select the database in which we want to create a table.
– hive> use test;
• Create the table and provide the partitioned columns by using the
following command: -
– hive> create table student (id int, name string, age int, institute string)
– partitioned by (course string)
– row format delimited
XP

• To retrieve the information associated with the to table.

hive> describe student;

To partition the students of an institute based on courses.

• Load the data into the table and pass the values of partition columns with it by using
the following command: -
hive> load data local inpath '/home/codegyani/hive/
student_details1' into table student
partition(course= "java");

we are partitioning the students of an institute based on courses.

• Load the data of another file into the same table and pass the values of partition
XP
columns with it by using the following command: -

hive> load data local inpath '/home/codegyani/hive/student_details2' into table student

partition(course= "hadoop");

the table student is divided into two categories

• The entire data of the able by using the following command: - XP
– hive> select * from student;

• to retrieve the data based on partitioned columns by using the following

command: -
– hive> select * from student where course="java";
Dynamic Partitioning XP

• The values of partitioned columns exist within the table. So, it is not required to
pass the values of partitioned columns manually.
• First, select the database in which we want to create a table.
– hive> use show;
• Enable the dynamic partition by using the following commands: -
– hive> set hive.exec.dynamic.partition=true;
– hive> set hive.exec.dynamic.partition.mode=nonstrict;
• Create a dummy table to store the data.
– hive> create table stud_demo(id int, name string, age int, institute string, course string)

– row format delimited

– fields terminated by ',';
• Now, load the data into the table
– hive> load data local inpath '/home/codegyani/hive/student_details' into table
stud_demo;
XP

• Create a partition table by using the following command: -

– hive> create table student_part (id int, name string, age int, institute string)
– partitioned by (course string)
– row format delimited
– fields terminated by ',';
• Now, insert the data of dummy table into the partition table.
– hive> insert into student_part
– partition(course)
– select id, name, age, institute, course
– from stud_demo;
• the table student_part is divided into two categories
XP

• Let's retrieve the entire data of the table by using the following command:
– hive> select * from student_part;
XP

• to retrieve the data based on partitioned columns by using the following

command: -
– hive> select * from student_part where course= "java ";
Comparison between Static and Dynamic Partitioning XP

Static Partitioning Dynamic Partitioning

1 when loading files (big files) When you have large data stored in a table
into Hive tables static partitions are then the Dynamic partition is suitable.
preferred. If you want to partition a number of columns
but you don’t know how many columns then
also dynamic partition is suitable.

2 “statically” add a partition in the The values of partitioned columns exist within
table and move the file into the the table. So, it is not required to pass the
partition of the table. values of partitioned columns manually.

3 We can alter the partition in the can’t perform alter on the Dynamic partition.
static partition.
4 Static partition is in Strict Mode. Dynamic partition is in non-Strict Mode.
5 You should use where clause to use Dynamic partition there is no required where
limit in the static partition. clause to use limit.
Bucketing in Hive XP

• The bucketing in Hive is a data organizing technique.

• It is similar to partitioning in Hive with an added functionality that it divides

large datasets into more manageable parts known as buckets.

Working of Bucketing in Hive XP

• The concept of bucketing is based on the hashing technique.

• Here, modules of current column value and the number of required buckets is
calculated (let say, F(x) % 3).
• Now, based on the resulted value, the data is stored into the corresponding
bucket.
XP

• First, select the database in which we want to create a table using

– hive> use showbucket;
• Create a dummy table to store the data.
– hive> create table emp_demo (Id int, Name string , Salary float)
– row format delimited
– fields terminated by ',' ;
• Now, load the data into the table.
– hive> load data local inpath '/home/codegyani/hive/emp_details' into table emp_demo;
• Enable the bucketing by using the following command: -
– hive> set hive.enforce.bucketing = true;
• Create a bucketing table by using the following command: -
• hive> create table emp_bucket(Id int, Name string , Salary float)
– clustered by (Id) into 3 buckets
– row format delimited
– fields terminated by ',' ;
• insert the data of dummy table into the bucketed table.
XP

• we can see that the data is divided into three buckets.

• Let's retrieve the data of bucket 0.

• According to hash function :
6%3=0
3%3=0
So, these columns stored in bucket 0.
XP

• Let's retrieve
• According to hash function :
7%3=1
4%3=1
1%3=1
So, these columns stored in bucket 1. the data of bucket 1.
XP

• Let's retrieve the data of bucket 2.

• According to hash function :
8%3=2
5%3=2
2%3=2
So, these columns stored in bucket 2.
Advantages of Bucketing in Hive XP

i. On comparing with non-bucketed tables, Bucketed tables offer

the efficient sampling.

ii. Map-side joins will be faster on bucketed tables than non-

bucketed tables, as the data files are equal sized parts.

iii. Here also bucketed tables offer faster query responses than non-

bucketed tables as compared to Similar to partitioning.

iv. This concept offers the flexibility to keep the records in each

bucket to be sorted by one or more columns.

v. Since the join of each bucket becomes an efficient merge-sort,

Limitations of Bucketing in Hive XP

i. However, it doesn’t ensure that the table is properly populated.

ii. So, we need to handle Data Loading into buckets by our-self
HiveQL - Operators XP

• Arithmetic Operators in Hive

Operators Description
A+B This is used to add A and B.
A-B This is used to subtract B from A.
A*B This is used to multiply A and B.
A/B This is used to divide A and B and returns the quotient of the
operands.

A%B This returns the remainder of A / B.

A|B This is used to determine the bitwise OR of A and B.

A&B This is used to determine the bitwise AND of A and B.

A^B This is used to determine the bitwise XOR of A and B.

~A This is used to determine the bitwise NOT of A.

• Relational Operators in Hive

Operator Description
A=B It returns true if A equals B, otherwise false.
A <> B, A !=B It returns null if A or B is null; true if A is not equal to B,
otherwise false.
A<B It returns null if A or B is null; true if A is less than B, otherwise
false.
A>B It returns null if A or B is null; true if A is greater than B,
otherwise false.
A<=B It returns null if A or B is null; true if A is less than or equal to
B, otherwise false.
A>=B It returns null if A or B is null; true if A is greater than or equal
to B, otherwise false.
A IS NULL It returns true if A evaluates to null, otherwise false.
A IS NOT NULL It returns false if A evaluates to null, otherwise true.
EMP table Data XP
• Example:
XP
• Select the database in which we want to create a table
– hive> use hql;
• Create a hive table using the following command: -
– hive> create table employee (Id int, Name string , Salary float)
– row format delimited
– fields terminated by ',' ;
• Now, load the data into the table.
– hive> load data local inpath '/home/codegyani/hive/
emp_data' into table employee;
• Let's fetch the loaded data by using the following command: -
– hive> select * from employee;
XP

• an example to increase the salary of each employee by

50.
– hive> select id, name, salary + 50 from employee;
XP

• an example to find out the 10% salary of each employee.

– hive> select id, name, (salary * 10) /100 from employee;
• an example to fetch the details of the employee having salary>=25000. XP
– hive> select * from employee where salary >= 25000;

• an example to fetch the details of the employee having

salary<25000.
– hive> select * from employee where salary < 25000;
Functions in HIVE XP

• To perform mathematical and aggregate type operations.

– Mathematical Functions in Hive
– Aggregate Functions in Hive
– Other built-in Functions in Hive
Functions in HIVE XP

• Mathematical Functions in Hive

Return type Functions Description
BIGINT round(num) It returns the BIGINT for the rounded value of DOUBLE
num.
BIGINT floor(num) It returns the largest BIGINT that is less than or equal to
num.
BIGINT ceil(num), ceiling(DOUBLE It returns the smallest BIGINT that is greater than or
num) equal to num.
DOUBLE exp(num) It returns exponential of num.
DOUBLE ln(num) It returns the natural logarithm of num.
DOUBLE log10(num) It returns the base-10 logarithm of num.
DOUBLE sqrt(num) It returns the square root of num.
DOUBLE abs(num) It returns the absolute value of num.
DOUBLE sin(d) It returns the sin of num, in radians.
DOUBLE asin(d) It returns the arcsin of num, in radians.
DOUBLE cos(d) It returns the cosine of num, in radians.
DOUBLE acos(d) It returns the arccosine of num, in radians.

DOUBLE tan(d) It returns the tangent of num, in radians.

DOUBLE atan(d) It returns the arctangent of num, in radians.
Aggregate Functions in Hive XP

• The aggregate function returns a single value resulting from computation

over many rows.
Return Operator Description
Type
BIGINT count(*) It returns the count of the number of rows present in the
file.
DOUBLE sum(col) It returns the sum of values.
DOUBLE sum(DISTINCT It returns the sum of distinct values.
col)
DOUBLE avg(col) It returns the average of values.
DOUBLE avg(DISTINCT It returns the average of distinct values.
col)
DOUBLE min(col) It compares the values and returns the minimum one
form it.
DOUBLE max(col) It compares the values and returns the maximum one
form it.
Other built-in Functions in Hive XP

Return Type Operator Description

INT length(str) It returns the length of the string.
STRING reverse(str) It returns the string in reverse order.
STRING concat(str1, str2, ...) It returns the concatenation of two or more
strings.
STRING substr(str, start_index) It returns the substring from the string based
on the provided starting index.
STRING substr(str, int start, int It returns the substring from the string based
length) on the provided starting index and length.
STRING upper(str) It returns the string in uppercase.
STRING lower(str) It returns the string in lowercase.
STRING trim(str) It returns the string by removing whitespaces
from both the ends.
STRING ltrim(str) It returns the string by removing whitespaces
from left-hand side.
TRING rtrim(str) It returns the string by removing whitespaces
from right-hand side.
• Example:
XP
• Select the database in which we want to create a table
– hive> use hql;
• Create a hive table using the following command: -
– hive> create table employee (Id int, Name string , Salary float)
– row format delimited
– fields terminated by ',' ;
• Now, load the data into the table.
– hive> load data local inpath '/home/codegyani/hive/
emp_data' into table employee;
• Let's fetch the loaded data by using the following command: -
– hive> select * from employee;
• An example to fetch the square root of each employee's salary. XP
– hive> select Id, Name, sqrt(Salary) from employee_data ;

• An example to fetch the maximum salary of an employee.

– hive> select max(Salary) from employee_data;
• An example to fetch the name of each employee in uppercase. XP
– select Id, upper(Name) from employee_data;
XP

• an example to fetch the name of each employee in lowercase.

– select Id, lower(Name) from employee_data;
EMP table Data XP
HiveQL - GROUP BY and HAVING Clause XP

• The HQL Group By clause is used to group the data from the multiple
records based on one or more column.
• It is generally used in conjunction with the aggregate functions (like SUM,
COUNT, MIN, MAX and AVG) to perform an aggregation over each group.
• sum of employee salaries department wise by using the following command:

– hive> select department, sum(salary) from emp group by department;

HAVING CLAUSE XP

• The HQL HAVING clause is used with GROUP BY clause. Its purpose is to
apply constraints on the group of data produced by GROUP BY clause.
Thus, it always returns the data where the condition is TRUE.
• The sum of employee's salary based on department having sum >= 35000
by using the following command:
– hive> select department, sum(salary) from emp group by department ha
ving sum(salary)>=35000;
XP
HiveQL - ORDER BY

• It returns the result set either in ascending or descending order.

• ORDER BY clause performs a complete ordering of the query result set.

• Hence, the complete data is passed through a single reducer.

• This may take much time in the execution of large datasets.

• However, we can use LIMIT to minimize the sorting time.

• an example to arrange the data in the sorted order by using ORDER BY clause.
• fetch the data in the descending order by using the following command:
– hive> select * from emp order by salary desc;
HiveQL - SORT BY Clause XP

• SORT BY clause is an alternative of ORDER BY clause.

• It orders the data within each reducer.

• Hence, it performs the local ordering, where each reducer's output is

sorted separately.
• It may also give a partially ordered result.
• fetch the data in the descending order by using the following command:
– hive> select * from emp sort by salary desc;
XP

• https://www.edureka.co/blog/hive-tutorial/
• https://www.edureka.co/blog/hive-commands-with-examples
• https://www.javatpoint.com/hive-drop-database
• https://www.dezyre.com/hadoop-tutorial/hive-commands
PIG
XP
PIG
• The language used to analyze data in Hadoop using Pig is known as Pig Latin.

• It is a high level data processing language which provides a rich set of data types

and operators to perform various operations on the data.

• To perform a particular task Programmers using Pig, programmers need to write

a Pig script using the Pig Latin language, and execute them using any of the

execution mechanisms (Grunt Shell, UDFs, Embedded).

• After execution, these scripts will go through a series of transformations

applied by the Pig Framework, to produce the desired output.

• Internally, Apache Pig converts these scripts into a series of MapReduce jobs
The architecture of Apache Pig is shown below. XP
XP

Parser:
– Initially the Pig Scripts are handled by the Parser.

– It checks the syntax of the script, does type checking, and other miscellaneous

checks.

– The output of the parser will be a DAG (directed acyclic graph), which represents

the Pig Latin statements and logical operators.

– In the DAG, the logical operators of the script are represented as the nodes and

the data flows are represented as edges.

Optimizer
– The logical plan (DAG) is passed to the logical optimizer, which carries out the

logical optimizations such as projection and pushdown.

Compiler:
– The compiler compiles the optimized logical plan into a series of
MapReduce jobs.

Execution engine:
– Finally the MapReduce jobs are submitted to Hadoop in a sorted order.

– These MapReduce jobs are executed on Hadoop producing the desired

results.
Pig Latin Data Model XP

• The data model of Pig Latin is fully nested and it allows complex non-atomic
datatypes such as map and tuple.
• Given below is the diagrammatical representation of Pig Latin’s data model.
XP

Atom:
• Any single value in Pig Latin, irrespective of their data, type is known as
an Atom.
• We can use it as string and number and store it as the string.
• Atomic values of Pig are int, long, float, double, char array, and byte array.
• A field is a piece of data or a simple atomic value in Pig.
• For Example − ‘Shubham’’ or ‘30’
Tuple:
• Tuple is a record that is formed by an ordered set of fields.
• However, the fields can be of any type.
• In addition, a tuple is similar to a row in a table of RDBMS.
For Example − (Shubham, 30)
XP

Bag:

• A bag is an unordered set of tuples. In other words, a collection of tuples (non-

unique) is known as a bag.

• Each tuple can have any number of fields (flexible schema).

• A bag is represented by ‘{}’.

• It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary

that every tuple contain the same number of fields or that the fields in the
same position (column) have the same type.

• Example − {(Raj, 10), (Shubham, 30)}

• A bag can be a field in a relation; in that context, it is known as inner bag.

• Example − {Raj, 10, {9848022338, [email protected],}}

Map

• A map (or data map) is a set of key-value pairs.

• The key needs to be of type char array and should be unique.

• The value might be of any type.

• It is represented by ‘[]’

• Example − [name#Raj, age#10]

Relation

• A relation is a bag of tuples.

• The relations in Pig Latin are unordered (there is no guarantee that

tuples are processed in any particular order).

Pig Latin Data Model XP

• A bag-collection of tuples.
• A tuple - an ordered set of fields.
• A field - a piece of data
Statements in Pig Latin XP

• Statements work with relations. Also, includes expressions and

schemas.

• Every statement ends with a semicolon (;).

• Moreover, through statements, we will perform several operations

using operators, those are offered by Pig Latin.

• Pig Latin statements take a relation as input and produce another

relation as output, while performing all other operations Except
LOAD and STORE.
Pig Latin Datatypes
XP

• Int : represents a signed 32-bit integer. For eg. 10

• Long : represents a signed 64-bit integer. For eg. 10L

• Float: represents a signed 32-bit floating point. For eg 10.5F

• Double : represents a 64-bit floating point. For eg. 10.5

• Chararray : represents a character array (string) in Unicode UTF-8 format. For eg . ‘Class MCA’

• Bytearray :represents a Byte array . It is case insensitive.

• Boolean : represents a Boolean value. For eg. true/ false.

• Datetime :represents a date-time. for eg. 1970-01-01T00:00:00.000+00:00

• Biginteger : represents a Java BigInteger. For eg. 60708090709

• Bigdecimal : Represents a Java BigDecimal for eg. 185.98376256272893883

Pig Latin Datatypes:Complex Types XP

• Tuple : An ordered set of fields is what we call a tuple. For Example : (Ankit, 32)

• Bag: A collection of tuples is what we call a bag. For Example : {(Ankit,32),

(Neha,30)}
• Map: A set of key-value pairs is what we call a Map. Example : [ ‘name’#’Ankit’,
‘age’#32]
XP

Null Values :
– It is possible that values for all the above data types can be NULL

– On defining a null Value, It can be an unknown value or a non-existent value.

– Moreover, we use it as a placeholder for optional values.

Pig Latin Arithmetic Operators XP

Operator Description Example

+ Addition − It add values on any single side of the operator. if a= 10, b= 30,
a + b gives 40
− Subtraction − It reduces the value of right hand operand from left hand if a= 40, b= 30,
operand. a-b gives 10

* Multiplication − This operation multiplies the values on either side of the a * b gives you 1200
operator.
/ Division − This operator divides the left hand operand by right hand if a= 40, b= 20,
operand. b / a results to 2

% Modulus − It divides the left hand operand by right hand operand with if a= 40, b= 30,
remainder as result. b%a results to 10
?: Bincond − It evaluates the Boolean operators. Moreover, it has three b = (a == 1)? 40: 20;
operands below. if a = 1 the value is
variable x = (expression) ? value1 if true : value2 if false. 40.
if a!=1 the value is
20.
CASE Case − This operator is equal to the nested bincond. CASE f2 % 4
WHEN WHEN 0 THEN
THEN ‘even’
ELSE WHEN 1 THEN ‘odd’
END END
Pig Latin Comparison Operators XP

Operator Description Example

Equal − This operator checks whether the values of two operands are If a=10, b=20, then
== equal or not. If yes, then the condition becomes true. (a = b) is not true

!= Not Equal − Checks the values of two operands are equal or not. If the If a=10, b=20, then
values are equal, then condition becomes false else true. (a != b) is true

Greater than − It checks whether the right operand value is greater If a=10, b=20,
> than that of the right operand. If yes, then the condition becomes true. then(a > b) is not
true.
Less than − This operator checks the value of the left operand is less (a < b) is true, if
< than the right operand. If condition fulfills, then it returns true. a=10, b=20.
Greater than or equal to − It checks the value of the left operand with If a=20, b=50,
>= right hand. It checks whether it is greater or equal to the right operand. true(a >= b) is not
If yes, then it returns true. true.

<= Less than or equal to − The value of the left operand is less than or If a=20, b=20, (a <=
equal to that of the right operand. Then the condition still returns true. b) is true.

matches Pattern matching − This checks the string in the left-hand matches with f1 matches ‘.*df.*’
the constant in the RHS.
Type Construction Operators XP

Operator Description Example

() Tuple constructor operator − This operator constructs a tuple. (MCA, 20)

Bag constructor operator − To construct a bag, we use this

{} {(MCA, 10), (training, 25)}
operator.

[] Map constructor operator − This operator construct a tuple. [name#Raj age#12]

Pig Latin Relational Operations XP

Operator Description
Loading and LOAD It loads the data from a file system into a relation.
Storing
STORE It stores a relation to the file system (local/HDFS).
Filtering FILTER There is a removal of unwanted rows from a relation.
DISTINCT We can remove duplicate rows from a relation by this operator.
FOREACH, GENERATE It transforms the data based on the columns of data.
STREAM To transform a relation using an external program.
Grouping and JOIN We can join two or more relations.
Joining
COGROUP There is a grouping of the data into two or more relations.
GROUP It groups the data in a single relation.

CROSS We can create the cross product of two or more relations.

XP
Speaking Pig Latin
🙶 LOAD
🙶 Input is assumed to be a bag
(sequence of tuples)
🙶 Can specify a deserializer with “USING‟
🙶 newBag
Can provide=a schema
LOAD ‘filename’
with “AS‟
<USING functionName() >
<AS (fieldName1, fieldName2,…)>;

Queries = LOAD ‘query_log.txt’

USING myLoad()
AS (userID,queryString,
timeStamp)
17
XP
Speaking Pig Latin
🙶 FOREACH
🙶 Apply some processing to each tuple in a bag
🙶 Each field can be:
🙶 A fieldname of the bag
🙶 A constant
🙶 A simple expression (ie: f1+f2)
🙶 A predefined function (ie: SUM, AVG, COUNT, FLATTEN)
🙶 A UDF (ie: sumTaxes(gst, pst) )
newBag =
FOREACH bagName
GENERATE field1, field2, …;

18
XP
Speaking Pig Latin
🙶 FILTER
🙶 Select a subset of the tuples in a bag
newBag = FILTER bagName
BY expression ;
🙶 Expression uses simple comparison operators (==, !=,
<, >, …)
some_apples
and Logical = (AND, NOT, OR)
connectors
FILTER apples BY colour != ‘red’;
🙶 Can use UDFs
some_apples =
FILTER apples BY NOT isRed(colour);

19
Pig Latin Relational Operations XP

Operator Description
Sorting ORDER It arranges a relation in an order based on one or more fields.
LIMIT We can get a particular number of tuples from a relation.
Combining and UNION We can combine two or more relations into one relation.
Splitting
SPLIT To split a single relation into more relations.
Diagnostic DUMP It prints the content of a relationship through the console.
Operators
DESCRIBE It describes the schema of a relation.
We can view the logical, physical execution plans to evaluate a
EXPLAIN
relation.

ILLUSTRATE It displays all the execution steps as the series of statements.

XP
Execution modes

• The modes depends on where the Pig script is going to run and where the

data is residing. The data can be stored in a single machine, i.e. local file
system or it can be stored in a distributed environment like typical Hadoop
Cluster environment

• We can run Pig programs in three different modes.

• First one is non interactive shell also known as script mode. In this we have to

create a file, load the code in the file and execute the script.

• Second one is grunt shell, it is an interactive shell for running Apache Pig

commands.

• Third one is embedded mode, in this we use JDBC to run SQL programs from
XP
Execution modes

• Local mode: In this mode, Pig runs in a single JVM and makes use of local file

system. This mode is suitable only for analysis of small datasets using Pig

• Map Reduce mode: In this mode, we could have proper Hadoop cluster setup

and Hadoop installations on it. By default, the pig runs on MR mode. Pig

translates the submitted queries into Map reduce jobs and runs them on top

of Hadoop cluster. We can say this mode as a Map reduce mode on a fully

distributed cluster.

• Pig Latin statements like LOAD, STORE are used to read data from the HDFS file

system and to generate output. These Statements are used to process data.
XP

• Storing Results:

• During the processing and execution of MR jobs, intermediate data will

be generated. Pig stores this data in a temporary location on HDFS

storage. In order to store this intermediate data, temporary location has

to be created inside HDFS.

• By Using DUMP, we can get the final results displayed to the output

screen. In a production environment, the output results will be stored

using STORE operator.

XP
Eval Functions: XP

i. AVG() : to compute the average of the numerical values within a bag.

Syntax :AVG(expression)

ii. BagToString() :It is used to concatenate the elements of a bag into a string. We can place a

delimiter between these values (optional) while concatenating.

iii. CONCAT() : To concatenate two or more expressions of the same type

syntax: CONCAT (expression, expression)

iv. COUNT() : to get the number of elements in a bag.

Syntax: COUNT(expression)

v. COUNT_STAR() : To get the number of elements in a bag,

syntax :COUNT_STAR(expression)

vi. DIFF() : to compare two bags (fields) in a tuple.

syntax : DIFF(expression, expression)

XP
Eval Functions:
vii. IsEmpty() : to check if a bag or map is empty.

Syntax :IsEmpty(expression)

viii. MAX() : to calculate the highest value for a column (numeric values or chararrays) in a
single-column bag.

Syntax: MAX(expression)

ix. MIN(): to get the minimum (lowest) value (numeric or chararray) for a certain column in a
single-column bag.

Syntax : MIN(expression)

x. PluckTuple(): We can define a string prefix and filter the columns in a relation that begin with
the given prefix,

xi. SIZE(): to compute the number of elements based on any Pig data type.

Syntax : SIZE(expression)
Eval Functions: XP

xii. SUBTRACT(): to subtract two bags. As a process, it takes two bags as inputs. Then

returns a bag which contains the tuples of the first bag that are not in the second

bag.

Syntax : SUBTRACT(bag1,bag2)

xiii. SUM(): To get the total of the numeric values of a column in a single-column

xiv. TOKENIZE() : For splitting a string (which contains a group of words) in a single

tuple. Then return a bag which contains the output of the split operation.

Syntax : TOKENIZE(expression)
Load and Store Functions XP

To determine, how data goes into Pig and com

i. PigStorage() : to load and store structured files.

Syntax : PigStorage(field_delimiter)

ii. TextLoader() : for loading unstructured data into Pig.

Syntax :TextLoader()

iii. BinStorage() : By using machine-readable format, for loading and storing data into
Pig.
Syntax :BinStorage()

iv. Handling Compression

We can load and store compressed data in Pig Latin.
Bag and Tuple Functions XP

i. TOBAG() : to convert two or more expressions into a bag

Syntax :TOBAG(expression [, expression …])
ii. TOP() : For getting the top N tuples of a relation.
syntax :TOP(topN,column,relation)
iii. TOTUPLE() : to convert one or more expressions into a tuple.
Syntax :TOTUPLE(expression [, expression …])
iv. TOMAP(): to convert the key-value pairs into a Map.
String Functions XP

i. ENDSWITH(string, testAgainst)

For verifying whether a given string ends with a particular substring.

ii. STARTSWITH(string, substring)

This Pig Function verifies whether the first string starts with the second, after

accepting two string parameters.

iii. SUBSTRING(string, startIndex, stopIndex)

It returns a substring from a given string.

iv. EqualsIgnoreCase(string1, string2)

In order to compare two strings ignoring the case.

String Functions XP

v. INDEXOF(string, ‘character’, startIndex) : It returns the first occurrence of a

character in a string, searching forward from a start index.

vi. LAST_INDEX_OF(expression): To return the index of the last occurrence of a

character in a string, searching backward from a start index.

vi. LCFIRST(expression): It is used for conversion of the first character in a string to

lowercase.

vii. UCFIRST(expression): It returns a string with the first character converted to

uppercase.

viii. UPPER(expression): to get a string converted to upper case.

ix. LOWER(expression):converts all characters in a string to lower case.

String Functions XP

x. REPLACE(string, ‘oldChar’, ‘newChar’) :For replacing existing characters in a string

with new characters.

xi. STRSPLIT(string, regex, limit) : to split a string around matches of a given regular

xii. STRSPLITTOBAG(string, regex, limit)

It splits the string by given delimiter and returns the result in a bag.pression.

xiii. TRIM(expression): It is used to return a copy of a string with leading and trailing

whitespaces removed.

xiv. LTRIM(expression)

It returns a copy of a string with leading whitespaces removed.

xv. RTRIM(expression)

For returning a copy of a string with trailing whitespaces removed.

Date and Time Functions XP

i. ToDate(milliseconds) : According to the given parameters, it returns a date-time object.

There are more alternative for this functions. Such as ToDate(iosstring), ToDate(userstring,

format), ToDate(userstring, format, timezone)

ii. CurrentTime(): It returns the date-time object of the current time.

iii. GetDay(datetime): To get the day of a month as a return from the date-time object, we

use it.

iv. GetHour(datetime): GetHour returns the hour of a day from the date-time object.

v. GetMilliSecond(datetime): It returns the millisecond of a second from the date-time

object.

vi. GetMinute(datetime): To get the minute of an hour in return from the date-time object,

we use it.
Date and Time Functions
vii. GetMonth(datetime) XP
GetMonth returns the month of a year from the date-time object.

viii. GetSecond(datetime)

It returns the second of a minute from the date-time object.

ix. GetWeek(datetime)

To get the week of a year as a return from the date-time object, we use it.

x. GetWeekYear(datetime)

GetWeekYear returns the week year from the date-time object.

xi. GetYear(datetime)

It returns the year from the date-time object.

xii. AddDuration(datetime, duration)

To get the result of a date-time object as a result along with the duration object, we use it.

xiii. SubtractDuration(datetime, duration)

SubtractDuration subtracts the duration object from the Date-Time object and returns the result.
Date and Time Functions
XP

xiv. DaysBetween(datetime1, datetime2)

DaysBetween returns the number of days between the two date-time objects.

xv. HoursBetween(datetime1, datetime2)

It returns the number of hours between two date-time objects.

xvi. MilliSecondsBetween(datetime1, datetime2)

To get the number of milliseconds as result between two date-time objects, we use

it.

xvii. MinutesBetween(datetime1, datetime2)

MinutesBetween returns the number of minutes between two date-time objects.

Date and Time Functions
XP

xviii. MonthsBetween(datetime1, datetime2)

To get the number of months as a return between two date-time objects, we use it.

xix. SecondsBetween(datetime1, datetime2)

It returns the number of seconds between two date-time objects.

xx. WeeksBetween(datetime1, datetime2)

WeeksBetween returns the number of weeks between two date-time objects.

xxi. YearsBetween(datetime1, datetime2)

To get the number of years as a return between two date-time objects, we use it.

Any doubt yet in Pig Built in functions? Please Comment.

Math Functions XP

i. ABS(expression) : to get the absolute value of an expression.

Syntax: ABS(expression)

ii. ACOS(expression) : It gives the arc cosine of an expression.

Syntax:ACOS(expression)

iii. ASIN(expression) : gives the arc sine of an expression.

Syntax: ASIN(expression)

iv. ATAN: To get the arc tangent of an expression

Syntax: ATAN(expression)
Math Functions XP

v. CBRT(expression) : It gives the cube root of an expression

Syntax : CBRT(expression)

vi. CEIL(expression) : to get the value of an expression rounded up to the nearest

integer

Syntax: CEIL(expression)

vii. COS(expression) : to get the trigonometric cosine of an expression.

Syntax : COS(expression)

viii. COSH(expression) :gives the hyperbolic cosine of an expression.

Syntax : COSH(expression)
Math Functions XP

ix. EXP: To get the Euler’s number e raised to the power of x.

Syntax: EXP(expression)
x. FLOOR: to get the value of an expression rounded down to the
nearest integer.
Syntax: FLOOR(expression)
xi. LOG(expression) : It gives the natural logarithm (base e) of an
expression.
Syntax: LOG(expression)
xii. LOG10: It gives the base 10 logarithms of an expression.
Syntax : LOG10(expression)
xiii. RANDOM( ): to get a pseudo random number (type double)
greater than or equal to 0.0 and less than 1.0.
Math Functions XP

xiv. ROUND: gives the value of an expression rounded to an integer (if the

result type is float) or rounded to a long (if the result type is double).

Syntax : ROUND(expression)

xv. SIN(expression) : to get the sine of an expression.

Syntax: SIN(expression)

xvi. SINH(expression): It gives the hyperbolic sine of an expression.

Syntax : SINH(expression)
Math Functions XP

xvii. SQRT :

Syntax: SQRT(expression)

SQRT gives the positive square root of an expression.

xviii. TAN : to get the trigonometric tangent of an angle.

Syntax:TAN(expression)

xix. TANH : It gives the hyperbolic tangent of an expression.

Syntax: TANH(expression)
Filtering data XP

• The FILTER operator is used to select the required tuples from a relation based
on a condition.
• Syntax:
grunt> Relation2_name = FILTER relation1_name BY (condition);

For eg. filter_data = FILTER student_details BY city == 'Chennai';

Grouping XP

• The GROUP operator is used to group the data in one or more relations. It

collects the data having the same key.

• Syntax: grunt> Group_data = GROUP Relation_name BY age;

• For eg. grunt> group_data = GROUP student_details by age;

Sorting XP

• The ORDER BY operator is used to display the contents of a relation in a sorted

order based on one or more fields.

• Syntax: grunt> Relation_name2 = ORDER Relatin_name1 BY (ASC|DESC);

• For eg. grunt> order_by_data = ORDER student_details BY age DESC;

Pig Grunt Shell Commands XP

• sh Command : By using the sh option, we can invoke the ls command of

Linux shell from the Grunt shell.

Syntax: grunt> sh shell command parameters

eg. grunt> sh ls
• fs Command: By using fs command, we can invoke the ls command of
HDFS from the Grunt shell. Here, it lists the files in the HDFS root
directory.

Syntax: grunt> sh File System command parameters

Eg.grunt> fs –ls
XP

• clear : to clear the screen of the Grunt shell,

Syntax: grunt> clear
• Help: gives you a list of Pig commands or Pig properties
Synatx: grunt> help
• history : it displays a list of statements executed/used so far since
the Grunt sell is invoked.
Synatx: grunt> history
XP

• set : to show/assign values to keys

By passing any whole number as a value to this key, we can set the number of reducers
for a map job.

Debug : Also, by passing on/off to this key, we can turn off or turn on the debugging
feature in Pig.

job.name : Moreover, by passing a string value to this key we can set the Job name to
the required job.

job.priority : By passing one of the following values to this key, we can set the job
priority to a job − very_low , low , normal,high,very_high

stream.skippath : By passing the desired path in the form of a string to this key, we can
set the path from where the data is not to be transferred, for streaming.

Begin Apex - Collection - Add - Member (P - Collection - Name CART', P - c001 :P2 - PRODUCT - ID) End
100% (1)
Begin Apex - Collection - Add - Member (P - Collection - Name CART', P - c001 :P2 - PRODUCT - ID) End
11 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
SQL Tutorial
No ratings yet
SQL Tutorial
41 pages
PHP MySQL Interview Questions
No ratings yet
PHP MySQL Interview Questions
9 pages
DBA - Class - Notes 3.0
No ratings yet
DBA - Class - Notes 3.0
118 pages
Chapter - RDBMS (Basic) Class 10
No ratings yet
Chapter - RDBMS (Basic) Class 10
6 pages
SBI IT (Systems) Assistant Manager 2012 Question Paper Question 1 To 25
No ratings yet
SBI IT (Systems) Assistant Manager 2012 Question Paper Question 1 To 25
38 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
SQL Tutorial 2024
No ratings yet
SQL Tutorial 2024
303 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
120 pages
Bda Unit 5 Notes
No ratings yet
Bda Unit 5 Notes
23 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Oracle Wait Events and Solution Part I
No ratings yet
Oracle Wait Events and Solution Part I
11 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
Architecture and Working of Hive
No ratings yet
Architecture and Working of Hive
7 pages
Hive
No ratings yet
Hive
63 pages
Lecture Notes CSC3170 2025 Part 1
No ratings yet
Lecture Notes CSC3170 2025 Part 1
40 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Hive
No ratings yet
Hive
52 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Final Exam DB Design and Programming
No ratings yet
Final Exam DB Design and Programming
32 pages
Hive
No ratings yet
Hive
30 pages
Chapter - 4 - Data Access - Hive
No ratings yet
Chapter - 4 - Data Access - Hive
35 pages
Unit-IV - BDA
No ratings yet
Unit-IV - BDA
42 pages
Close: Criteria Mysql SQL Server
No ratings yet
Close: Criteria Mysql SQL Server
5 pages
DBMS MCQ Questions
No ratings yet
DBMS MCQ Questions
8 pages
Hive
No ratings yet
Hive
49 pages
Hive Final
No ratings yet
Hive Final
75 pages
MYSQL Notes 1
No ratings yet
MYSQL Notes 1
86 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
HIVE
No ratings yet
HIVE
33 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Introduction To Hive
No ratings yet
Introduction To Hive
28 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Dbms Lab Manual Reg2021 24-25
No ratings yet
Dbms Lab Manual Reg2021 24-25
73 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
59 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
7 Hive
No ratings yet
7 Hive
30 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Database Concepts II - Course Outline
No ratings yet
Database Concepts II - Course Outline
3 pages
Hive
No ratings yet
Hive
28 pages
Lecture 7 SQL
No ratings yet
Lecture 7 SQL
88 pages
Web Based Data Management of Apache Hive
No ratings yet
Web Based Data Management of Apache Hive
22 pages
Hive Tutorial
No ratings yet
Hive Tutorial
19 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
14 pages
XII CS - Term2 - Practicals (2021-22) - Sol
No ratings yet
XII CS - Term2 - Practicals (2021-22) - Sol
13 pages
Bda Bi Jit Chapter-5
No ratings yet
Bda Bi Jit Chapter-5
27 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
HIVE
No ratings yet
HIVE
18 pages
HIVE
No ratings yet
HIVE
16 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
Bda Ia-3 QB-1
No ratings yet
Bda Ia-3 QB-1
17 pages
Unit 3
No ratings yet
Unit 3
23 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
17 pages
Section 10 13 PLSQL
No ratings yet
Section 10 13 PLSQL
45 pages
Hive
No ratings yet
Hive
23 pages
Unit IV
No ratings yet
Unit IV
22 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
DBMS Project
No ratings yet
DBMS Project
13 pages
Hive
No ratings yet
Hive
12 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
What Is System Catalog
No ratings yet
What Is System Catalog
3 pages
01 Introduction To Hive (1) 2 15
No ratings yet
01 Introduction To Hive (1) 2 15
14 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Fish Pond Activity
No ratings yet
Fish Pond Activity
13 pages
Hive
No ratings yet
Hive
12 pages
Unit 3
No ratings yet
Unit 3
8 pages
BDA Assignment QP-3 IT C With Key Solutions
No ratings yet
BDA Assignment QP-3 IT C With Key Solutions
5 pages
Hive
No ratings yet
Hive
5 pages
Introduction To HIVE
No ratings yet
Introduction To HIVE
8 pages
07 DBMS More SQL Data Definition
No ratings yet
07 DBMS More SQL Data Definition
31 pages
MySQL Handbook 2023-24
No ratings yet
MySQL Handbook 2023-24
23 pages
E Computer Notes - Oracle9i Extensions To DML and DDL Statements
No ratings yet
E Computer Notes - Oracle9i Extensions To DML and DDL Statements
20 pages
Assignment 4-Gcc: Hive Is Not
No ratings yet
Assignment 4-Gcc: Hive Is Not
3 pages
Skillslash
No ratings yet
Skillslash
4 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
SQL Server 2008 Ephrem
No ratings yet
SQL Server 2008 Ephrem
15 pages
Lab Manual
No ratings yet
Lab Manual
8 pages
Sailaja - DBMS Railway 18 19
No ratings yet
Sailaja - DBMS Railway 18 19
6 pages
MYSQL Reference Sheet
No ratings yet
MYSQL Reference Sheet
2 pages
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet