100% found this document useful (1 vote)

1K views24 pages

Unit-Vi Hive Hadoop & Big Data

Hive is a data warehouse infrastructure tool that allows users to query and analyze large datasets stored in Hadoop using SQL-like queries. It resides on top of Hadoop and makes querying and analyzing large datasets easy. Hive uses a metastore to store metadata about the structure of tables and partitions, and converts SQL queries into MapReduce jobs that process data stored in HDFS. The main components of Hive are the user interface, metastore, HiveQL compiler, execution engine, and HDFS. When a user submits a query, HiveQL compiler parses and validates it, retrieves metadata from the metastore, generates a query execution plan, and submits the plan to the execution engine which runs it as a MapReduce job

Uploaded by

Abhay Dabhade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

1K views24 pages

Unit-Vi Hive Hadoop & Big Data

Uploaded by

Abhay Dabhade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

UNIT-VI HIVE Hadoop & Big Data

Applying Structure to Hadoop Data with Hive: Saying Hello to Hive, Seeing How the Hive is Put Together,
Getting Started with Apache Hive, Examining the Hive Clients, Working with Hive Data Types, Creating and
Managing Databases and Tables, Seeing How the Hive Data Manipulation Language Works, Querying and
Analyzing Data.

HIVE Introduction
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It
resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

The term ‘Big Data’ is used for collections of large datasets that include huge
volume, high velocity, and a variety of data that is increasing day by day. Using traditional
data management systems, it is difficult to process Big Data. Therefore, the Apache
Software Foundation introduced a framework called Hadoop to solve Big Data management
and processing challenges.

Hadoop
Hadoop is an open-source framework to store and process Big Data in a distributed
environment. It contains two modules, one is MapReduce and another is Hadoop
Distributed File System (HDFS).

MapReduce: It is a parallel programming model for processing large amounts of structured, semi-
structured, and unstructured data on large clusters of commodity hardware.

HDFS:Hadoop Distributed File System is a part of Hadoop framework, used to store and
process the datasets. It provides a fault-tolerant file system to run on commodity hardware.

The Hadoop ecosystem contains different sub-projects (tools) such as Sqoop, Pig,
and Hive that are used to help Hadoop modules.

Sqoop: It is used to import and export data to and from between HDFS and RDBMS.

Pig: It is a procedural language platform used to develop a script for MapReduce

operations.

Hive: It is a platform used to develop SQL type scripts to do MapReduce

operations.

Note: There are various ways to execute MapReduce operations:

The traditional approach using Java MapReduce program for structured,

semi- structured, and unstructured data.
The scripting approach for MapReduce to process structured and semi structured data using
Pig.
The Hive Query Language (HiveQL or HQL) for MapReduce to process structured data
using Hive.
What is Hive
IT Dept Page 1
UNIT-VI HIVE Hadoop & Big Data
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It
resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

Initially Hive was developed by Facebook, later the Apache Software Foundation took it
up and developed it further as an open source under the name Apache Hive. It is used by
different companies. For example, Amazon uses it in Amazon Elastic MapReduce.

Hive is not

A relational database
A design for OnLine Transaction Processing (OLTP)
A language for real-time queries and row-level updates

Features of Hive

 It stores schema in a database and processed data into HDFS.

 It is designed for OLAP.
 It provides SQL type language for querying called HiveQL or HQL.
 It is familiar, fast, scalable, and extensible.

Architecture of Hive
The following component diagram depicts the architecture of Hive:

The apache Hive architecture

This component diagram contains different units. The following

table describes each unit:
IT Dept Page 2
UNIT-VI HIVE Hadoop & Big Data
Unit Name Operation
Hive is a data warehouse infrastructure software that can create
User Interface interaction between user and HDFS. The user interfaces that Hive
supports are Hive Web UI, Hive command line, and Hive HD
Insight (In Windows

Hive chooses respective database servers to store the schema or

Meta Store Metadata of tables, databases, columns in a table, their data
types, and HDFS mapping.

HiveQL is similar to SQL for querying on schema info on the

Metastore. It is one of the replacements of traditional approach for
HiveQL Process Engine MapReduce program. Instead of writing MapReduce program in
Java, we can write a query for MapReduce job and process it.

The conjunction part of HiveQL process Engine and

Execution Engine MapReduce is Hive Execution Engine. Execution engine
processes the query and generates results as same as
MapReduce results. It uses the flavor of MapReduce

Hadoop distributed file system or HBASE are the data

HDFS or HBASE storage techniques to store data into file system.

Working of Hive
The following diagram depicts the workflow between Hive and Hadoop.

IT Dept Page 3
UNIT-VI HIVE Hadoop & Big Data
The following table defines how Hive interacts with Hadoop framework:

Step Operation
No.

Execute Query

1 The Hive interface such as Command Line or Web UI sends query to Driver (any
database driver such as JDBC, ODBC, etc.) to execute.

Get Plan
2 The driver takes the help of query compiler that parses the query to check the syntax and
query plan or the requirement of query.

Get Metadata
3 The compiler sends metadata request to Metastore (any database).

Send Metadata
4
Metastore sends metadata as a response to the compiler.

Send Plan
5 The compiler checks the requirement and resends the plan to the driver. Up to
here, the parsing and compiling of a query is complete.

Execute Plan
6 The driver sends the execute plan to the execution engine.

Execute Job
Internally, the process of execution job is a MapReduce job. The execution engine sends the job
7 to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data
node. Here, the query executes MapReduce job.

Metadata Ops
Meanwhile in execution, the execution engine can execute metadata operations with
7.1 Metastore.

8 Fetch Result
The execution engine receives the results from Data nodes.

9 Send Results
The execution engine sends those resultant values to the driver.

10 Send Results
The driver sends the results to Hive Interfaces.

IT Dept Page 4
UNIT-VI HIVE Hadoop & Big Data
Hive - Data Types
All the data types in Hive are classified into four types, given as follows:

 Column Types
 Literals
 Null Values
 Complex Types

Column Types
Column type are used as column data types of Hive. They are as follows:

Integral Types
Integer type data can be specified using integral data types, INT. When the data range
exceeds the range of INT, you need to use BIGINT and if the data range is smaller
than the INT, you use SMALLINT. TINYINT is smaller than SMALLINT.

The following table depicts various INT data types:

Type Postfix Example

TINYINT Y 10Y
SMALLINT S 10S

INT - 10

BIGINT L 10L

String Types
String type data types can be specified using single quotes (' ') or double quotes (" "). It
contains two data types: VARCHAR and CHAR. Hive follows C-types escape characters.

The following table depicts various CHAR data types:

Data Type Length

VARCHAR 1 to 65355

CHAR 255

IT Dept 6
UNIT-VI HIVE Hadoop & Big Data
Timestamp
It supports traditional UNIX timestamp with optional nanosecond precision. It supports
java.sql.Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” and format “yyyy-mm-dd
hh:mm:ss.ffffffffff”.
Dates
DATE values are described in year/month/day format in the form {{YYYY- MM-DD}}.
Decimals
The DECIMAL type in Hive is as same as Big Decimal format of Java. It is used for
representing immutable arbitrary precision. The syntax and example is as follows:

DECIMAL(precision, scale)
decimal(10,0)

Union Types
Union is a collection of heterogeneous data types. You can create
an instance using create union. The syntax and example is as follows:

UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>

{0:1}
{1:2.0}
{2:["three","four"]}
{3:{"a":5,"b":"five"}}
{2:["six","seven"]}
{3:{"a":8,"b":"eight"}}
{0:9}
{1:10.0}

Literals
The following literals are used in Hive:

Floating Point Types

Floating point types are nothing but numbers with decimal
points. Generally, this type of data is composed of DOUBLE data type.

IT Dept 7
UNIT-VI HIVE Hadoop & Big Data
Decimal Type
Decimal type data is nothing but floating point value with higher range than DOUBLE data type. The
-308 308
range of decimal type is approximately -10 to 10 .
Null Value
Missing values are represented by the special value NULL.

Complex Types
The Hive complex data types are as follows:
Arrays
Arrays in Hive are used the same way they are used in Java.

Syntax: ARRAY<data_type>

Maps
Maps in Hive are similar to Java Maps.

Syntax: MAP<primitive_type, data_type>

Structs
Structs in Hive is similar to using complex data with comment.

Syntax: STRUCT<col_name : data_type [COMMENT col_comment], ...>

Hive - Create Database

Hive is a database technology that can define databases and tables to analyze
structured data. The theme for structured data analysis is to store the data in a
tabular manner, and pass queries to analyze it. This chapter explains how to create
Hive database. Hive contains a default database named default.
Create Database Statement
Create Database is a statement used to create a database in Hive. A database in
Hive is a namespace or a collection of tables. The syntax for this
statement is as follows:

CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>

Here, IF NOT EXISTS is an optional clause, which notifies the user that a database with
the same name already exists. We can use SCHEMA in place of
DATABASE in this command. The following query is
executed to create a database named userdb:

IT Dept 8
UNIT-VI HIVE Hadoop & Big Data
hive> CREATE DATABASE [IF NOT EXISTS] userdb;

hive> CREATE SCHEMA userdb;

The following query is used to verify a databases list:

hive> SHOW DATABASES;

default
userdb

Drop Database Statement

Drop Database is a statement that drops all the tables and deletes the database. Its
syntax is as follows:

DROP DATABASE StatementDROP (DATABASE|SCHEMA) [IF EXISTS] database_name

[RESTRICT|CASCADE];

The following queries are used to drop a database. Let us assume that the database name
is userdb.

hive> DROP DATABASE IF EXISTS userdb;

The following query drops the database using CASCADE. It means dropping respective
tables before dropping the database.

hive> DROP DATABASE IF EXISTS userdb CASCADE;

The following query drops the database using SCHEMA.

hive> DROP SCHEMA userdb;

create Table Statement

Create Table is a statement used to create a table in Hive. The syntax and example are as
follows:

Syntax
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

[(col_name data_type [COMMENT col_comment], ...)]

[COMMENT table_comment]
[ROW FORMAT row_format] [STORED AS file_format]

IT Dept 9
UNIT-VI HIVE Hadoop & Big Data
Example
Let us assume you need to create a table named employee using CREATE TABLE
statement. The following table lists the fields and their data types in employee table:
Sr.No Field Name Data Type

1 Eid int
2 Name String

3 Salary Float
4 Designation string

The following data is a Comment, Row formatted fields such as

Field terminator, Lines terminator, and Stored File type.
COMMENT ŧEmployee detailsŨ FIELDS TERMINATED BY ŧ\tŨ LINES TERMINATED BY ŧ\nŨ STORED IN
TEXT FILE

The following query creates a table named employee using the above data.

hive> CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination
String)
COMMENT ŧEmployee detailsŨ
ROW FORMAT DELIMITED FIELDS TERMINATED BY ŧ\tŨ LINES TERMINATED BY ŧ\nŨ STORED AS
TEXTFILE;

If you add the option IF NOT EXISTS, Hive ignores the statement in case the table
already exists.

On successful creation of table, you get to see the following response:

OK
Time taken: 5.905 seconds
hive>

Load Data Statement

Generally, after creating a table in SQL, we can insert data using the Insert statement.
But in Hive, we can insert data using the LOAD DATA statement.

While inserting data into Hive, it is better to use LOAD DATA to store bulk records.
There are two ways to load data: one is from local file system and second is from Hadoop
file system.

IT Dept 1
0
UNIT-VI HIVE Hadoop & Big Data

Syntax
The syntax for load data is as follows:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

[PARTITION (partcol1=val1, partcol2=val2 ...)]

LOCAL is identifier to specify the local path. It is optional.

OVERWRITE is optional to overwrite the data in the table.
PARTITION is optional.
Example
We will insert the following data into the table. It is a text
file namedsample.txt in /home/user directory.

1201 Gopal 45000 Technical manager

1202 Manisha 45000 Proof reader
1203 Masthanvali 40000 Technical writer
1204 Kiran 40000 Hr Admin
1205 Kranthi 30000 Op Admin

The following query loads the given text into the table.
hive> LOAD DATA LOCAL INPATH
'/home/user/sample.txt' OVERWRITE INTO TABLE employee;

On successful download, you get to see the following response:

OK
Time taken: 15.905 seconds
hive>

Alter Table Statement

It is used to alter a table in Hive.

Syntax
The statement takes any of the following syntaxes based on what attributes we wish to
modify in a table.

ALTER TABLE name RENAME TO new_name

ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])

Rename To… Statement

The following query renames the table from employee to emp.

hive> ALTER TABLE employee RENAME TO emp;

IT Dept 1
1
UNIT-VI HIVE Hadoop & Big Data

Change Statement
The following table contains the fields of employee table and it shows the fields to be
changed (in bold).

Field Convert from Data Change Field Convert to Data

Name Type Name Type

eid int eid int

name String ename String

salary Float salary Double

designation String designation String

The following queries rename the column name and column data type using the above
data:

hive> ALTER TABLE employee CHANGE name ename String;

hive> ALTER TABLE employee CHANGE salary salary Double;

Add Columns Statement

The following query adds a column named dept to the employee table.

hive> ALTER TABLE employee ADD COLUMNS (

dept STRING COMMENT 'Department name');

Replace Statement
The following query deletes all the columns from the employee table and replaces it with
emp and name columns:
hive> ALTER TABLE employee REPLACE COLUMNS (
eid INT empid Int,
ename STRING name String);

Drop Table Statement

Hive Metastore, it removes the table/column data and their metadata. It can be a normal
table (stored in Metastore) or an external table (stored in local file system); Hive treats
both in the same manner, irrespective of their types.
The syntax is as follows:
DROP TABLE [IF EXISTS] table_name;

The following query drops a table named employee:

IT Dept 1
2
UNIT-VI HIVE Hadoop & Big Data
hive> DROP TABLE IF EXISTS employee;

see the following

On successful execution of the query, you get
to response:

IT Dept 1
3
UNIT-VI HIVE Hadoop & Big Data
OK
Time taken: 5.3 seconds
hive>

The following query is used to verify the list of tables:

hive> SHOW TABLES;

emp ok
Time taken: 2.1 seconds
hive>

Operators in HIVE:

There are four types of operators in Hive:

 Relational Operators

 Arithmetic Operators

 Logical Operators

 Complex Operators

Relational Operators: These operators are used to compare two operands. The following
table describes the relational operators available in Hive:

Operator Operand Description

A=B all primitive types TRUE if expression A is equivalent to

expression B otherwise FALSE.

A != B all primitive types TRUE if expression A is not equivalent to

expression B otherwise FALSE.

A<B all primitive types TRUE if expression A is less than expression B

otherwise FALSE.

A <= B all primitive types TRUE if expression A is less than or equal to

expression B otherwise FALSE.

A>B all primitive types TRUE if expression A is greater than

expression B otherwise FALSE.

A >= B all primitive types TRUE if expression A is greater than or equal to

expression B otherwise FALSE.

IT Dept 1
4
UNIT-VI HIVE Hadoop & Big Data
A IS NULL all types TRUE if expression A evaluates to NULL
otherwise FALSE.

A IS NOT NULL all types FALSE if expression A evaluates to NULL

otherwise TRUE.

A LIKE B Strings TRUE if string pattern A matches to B

otherwise FALSE.

A RLIKE B Strings NULL if A or B is NULL, TRUE if any

substring of A matches the Java regular
expression B , otherwise FALSE.
A REGEXP B Strings Same as RLIKE.

Example
Let us assume the employee table is composed of fields named Id, Name, Salary,
Designation, and Dept as shown below. Generate a query to retrieve the employee details
whose Id is 1205.

+-----+--------------+--------+---------------------------+------+
| Id | Name | Salary | Designation | Dept |
+-----+--------------+------------------------------------+------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op Admin | Admin|
+-----+--------------+--------+---------------------------+------+

The following query is executed to retrieve the employee details using the above table:

hive> SELECT * FROM employee WHERE Id=1205;

On successful execution of query, you get to see the following response:

+-----+-----------+-----------+----------------------------------+
| ID | Name | Salary | Designation | Dept |
+-----+---------------+-------+----------------------------------+
|1205 | Kranthi | 30000 | Op Admin | Admin |
+-----+-----------+-----------+----------------------------------+

The following query is executed to retrieve the employee details whose salary is
more than or equal to Rs 40000.

hive> SELECT * FROM employee WHERE Salary>=40000;

On successful execution of query, you get to see the following response:

IT Dept 1
5
UNIT-VI HIVE Hadoop & Big Data
+-----+------------+--------+----------------------------+------+
| ID | Name | Salary | Designation | Dept |
+-----+------------+--------+----------------------------+------+
|120 | Gopal | 45000 | Technical manager | TP |
|120 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali| 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
+-----+------------+--------+----------------------------+------+

Arithmetic Operators
These operators support various common arithmetic operations on the operands. All of
them return number types. The following table describes the arithmetic operators
available in Hive:

Operators Operand Description

A+B all number types Gives the result of adding A and B.

A-B all number types Gives the result of subtracting B from A.

A*B all number types Gives the result of multiplying A and B.

A/B all number types Gives the result of dividing B from A.

A%B all number types Gives the reminder resulting from dividing A by B.

A&B all number types Gives the result of bitwise AND of A and B.

A|B all number types Gives the result of bitwise OR of A and B.

A^B all number types Gives the result of bitwise XOR of A and B.

~A all number types Gives the result of bitwise NOT of A.

IT Dept 1
6
UNIT-VI HIVE Hadoop & Big Data

Example
The following query adds two numbers, 20 and 30.

hive> SELECT 20+30 ADD FROM temp;

On successful execution of the query, you get to see the

following response:
+--------+
| ADD |
+--------+
| 50 |
+--------+

Logical Operators
The operators are logical expressions. All of them return either TRUE
or
FALSE.

Operators Operands Description

A AND B boolean TRUE if both A and B are TRUE, otherwise FALSE.

A && B boolean Same as A AND B.

A OR B boolean TRUE if either A or B or both are TRUE, otherwise FALSE.

A || B boolean Same as A OR B.

NOT A boolean TRUE if A is FALSE, otherwise FALSE.

!A boolean Same as NOT A.

Example
The following query is used to retrieve employee details whose Department is TP and
Salary is more than Rs 40000.

IT Dept 1
7
UNIT-VI HIVE Hadoop & Big Data
hive> SELECT * FROM employee WHERE Salary>40000 && Dept=TP;

On successful execution of the query, you get to see the

following response:
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
+------+--------------+-------------+-------------------+--------+

Complex Operators
These operators provide an expression to access the elements of Complex Types.

Operator Operand Description

A[n] A is an Array and n is It returns the nth element in the array A. The first
an int element has index 0.

M[key] M is a Map<K, V> and It returns the value corresponding to the key in
key has type K the map.

S.x S is a struct It returns the x field of S.

HiveQL - Select-Where
The Hive Query Language (HiveQL) is a query language for Hive to process
and analyze structured data in a Metastore. This chapter explains how to use the
SELECT statement with WHERE clause.

SELECT statement is used to retrieve the data from a table. WHERE clause
works similar to a condition. It filters the data using the condition and gives you a
finite result. The built-in operators and functions generate an expression, which fulfils
the condition.

Syntax

IT Dept 1
8
UNIT-VI HIVE Hadoop & Big Data
Given below is the syntax of the SELECT query:

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]]
[LIMIT number];

Example
Let us take an example for SELECT…WHERE clause. Assume we have the employee table
as given below, with fields named Id, Name, Salary, Designation, and Dept. Generate
a query to retrieve the employee details who earn a salary of more than Rs 30000.

+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
| Gopal | 45000 | Technical manager | TP |
| Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op | Admin |
+------+--------------+-------------+-------------------+--------+

The following query retrieves the employee details using the above scenario:

hive> SELECT * FROM employee WHERE salary>30000;

On successful execution of the query, you get to see the following response:

+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
+------+--------------+-------------+-------------------+--------+

HiveQL - Select-Order By
This chapter explains how to use the ORDER BY clause in a SELECT statement.
The ORDER BY clause is used to retrieve the details based on one column and sort
the result set by ascending or descending order.

IT Dept 1
9
UNIT-VI HIVE Hadoop & Big Data

Syntax
Given below is the syntax of the ORDER BY clause:

IT Dept 2
0
UNIT-VI HIVE Hadoop & Big Data
SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table_reference
[WHERE where_condition] [GROUP BY col_list]

[HAVING having_condition]
[ORDER BY col_list]] [LIMIT
number];

Example
Let us take an example for SELECT...ORDER BY clause. Assume employee
table as given below, with the fields named Id, Name, Salary, Designation, and Dept.
Generate a query to retrieve the employee details in order by using Department name.

+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op Admin | Admin |
+------+--------------+-------------+-------------------+--------+

The following query retrieves the employee details using the above
scenario:

hive> SELECT Id, Name, Dept FROM employee ORDER BY DEPT;

HiveQL - Select-Group By
This chapter explains the details of GROUP BY clause in a SELECT statement.
The GROUP BY clause is used to group all the records in a result set using a
particular collection column. It is used to query a group of records.
The syntax of GROUP BY clause is as follows:

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]

[ORDER BY col_list]]

[LIMIT number];

HiveQL - Select-Joins
JOIN is a clause that is used for combining specific fields from two tables by using values
common to each one. It is used to combine records from two or more tables in the
database. It is more or less similar to SQL JOIN.
IT Dept Page 21
UNIT-VI HIVE Hadoop & Big Data

Syntax

join_table:

table_reference JOIN table_factor [join_condition]

| table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference
join_condition
| table_reference LEFT SEMI JOIN table_reference join_condition
| table_reference CROSS JOIN table_reference [join_condition]

Example
We will use the following two tables in this chapter. Consider the following table
named CUSTOMERS..

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

Consider another table ORDERS as follows:

+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |

| 101 | 2009-11-20 00:00:00 | 2 | 1560 |

| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+

There are different types of joins given as follows:

JOIN LEFT OUTER JOIN RIGHT OUTER JOIN

FULL OUTER JOIN

JOIN
JOIN clause is used to combine and retrieve the records from multiple tables. JOIN is same
as OUTER JOIN in SQL. A JOIN condition is to be raised using the primary keys and foreign
keys of the tables.

IT Dept Page 22
UNIT-VI HIVE Hadoop & Big Data
The following query executes JOIN on the CUSTOMER and ORDER
tables, and retrieves the records:

hive> SELECT c.ID, c.NAME, c.AGE,

o.AMOUNT FROM CUSTOMERS c JOIN
ORDERS o
ON (c.ID = o.CUSTOMER_ID);

On successful execution of the query, you get to see the

following response:
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 |
3000 |
| 3 | kaushik | 23 |
1500 |
| 2 | Khilan | 25 |
1560 |
| 4 | Chaitali | 25 | 2060

LEFT OUTER JOIN

The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there
are no matches in the right table. This means, if the ON clause matches 0 (zero) records
in the right table, the JOIN still returns a row in the result, but with NULL in each
column from the right table. A LEFT JOIN returns all the values from the left table,
plus the matched values from the right table, or NULL in case of no matching JOIN
predicate.

The following query demonstrates LEFT OUTER JOIN between CUSTOMER and
ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT,

o.DATE FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

On successful execution of the query, you get to see the

following response:
+----+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE
|
+----+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |

5 | Hardik | NULL | NULL

IT Dept Page 23
UNIT-VI HIVE Hadoop & Big Data

RIGHT OUTER JOIN

The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even if
there are no matches in the left table. If the ON clause matches 0 (zero) records in the left
table, the JOIN still returns a row in the result, but with NULL in each column from the left
table.

A RIGHT JOIN returns all the values from the right table, plus the matched values from
the left table, or NULL in case of no matching join predicate.

The following query demonstrates RIGHT OUTER JOIN between

the CUSTOMER and ORDER t ables.

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT

OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);
On successful execution of the query, you get to see the following response:

+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 |
| 3 | kaushik | 1500 | 2009-10-08 |
| 2 | Khilan | 1560 | 2009-11-20 |
| 4 | Chaitali | 2060 | 2008-05-20 |
+------+----------+--------+---------------------+

FULL OUTER JOIN

The HiveQL FULL OUTER JOIN combines the records of both the left and the right outer
tables that fulfil the JOIN condition. The joined table contains either all the records from
both the tables, or fills in NULL values for missing matches on either side.

The following query demonstrates FULL OUTER JOIN between CUSTOMER

and ORDER tables:

hive> SELECT c.ID, c.NAME, o.AMOUNT,

o.DATE FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

On successful execution of the query, you get to see the

following response:

IT Dept Page 24
UNIT-VI HIVE Hadoop & Big Data
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |

IMP Questions

1. Explain Hive architecture in detail with neat diagram?

2. How the WorkFlow will happen Between Hive and Hadoop?
3. What is the difference between Hive and ApachePig?
4. Write the Hive Script for the following?
create database
select statement for
each statement order
by statement
alter table statement
Drop table statement

IT Dept Page 1

Big - Data Lab Manual
No ratings yet
Big - Data Lab Manual
65 pages
Cassandra PPT Final
No ratings yet
Cassandra PPT Final
23 pages
Unit 3-BDA
50% (2)
Unit 3-BDA
26 pages
Final Practical Exam Questions
100% (1)
Final Practical Exam Questions
6 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Unit Iv Mapreduce Applications
No ratings yet
Unit Iv Mapreduce Applications
70 pages
UNIT-3 Hadoop and MapReduce Programming
100% (1)
UNIT-3 Hadoop and MapReduce Programming
84 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Anne Sexton
No ratings yet
Anne Sexton
10 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
Chapter 10
No ratings yet
Chapter 10
50 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
Hive File Formats Presentation
No ratings yet
Hive File Formats Presentation
19 pages
Bda Unit 5 Notes
No ratings yet
Bda Unit 5 Notes
23 pages
4 UNIT-4 Introduction To Hadoop
No ratings yet
4 UNIT-4 Introduction To Hadoop
154 pages
BDA Model Question Paper
No ratings yet
BDA Model Question Paper
2 pages
Chapter 6
100% (1)
Chapter 6
51 pages
Hive PPT
No ratings yet
Hive PPT
25 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Hbase Lab Manual3.0-Update
No ratings yet
Hbase Lab Manual3.0-Update
8 pages
Big Data Analytics: Seema Acharya Subhashini Chellappan
100% (1)
Big Data Analytics: Seema Acharya Subhashini Chellappan
47 pages
HKBK College of Engineering Department of Ise: Big Data Analytics (18Cs72) Seminar On The Topic Key-Value Pairs
100% (1)
HKBK College of Engineering Department of Ise: Big Data Analytics (18Cs72) Seminar On The Topic Key-Value Pairs
15 pages
NOSQL Module-3
100% (2)
NOSQL Module-3
67 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
38 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Chapter 7
No ratings yet
Chapter 7
48 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
C & Ds Notes 2022-2023 r22 Syllabus
100% (1)
C & Ds Notes 2022-2023 r22 Syllabus
210 pages
18CS72-BDA Question Bank of First Internal Syllabus
No ratings yet
18CS72-BDA Question Bank of First Internal Syllabus
1 page
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Syllabus BCS714D-Big Data Analytics
50% (2)
Syllabus BCS714D-Big Data Analytics
3 pages
CS8091-BIG DATA ANALYTICS UNIT V Notes
100% (4)
CS8091-BIG DATA ANALYTICS UNIT V Notes
31 pages
Collection Framework in JAVA
No ratings yet
Collection Framework in JAVA
10 pages
BDA Experiment 14 PDF
No ratings yet
BDA Experiment 14 PDF
77 pages
Map Reduce Applications
No ratings yet
Map Reduce Applications
94 pages
Dbms-Unit-3 - Aktu
100% (1)
Dbms-Unit-3 - Aktu
7 pages
Dbms Lab Manual II Cse II Sem
No ratings yet
Dbms Lab Manual II Cse II Sem
58 pages
Hadoop Building Blocks
No ratings yet
Hadoop Building Blocks
30 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
All Topics MCQ S Mixed
No ratings yet
All Topics MCQ S Mixed
121 pages
Counting Ones in A Window: The Cost of Exact Counts
100% (1)
Counting Ones in A Window: The Cost of Exact Counts
13 pages
07 Hive 01 Exercises
0% (1)
07 Hive 01 Exercises
4 pages
DBMS Question Bank-2021
100% (1)
DBMS Question Bank-2021
14 pages
DBMS - GTU Paper Solution
No ratings yet
DBMS - GTU Paper Solution
16 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Big Data and Business Analytics: Lab Manual
100% (1)
Big Data and Business Analytics: Lab Manual
45 pages
Big Data Shivani
No ratings yet
Big Data Shivani
78 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
11 pages
III-II Big Data Analytics Question Bank
100% (1)
III-II Big Data Analytics Question Bank
3 pages
Unit-6: Data Visualization and Hadoop
No ratings yet
Unit-6: Data Visualization and Hadoop
96 pages
MCQ Type Questions
No ratings yet
MCQ Type Questions
24 pages
Assignment DBMS
No ratings yet
Assignment DBMS
8 pages
Mapping The Data Warehouse Architecture To Multiprocessor Architecture
No ratings yet
Mapping The Data Warehouse Architecture To Multiprocessor Architecture
15 pages
DSBDa MCQ
No ratings yet
DSBDa MCQ
17 pages
VTU Question Paper of 18CS72 Big Data Analytics Feb-2022
100% (1)
VTU Question Paper of 18CS72 Big Data Analytics Feb-2022
2 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
No ratings yet
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
9 pages
Document
No ratings yet
Document
2 pages
Unit 2
No ratings yet
Unit 2
22 pages
Uploads - Question - Bank - Btech - 1sem - HR Ques PDF
No ratings yet
Uploads - Question - Bank - Btech - 1sem - HR Ques PDF
5 pages
PERMUTATIONS - Ques PDF
No ratings yet
PERMUTATIONS - Ques PDF
1 page
Dbmss
No ratings yet
Dbmss
14 pages
Programs For Placement Set 2
No ratings yet
Programs For Placement Set 2
1 page
DATA INTERPRETATION - Odt
No ratings yet
DATA INTERPRETATION - Odt
6 pages
Clocks (New) PDF
No ratings yet
Clocks (New) PDF
5 pages
Clocks (New) PDF
No ratings yet
Clocks (New) PDF
5 pages
Permutations and Combinations 1 (New)
No ratings yet
Permutations and Combinations 1 (New)
12 pages
Technical Aptitude Questions
No ratings yet
Technical Aptitude Questions
151 pages
Coding and Decoding
100% (1)
Coding and Decoding
7 pages
Ages
No ratings yet
Ages
2 pages
Syllabus Wise List of Text Books and Reference Books XI
No ratings yet
Syllabus Wise List of Text Books and Reference Books XI
1 page
A Vaisnava Poet in Early Modern Bengal Kavikarnapuras Splendour of Speech Oxford Theology and Religion Monographs 9780198827108 0198827105 - Compress
No ratings yet
A Vaisnava Poet in Early Modern Bengal Kavikarnapuras Splendour of Speech Oxford Theology and Religion Monographs 9780198827108 0198827105 - Compress
371 pages
How To Use Songs in The English Language Classroom
No ratings yet
How To Use Songs in The English Language Classroom
5 pages
CPP
No ratings yet
CPP
92 pages
Sylvester Positive Definite
No ratings yet
Sylvester Positive Definite
2 pages
How To Study The Scriptures - Isaac Ambrose
No ratings yet
How To Study The Scriptures - Isaac Ambrose
77 pages
Argparse Tutorial: Guido Van Rossum and The Python Development Team
No ratings yet
Argparse Tutorial: Guido Van Rossum and The Python Development Team
13 pages
Reported Speech
No ratings yet
Reported Speech
21 pages
Aristotle's Critique of Platonic Forms
No ratings yet
Aristotle's Critique of Platonic Forms
4 pages
Effectiveness of Language Games in Teaching Grammar To Undergraduate-A Case Study)
No ratings yet
Effectiveness of Language Games in Teaching Grammar To Undergraduate-A Case Study)
14 pages
(Tom Taylor) Fred & Shaggy in Game Night (Eng) - MyReadingManga
No ratings yet
(Tom Taylor) Fred & Shaggy in Game Night (Eng) - MyReadingManga
21 pages
S3503 PDF
No ratings yet
S3503 PDF
10 pages
Annex C 3 COT Rating Sheet For Proficient Teacher For SY 2024 2025
No ratings yet
Annex C 3 COT Rating Sheet For Proficient Teacher For SY 2024 2025
1 page
Traffic Signal System Updated Final
No ratings yet
Traffic Signal System Updated Final
9 pages
Stretching The Advanced Learners' English
No ratings yet
Stretching The Advanced Learners' English
2 pages
M.E. App Ele R21 Syllabus
No ratings yet
M.E. App Ele R21 Syllabus
62 pages
Unemployment Insurance Benefits Referral Form
No ratings yet
Unemployment Insurance Benefits Referral Form
1 page
(2001) The Anatomy of The Grid Enabling Scalable Virtual Organizations
No ratings yet
(2001) The Anatomy of The Grid Enabling Scalable Virtual Organizations
25 pages
Typical Speech Sound Development: Keywords: Speech, Articulation, Phonology, Speech Sounds
100% (1)
Typical Speech Sound Development: Keywords: Speech, Articulation, Phonology, Speech Sounds
4 pages
English Lexicology
100% (2)
English Lexicology
61 pages
Genres of Viewing
100% (1)
Genres of Viewing
6 pages
Writers, by Martha Horn and Mary Ellen Giacobbe Is An Informational Book For Aspiring
No ratings yet
Writers, by Martha Horn and Mary Ellen Giacobbe Is An Informational Book For Aspiring
6 pages
8 Techniques For Active Listening - EILM - EDU.EU
No ratings yet
8 Techniques For Active Listening - EILM - EDU.EU
4 pages
Grade 7 (Unit 1 + 2)
No ratings yet
Grade 7 (Unit 1 + 2)
18 pages
Eng 2
No ratings yet
Eng 2
4 pages
Kelas 9 Sem. Ganjil English 1 Congratulations
No ratings yet
Kelas 9 Sem. Ganjil English 1 Congratulations
4 pages
Revised - Comparative Analysis Essay Rubric
100% (1)
Revised - Comparative Analysis Essay Rubric
2 pages
Teo en Ming's Pfsense 2.4.4-p1 Firewall Installation Manual Version 1.0
No ratings yet
Teo en Ming's Pfsense 2.4.4-p1 Firewall Installation Manual Version 1.0
101 pages
Access Control
No ratings yet
Access Control
23 pages