0% found this document useful (0 votes)

33 views32 pages

Module 3-1

Uploaded by

madhavan090603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views32 pages

Module 3-1

Uploaded by

madhavan090603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

SRI KRISHNA COLLEGE OF ENGINEERING AND TECHNOLOGY

Kuniamuthur, Coimbatore, Tamilnadu, India

An Autonomous Institution, Affiliated to Anna University,
Accredited by NAAC with “A” Grade & Accredited by NBA (CSE, ECE, IT, MECH ,EEE, CIVIL& MCT)

Topics : Introduction to Hive

What is Hive?

Hive is a data warehouse infrastructure tool to

process structure data in Hadoop. It resides on top
of Hadoop to summarize Big Data, and makes
querying and analyzing easy.
Initially Hive was developed by Facebook, later the
Apache Software Foundation took it up and
developed it further as an open source under the
name Apache Hive.
Hive
• An sql like interface to Hadoop.
• Data warehouse infrastructure built on top of Hadoop
• Provide data summarization, query and analysis
• Query execution via MapReduce
• Hive interpreter convert the query to Map reduce format.
• Open source project.
• Developed by Facebook
• Also used by Netflix, Cnet, Digg, eHarmony etc.
Hive
► HiveQL example:

SELECT customerId, max(total_cost) from hive_purchases GROUP BY

customerId HAVING count(*) > 3;
Hive Key Principles
Features of Hive

It stores Schema in a database and processed

data into HDFS(Hadoop Distributed File
System).
It is designed for OLAP.

It provides SQL type language for querying

called HiveQL or HQL.

It is familiar, fast, scalable, and extensible.

Architecture of Hive
Architecture of Hive

User Interface - Hive is a data warehouse infrastructure software that can

create interaction between user and HDFS. The user interfaces that Hive
supports are Hive Web UI, Hive command line, and Hive HD.
Meta Store -Hive chooses respective database servers to store the schema
or Metadata of tables, databases, columns in a table, their data types and
HDFS mapping.
HiveQL Process Engine- HiveQL is similar to SQL for querying on schema
info on the Megastore. It is one of the replacements of traditional approach
for MapReduce program. Instead of writing MapReduce program in Java, we
can write a query for MapReduce job and process it.
Execution Engine - The conjunction part of
HiveQL process Engine and MapReduce is
Hive Execution Engine. Execution engine
processes the query and generates results
as same as MapReduce results. It uses
the flavor of MapReduce.
HDFS or HBASE - Hadoop distributed file
system or HBASE are the data storage
techniques to store data into the file
system.
Working of Hive
Working of Hive

Execute Query- The Hive interface such as Command Line

or Web UI sends query Driver to execute.
Get Plan- The driver takes the help of query complier that
parses the query to check the syntax and query plan or the
requirement of query.
Get Metadata- The compiler sends metadata request to
Megastore
Send Metadata- Metastore sends metadata as a response to
the compiler.
Send Plan- The compiler checks the requirement
and resends the plan to the driver. Up to here, the
parsing and compiling of a query is complete.
Execute Plan- the driver sends the execute plan to
the execution engine.
Execute Job- Internally, the process of execution job
is a MapReduce job. The execution engine sends
the job to JobTracker, which is in Name node and it
assigns this job to TaskTracker, which is in Data
node. Here, the query executes MapReduce job.
Metadata Ops- Meanwhile in execution, the
execution engine can execute metadata
operations with Metastore.
Fetch Result- The execution engine receives
the results from Data nodes.
Send Results- The execution engine sends
those resultant values to the driver.
Send Results- The driver sends the results to
Hive Interfaces.
Hive- Data Types
• All the data types in hive are classified into four types
• Column Types Literals
• Null Values Complex
Types
Column Types

Integral Types - Integer type data can be specified using

integral data types, INT. When the data range exceeds
the range of INT, you need to use BIGINT and if the
data range is smaller than the INT, you use SMALLINT.
TINYINT is smaller than SMALLINT.

String Types - String type data types can be specified

using single quotes (' ') or double quotes (" "). It contains
two data types: VARCHAR and CHAR. Hive follows
C-types escape characters.
Timestamp - It supports traditional UNIX timestamp with
optional nanosecond precision. It supports
java.sql.Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff”
and format “yyyy-mm- dd hh:mm:ss.ffffffffff”.
Dates - DATE values are described in year/month/day format
in the form {{YYYY-MM-DD}}.
Decimals -The DECIMAL type in Hive is as same as Big
Decimal format of Java. It is used for representing
immutable arbitrary precision.
Union Types - Union is a collection of heterogeneous data
types. You can create an instance using create union.
Literals
•Floating Point Types - Floating point types are nothing
but numbers with decimal points. Generally, this type of
data is composed of DOUBLE data type.

•Decimal Type - Decimal type data is nothing but floating

point value with higher range than DOUBLE data type. The
range of decimal type is approximately -10-308 to 10 .
308
Complex Types

Arrays - Arrays in Hive are used the same way they are used in
Java.
Syntax: ARRAY<data_type>
Maps - Maps in Hive are similar to Java Maps.
Syntax: MAP<primitive_type, data_type>
Structs - Structs in Hive is similar to using complex data with
comment.
Syntax: STRUCT<col_name : data_type [ COMMENT
col_comment, … ]>
Create Database

hive> CREATE DATABASE [IF

NOT EXISTS] userdb;
hive> CREATE SCHEMA userdb;
hive> SHOW DATABASES;
Drop Database

hive>DROP DATABASE [IF

EXISTS] userdb;
hive> DROP DATABASE [IF
EXISTS] userdb CASCADE;
hive> DROP SCHEMA userdb;
Create Table
hive> CREATE TABLE IF NOT EXISTS
employee(eid int, name String, salary String,
destination String)

>COMMENT ‘Employee details’

>ROW FORMAT DELIMITED

>FIELDS TERMINATED BY ‘\t’

>LINES TERMINATED BY ‘\n’

>STORED AS TEXTFILE;
Partition
•Hive organizes tables into partitions. It is a way of dividing a table
into related parts based on the values of partitioned columns such
as date, city, and department. Using partition, it is easy to query a
portion of the data.

•Adding partition- Syntax - hive> ALTER TABLE employee ADD

PARTITION(year =‘2013’) location ‘/2012/part2012’;

•Dropping partition - Syntax - hive>ALTER TABLE employee

DROP [IF EXISTS] PARTITION (year=‘2013’);
id, name, dept, year
1, Mark, TP, 2012
2, Bob, HR, 2012
3, Sam,SC, 2013
4, Adam, SC, 2013
HiveQL - Select Where
• The Hive Query Language (HiveQL) is a query
language for Hive to process and analyze
structured data in a Metastore.

• hive> SELECT * FROM employee WHERE

salary>30000;
HiveQL - Select Order By

• The ORDER BY clause is used to retrieve the

details based on one column and sort the result set
by ascending or descending order.

• hive> SELECT Id, Name, Dept FROM employee

ORDER BY DEPT;
HiveQL - Select-Group By

• The GROUP BY clause is used to group all the

records in a result set using a particular collection
column. It is used to query a group of records.

• hive> SELECT Dept,count(*) FROM employee

GROUP BY DEPT;
HiveQL - Select-Joins
•JOIN is a clause that is used for combining specific fields from two tables by using
values common to each one. It is used to combine records from two or more tables
in the database. It is more or less similar to SQL JOIN.
•There are different types of joins given as follows: JOIN
•LEFT OUTER JOIN RIGHT OUTER
•

JOIN FULL OUTER JOIN

•

•
JOIN
JOIN clause is used to combine and retrieve the
records from multiple tables. JOIN is same as
OUTER JOIN in SQL. A JOIN condition is to be
raised using the primary keys and foreign keys of
the tables.
hive> SELECT c.ID, c.NAME, c.AGE,
o.AMOUNT FROM CUSTOMERS c JOIN
ORDERS o ON (c.ID = o.CUSTOMER_ID);
Left Outer Join

The HiveQL LEFT OUTER JOIN returns all the rows from
the left table, even if there are no matches in the right
table. This means, if the ON clause matches 0 (zero)
records in the right table, the JOIN still returns a row in the
result, but with NULL in each column from the right table.

hive> SELECT c.ID, c.NAME, o.AMOUNT,

o.DATE FROM CUSTOMERS c LEFT OUTER
JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);
Right Outer Join
•The HiveQL RIGHT OUTER JOIN returns all the
rows from the right table, even if there are no
matches in the left table. If the ON clause matches
0 (zero) records in the left table, the JOIN still
returns a row in the result, but with NULL in each
column from the left table.

•hive> SELECT c.ID, c.NAME, o.AMOUNT,

o.DATE FROM CUSTOMERS c RIGHT OUTER
Full Outer Join
The HiveQL FULL OUTER JOIN combines the
records of both the left and the right outer tables that
fulfill the JOIN condition. The joined table contains
either all the records from both the tables, or fills in
NULL values for missing matches on either side.
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c FULL OUTER JOIN
ORDERS o ON (c.ID = o.CUSTOMER_ID);
Thank You

Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Class X IT Practical File
0% (1)
Class X IT Practical File
8 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Bda-Unit-Iv - 2020-21
100% (1)
Bda-Unit-Iv - 2020-21
30 pages
Hive and Pig
No ratings yet
Hive and Pig
57 pages
Regina Navarro Lins - A Cama Na Varanda (PDF) (Rev) : (HTTPS://WWW - Academia.edu/)
0% (1)
Regina Navarro Lins - A Cama Na Varanda (PDF) (Rev) : (HTTPS://WWW - Academia.edu/)
292 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Coaching & Facilitation Agile Teams
100% (1)
Coaching & Facilitation Agile Teams
28 pages
Normas Iso Publicadas - Marco e Abril
No ratings yet
Normas Iso Publicadas - Marco e Abril
71 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
Unit IV
No ratings yet
Unit IV
64 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Hive Final
No ratings yet
Hive Final
75 pages
Hive
No ratings yet
Hive
47 pages
Bda Unit-3
No ratings yet
Bda Unit-3
59 pages
Hive
No ratings yet
Hive
42 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
HIVE
No ratings yet
HIVE
80 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Hive
No ratings yet
Hive
49 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Hive Main
No ratings yet
Hive Main
33 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
HIVE
No ratings yet
HIVE
28 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
7 Hive
No ratings yet
7 Hive
30 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive
No ratings yet
Hive
9 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Hive
No ratings yet
Hive
29 pages
Unit IV
No ratings yet
Unit IV
22 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Ilovepdf - Merged (1) - Removed
No ratings yet
Ilovepdf - Merged (1) - Removed
20 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Hive
No ratings yet
Hive
65 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Hive
No ratings yet
Hive
30 pages
Unit 3
No ratings yet
Unit 3
23 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Hive
No ratings yet
Hive
23 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Unit-5 Sgs
No ratings yet
Unit-5 Sgs
10 pages
Introduction To Hive
No ratings yet
Introduction To Hive
14 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Hive
No ratings yet
Hive
13 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Hiveppt
No ratings yet
Hiveppt
29 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Big Data Analytics and Developers Training Session 10
No ratings yet
Big Data Analytics and Developers Training Session 10
27 pages
HCCDA - Tech Essentials Exam Outline
No ratings yet
HCCDA - Tech Essentials Exam Outline
4 pages
Uniquely Decodable Codes
No ratings yet
Uniquely Decodable Codes
10 pages
Hive Presentation
No ratings yet
Hive Presentation
18 pages
Unit 3
No ratings yet
Unit 3
8 pages
Analyzing Kernel Crash On Red Hat
No ratings yet
Analyzing Kernel Crash On Red Hat
9 pages
Control M Interview Questions
No ratings yet
Control M Interview Questions
1 page
Introduction To Web Services Development (CS311) - Updated Handouts
No ratings yet
Introduction To Web Services Development (CS311) - Updated Handouts
96 pages
AirSpeed 5000 RC EN
No ratings yet
AirSpeed 5000 RC EN
92 pages
CSS3 PDF
100% (1)
CSS3 PDF
5 pages
RARE WEDGE 100BF 32X Manual Installation Guide
No ratings yet
RARE WEDGE 100BF 32X Manual Installation Guide
32 pages
Module 2
No ratings yet
Module 2
20 pages
CHAPTER 3 LESSON 1 Designing A Simple Query
No ratings yet
CHAPTER 3 LESSON 1 Designing A Simple Query
8 pages
Blood Bank Management System: Prathamesh Raut, Prachi Parab, Yogesh Suthar, Sumeet Narwani, Sanjay Pandey
No ratings yet
Blood Bank Management System: Prathamesh Raut, Prachi Parab, Yogesh Suthar, Sumeet Narwani, Sanjay Pandey
5 pages
Intelligent Agents: An Introduction To Multiagent Systems
No ratings yet
Intelligent Agents: An Introduction To Multiagent Systems
60 pages
Power BI Sec - 1 - Session-1
No ratings yet
Power BI Sec - 1 - Session-1
17 pages
Basic Software Library Volume 6 - A Complete Business System
No ratings yet
Basic Software Library Volume 6 - A Complete Business System
186 pages
OOP Unit-I Java Introduction - 3
No ratings yet
OOP Unit-I Java Introduction - 3
29 pages
Orcad Component Information System: User's Guide
No ratings yet
Orcad Component Information System: User's Guide
142 pages
Basic Computer Class: Lesson 4 Using Email
No ratings yet
Basic Computer Class: Lesson 4 Using Email
20 pages
6th Month Capstone Project
No ratings yet
6th Month Capstone Project
4 pages
Embedded Systems in Washing Machines
No ratings yet
Embedded Systems in Washing Machines
8 pages
Syed Amir Ali - SAP - 22-SEP-2019 PDF
No ratings yet
Syed Amir Ali - SAP - 22-SEP-2019 PDF
8 pages
4G - Fingerprint VDM Installation Guide v1.0
No ratings yet
4G - Fingerprint VDM Installation Guide v1.0
16 pages
Queuing Models Lecture Presentation
No ratings yet
Queuing Models Lecture Presentation
59 pages
Software Design and Architecture: Week 3 A Case Study: Designing A Document Editor - Lexi
No ratings yet
Software Design and Architecture: Week 3 A Case Study: Designing A Document Editor - Lexi
42 pages
Seq Cheat Sheet
No ratings yet
Seq Cheat Sheet
5 pages
JAVA Lab Examples
No ratings yet
JAVA Lab Examples
3 pages
Phy Niroopa
No ratings yet
Phy Niroopa
3 pages
Module 3-1
No ratings yet
Module 3-1
2 pages
File Tree
No ratings yet
File Tree
3 pages
Correction Form
No ratings yet
Correction Form
2 pages

Module 3-1

Uploaded by

Module 3-1

Uploaded by

SRI KRISHNA COLLEGE OF ENGINEERING AND TECHNOLOGY

Kuniamuthur, Coimbatore, Tamilnadu, India

Topics : Introduction to Hive

Hive is a data warehouse infrastructure tool to

SELECT customerId, max(total_cost) from hive_purchases GROUP BY

It stores Schema in a database and processed

It provides SQL type language for querying

It is familiar, fast, scalable, and extensible.

User Interface - Hive is a data warehouse infrastructure software that can

Execute Query- The Hive interface such as Command Line

Integral Types - Integer type data can be specified using

String Types - String type data types can be specified

•Decimal Type - Decimal type data is nothing but floating

hive> CREATE DATABASE [IF

hive>DROP DATABASE [IF

>COMMENT ‘Employee details’

>ROW FORMAT DELIMITED

>LINES TERMINATED BY ‘\n’

•Adding partition- Syntax - hive> ALTER TABLE employee ADD

•Dropping partition - Syntax - hive>ALTER TABLE employee

• hive> SELECT * FROM employee WHERE

• The ORDER BY clause is used to retrieve the

• hive> SELECT Id, Name, Dept FROM employee

• The GROUP BY clause is used to group all the

• hive> SELECT Dept,count(*) FROM employee

JOIN FULL OUTER JOIN

hive> SELECT c.ID, c.NAME, o.AMOUNT,

•hive> SELECT c.ID, c.NAME, o.AMOUNT,

You might also like