0% found this document useful (0 votes)

50 views29 pages

Hiveppt

Hive is a data warehouse infrastructure tool that allows users to query and analyze large datasets stored in Hadoop. It uses a SQL-like language called HiveQL to process structured data stored in HDFS. Hive stores metadata about the schema in a metastore database and uses MapReduce to process queries and return results. The architecture includes interfaces for users to query data, a metastore to store schema information, a compiler to parse queries into execution plans, and an execution engine that uses MapReduce to run queries and return results to the user.

Uploaded by

kavitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views29 pages

Hiveppt

Uploaded by

kavitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

HIVE

Abhinav Tyagi
What is Hive?

Hive is a data warehouse infrastructure tool to

process structure data in Hadoop. It resides on top of
Hadoop to summarize Big Data, and makes querying
and analyzing easy.
Initially Hive was developed by Facebook, later the
Apache Software Foundation took it up and
developed it further as an open source under the name
Apache Hive.
Features of Hive

It stores Schema in a database and processed data

into HDFS(Hadoop Distributed File System).
It is designed for OLAP.
It provides SQL type language for querying called
HiveQL or HQL.
It is familiar, fast, scalable, and extensible.
Architecture of Hive
Architecture of Hive

User Interface - Hive is a data warehouse infrastructure software that can

create interaction between user and HDFS. The user interfaces that Hive
supports are Hive Web UI, Hive command line, and Hive HD.
Meta Store -Hive chooses respective database servers to store the schema
or Metadata of tables, databases, columns in a table, their data types and
HDFS mapping.
HiveQL Process Engine- HiveQL is similar to SQL for querying on
schema info on the Megastore. It is one of the replacements of traditional
approach for MapReduce program. Instead of writing MapReduce program in
Java, we can write a query for MapReduce job and process it.
Execution Engine - The conjunction part of
HiveQL process Engine and MapReduce is
Hive Execution Engine. Execution engine
processes the query and generates results as
same as MapReduce results. It uses the flavor
of MapReduce.
HDFS or HBASE - Hadoop distributed
file system or HBASE are the data storage
techniques to store data into the file system.
Working of Hive
Working of Hive

Execute Query- The Hive interface such as Command Line or

Web UI sends query Driver to execute.
Get Plan- The driver takes the help of query complier that parses
the query to check the syntax and query plan or the requirement of
query.
Get Metadata- The compiler sends metadata request to Megastore
Send Metadata- Metastore sends metadata as a response to the
compiler.
Send Plan- The compiler checks the requirement and
resends the plan to the driver. Up to here, the parsing
and compiling of a query is complete.
Execute Plan- the driver sends the execute plan to the
execution engine.
Execute Job- Internally, the process of execution job is
a MapReduce job. The execution engine sends the job
to JobTracker, which is in Name node and it assigns
this job to TaskTracker, which is in Data node. Here,
the query executes MapReduce job.
Metadata Ops- Meanwhile in execution, the
execution engine can execute metadata operations
with Metastore.
Fetch Result- The execution engine receives the
results from Data nodes.
Send Results- The execution engine sends those
resultant values to the driver.
Send Results- The driver sends the results to Hive
Interfaces.
Hive- Data Types
All the data types in hive are classified into four
types
Column Types
Literals
Null Values
Complex Types
Column Types

Integral Types - Integer type data can be specified using

integral data types, INT. When the data range exceeds the
range of INT, you need to use BIGINT and if the data
range is smaller than the INT, you use SMALLINT.
TINYINT is smaller than SMALLINT.

String Types - String type data types can be specified using

single quotes (' ') or double quotes (" "). It contains two data
types: VARCHAR and CHAR. Hive follows C-types
escape characters.
Timestamp - It supports traditional UNIX timestamp with optional
nanosecond precision. It supports java.sql.Timestamp format
“YYYY-MM-DD HH:MM:SS.fffffffff” and format “yyyy-mm-
dd hh:mm:ss.ffffffffff”.
Dates - DATE values are described in year/month/day format
in the form {{YYYY-MM-DD}}.
Decimals -The DECIMAL type in Hive is as same as Big
Decimal format of Java. It is used for representing immutable
arbitrary precision.
Union Types - Union is a collection of heterogeneous data types.
You can create an instance using create union.
Literals

Floating Point Types - Floating point types are

nothing but numbers with decimal points. Generally,
this type of data is composed of DOUBLE data
type.
Decimal Type - Decimal type data is nothing but
floating point value with higher range than
DOUBLE data type. The range of decimal type is
approximately -10-308 to 10 .
308
Complex Types

Arrays - Arrays in Hive are used the same way they are used in Java.
Syntax: ARRAY<data_type>

Maps - Maps in Hive are similar to Java Maps.

Syntax: MAP<primitive_type, data_type>

Structs - Structs in Hive is similar to using complex data with comment.

Syntax: STRUCT<col_name : data_type [ COMMENT
col_comment, … ]>
Create Database

hive> CREATE DATABASE [IF

NOT EXISTS] userdb;
hive> CREATE SCHEMA userdb;
hive> SHOW DATABASES;
Drop Database

hive>DROP DATABASE [IF

EXISTS] userdb;
hive> DROP DATABASE [IF
EXISTS] userdb CASCADE;
hive> DROP SCHEMA userdb;
Create Table
hive> CREATE TABLE IF NOT EXISTS
employee(eid int, name String, salary String, destination
String)
>COMMENT ‘Employee details’

>ROW FORMAT DELIMITED

>FIELDS TERMINATED BY ‘\t’

>LINES TERMINATED BY ‘\n’

>STORED AS TEXTFILE;
Partition
Hive organizes tables into partitions. It is a way of dividing a
table into related parts based on the values of partitioned
columns such as date, city, and department. Using partition, it
is easy to query a portion of the data.
Adding partition- Syntax - hive> ALTER TABLE
employee ADD PARTITION(year =‘2013’) location
‘/2012/part2012’;

Dropping partition - Syntax - hive>ALTER TABLE

employee DROP [IF EXISTS] PARTITION
(year=‘2013’);
id, name, dept, year
1, Mark, TP, 2012
2, Bob, HR, 2012
3, Sam,SC, 2013
4, Adam, SC, 2013
HiveQL - Select Where

The Hive Query Language (HiveQL) is a

query language for Hive to process and analyze
structured data in a Metastore.
hive> SELECT * FROM employee
WHERE salary>30000;
HiveQL - Select Order
By

The ORDER BY clause is used to retrieve

the details based on one column and sort the
result set by ascending or descending order.
hive> SELECT Id, Name, Dept FROM
employee ORDER BY DEPT;
HiveQL - Select-Group
By

The GROUP BY clause is used to group all

the records in a result set using a particular
collection column. It is used to query a group of
records.
hive> SELECT Dept,count(*) FROM
employee GROUP BY DEPT;
HiveQL - Select-Joins
JOIN is a clause that is used for combining specific fields from two tables by
using values common to each one. It is used to combine records from two or
more tables in the database. It is more or less similar to SQL JOIN.
There are different types of joins given as follows:
•
JOIN
•
LEFT OUTER JOIN
•
RIGHT OUTER JOIN
•
FULL OUTER JOIN
JOIN
JOIN clause is used to combine and retrieve the
records from multiple tables. JOIN is same as
OUTER JOIN in SQL. A JOIN condition is to
be raised using the primary keys and foreign keys of
the tables.
hive> SELECT c.ID, c.NAME, c.AGE,
o.AMOUNT FROM CUSTOMERS c
JOIN ORDERS o ON (c.ID =
o.CUSTOMER_ID);
Left Outer Join

The HiveQL LEFT OUTER JOIN returns all the rows

from the left table, even if there are no matches in the right
table. This means, if the ON clause matches 0 (zero) records
in the right table, the JOIN still returns a row in the result, but
with NULL in each column from the right table.

hive> SELECT c.ID, c.NAME, o.AMOUNT,

o.DATE FROM CUSTOMERS c LEFT
OUTER JOIN ORDERS o ON (c.ID =
o.CUSTOMER_ID);
Right Outer Join
The HiveQL RIGHT OUTER JOIN returns all
the rows from the right table, even if there are no matches
in the left table. If the ON clause matches 0 (zero)
records in the left table, the JOIN still returns a row in
the result, but with NULL in each column from the left
table.
hive> SELECT c.ID, c.NAME, o.AMOUNT,
o.DATE FROM CUSTOMERS c RIGHT
OUTER JOIN ORDERS o ON (c.ID =
o.CUSTOMER_ID);
Full Outer Join
The HiveQL FULL OUTER JOIN combines the
records of both the left and the right outer tables that
fulfill the JOIN condition. The joined table contains
either all the records from both the tables, or fills in
NULL values for missing matches on either side.
hive> SELECT c.ID, c.NAME, o.AMOUNT,
o.DATE FROM CUSTOMERS c FULL
OUTER JOIN ORDERS o ON (c.ID =
o.CUSTOMER_ID);
Thank You

Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Actualtests.1Z0-133.79.Qa: 1Z0-133 Oracle Weblogic Server 12C: Administration I
No ratings yet
Actualtests.1Z0-133.79.Qa: 1Z0-133 Oracle Weblogic Server 12C: Administration I
28 pages
Abhiram Kanumilli - Informatica Developer
No ratings yet
Abhiram Kanumilli - Informatica Developer
7 pages
Module 3-1
No ratings yet
Module 3-1
32 pages
Hive Final
No ratings yet
Hive Final
75 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
HIVE
No ratings yet
HIVE
80 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Hive
No ratings yet
Hive
13 pages
Hive
No ratings yet
Hive
30 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Hive Main
No ratings yet
Hive Main
33 pages
HIVE
No ratings yet
HIVE
28 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
Unit 3
No ratings yet
Unit 3
8 pages
Hive
No ratings yet
Hive
29 pages
Unit-5 Sgs
No ratings yet
Unit-5 Sgs
10 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Hive
No ratings yet
Hive
9 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Unit IV
No ratings yet
Unit IV
64 pages
Big Data Analytics and Developers Training Session 10
No ratings yet
Big Data Analytics and Developers Training Session 10
27 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Unit IV
No ratings yet
Unit IV
22 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Introduction To Hive
No ratings yet
Introduction To Hive
14 pages
Hive
No ratings yet
Hive
65 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Hive
No ratings yet
Hive
47 pages
HIVE
No ratings yet
HIVE
16 pages
Hive
No ratings yet
Hive
23 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
Bda Unit-3
No ratings yet
Bda Unit-3
59 pages
Hive and Pig
No ratings yet
Hive and Pig
57 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
7 Hive
No ratings yet
7 Hive
30 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Hive
No ratings yet
Hive
45 pages
Big Data
No ratings yet
Big Data
120 pages
Hive
No ratings yet
Hive
42 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
HIVE
No ratings yet
HIVE
18 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Bda-Unit-Iv - 2020-21
100% (1)
Bda-Unit-Iv - 2020-21
30 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Hive Presentation
No ratings yet
Hive Presentation
18 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Chat Kickoff
No ratings yet
Chat Kickoff
7 pages
Unit 4-1
No ratings yet
Unit 4-1
21 pages
L13 Relational Model DDL
No ratings yet
L13 Relational Model DDL
79 pages
Database Systems Lab 9 Presentation
No ratings yet
Database Systems Lab 9 Presentation
17 pages
Chapter 1
No ratings yet
Chapter 1
63 pages
Oracle 2
No ratings yet
Oracle 2
112 pages
Big Data Assignment PDF
No ratings yet
Big Data Assignment PDF
18 pages
Outbox Pattern With Hibernate PDF
No ratings yet
Outbox Pattern With Hibernate PDF
4 pages
Using Bex Queries in Dashboard (Sap Bo Tool) .: Topic
No ratings yet
Using Bex Queries in Dashboard (Sap Bo Tool) .: Topic
17 pages
DBMS Basic Concepts
No ratings yet
DBMS Basic Concepts
29 pages
Quiz Result
No ratings yet
Quiz Result
3 pages
Oracle ZFS Storage ZS3 Presales Specialist
80% (5)
Oracle ZFS Storage ZS3 Presales Specialist
21 pages
SQL Server 2019 Editions Datasheet
No ratings yet
SQL Server 2019 Editions Datasheet
3 pages
Past Paper - Database Systems - March 2009
No ratings yet
Past Paper - Database Systems - March 2009
5 pages
Object-Oriented Databases: Commercial OODBMS: Objectivity/DB
No ratings yet
Object-Oriented Databases: Commercial OODBMS: Objectivity/DB
36 pages
MDF Supplemental Slides
No ratings yet
MDF Supplemental Slides
25 pages
Seo Marketing Plan Template
No ratings yet
Seo Marketing Plan Template
8 pages
From Excel To KNIME 081921-44
No ratings yet
From Excel To KNIME 081921-44
52 pages
SQIT3043 Chapter 5 - Data Normalization
No ratings yet
SQIT3043 Chapter 5 - Data Normalization
22 pages
Computer Application To Business-II
No ratings yet
Computer Application To Business-II
3 pages
Concepts
No ratings yet
Concepts
7 pages
PDF Pemeriksaan Klinis Pada Bayi Dan Anak Edisi 3 Bab 1 4 - Compress
No ratings yet
PDF Pemeriksaan Klinis Pada Bayi Dan Anak Edisi 3 Bab 1 4 - Compress
72 pages
Pages 43 and 44
No ratings yet
Pages 43 and 44
2 pages
DBT Re-Exam
No ratings yet
DBT Re-Exam
3 pages
ES 5 - Reviewer Ms Excel
No ratings yet
ES 5 - Reviewer Ms Excel
5 pages
Business Intelligence (BI) Refers To
No ratings yet
Business Intelligence (BI) Refers To
8 pages
Alfresco Content Management System
No ratings yet
Alfresco Content Management System
12 pages
Transaction Management SQL Isolation Level: 1. Read Committed
No ratings yet
Transaction Management SQL Isolation Level: 1. Read Committed
2 pages

Hiveppt

Uploaded by

Hiveppt

Uploaded by

HIVE

Hive is a data warehouse infrastructure tool to

It stores Schema in a database and processed data

User Interface - Hive is a data warehouse infrastructure software that can

Execute Query- The Hive interface such as Command Line or

Integral Types - Integer type data can be specified using

String Types - String type data types can be specified using

Floating Point Types - Floating point types are

Maps - Maps in Hive are similar to Java Maps.

Syntax: MAP<primitive_type, data_type>

Structs - Structs in Hive is similar to using complex data with comment.

hive> CREATE DATABASE [IF

hive>DROP DATABASE [IF

>ROW FORMAT DELIMITED

>LINES TERMINATED BY ‘\n’

Dropping partition - Syntax - hive>ALTER TABLE

The Hive Query Language (HiveQL) is a

The ORDER BY clause is used to retrieve

The GROUP BY clause is used to group all

The HiveQL LEFT OUTER JOIN returns all the rows

hive> SELECT c.ID, c.NAME, o.AMOUNT,

You might also like