05b Hive
05b Hive
Why
Another
Data
Warehousing
System?
Problem
:
Data,
data
and
more
data
Several
TBs
of
data
everyday
The
Hadoop
Experiment:
Uses
Hadoop
File
System
(HDFS)
Scalable/Available
Problem
Lacked
Expressiveness
Map-‐Reduce
hard
to
program
SoluOon
:
HIVE
Copyright Ellis Horowitz, 2011 - 2012 2
What
is
HIVE?
A
system
for
managing
and
querying
unstructured
data
as
if
it
were
structured
Uses
Map-‐Reduce
for
execuOon
HDFS
for
Storage
Key
Building
Principles
SQL
as
a
familiar
data
warehousing
tool
Extensibility
(Pluggable
map/reduce
scripts
in
the
language
of
your
choice,
Rich
and
User
Defined
Data
Types,
User
Defined
FuncOons)
Interoperability
(Extensible
Framework
to
support
different
file
and
data
formats)
Performance
Example:
CREATE
TABLE
t1(ds
string,
ctry
float,
li
list<map<string,
struct<p1:int,
p2:int<<);
Metastore
• The
component
that
store
the
system
catalog
and
meta
data
about
tables,
columns,
parOOons
etc.
• Stored
on
a
tradiOonal
RDBMS
System
Architecture
and
Components
JDBC
ODBC
Web
Command
Line
Interface
Interface
ThriP
Server
Metastore
Driver
(Compiler,
OpOmizer,
Executor)
•
Driver
The
component
that
manages
the
lifecycle
of
a
HiveQL
statement
as
it
moves
through
Hive.
The
driver
also
maintains
a
session
handle
and
any
session
staOsOcs.
System
Architecture
and
Components
JDBC
ODBC
Web
Command
Line
Interface
Interface
ThriP
Server
Metastore
Driver
(Compiler,
OpOmizer,
Executor)
•
Query
Compiler
The
component
that
compiles
HiveQL
into
a
directed
acyclic
graph
of
map/
reduce
tasks.
System
Architecture
and
Components
JDBC
ODBC
Web
Command
Line
Interface
Interface
ThriP
Server
Metastore
Driver
(Compiler,
OpOmizer,
Executor)
• OpOmizer
consists
of
a
chain
of
transformaOons
such
that
the
operator
DAG
resulOng
from
one
transformaOon
is
passed
as
input
to
the
next
transformaOon
Performs
tasks
like
Column
Pruning
,
ParOOon
Pruning,
ReparOOoning
of
Data
System
Architecture
and
Components
JDBC
ODBC
Web
Command
Line
Interface
Interface
Thris
Server
Metastore
Driver
(Compiler,
OpOmizer,
Executor)
•
ExecuOon
Engine
The
component
that
executes
the
tasks
produced
by
the
compiler
in
proper
dependency
order.
The
execuOon
engine
interacts
with
the
underlying
Hadoop
instance.
System
Architecture
and
Components
JDBC
ODBC
Web
Command
Line
Interface
Interface
ThriP
Server
Metastore
Driver
(Compiler,
OpOmizer,
Executor)
•
HiveServer
The
component
that
provides
a
tris
interface
and
a
JDBC/ODBC
server
and
provides
a
way
of
integraOng
Hive
with
other
applicaOons.
System
Architecture
and
Components
JDBC
ODBC
Web
Command
Line
Interface
Interface
ThriP
Server
Metastore
Driver
(Compiler,
OpOmizer,
Executor)
•
Client
Components
Client
component
like
Command
Line
Interface(CLI),
the
web
UI
and
JDBC/
ODBC
driver.
Hive
Query
Language
Basic
SQL
From
clause
sub-‐query
ANSI
JOIN
(equi-‐join
only)
MulO-‐Table
insert
MulO
group-‐by
Sampling
Objects
Traversal
Extensibility
Pluggable
Map-‐reduce
scripts
using
TRANSFORM
INSERTION
INSERT
OVERWRITE
TABLE
t1
SELECT
*
FROM
t2;
http://www.slideshare.net/cloudera/hw09-hadoop-
7/20/2010
development-at-facebook-hive-and-hdfs
Introduction to Hive 36
Conclusion
Pros
Good
explanaOon
of
Hive
and
HiveQL
with
proper
examples
Architecture
is
well
explained
Usage
of
Hive
is
properly
given
Cons
Accepts
only
a
subset
of
SQL
queries
Performance
comparisons
with
other
systems
would
have
been
more
appreciable