06 ImpalaHiveDataModeling
06 ImpalaHiveDataModeling
201509
Course
Chapters
10
Spark
Basics
11
Working
with
RDDs
in
Spark
12
AggregaHng
Data
with
Pair
RDDs
13
WriHng
and
Deploying
Spark
ApplicaHons
Distributed
Data
Processing
with
14
Parallel
Processing
in
Spark
Spark
15
Spark
RDD
Persistence
16
Common
PaFerns
in
Spark
Data
Processing
17
Spark
SQL
and
DataFrames
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐2
Modeling
and
Managing
Data
in
Impala
and
Hive
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐3
Chapter
Topics
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐4
How
Hive
and
Impala
Load
and
Store
Data
(1)
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐5
HIDDEN
SLIDE
Hive
Metastore
instructor
notes
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐6
How
Hive
and
Impala
Load
and
Store
Data
(2)
§ Hive
and
Impala
use
the
Metastore
to
determine
data
format
and
loca*on
– The
query
itself
operates
on
data
stored
in
HDFS
Metastore
Query
(metadata
in
RDBMS)
Impala
or
Hive
Server
Tables
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐7
Data
and
Metadata
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐8
The
Data
Warehouse
Directory
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐9
Chapter
Topics
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐10
Defining
Databases
and
Tables
§ Databases
and
tables
are
created
and
managed
using
the
DDL
(Data
Defini*on
Language)
of
HiveQL
or
Impala
SQL
– Very
similar
to
standard
SQL
DDL
– Some
minor
differences
between
Hive
and
Impala
DDL
will
be
noted
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐11
CreaHng
a
Database
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐12
Removing
a
Database
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐13
Data
Types
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐15
CreaHng
a
Table
(2)
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐16
CreaHng
a
Table
(3)
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐17
CreaHng
a
Table
(4)
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐18
CreaHng
a
Table
(5)
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐19
Example
Table
DefiniHon
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐20
CreaHng
Tables
Based
On
ExisHng
Schema
§ Use
LIKE
to
create
a
new
table
based
on
an
exis*ng
table
defini*on
CREATE TABLE jobs_archived LIKE jobs;
§ Column
defini*ons
and
names
are
derived
from
the
exis*ng
table
– New
table
will
contain
no
data
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐21
CreaHng
Tables
Based
On
ExisHng
Data
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐22
Controlling
Table
Data
LocaHon
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐23
Externally
Managed
Tables
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐24
Exploring
Tables
(1)
§ The
SHOW TABLES
command
lists
all
tables
in
the
current
database
SHOW TABLES;
+---------------+
| tab_name |
+---------------+
| accounts |
| employees |
| job |
| vendors |
+---------------+
§ The
DESCRIBE
command
lists
the
fields
in
the
specified
table
DESCRIBE jobs;
+--------+-----------+---------+
| name | type | comment |
+--------+-----------+---------+
| id | int | |
| title | string | |
| salary | int | |
| posted | timestamp | |
+--------+-----------+---------+
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐25
Exploring
Tables
(2)
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐26
Exploring
Tables
(3)
§ SHOW CREATE TABLE displays the SQL command to create the table
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐27
Using
the
Hue
Metastore
Manager
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐28
Chapter
Topics
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐29
Data
ValidaHon
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐30
Loading
Data
From
HDFS
Files
§ To
load
data,
simply
add
files
to
the
table’s
directory
in
HDFS
– Can
be
done
directly
using
the
hdfs dfs
commands
– This
example
loads
data
from
HDFS
into
the
sales
table
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐31
OverwriHng
Data
From
Files
§ Add
the
OVERWRITE
keyword
to
delete
all
records
before
import
– Removes
all
files
within
the
table’s
directory
– Then
moves
the
new
files
into
that
directory
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐32
Appending
Selected
Records
to
a
Table
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐33
Loading
Data
Using
the
Metastore
Manager
§ The Metastore Manager provides two ways to load data into a table
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐34
Loading
Data
From
a
RelaHonal
Database
§ Sqoop
has
built-‐in
support
for
impor*ng
data
into
Hive
and
Impala
§ Add
the
--hive-import
op*on
to
your
Sqoop
command
– Creates
the
table
in
the
Hive
metastore
– Imports
data
from
the
RDBMS
to
the
table’s
directory
in
HDFS
$ sqoop import \
--connect jdbc:mysql://localhost/loudacre \
--username training \
--password training \
--fields-terminated-by '\t' \
--table employees \
--hive-import
– Note
that
--hive-import
creates
a
table
accessible
in
both
Hive
and
Impala
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐35
Chapter
Topics
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐36
Impala
in
the
Cluster
§ Each
slave
node
in
the
cluster
runs
an
Catalog
Server
Impala
daemon
Master
State
Store
Node
– Co-‐located
with
the
HDFS
slave
NameNode
daemon
(DataNode)
§ Two
other
daemons
running
on
master
Impala
DataNode
nodes
support
query
execu*on
Daemon
(HDFS)
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐37
How
Impala
Executes
a
Query
HDFS
Slave
– Streams
results
to
client
Nodes
Impala
Daemon
Impala
Daemon
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐38
Metadata
Caching
(1)
metadata
Impala
Metadata
in
RDBMS
Daemon
cache
Impala
Metadata
Daemon
cache
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐39
Metadata
Caching
(2)
Metastore
CREATE TABLE Impala
Metadata
suppliers (…) Daemon
cache
metadata
Impala
Metadata
in
RDBMS
Daemon
cache
Impala
Metadata
Daemon
cache
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐40
External
Changes
and
Metadata
Caching
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐41
UpdaHng
the
Impala
Metadata
Cache
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐42
Chapter
Topics
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐43
EssenHal
Points
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-44
Bibliography
The
following
offer
more
informa*on
on
topics
discussed
in
this
chapter
§ Impala
Concepts
and
Architecture
– http://tiny.cloudera.com/adcc12a
§ Impala
SQL
Language
Reference
– http://tiny.cloudera.com/impalasql
§ Impala-‐related
Ar*cles
on
Cloudera’s
Blog
– http://tiny.cloudera.com/adcc12e
§ Apache
Hive
Web
Site
– http://hive.apache.org/
§ HiveQL
Language
Manual
– http://tiny.cloudera.com/adcc10b
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐45
Chapter
Topics
©
Copyright
2010-‐2015
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
or
shared
without
prior
wriFen
consent
from
Cloudera.
6-‐46
Homework:
Create
and
Populate
Tables
in
Impala
© Copyright 2010-‐2015 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriFen consent from Cloudera. 6-‐47