0% found this document useful (0 votes)

12 views21 pages

07 Hive 01

The document provides an overview of Hive, a data warehousing solution built on Hadoop, including its installation, table creation, data loading, partitioning, and bucketing. It explains Hive's architecture, interface options, and how it translates HiveQL into MapReduce jobs for execution. Additionally, it covers concepts like schema violations and partitioning to enhance query performance.

Uploaded by

arunsjoseph5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views21 pages

07 Hive 01

Uploaded by

arunsjoseph5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Big Data Analytics

8CAI4-01
Unit 6 (Hive)
Agenda
• Hive Overview and Concepts
• Installation
• Table Creation and Deletion
• Loading Data into Hive
• Partitioning
• Bucketing
• Joins

Hive
• Data Warehousing Solution built on top of
Hadoop
• Provides SQL-like query language named
HiveQL
– Minimal learning curve for people with SQL expertise
– Data analysts are target audience
• Early Hive development work started at
Facebook in 2007
• Today Hive is an Apache project under
Hadoop
– http://hive.apache.org

5
Hive Provides
• Ability to bring structure to various data
formats
• Simple interface for ad hoc querying,
analyzing and summarizing large amounts
of data
• Access to files on various data stores such
as HDFS and HBase

Hive
• Hive does NOT provide low latency or real-
time queries
• Even querying small amounts of data may
take minutes
• Designed for scalability and ease-of-use
rather than low latency responses

7
Hive
• Translates HiveQL statements into a set of
MapReduce Jobs which are then executed on a
Hadoop Cluster

Execute on
Hadoop
Cluster
HiveQL Hive
CREATE TABLE posts (user
STRING, post STRING, time
BIGINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
Monitor/ ...
Report
LOAD DATA LOCAL INPATH
'data/user-posts.txt' OVERWRITE
INTO TABLE posts;

Hadoop
Client Machine Cluster
8

Hive Metastore
• To support features like schema(s) and data
partitioning Hive keeps its metadata in a
Relational Database
– Packaged with Derby, a lightweight embedded SQL DB
• Default Derby based is good for evaluation an
testing
• Schema is not shared between users as each
user has their own instance of embedded Derby
• Stored in metastore_db directory which resides
in the directory that hive was started from
– Can easily switch another SQL installation such as
MySQL

9
Hive Architecture
Comman JDBC/
d Line Other
clients

Hive
Metastor Query
e Parser
Execu
tor

Hadoop
HDFS and MapReduce
10

Hive Interface Options

• Command Line Interface (CLI)
– Will use exclusively in these slides
• Hive Web Interface
– https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface
• Java Database Connectivity (JDBC)
– https://cwiki.apache.org/confluence/display/Hive/HiveClient

11
Hive Concepts
• Re-used from Relational Databases
– Database: Set of Tables, used for name conflicts resolution
– Table: Set of Rows that have the same schema (same columns)
– Row: A single record; a set of columns
– Column: provides value and type for a single value

Column

Row

Table

12
Databas
e

Installation Prerequisites
• Java 6
– Just Like Hadoop
• Hadoop 0.20.x+
– No surprise here

13
Hive Installation
• Set $HADOOP_HOME environment variable
– Was done as a part of HDFS installation
• Set $HIVE_HOME and add hive to the PATH
export HIVE_HOME=$CDH_HOME/hive-0.8.1-cdh4.0.0
export PATH=$PATH:$HIVE_HOME/bin

• Hive will store its tables on HDFS and those

locations needs to be bootstrapped
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse

Hive Installation
• Similar to other Hadoop’s projects Hive’s
configuration is in $HIVE_HOME/conf/hive-
site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
<name>mapred.job.tracker</name>
<value>localhost:10040</value>
</property>

</configuration> Specify the location of

ResourceManager so Hive knows
where to execute MapReduce Jobs;
by default Hive utilizes
15 LocalJobRunner
Run Hive
• HDFS and YARN need to be up and running
$ hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201207312052_1402761030.txt
hive>

Hive’s Interactive Command Line Interface (CLI)

Simple Example
1. Create a Table
2. Load Data into a Table
3. Query Data
4. Drop the Table

17
1: Create a Table
• Let’s create a table to store data from
$PLAY_AREA/data/user-posts.txt
Launch Hive Command Line Interface (CLI)
$ cd $PLAY_AREA
Location of the session’s log
file
$ hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201208022144_2014345460.txt
hive> !cat data/user-posts.txt; Can execute local
user1,Funny Story,1343182026191 commands within CLI,
user2,Cool Deal,1343182133839 place a command in
user4,Interesting Post,1343182154633
between ! and ;
user5,Yet Another Blog,13431839394
hive>
Values are separate by ‘,’ and each
row represents a record; first value
is user name, second is post
content and third is timestamp
18

1: Create a Table
hive> CREATE TABLE posts (user STRING, post STRING, time BIGINT)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',' 1st line: creates a table with 3
> STORED AS TEXTFILE; columns 2nd and 3rd line: how the
OK underlying file should be parsed
Time taken: 10.606 seconds 4th line: how to store data

Statements must end with a

semicolon and can span multiple
rows
hive> show tables;
Display all of the
OK
tables
posts
Time taken: 0.221 seconds Result is displayed between
̎OK and ̎Time taken...̎

hive> describe posts;

OK
user string Display schema for posts
post string table
time bigint
19
Time taken: 0.212 seconds
2: Load Data Into a Table
hive> LOAD DATA LOCAL INPATH 'data/user-posts.txt'
> OVERWRITE INTO TABLE posts;
Copying data from file:/home/hadoop/Training/play_area/data/user-
posts.txt Copying file:
file:/home/hadoop/Training/play_area/data/user-posts.txt Loading
data to table default.posts
Deleted /user/hive/warehouse/posts
OK
Time taken: 5.818 seconds
hive>
Exis
ting
rec
ord
s
the
tabl
e
post
20 s
are
del
ete
d;
dat
a in
user
3: Query Data -
post
s.txt
hive> select count (1) from is Count number of records in posts
posts; Total MapReduce jobs =loa
1 table
Launching Job 1 out of 1 ded Transformed HiveQL into 1 MapReduce
... into Job
Starting Job = job_1343957512459_0004,
Hiv Tracking URL
=
e’s
http://localhost:8088/proxy/application_1343957512459
_0004/ post
Kill Command = hadoop job -s
Dmapred.job.tracker=localhost:10040
tabl -kill
job_1343957512459_0004 e
Hadoop job information for Stage-1: number of mappers: 1; number of
$reducers:
hdfs dfs -cat /user/hive/warehouse/posts/user-posts.txt
1 2012-08-02 22:37:24,962 Stage-1 map = 0%, reduce = 0%
2012-08-02 22:37:30,497
user1,Funny Story,1343182026191 Stage-1 map = 100%, reduce = 0%,
Cumulative CPU 0.87 sec
user2,Cool Deal,1343182133839
2012-08-02 22:37:31,577
user4,Interesting Stage-1 map = 100%, reduce = 0%,
Post,1343182154633
Cumulative CPU 0.87 sec
user5,Yet Another Blog,13431839394
2012-08-02 22:37:32,664 Stage-1 map = 100%, reduce = 100%, Cumulative
CPU 2.64
Under sec MapReduce
the covers Total cumulative CPU time: 2 seconds 640 msec
Hive stores it’s
Ended Job =
t
job_1343957512459_0004
4 Result is 4a
MapReduce
Time taken: Jobs
14.204Launched: records b
Job 0: Map: 1 Reduce: 1
seconds
21
Accumulative CPU: 2.64 sec HDFS l
Read: 0 HDFS Write: 0 e
SUCESS s
3: Query Data
hive> select * from posts where user="user2";
...
...
OK Select records for
use 1343182133839 "user2"
r2
Time taken: 12.184 seconds

Coo
l Select records whose
Dea timestamp is less or
l equals to the provided
value

hive> select * from

posts where
time<=1343182133839 Usually there are too
limit 2; many results to
... display, then one
Time taken:
. 12.003 seconds could utilize limit
hive> . command to bound
22 . the display

O
K
user1 Funny Story
1343182026191 user2
Cool Deal
4: Drop the Table
1343182133839

hive> DROP TABLE posts; Remove the table; use with

OK caution
Time taken: 2.182 seconds

hive> exit;

$ hdfs dfs -ls /user/hive/warehouse/

If hive was managing underlying file

then it will be removed

23
Loading Data
• Several options to start using data in HIVE
– Load data from HDFS location
hive> LOAD DATA INPATH '/training/hive/user-posts.txt'
> OVERWRITE INTO TABLE posts;

• File is copied from the provided location to

/user/hive/warehouse/ (or configured location)
– Load data from a local file system
hive> LOAD DATA LOCAL INPATH 'data/user-posts.txt'
> OVERWRITE INTO TABLE posts;

• File is copied from the provided location to

/user/hive/warehouse/ (or configured location)
– Utilize an existing location on HDFS
• Just point to an existing location when creating a table

Re-Use Existing HDFS Location

hive> CREATE EXTERNAL TABLE posts
> (user STRING, post STRING, time BIGINT)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> LOCATION '/training/hive/';
OK
Time taken: 0.077 seconds
hive> Hive will load all the files under
/training/hive directory in posts table

25
Schema Violations
• What would happen if we try to insert data that
does not comply with the pre-defined schema?

hive> !cat data/user-posts-inconsistentFormat.txt;

user1,Funny Story,1343182026191
user2,Cool Deal,2012-01-05
user4,Interesting Post,1343182154633
user5,Yet Another Blog,13431839394

hive> describe posts;

OK
user string
post string Third Column ‘post’ is of type
time bigint bigint; will not be able to
Time taken: 0.289 seconds convert
‘2012-01-05’ value

Schema Violations
hive> LOAD DATA LOCAL INPATH
> 'data/user-posts-inconsistentFormat.txt'
> OVERWRITE INTO TABLE posts;
OK
Time taken: 0.612 seconds

hive> select * from posts;

OK
user1 Funny Story 1343182026191
null is set for any value
user2 Cool Deal NULL
user4 Interesting Post 1343182154633 that violates pre-defined
user5 Yet Another Blog 13431839394 schema
Time taken: 0.136 seconds
hive>

27
Partitions
• To increase performance Hive has the
capability to partition data
– The values of partitioned column divide a table into
segments
– Entire partitions can be ignored at query time
– Similar to relational databases’ indexes but not as
granular
• Partitions have to be properly crated by
users
– When inserting data must specify a partition
• At query time, whenever appropriate,
Hive will automatically filter out partitions
28

Creating Partitioned Table

hive> CREATE TABLE posts (user STRING, post STRING, time
BIGINT)
> PARTITIONED BY(country STRING)
> FIELDS TERMINATED
ROW FORMAT BY ','
DELIMITED Partition table based
> STORED AS TEXTFILE; on the value of a
OK country.
Time taken: 0.116 seconds

hive> describe posts;

OK
user string There is no difference in
post string schema between "partition"
time bigint columns and "data" columns
countrystring
Time taken:
0.111 seconds
hive> show partitions posts;
OK
Time taken: 0.102 seconds
hive>

29
Load Data Into Partitioned Table
hive> LOAD DATA LOCAL INPATH 'data/user-posts-US.txt'
> OVERWRITE INTO TABLE posts;
FAILED: Error in semantic analysis: Need to specify partition
columns because the destination table is partitioned

Since the posts table was defined to be

partitioned any insert statement must
specify the partition

hive> LOAD DATA LOCAL INPATH 'data/user-posts-US.txt'

> OVERWRITE INTO TABLE posts PARTITION(country='US');
OK
Time taken: 0.225 seconds

hive> LOAD DATA LOCAL INPATH 'data/user-posts-AUSTRALIA.txt'

> OVERWRITE INTO TABLE posts PARTITION(country='AUSTRALIA');
OK
hive> taken: 0.236 seconds
Time Each file is loaded into separate
partition; data is separated by
30 country

Partitioned Table
• Partitions are physically stored under
separate directories

hive> show partitions

posts;
OK There is a directory
country=AUSTRALI for each partition
A country=US value
Time taken: 0.095
seconds
$
hive>
hdfs exit;
dfs -ls -R /user/hive/warehouse/posts
/user/hive/warehouse/posts/country=AUSTRALIA
/user/hive/warehouse/posts/country=AUSTRALIA/user-posts-
AUSTRALIA.txt
/user/hive/warehouse/posts/country=US
/user/hive/warehouse/posts/country=US/user-posts-US.txt
31
Querying Partitioned Table
• There is no difference in syntax
• When partitioned column is specified in the
where clause entire directories/partitions could
be ignored
Only "COUNTRY=US" partition will be
queried, "COUNTRY=AUSTRALIA" partition
will be ignored

hive> select * from posts where

country='US'
user1 limit1343182026191
Funny Story 10; US
OK
user2 Cool Deal 1343182133839 US
user2 Great Interesting Note 13431821339485 US
user4 Interesting Post 1343182154633 US
user1 Humor is good 1343182039586 US
user2 Hi I am user #2 1343182133839 US
32
Time taken: 0.197 seconds

Bucketing
• Mechanism to query and examine random
samples of data
• Break data into a set of buckets based on a hash
function of a "bucket column"
– Capability to execute queries on a sub-set of random data
• Doesn’t automatically enforce bucketing
– User is required to specify the number of buckets by setting #
of reducer

hive> mapred.reduce.tasks = 256; Either manually set the #

OR of reducers to be the
hive> hive.enforce.bucketing = number of buckets or you
true; can use
‘hive.enforce.bucketing’
which will set it on your
behalf
33
Create and Use Table with
Buckets
hive> CREATE TABLE post_count (user STRING, count INT)
> CLUSTERED BY (user) INTO 5 BUCKETS;
Declare table with 5
OK
Time taken: 0.076 seconds
buckets for user
column
hive> set hive.enforce.bucketing = true; # of reducer will get
hive> insert overwrite table post_count set 5
> select user, count(post) from posts group by user;
Total MapReduce jobs = 2
Launching Job 1 out of 2
... Insert data into post_count
Launching Job 2 out of 2 bucketed table; number of posts
... are counted up for each user
OK
Time taken: 42.304 seconds
hive> exit;
$ hdfs dfs -ls -R
/user/hive/warehouse/post_c
ount/
/user/hive/warehouse/post_count/000001_0 A file per bucket is
/user/hive/warehouse/
/user/hive/warehouse/post_count/000002_0 created; now only a
post_count/000000_0
/user/hive/warehouse/post_count/000003_0 sub-set of buckets
34
/user/hive/warehouse/post_count/000004_0 can be sampled

Random Sample of Bucketed

Table
hive> select * from post_count TABLESAMPLE(BUCKET 1 OUT OF 2);
OK
user5 1
user1 2
Time taken: 11.758 seconds
hive>
S
a
m
p
l
e

a
p
p
r
o
x
i
35
m
a
Joins
• Joins in Hive are trivial
• Supports outer joins
– left, right and full joins
• Can join multiple tables
• Default Join is Inner Join
– Rows are joined where the keys match
– Rows that do not have matches are not included in the
result

set #1 join set #2

Simple Inner Join

• Let’s say we have 2 tables: posts and likes
hive> select * from posts limit
10;
OK
user Funny Story
1 1343182026191
user Interesting
Cool Deal Post
2 Yet Another Blog
1343182133839
1343182154633
Time
user taken: 0.108
1343183939434
hive>
seconds
4 select * from likes limit
We want to join these 2 data-
10;
user
user1 12 1343182026191 sets and produce a single
OK
5
user2 7 1343182139394 table that contains user, post
user3 0 1343182154633 and count of likes
user4 50 1343182147364
Time taken: 0.103 seconds
hive> CREATE TABLE posts_likes (user STRING, post STRING, likes_count
INT); OK
Time taken: 0.06 seconds
37
Simple Inner Join
hive> INSERT OVERWRITE TABLE posts_likes
> SELECT p.user, p.post, l.count
> FROM posts p JOIN likes l ON (p.user = l.user);
OK
Time taken: 17.901 seconds

Two tables are joined based on

user column; 3 columns are
selected and stored in
posts_likes table

hive> select * from posts_likes limit 10;

OK
user1 Funny Story 12
user2 Cool Deal

7 user4 Interesting Post

50 Time taken: 0.082

seconds hive>
38

Outer Join
• Rows which will not join with the ‘other’ table are still
included in the result

Left Outer
– Row from the first table are included whether they
have a match or not. Columns from the unmatched
(second) table are set to null.

Right Outer
– The opposite of Left Outer Join: Rows from the second
table are included no matter what. Columns from the
unmatched (first) table are set to null.

Full Outer
– Rows from both sides are included. For unmatched
39 rows the columns from the ‘other’ table are set to null.
Outer Join Examples
SELECT p.*, l.*
FROM posts p LEFT OUTER JOIN likes l ON (p.user = l.user)
limit 10;

SELECT p., l.

FROM posts p RIGHT OUTER JOIN likes l ON (p.user = l.user)
limit 10;

SELECT p., l.

FROM posts p FULL OUTER JOIN likes l ON (p.user = l.user)
limit 10;

Resources
• http://hive.apache.org/
• Hive Wiki
– https://cwiki.apache.org/confluence/display/Hive/Home

Hive
Edward Capriolo (Author), Dean Wampler (Author),
Jason Rutherglen (Author)
O'Reilly Media; 1 edition (October 3, 2012)

Chapter About Hive

Hadoop in Action
Chuck Lam
(Author)
Manning Publications; 1st Edition (December,
2010)
41
Summary
• We learned about
– Hive Concepts
– Hive Installation
– Table Creation and Deletion
– Loading Data into Hive
– Partitioning
– Bucketing
– Joins

Top 100 Excel Tips by Nicolas Boucher
57% (7)
Top 100 Excel Tips by Nicolas Boucher
1 page
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
Anam City Master Plan
No ratings yet
Anam City Master Plan
102 pages
Wa0006.
No ratings yet
Wa0006.
53 pages
Hive L1
No ratings yet
Hive L1
134 pages
Hive Main
No ratings yet
Hive Main
33 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
Hive
No ratings yet
Hive
29 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
Apache Hive: An Introduction
No ratings yet
Apache Hive: An Introduction
51 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
HIVE
No ratings yet
HIVE
80 pages
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
No ratings yet
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
91 pages
Hive Final
No ratings yet
Hive Final
75 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Hive Basics
No ratings yet
Hive Basics
35 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Introduction To Hive
No ratings yet
Introduction To Hive
14 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Hive Intoduction and Tables
No ratings yet
Hive Intoduction and Tables
31 pages
Apache Hive
No ratings yet
Apache Hive
30 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Practical 3.6 Hive
No ratings yet
Practical 3.6 Hive
8 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Hive
No ratings yet
Hive
65 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Unit 5 Hive and Pig
No ratings yet
Unit 5 Hive and Pig
16 pages
HIVE
No ratings yet
HIVE
28 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Hive
No ratings yet
Hive
4 pages
Hive PPTs
No ratings yet
Hive PPTs
34 pages
Bda Report
No ratings yet
Bda Report
16 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
4 pages
Hive
No ratings yet
Hive
45 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Hive
No ratings yet
Hive
23 pages
Hive
No ratings yet
Hive
30 pages
Hive Tutorial PDF
0% (1)
Hive Tutorial PDF
14 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Module 3-1
No ratings yet
Module 3-1
32 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
Apache Hive Lessons For Beginner
No ratings yet
Apache Hive Lessons For Beginner
93 pages
Hive
No ratings yet
Hive
49 pages
Unit Iv Part - 1
No ratings yet
Unit Iv Part - 1
60 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
13 Course Electrical
No ratings yet
13 Course Electrical
1 page
SSRN 4579415
No ratings yet
SSRN 4579415
64 pages
Farzana Akter - Energy Conversions
0% (2)
Farzana Akter - Energy Conversions
4 pages
Calalang vs. Williams
No ratings yet
Calalang vs. Williams
6 pages
Manual Del Medidor de Campo
No ratings yet
Manual Del Medidor de Campo
17 pages
Emplys Job Satisfaction
No ratings yet
Emplys Job Satisfaction
64 pages
Marketing Analytics
No ratings yet
Marketing Analytics
9 pages
Eagle: Multi-Format Dual-Channel Isdn Audio Codec
No ratings yet
Eagle: Multi-Format Dual-Channel Isdn Audio Codec
4 pages
Lesson 7
No ratings yet
Lesson 7
6 pages
8114 Um Hu
No ratings yet
8114 Um Hu
37 pages
Computing
No ratings yet
Computing
95 pages
2017 9749 H2 Physics Prelim Paper 3 Solutions
No ratings yet
2017 9749 H2 Physics Prelim Paper 3 Solutions
10 pages
De La Cruz-Pricing Strateegy Midterm
No ratings yet
De La Cruz-Pricing Strateegy Midterm
5 pages
Lecture # 2 Sept 8, 2020: Artificial Lift Technology Quizzes Questions
No ratings yet
Lecture # 2 Sept 8, 2020: Artificial Lift Technology Quizzes Questions
3 pages
Judgement 12
No ratings yet
Judgement 12
4 pages
Sample Lesson Plans Forkazakhstangrade 10: Jenny Dooley Series Consultant: Bob Obee Translations by N. Mukhamedjanova
No ratings yet
Sample Lesson Plans Forkazakhstangrade 10: Jenny Dooley Series Consultant: Bob Obee Translations by N. Mukhamedjanova
331 pages
Drafting, Revising, and Editing: Academic & Professional Communication
No ratings yet
Drafting, Revising, and Editing: Academic & Professional Communication
37 pages
Criticism B.A 5th Sem
No ratings yet
Criticism B.A 5th Sem
19 pages
g8 - 05 - 05 - Parallel Lines
No ratings yet
g8 - 05 - 05 - Parallel Lines
8 pages
Example Summary Writing A Goal of Service To Humankind Summary
No ratings yet
Example Summary Writing A Goal of Service To Humankind Summary
2 pages
Shayri
No ratings yet
Shayri
15 pages
Research Paper Coping Mechanism of SHS 2022
No ratings yet
Research Paper Coping Mechanism of SHS 2022
29 pages
Pub002 023 00 1299
No ratings yet
Pub002 023 00 1299
60 pages
Model QP 2022 Scheme
No ratings yet
Model QP 2022 Scheme
39 pages
Pay Query Procedure
No ratings yet
Pay Query Procedure
1 page
Shopping: Enter Your Title
No ratings yet
Shopping: Enter Your Title
12 pages
Data Analyst Cheat Sheet
No ratings yet
Data Analyst Cheat Sheet
28 pages
Physical Characteristics: Nov 23, 2021 ICAO Annex 14 Training Course 1
100% (1)
Physical Characteristics: Nov 23, 2021 ICAO Annex 14 Training Course 1
74 pages

07 Hive 01

Uploaded by

07 Hive 01

Uploaded by

Big Data Analytics

Hive Interface Options

• Hive will store its tables on HDFS and those

</configuration> Specify the location of

Hive’s Interactive Command Line Interface (CLI)

Statements must end with a

hive> describe posts;

hive> select * from

hive> DROP TABLE posts; Remove the table; use with

$ hdfs dfs -ls /user/hive/warehouse/

If hive was managing underlying file

• File is copied from the provided location to

• File is copied from the provided location to

Re-Use Existing HDFS Location

hive> !cat data/user-posts-inconsistentFormat.txt;

hive> describe posts;

hive> select * from posts;

Creating Partitioned Table

hive> describe posts;

Since the posts table was defined to be

hive> LOAD DATA LOCAL INPATH 'data/user-posts-US.txt'

hive> LOAD DATA LOCAL INPATH 'data/user-posts-AUSTRALIA.txt'

hive> show partitions

hive> select * from posts where

hive> mapred.reduce.tasks = 256; Either manually set the #

Random Sample of Bucketed

set #1 join set #2

Simple Inner Join

Two tables are joined based on

hive> select * from posts_likes limit 10;

7 user4 Interesting Post

50 Time taken: 0.082

SELECT p.*, l.*

SELECT p.*, l.*

Chapter About Hive

You might also like

SELECT p., l.

SELECT p., l.