0% found this document useful (0 votes)

28 views26 pages

Chapter 3 Hive - Distributed Data Warehouse

Uploaded by

mazlout hanadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views26 pages

Chapter 3 Hive - Distributed Data Warehouse

Uploaded by

mazlout hanadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Chapter 3 Hive - Distributed Data Warehouse

Foreword

 The Apache Hive data warehouse software helps read, write, and manage
large data sets that reside in distributed storage by using SQL. Structures
can be projected onto stored data. The command line tool and JDBC driver
are provided to connect users to Hive.

1 Huawei Confidential
Objectives

 Upon completion of this course, you will be able to learn:

 Hive application scenarios and basic principles
 Hive architecture and running process
 Hive SQL statements

2 Huawei Confidential
Contents

1. Hive Overview

2. Hive Functions and Architecture

3. Basic Hive Operations

3 Huawei Confidential
Introduction to Hive
 Hive is a data warehouse tool running on Hadoop and supports PB-level
distributed data query and management.
 Hive features:
 Supporting flexible extraction, transformation, and load (ETL)
 Supporting multiple computing engines, such as Tez and Spark
 Supporting direct access to HDFS files and HBase
 Easy-to-use and easy-to-program

4 Huawei Confidential
Application Scenarios of Hive

 User behavior analysis

Data  Interest partition
mining  Area display

Non-real-time  Log analysis

data  Text analysis
analysis

Data  Daily/Weekly user clicks

summarization  Traffic statistics

 Data extraction
Data  Data loading
warehouse  Data transformation

5 Huawei Confidential
Comparison Between Hive and Traditional Data
Warehouses (1)
Hive Conventional Data Warehouse
Clusters are used to store data, which have a capacity upper
HDFS is used to store data. Theoretically, limit. With the increase of capacity, the computing speed
Storage infinite expansion is possible. decreases sharply. Therefore, data warehouses are applicable
only to commercial applications with small data volumes.

Execution You can select more efficient algorithms to perform queries,

Tez (default)
engine or take more optimization measures to speed up the queries.

Usage
HQL (SQL-like) SQL
Method
Metadata storage is independent of data
Flexibility storage, decoupling metadata and data.
Low flexibility. Data can be used for limited purposes.

Computing depends on the cluster scale and

When the data volume is small, the data processing speed is
Analysis the cluster is easy to expand. In the case of a
high. When the data volume is large, the speed decreases
speed large amount of data, computing is much
sharply.
faster than that of a common data warehouse.

6 Huawei Confidential
Comparison Between Hive and Traditional Data
Warehouses (2)

Hive Conventional Data Warehouse

Index Low efficiency High efficiency

Self-developed application models are

A set of mature report solutions are
Ease of use needed, featuring high flexibility but
integrated to facilitate data analysis.
delivering low usability.

The reliability is low. If a query fails, you

Data is stored in HDFS, implementing
Reliability high data reliability and fault tolerance.
start the task again. The data fault
tolerance depends on hardware RAID.

Environment Low dependency on hardware, applicable Highly dependent on high-performance

dependency to common machines business servers

Price Open-source product, free of charge Expensive in commercial use

7 Huawei Confidential
Advantages of Hive

Advantages

High Reliability
and SQL-like Scalability Multiple APIs
Fault Tolerance
1. Cluster 1. SQL-like 1. User-defined 1. Beeline
deployment syntax storage 2. JDBC
of HiveServer 2. Large number format 3. Thrift
2. Double of built-in 2. User-defined 4. ODBC
MetaStores functions function
3. Timeout retry
mechanism

1 2 3 4

8 Huawei Confidential
Contents

1. Hive Overview

2. Hive Functions and Architecture

3. Basic Hive Operations

9 Huawei Confidential
Hive Architecture

Hive
JDBC ODBC

Web
Thrift Server
Interface

Driver
MetaStore
(Compiler, Optimizer, Executor)

Tez MapReduce Spark

10 Huawei Confidential
Hive Running Process
 The client submits the HQL command.
HQL statement
 Tez executes the query.
 YARN allocates resources to Hive
applications in the cluster and enables
authorization for Hive jobs in the Tez(default)
YARN queue.
 Hive updates data in HDFS or Hive YARN

warehouse based on the table type.

 Hive returns the query result through HDFS

the JDBC connection.

11 Huawei Confidential
Data Storage Model of Hive

Database

Table Table

Partition

Bucket Bucket Partition Skewed Normal

data data
Bucket Bucket

12 Huawei Confidential
Partition and Bucket
 Partition: Data tables can be partitioned based on the value of a certain field.
 Each partition is a directory.
 The number of partitions is not fixed.
 Partitions or buckets can be created in a partition.
 Data can be stored in different buckets.
 Each bucket is a file.
 The number of buckets is specified when creating a table. The buckets can be sorted.
 Data is hashed based on the value of a field and then stored in a bucket.

13 Huawei Confidential
Managed Table and External Table
 Hive can create managed tables and external tables.
 By default, a managed table is created, and Hive moves data to the data warehouse directory.
 When an external table is created, Hive accesses data outside the warehouse directory.
 If all processing is performed by Hive, you are advised to use managed tables.
 If you want to use Hive and other tools to process the same data set, you are advised to use
external tables.

Managed Table External Table

CREATE/LOAD Data is moved to the repository directory. The data location is not moved.

The metadata and data are deleted

DROP Only the metadata is deleted.
together.

15 Huawei Confidential
Functions Supported by Hive
 Built-in Hive Functions
 Mathematical functions, such as round(), floor(), abs(), and rand().
 Date functions, such as to_date(), month(), and day().
 String functions, such as trim(), length(), and substr().
 User-Defined Function (UDF)

16 Huawei Confidential
Contents

1. Hive Overview

2. Hive Functions and Architecture

3. Basic Hive Operations

17 Huawei Confidential
Hive SQL Overview
 DDL-Data Definition Language:
 Creates tables, modifies tables, deletes tables, partitions, and data types.
 DML-Data Management Language:
 Imports and exports data.
 DQL-Data Query Language:
 Performs simple queries.
 Performs complex queries such as Group by, Order by and Join.

19 Huawei Confidential
DDL Operations
-- Create a table:
hive> CREATE TABLE pokes (foo INT, bar STRING);

hive> CREATE EXTERNAL TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);

-- Browse the table:

hive> SHOW TABLES;

-- Describe a table:
hive> DESCRIBE invites;

-- Modify a table:
hive> ALTER TABLE events RENAME TO 3koobecaf;
hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);

20 Huawei Confidential
DML Operations
-- Load data to a table:

hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

hive> LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');

-- Export data to HDFS:

EXPORT TABLE invites TO '/department';

21 Huawei Confidential
DQL Operations (1)
--SELECTS and FILTERS:

hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';

hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='2008-08-15';

--GROUP BY:

hive> FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(*) WHERE a.foo > 0 GROUP BY a.bar;
hive> INSERT OVERWRITE TABLE events SELECT a.bar, count(*) FROM invites a WHERE a.foo > 0 GROUP BY a.bar;

22 Huawei Confidential
DQL Operations (2)
--MULTITABLE INSERT:

FROM src
INSERT OVERWRITE TABLE dest1 SELECT src.* WHERE src.key < 100
INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= 100 and src.key < 200;

--JOIN:
hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo;

--STREAMING:

hive> FROM invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM(a.foo, a.bar) AS (oof, rab) USING
'/bin/cat' WHERE a.ds > '2008-08-09';

23 Huawei Confidential
Summary

 This course introduces Hive application scenarios, basic principles, Hive

architecture, running process, and common Hive SQL statements.

24 Huawei Confidential
Quiz

1. Which of the following scenarios are applicable to Hive? ( )

A. Online real-time data analysis
B. Data mining (including user behavior analysis, region of interest, and regional display)
C. Data summary (daily/weekly user clicks and click ranking)
D. Non-real-time analysis (log analysis and statistical analysis)
2. Which of the following statements about basic Hive SQL operations is correct? ( )
A. You need to use the keyword "external" to create an external table and specify the keyword
"internal" to create a normal table.
B. The location information must be specified when an external table is created.
C. When data is loaded to Hive, the source data must be a path in HDFS.
D. Column separators can be specified when a table is created.

25 Huawei Confidential
Recommendations

 Huawei Cloud Official Web Link:

 https://www.huaweicloud.com/intl/en-us/
 Huawei MRS Documentation:
 https://www.huaweicloud.com/intl/en-us/product/mrs.html
 Huawei TALENT ONLINE:
 https://e.huawei.com/en/talent/#/

26 Huawei Confidential
Thank you. Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

The information in this document may contain predictive

statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.

Big Data. Lecture. Chapter 3 Hive - Distributed Data Warehouse
No ratings yet
Big Data. Lecture. Chapter 3 Hive - Distributed Data Warehouse
25 pages
Hive L1
No ratings yet
Hive L1
134 pages
Odoo Development
No ratings yet
Odoo Development
151 pages
Hive and Pig
No ratings yet
Hive and Pig
57 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
Using DAO (Data Access Objects) Code - Visual Basic 6 (VB6
33% (3)
Using DAO (Data Access Objects) Code - Visual Basic 6 (VB6
72 pages
Dse6321 Examen
No ratings yet
Dse6321 Examen
49 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
SAP Warranty Presentation Detering Consulting PDF
100% (1)
SAP Warranty Presentation Detering Consulting PDF
28 pages
Wa0006.
No ratings yet
Wa0006.
53 pages
Unit IV
No ratings yet
Unit IV
64 pages
BADI Example
No ratings yet
BADI Example
12 pages
S01 - Getting Started: Welcome To Bw4Hana Training
100% (1)
S01 - Getting Started: Welcome To Bw4Hana Training
31 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hive Intoduction and Tables
No ratings yet
Hive Intoduction and Tables
31 pages
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Hive Final
No ratings yet
Hive Final
75 pages
Module 06 Hive - Distributed Data Warehouse
No ratings yet
Module 06 Hive - Distributed Data Warehouse
36 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
Hive Part 2
No ratings yet
Hive Part 2
47 pages
Big Book of Data Warehousing and Bi v9 122723 Final 0
No ratings yet
Big Book of Data Warehousing and Bi v9 122723 Final 0
88 pages
Hive Basics
No ratings yet
Hive Basics
35 pages
HIVE
No ratings yet
HIVE
80 pages
ITSM Service Request Management
No ratings yet
ITSM Service Request Management
9 pages
Hive Part 2
No ratings yet
Hive Part 2
53 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
133 Core Java Interview Questions Answers From Last 5 Years - The MEGA List
No ratings yet
133 Core Java Interview Questions Answers From Last 5 Years - The MEGA List
20 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive Main
No ratings yet
Hive Main
33 pages
HIVE
No ratings yet
HIVE
28 pages
Hive
No ratings yet
Hive
49 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
07 Hive 01
No ratings yet
07 Hive 01
21 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Unit Iv Part - 1
No ratings yet
Unit Iv Part - 1
60 pages
Hive
No ratings yet
Hive
29 pages
7 Hive
No ratings yet
7 Hive
30 pages
Unit IV
No ratings yet
Unit IV
22 pages
Hive PPTs
No ratings yet
Hive PPTs
34 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Introduction To Hive
No ratings yet
Introduction To Hive
14 pages
Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
Bda Report
No ratings yet
Bda Report
16 pages
Slide-5 (AWS - IAM)
No ratings yet
Slide-5 (AWS - IAM)
28 pages
Hive
No ratings yet
Hive
50 pages
Hive - Self Learning Notes
No ratings yet
Hive - Self Learning Notes
69 pages
Hive Basics MCA
No ratings yet
Hive Basics MCA
8 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Hive and Hiveql
No ratings yet
Hive and Hiveql
10 pages
Apache Hive: An Introduction
No ratings yet
Apache Hive: An Introduction
51 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Hive
No ratings yet
Hive
4 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Object Oriented ABAP Part2
No ratings yet
Object Oriented ABAP Part2
49 pages
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
From Everand
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Install PHP 5.3 and 5.2 Together On Ubuntu 12.04
No ratings yet
Install PHP 5.3 and 5.2 Together On Ubuntu 12.04
8 pages
Spammed Office
No ratings yet
Spammed Office
26 pages
Basics of Software Testing: Test Progress Monitoring and Control
No ratings yet
Basics of Software Testing: Test Progress Monitoring and Control
26 pages
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Using Using Using Using Using Using Namespace Publicclass: Assign Sqlconnection Sqldataadapter Sqlcommandbuilder Dataset
No ratings yet
Using Using Using Using Using Using Namespace Publicclass: Assign Sqlconnection Sqldataadapter Sqlcommandbuilder Dataset
8 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
4 pages
Handout Vectors
No ratings yet
Handout Vectors
18 pages
Chapter 14 Huawei Big Data Solution
No ratings yet
Chapter 14 Huawei Big Data Solution
31 pages
Chapter 1 Introduction To Enterprise Systems For Management
No ratings yet
Chapter 1 Introduction To Enterprise Systems For Management
10 pages
The Free Hive Book
No ratings yet
The Free Hive Book
1 page
ExampleProjetMyfinancePart2 BB
No ratings yet
ExampleProjetMyfinancePart2 BB
24 pages
Dependency Properties: Dictionary of Keys and Values Provided by The Base Class Dependencyobject. The Key of An
No ratings yet
Dependency Properties: Dictionary of Keys and Values Provided by The Base Class Dependencyobject. The Key of An
8 pages
API - Mohit Raj
No ratings yet
API - Mohit Raj
9 pages
Be Information Technology Semester 5 2022 May Software Engineeringrev 2019 C Scheme
No ratings yet
Be Information Technology Semester 5 2022 May Software Engineeringrev 2019 C Scheme
1 page
Copie de Beige Modern Elegant Personal Faves Sustainable Habits Pinterest Pin
No ratings yet
Copie de Beige Modern Elegant Personal Faves Sustainable Habits Pinterest Pin
1 page
Web Services
No ratings yet
Web Services
20 pages
Azure Data Factory Cookbook: Data Engineers Guide To Build and Manage ETL and ELT Pipelines With Data Integration, 2nd Edition Dmitry Foshin
100% (4)
Azure Data Factory Cookbook: Data Engineers Guide To Build and Manage ETL and ELT Pipelines With Data Integration, 2nd Edition Dmitry Foshin
59 pages
Web Application Firewall Detect Block Common Web Application Attacks 33831
No ratings yet
Web Application Firewall Detect Block Common Web Application Attacks 33831
33 pages
Larman Chapter 1
No ratings yet
Larman Chapter 1
11 pages
Kalai Examination Final Document
No ratings yet
Kalai Examination Final Document
55 pages
System Administation and Maintenance PCC004 PDF
No ratings yet
System Administation and Maintenance PCC004 PDF
1 page
Azure CDN Presenatation
No ratings yet
Azure CDN Presenatation
14 pages
Programmatic Approach Using DATA Step and PROC SQL Creating A SAS Data Set Using DATA Step
No ratings yet
Programmatic Approach Using DATA Step and PROC SQL Creating A SAS Data Set Using DATA Step
11 pages
Rapid Bottleneck Identification: A Better Way To Load Test
No ratings yet
Rapid Bottleneck Identification: A Better Way To Load Test
8 pages
Voucher Buana Hotspot RP 2000 Up 780 02.24.24
No ratings yet
Voucher Buana Hotspot RP 2000 Up 780 02.24.24
5 pages
Copie de Beige Modern Elegant Personal Faves Sustainable Habits Pinterest Pin
No ratings yet
Copie de Beige Modern Elegant Personal Faves Sustainable Habits Pinterest Pin
1 page

Chapter 3 Hive - Distributed Data Warehouse

Uploaded by

Chapter 3 Hive - Distributed Data Warehouse

Uploaded by

Chapter 3 Hive - Distributed Data Warehouse

 Upon completion of this course, you will be able to learn:

2. Hive Functions and Architecture

3. Basic Hive Operations

 User behavior analysis

Non-real-time  Log analysis

Data  Daily/Weekly user clicks

Execution You can select more efficient algorithms to perform queries,

Computing depends on the cluster scale and

Hive Conventional Data Warehouse

Index Low efficiency High efficiency

Self-developed application models are

The reliability is low. If a query fails, you

Environment Low dependency on hardware, applicable Highly dependent on high-performance

Price Open-source product, free of charge Expensive in commercial use

2. Hive Functions and Architecture

3. Basic Hive Operations

Tez MapReduce Spark

warehouse based on the table type.

the JDBC connection.

Bucket Bucket Partition Skewed Normal

Managed Table External Table

The metadata and data are deleted

2. Hive Functions and Architecture

3. Basic Hive Operations

-- Browse the table:

-- Export data to HDFS:

EXPORT TABLE invites TO '/department';

hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';

 This course introduces Hive application scenarios, basic principles, Hive

1. Which of the following scenarios are applicable to Hive? ( )

 Huawei Cloud Official Web Link:

Copyright© 2020 Huawei Technologies Co., Ltd.

The information in this document may contain predictive

You might also like