0% found this document useful (0 votes)
62 views24 pages

HIVE

Hive is a data warehouse infrastructure built on top of Hadoop for querying and analyzing large datasets stored in Hadoop Distributed File System (HDFS). It allows users to query data using SQL-like language called HiveQL. Some key features of Hive include its ability to handle large datasets across clusters using SQL-like queries, integration with Hadoop ecosystem, and support for various data formats. However, Hive is not suitable for real-time data and online transaction processing. Hive uses a metastore to manage metadata and a query compiler to generate MapReduce jobs from HiveQL queries which are then executed. The document provides details about Hive architecture, data flow, data modeling concepts, different modes of operation, installation process and various Hive commands
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views24 pages

HIVE

Hive is a data warehouse infrastructure built on top of Hadoop for querying and analyzing large datasets stored in Hadoop Distributed File System (HDFS). It allows users to query data using SQL-like language called HiveQL. Some key features of Hive include its ability to handle large datasets across clusters using SQL-like queries, integration with Hadoop ecosystem, and support for various data formats. However, Hive is not suitable for real-time data and online transaction processing. Hive uses a metastore to manage metadata and a query compiler to generate MapReduce jobs from HiveQL queries which are then executed. The document provides details about Hive architecture, data flow, data modeling concepts, different modes of operation, installation process and various Hive commands
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

HIVE

élaborée par :

Abbes Feriel & Guesmi Nour


PLAN
History of Hive
What is Hive
Use cases of Hive
Feature of Hive
Limitation of Hive
Architecture of Hive
Data flow in Hive
Hive data modelling
Different modes of Hive
Hive Installation
History of hive
Not all users were well
Facebook used Hadoop as versed with java and Hive was developed with
a solution to handle the other coding languages. a vision to incorporate
growing big data This proved to be a the concepts of tables
disadvantage for and columns just like SQl.
them.

As we know,Hadoop uses
MapReduce for processing Users were comfortable
data. MapReduce required with writing queries in
users to write long codes SQL
(Java)
what is hive ?

Hive is a data warehouse which is used for querying and analyzing


large datasets stored in HDFS.
Hive uses a query language call HiveQL which is similar to SQL.
Hive use cases

Personalized
Business data analysis recommendations

Scientific Data Social media


Analysis USE CASES analysis

Marketing data Financial data


analysis analysis

Brainstorming
features of Hive
By using HiveQL
The use of a SQL-like
multiple users can
language called HiveQL in
simultaneously query
Hive is easier than long
data
codes

Hive seamlessly integrates Hive supports a variety of


with other Hadoop data formats
components like HDFS,
HBase, and YARN,
allowing for
comprehensive data
processing
LIMITATIONS OF HIVE

01
Hive is not capable of
03
handling real-time data.

02
It is not designed for online
transaction processing. 02 01
03
Hive queries contain high
latency
ARCHITECTURE OF HIVE
DATA FLOW IN HIVE
Hive Data Modeling
Tables in Hive are created the same
Tables way it is done in RDBMS

tables are organized into partitions


Hive Data
Modeling
Partitions for grouping similar types of data
based on the partition key

Data present in partitions can be


Buckets further divided into buckets for
efficient querying
Different modes of hive

Local mode MapReduce mode

Used when Hadoop has one data


node, and the amount of data is Used when the data in Hadoop is
small. spread across multiple data nodes.

The processing will be very fast on Processing large datasets can be


smaller datasets, which are more efficient using this mode.
present in local machines
Hive installation
process
Requirements
Since Hive is built on TOP of apache
Hadoop , Hive queries the large datasets
stored and processed by Hadoop. Thus the
presence of Hadoop is essential

The entire Hadoop ecosystem is written


in Java. This is , however only required if
we wish to create custom inputs and
outputs
Hive commands
Create Database
hive> create database demo;

Let's check the existence of a newly created database.

hive> show databases;


Hive commands
Drop Database
hive> drop database demo;

Let's check whether the database is dropped or not.

hive> show databases;

=> the database demo


is not present in the list.
Hence, the database is
dropped successfully.
Hive commands
Create Table
Let's create an internal table by using the following command:

Here, the command also includes the information that the data is separated by ','.

Let's see the metadata of


the created table by using
the following command:

hive> describe demo.employee


Hive commands
Let's see the result when we try to create the existing table again.

In such a case, the exception occurs. If we want to ignore this type of


exception, we can use if not exists command while creating the table.
Hive commands
External Table
Let's create a directory on HDFS by using the following command:
hdfs dfs -mkdir /HiveDirectory
Now, store the file on the created directory.
hdfs dfs -put hive/emp_details /HiveDirectory
Let's create an external table using the following command: -
Hive commands
Retreive data from table
Hive commands
Drop Table
select the database from which
we want to delete the table by
using the following command

Let's check the list of existing tables in the corresponding database


Hive commands
Drop Table
Now, drop the table by using the following command

Let's check whether the table is dropped or not.

the table new_employee is


not present in the list.
Hence, the table is dropped
successfully.
Hive commands
Alter Table
In Hive, we can perform modifications in the existing table like changing the table
name, column name, comments, and table properties. It provides SQL like commands
to alter the table.

Rename a Table

Adding column
Hive commands
Change Column
In Hive, we can rename a column, change its type and position. Here, we are changing
the name of the column by using the following signature:

Delete Column
Thank you
Does anyone have any question?

You might also like