0% found this document useful (0 votes)

62 views24 pages

HIVE

Hive is a data warehouse infrastructure built on top of Hadoop for querying and analyzing large datasets stored in Hadoop Distributed File System (HDFS). It allows users to query data using SQL-like language called HiveQL. Some key features of Hive include its ability to handle large datasets across clusters using SQL-like queries, integration with Hadoop ecosystem, and support for various data formats. However, Hive is not suitable for real-time data and online transaction processing. Hive uses a metastore to manage metadata and a query compiler to generate MapReduce jobs from HiveQL queries which are then executed. The document provides details about Hive architecture, data flow, data modeling concepts, different modes of operation, installation process and various Hive commands

Uploaded by

Iskander Denguezli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views24 pages

HIVE

Uploaded by

Iskander Denguezli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

HIVE

élaborée par :

Abbes Feriel & Guesmi Nour

PLAN
History of Hive
What is Hive
Use cases of Hive
Feature of Hive
Limitation of Hive
Architecture of Hive
Data flow in Hive
Hive data modelling
Different modes of Hive
Hive Installation
History of hive
Not all users were well
Facebook used Hadoop as versed with java and Hive was developed with
a solution to handle the other coding languages. a vision to incorporate
growing big data This proved to be a the concepts of tables
disadvantage for and columns just like SQl.
them.

As we know,Hadoop uses
MapReduce for processing Users were comfortable
data. MapReduce required with writing queries in
users to write long codes SQL
(Java)
what is hive ?

Hive is a data warehouse which is used for querying and analyzing

large datasets stored in HDFS.
Hive uses a query language call HiveQL which is similar to SQL.
Hive use cases

Personalized
Business data analysis recommendations

Scientific Data Social media

Analysis USE CASES analysis

Marketing data Financial data

analysis analysis

Brainstorming
features of Hive
By using HiveQL
The use of a SQL-like
multiple users can
language called HiveQL in
simultaneously query
Hive is easier than long
data
codes

Hive seamlessly integrates Hive supports a variety of

with other Hadoop data formats
components like HDFS,
HBase, and YARN,
allowing for
comprehensive data
processing
LIMITATIONS OF HIVE

01
Hive is not capable of
03
handling real-time data.

02
It is not designed for online
transaction processing. 02 01
03
Hive queries contain high
latency
ARCHITECTURE OF HIVE
DATA FLOW IN HIVE
Hive Data Modeling
Tables in Hive are created the same
Tables way it is done in RDBMS

tables are organized into partitions

Hive Data
Modeling
Partitions for grouping similar types of data
based on the partition key

Data present in partitions can be

Buckets further divided into buckets for
efficient querying
Different modes of hive

Local mode MapReduce mode

Used when Hadoop has one data

node, and the amount of data is Used when the data in Hadoop is
small. spread across multiple data nodes.

The processing will be very fast on Processing large datasets can be

smaller datasets, which are more efficient using this mode.
present in local machines
Hive installation
process
Requirements
Since Hive is built on TOP of apache
Hadoop , Hive queries the large datasets
stored and processed by Hadoop. Thus the
presence of Hadoop is essential

The entire Hadoop ecosystem is written

in Java. This is , however only required if
we wish to create custom inputs and
outputs
Hive commands
Create Database
hive> create database demo;

Let's check the existence of a newly created database.

hive> show databases;

Hive commands
Drop Database
hive> drop database demo;

Let's check whether the database is dropped or not.

hive> show databases;

=> the database demo

is not present in the list.
Hence, the database is
dropped successfully.
Hive commands
Create Table
Let's create an internal table by using the following command:

Here, the command also includes the information that the data is separated by ','.

Let's see the metadata of

the created table by using
the following command:

hive> describe demo.employee

Hive commands
Let's see the result when we try to create the existing table again.

In such a case, the exception occurs. If we want to ignore this type of

exception, we can use if not exists command while creating the table.
Hive commands
External Table
Let's create a directory on HDFS by using the following command:
hdfs dfs -mkdir /HiveDirectory
Now, store the file on the created directory.
hdfs dfs -put hive/emp_details /HiveDirectory
Let's create an external table using the following command: -
Hive commands
Retreive data from table
Hive commands
Drop Table
select the database from which
we want to delete the table by
using the following command

Let's check the list of existing tables in the corresponding database

Hive commands
Drop Table
Now, drop the table by using the following command

Let's check whether the table is dropped or not.

the table new_employee is

not present in the list.
Hence, the table is dropped
successfully.
Hive commands
Alter Table
In Hive, we can perform modifications in the existing table like changing the table
name, column name, comments, and table properties. It provides SQL like commands
to alter the table.

Rename a Table

Adding column
Hive commands
Change Column
In Hive, we can rename a column, change its type and position. Here, we are changing
the name of the column by using the following signature:

Delete Column
Thank you
Does anyone have any question?

Rocky Linux Admin Guide
No ratings yet
Rocky Linux Admin Guide
279 pages
BigData Nptel
No ratings yet
BigData Nptel
813 pages
Power BI Training Course - Day 2 - Lab Manual
No ratings yet
Power BI Training Course - Day 2 - Lab Manual
17 pages
3.1.2. LAB PRACTICE - Footprinting With Maltego v1
100% (1)
3.1.2. LAB PRACTICE - Footprinting With Maltego v1
20 pages
Power BI Training Course - Day 2 - Presentation Handout
No ratings yet
Power BI Training Course - Day 2 - Presentation Handout
31 pages
Workbook 70.412
No ratings yet
Workbook 70.412
65 pages
BDA Textbook Main
No ratings yet
BDA Textbook Main
370 pages
RHSA1 Day1new
No ratings yet
RHSA1 Day1new
52 pages
RH254-RHEL7 Self Prepare Slides
No ratings yet
RH254-RHEL7 Self Prepare Slides
342 pages
Cloud Computing
No ratings yet
Cloud Computing
25 pages
CHAPTER 3 - Scanning Networks
No ratings yet
CHAPTER 3 - Scanning Networks
14 pages
Lab 2
No ratings yet
Lab 2
11 pages
Apache Guacamole Install Guide
No ratings yet
Apache Guacamole Install Guide
12 pages
70-741: Networking With Windows Server 2016 Chapter 2 - Installing and Configuring DHCP
No ratings yet
70-741: Networking With Windows Server 2016 Chapter 2 - Installing and Configuring DHCP
31 pages
Kubernetes - Objects Nov24
No ratings yet
Kubernetes - Objects Nov24
11 pages
Cloud Computing - Lab 3
No ratings yet
Cloud Computing - Lab 3
2 pages
RHSA1 Day3new
No ratings yet
RHSA1 Day3new
28 pages
RHSA1 Day2new
No ratings yet
RHSA1 Day2new
42 pages
Lab 3 - Visualize Data in Power BI
No ratings yet
Lab 3 - Visualize Data in Power BI
42 pages
Deploying and Managing Certificates
No ratings yet
Deploying and Managing Certificates
34 pages
RHSA1 Day5
No ratings yet
RHSA1 Day5
37 pages
Lab 3 - Google Hacking
No ratings yet
Lab 3 - Google Hacking
1 page
(RHSA 124) : Creating, Viewing, and Editing Text Files
No ratings yet
(RHSA 124) : Creating, Viewing, and Editing Text Files
53 pages
(RHSA 124) : Monitoring and Managing Linux Processes
No ratings yet
(RHSA 124) : Monitoring and Managing Linux Processes
56 pages
DevOps Cours Jenkins
No ratings yet
DevOps Cours Jenkins
44 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
RHSA1 Day4new
No ratings yet
RHSA1 Day4new
37 pages
70-410R2 Full Lab PDF
0% (1)
70-410R2 Full Lab PDF
84 pages
Digital Forensics: Computer Forensics
No ratings yet
Digital Forensics: Computer Forensics
26 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Get Started With Windows Server 2016
No ratings yet
Get Started With Windows Server 2016
213 pages
Linux+ Objective 2.1 Labs
No ratings yet
Linux+ Objective 2.1 Labs
8 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
File Types in Data Engineering!
No ratings yet
File Types in Data Engineering!
18 pages
Chapter 3: Gathering Network and Host Information: Scanning and Enumeration
No ratings yet
Chapter 3: Gathering Network and Host Information: Scanning and Enumeration
14 pages
Hive Main
No ratings yet
Hive Main
33 pages
10: Icmpv6 Neighbor Discovery: Rick Graziani Cabrillo College Rick - Graziani@Cabrillo - Edu
No ratings yet
10: Icmpv6 Neighbor Discovery: Rick Graziani Cabrillo College Rick - Graziani@Cabrillo - Edu
32 pages
3.scanning Network
No ratings yet
3.scanning Network
25 pages
CP R80.40 Installation and Upgrade Guide
No ratings yet
CP R80.40 Installation and Upgrade Guide
799 pages
Penetration Testing Reportmars 14, 2022: Prepared By: Email: Telephone
100% (1)
Penetration Testing Reportmars 14, 2022: Prepared By: Email: Telephone
19 pages
Implementing Samba 4 Sample Chapter
No ratings yet
Implementing Samba 4 Sample Chapter
46 pages
Microsoft 70-411 Exam Questions Updated October 2014
No ratings yet
Microsoft 70-411 Exam Questions Updated October 2014
27 pages
Web Application Proxy and ADFS On The AWS Cloud
No ratings yet
Web Application Proxy and ADFS On The AWS Cloud
31 pages
Internet Information Server: Indexing Web Sites
No ratings yet
Internet Information Server: Indexing Web Sites
80 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
Lab 2 - Footprinting Using Metagoofil and Maltego
No ratings yet
Lab 2 - Footprinting Using Metagoofil and Maltego
2 pages
Lab 01 - Securing The Router For Administrative Access
No ratings yet
Lab 01 - Securing The Router For Administrative Access
39 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
Example: Internet Information Server
No ratings yet
Example: Internet Information Server
40 pages
70-741 PPTs
No ratings yet
70-741 PPTs
81 pages
AI-100 ExamPrep
No ratings yet
AI-100 ExamPrep
46 pages
5.1.5 Lab - Tracing A Route
No ratings yet
5.1.5 Lab - Tracing A Route
4 pages
Introduction To Analytics
No ratings yet
Introduction To Analytics
342 pages
Splunk ActiveDirectory 1.1.4 DeployAD
No ratings yet
Splunk ActiveDirectory 1.1.4 DeployAD
65 pages
AWS DAS-C01 Sample Questions
No ratings yet
AWS DAS-C01 Sample Questions
5 pages
Big Data Engineer Ibm Exploree Cartes - Quizlet
No ratings yet
Big Data Engineer Ibm Exploree Cartes - Quizlet
30 pages
Chapter 3 - Implementing Advanced Network Services
No ratings yet
Chapter 3 - Implementing Advanced Network Services
38 pages
Super Important Questions For BDA-18CS72: Module-1
No ratings yet
Super Important Questions For BDA-18CS72: Module-1
2 pages
3.4.6 Lab - Configure VLANs and Trunking
No ratings yet
3.4.6 Lab - Configure VLANs and Trunking
8 pages
9.1.1.6 Lab - Encrypting and Decrypting Data Using OpenSSL - ILM Estudantes
No ratings yet
9.1.1.6 Lab - Encrypting and Decrypting Data Using OpenSSL - ILM Estudantes
3 pages
CS1315: Introduction To Media Computation
No ratings yet
CS1315: Introduction To Media Computation
41 pages
Active Directory Fundamentals
No ratings yet
Active Directory Fundamentals
38 pages
SCOM Tutorial
No ratings yet
SCOM Tutorial
4 pages
NetApp Setup Cheat Sheet
No ratings yet
NetApp Setup Cheat Sheet
6 pages
VMware VSphere Install, Configure, Manage (V5.5) - LAB MANUAL
No ratings yet
VMware VSphere Install, Configure, Manage (V5.5) - LAB MANUAL
168 pages
Syallaus 6 Final
No ratings yet
Syallaus 6 Final
16 pages
Linux Week 1 Quiz 1 CH 1 CH 2
No ratings yet
Linux Week 1 Quiz 1 CH 1 CH 2
2 pages
Lab 1 - Introduction To FortiGate
No ratings yet
Lab 1 - Introduction To FortiGate
10 pages
CC ZG522 Course Handout
No ratings yet
CC ZG522 Course Handout
6 pages
EPGP in Data Science (Curriculum)
No ratings yet
EPGP in Data Science (Curriculum)
30 pages
How To Create A SSL Certificate On Apache For Debian 7 PDF
No ratings yet
How To Create A SSL Certificate On Apache For Debian 7 PDF
4 pages
Technology
No ratings yet
Technology
20 pages
01-DS320-v67-Course Introduction PDF
No ratings yet
01-DS320-v67-Course Introduction PDF
84 pages
18CS72 - BDA - Module - 4 - Question - Bank
100% (2)
18CS72 - BDA - Module - 4 - Question - Bank
2 pages
Luigi Documentation: Release 1.0
No ratings yet
Luigi Documentation: Release 1.0
81 pages
Project Data Lake
No ratings yet
Project Data Lake
7 pages
Career Precis:: A Succinct Competence Profile of Pavan Kumar Katta
No ratings yet
Career Precis:: A Succinct Competence Profile of Pavan Kumar Katta
13 pages
Hive-NASA Case Study
100% (1)
Hive-NASA Case Study
9 pages
Spark SQL
No ratings yet
Spark SQL
18 pages
Unit Iii Basics - of - Hadoop
No ratings yet
Unit Iii Basics - of - Hadoop
22 pages
BDP Unit 4
No ratings yet
BDP Unit 4
28 pages
Ravi Teja AWS Data Engineer
No ratings yet
Ravi Teja AWS Data Engineer
8 pages
PDA Project
No ratings yet
PDA Project
7 pages
BDA Unit-6
No ratings yet
BDA Unit-6
11 pages
BDA Lab2
No ratings yet
BDA Lab2
8 pages
Ajai Chaganti AH
No ratings yet
Ajai Chaganti AH
6 pages
21CD744
No ratings yet
21CD744
2 pages
Pooja
No ratings yet
Pooja
3 pages
Bcse 0157
No ratings yet
Bcse 0157
1 page
BDA Model Paper-1
No ratings yet
BDA Model Paper-1
2 pages

HIVE

Uploaded by

HIVE

Uploaded by

HIVE

Abbes Feriel & Guesmi Nour

Hive is a data warehouse which is used for querying and analyzing

Scientific Data Social media

Marketing data Financial data

Hive seamlessly integrates Hive supports a variety of

tables are organized into partitions

Data present in partitions can be

Local mode MapReduce mode

Used when Hadoop has one data

The processing will be very fast on Processing large datasets can be

The entire Hadoop ecosystem is written

Let's check the existence of a newly created database.

hive> show databases;

Let's check whether the database is dropped or not.

hive> show databases;

=> the database demo

Let's see the metadata of

hive> describe demo.employee

In such a case, the exception occurs. If we want to ignore this type of

Let's check the list of existing tables in the corresponding database

Let's check whether the table is dropped or not.

the table new_employee is

You might also like