0% found this document useful (0 votes)

38 views17 pages

Hive Full Lecture

Hive is an open source data warehouse system built on Hadoop that allows users to query large datasets using SQL. It is designed for data summarization and analysis of large datasets stored in Hadoop Distributed File System (HDFS). Hive converts SQL queries into MapReduce jobs for execution. It organizes data into tables and stores metadata about the schema in a metastore database. Some key uses of Hive include data mining, document indexing, predictive modeling, and custom applications with user interfaces.

Uploaded by

Atharv Chaudhari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views17 pages

Hive Full Lecture

Uploaded by

Atharv Chaudhari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Shivajirao Kadam Institute of Technology

and Management, Indore (M.P.)

Department of Computer Science and Engineering
Subject: Data Analytics [CS 503]

Lecture
on
“All About Hive”
Hive
Motivation
Yahoo worked on Pig to facilitate application deployment on
Hadoop.

Their need mainly was focused on unstructured data

Simultaneously Facebook started working on deploying warehouse

solutions on Hadoop that resulted in Hive
Hive Look
What is Hive
 Apache Hive is an open source data warehouse system built on top
of Hadoop to summarize Big Data.

Used for Data Analysis

Created for users comfortable with SQL

Used for Managing and Querying Structured Data

No Need to Learn Java

What is Hive
 Apache Hive is for reading, writing and managing large data set
files that are stored directly in either the Apache Hadoop Distributed
File System (HDFS) or other data storage systems such as Apache
HBase

 Hive enables SQL developers to write Hive Query Language (HQL)

statements that are similar to standard SQL statements for data query
and analysis.

It is designed to make MapReduce programming easier because you

don’t have to know and write lengthy Java code.
What is Hive
 Instead, you can write queries more simply in HQL, and Hive can
then create the map and reduce the functions.

The Hive generally runs on your workstation and converts your SQL
query into a series of jobs for execution on a Hadoop cluster.

Apache Hive organizes data into tables. This provides a means for
attaching the structure to data stored in HDFS.
Limitation of Hive
 Hive is not
 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates.
 Latency for Apache Hive queries is generally very high
Features of Hive
 It stores schema in a database and processed data into
HDFS

It is designed for OLAP

It supports user-defined functions (UDFs) where user can

provide its functionality.

Hive is fast and scalable.

How Hive Different from Pig

Apache Hive Apache Pig

It can handle structured data. It can handle semi-structured data.
It works on server-side of HDFS cluster. It works on client-side of HDFS cluster.
Hive is slower than Pig. Pig is comparatively faster than Hive.
Hive uses Hive Query Llanguage Pig uses pig-latin language.
It was developed by Facebook. It was developed by Yahoo
In HIve, all extensions are supported. Pig scripts end with .pig extension.

It support JDBC/ODBC drivers It does not support JDBC/ODBC drivers

Where to use Hive
1) Data Mining

2) Document Indexing

3) Predictive Modeling

4) Custom Facing UI
Important characteristics of Hive
1) In Hive, tables and databases are created first and then data is loaded into these tables.
2) Hadoop's programming works on flat files. So, Hive can use directory structures to
"partition" data to improve performance on certain queries.
3) A new and important component of Hive i.e. Metastore used for storing schema
information. This Metastore typically resides in a relational database. We can interact
with Hive using methods like
 Web GUI
 Java Database Connectivity (JDBC) interface

4) Generally, HQL syntax is similar to the SQL syntax that most data analysts are
familiar with. The Sample query below display all the records present in mentioned
table name.

Sample query : Select * from <TableName>

5) Hive supports four file formats those are TEXTFILE, SEQUENCEFILE, ORC and
RCFILE (Record Columnar File).
Hive Architecture
Hive Architecture
 Hive Client: Support Different types of client
 Thrift Server - It is a cross-language service provider platform
that serves the request from all those programming languages
that supports Thrift.

 JDBC Driver - It is used to establish a connection between hive

and Java applications. The JDBC Driver is present in the class
org.apache.hadoop.hive.jdbc.HiveDriver.

 ODBC Driver - It allows the applications that support the ODBC

protocol to connect to Hive.
Hive Architecture
 Hive Services: Provided different types of Services
 Hive CLI - The Hive CLI (Command Line Interface) is a shell where we
can execute Hive queries and commands.

 Hive Web User Interface - The Hive Web UI is just an alternative of Hive
CLI. It provides a web-based GUI for executing Hive queries and
commands.

 Hive MetaStore - It is a central repository that stores all the structure

information of various tables and partitions in the warehouse. It also
includes metadata of column and its type information, the serializers and
deserializers which is used to read and write data and the corresponding
HDFS files where the data is stored.
Hive Architecture
 Hive Services: Provided different types of Services
 Hive Server - It is referred to as Apache Thrift Server. It accepts the request
from different clients and provides it to Hive Driver.

 Hive Driver - It receives queries from different sources like web UI, CLI,
Thrift, and JDBC/ODBC driver. It transfers the queries to the compiler.

 Hive Compiler - The purpose of the compiler is to parse the query and
perform semantic analysis on the different query blocks and expressions. It
converts HiveQL statements into MapReduce jobs.

 Hive Execution Engine - Optimizer generates the logical plan in the form of
DAG of map-reduce tasks and HDFS tasks. In the end, the execution engine
executes the incoming tasks in the order of their dependencies .
Thank You

Apache HIVE
100% (1)
Apache HIVE
105 pages
PDF of Practicle From 2 To 4
No ratings yet
PDF of Practicle From 2 To 4
12 pages
Indian Railway Officers CUG Directory NCRLY PDF
No ratings yet
Indian Railway Officers CUG Directory NCRLY PDF
4 pages
NLP BAD613B FullNotes
No ratings yet
NLP BAD613B FullNotes
158 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Question Bank
No ratings yet
Question Bank
22 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
21AI643
No ratings yet
21AI643
2 pages
Bda Unit 5 Notes
No ratings yet
Bda Unit 5 Notes
23 pages
Architecture and Working of Hive
No ratings yet
Architecture and Working of Hive
7 pages
Unit 4 Hadoop Ecosystem - HIVE and PIG
No ratings yet
Unit 4 Hadoop Ecosystem - HIVE and PIG
157 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
Mater Dei College College of Nursing Tubigon, Bohol
No ratings yet
Mater Dei College College of Nursing Tubigon, Bohol
27 pages
Csm-Part C
No ratings yet
Csm-Part C
25 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
TFB Work Plan Lericc 2024
No ratings yet
TFB Work Plan Lericc 2024
2 pages
Previewpdf
No ratings yet
Previewpdf
102 pages
Probablistic Data Structures
No ratings yet
Probablistic Data Structures
5 pages
Hive - Self Learning Notes
No ratings yet
Hive - Self Learning Notes
69 pages
Unit 1
No ratings yet
Unit 1
83 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
Chapter 7
No ratings yet
Chapter 7
84 pages
Hive
No ratings yet
Hive
30 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Hive
No ratings yet
Hive
63 pages
DSS U4 HIVE Rev1.1
No ratings yet
DSS U4 HIVE Rev1.1
23 pages
1 s2.0 S2352864822000827 Main PDF
No ratings yet
1 s2.0 S2352864822000827 Main PDF
18 pages
Actividad 7. Investigación Hive
No ratings yet
Actividad 7. Investigación Hive
25 pages
Hive Slides-2
No ratings yet
Hive Slides-2
25 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
17 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
Hive Tutorial
No ratings yet
Hive Tutorial
19 pages
Chapter - 4 - Data Access - Hive
No ratings yet
Chapter - 4 - Data Access - Hive
35 pages
LectureNotes Hive Final
No ratings yet
LectureNotes Hive Final
36 pages
BS Computer Science Batch-25 Semester 5 Fall 2024: Lecture1 9-9-2024
No ratings yet
BS Computer Science Batch-25 Semester 5 Fall 2024: Lecture1 9-9-2024
18 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Unit 5 Handouts
No ratings yet
Unit 5 Handouts
16 pages
Hive
No ratings yet
Hive
12 pages
Snowflake Interview Question
No ratings yet
Snowflake Interview Question
20 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
Bda Ia-3 QB-1
No ratings yet
Bda Ia-3 QB-1
17 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
Emailing Hive PDF
No ratings yet
Emailing Hive PDF
25 pages
Deep Learning Model For House Price Prediction Using Heterogeneous Data Analysis Along With Joint Self-Attention Mechanism
No ratings yet
Deep Learning Model For House Price Prediction Using Heterogeneous Data Analysis Along With Joint Self-Attention Mechanism
16 pages
7 Hive
No ratings yet
7 Hive
30 pages
HIVE
No ratings yet
HIVE
18 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Deep Learning in Data Analytics: Recent Techniques, Practices and Applications 1st Edition Debi Prasanna Acharjya
No ratings yet
Deep Learning in Data Analytics: Recent Techniques, Practices and Applications 1st Edition Debi Prasanna Acharjya
80 pages
Information Technology Unit 5
No ratings yet
Information Technology Unit 5
13 pages
G9 - ICT Applications - Expertsystems
No ratings yet
G9 - ICT Applications - Expertsystems
26 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Hive Updated
No ratings yet
Hive Updated
18 pages
Hive
No ratings yet
Hive
28 pages
Drug Recommendation Using Recurrent Neural Networks Augmented With Cellular Automata
No ratings yet
Drug Recommendation Using Recurrent Neural Networks Augmented With Cellular Automata
7 pages
Introduction To HIVE
No ratings yet
Introduction To HIVE
8 pages
Hive
No ratings yet
Hive
52 pages
BD U-5 (Anupam Sir)
No ratings yet
BD U-5 (Anupam Sir)
12 pages
Unit 3
No ratings yet
Unit 3
8 pages
Bda Report
No ratings yet
Bda Report
16 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Database Revision Question
No ratings yet
Database Revision Question
7 pages
COMP - 125 - Quiz 3
No ratings yet
COMP - 125 - Quiz 3
9 pages
Big Data Bigger Outcomes
No ratings yet
Big Data Bigger Outcomes
6 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
CT Lung Nodule Segmentation A Comparative Study of Data Preprocessing and Deep Learning Models
No ratings yet
CT Lung Nodule Segmentation A Comparative Study of Data Preprocessing and Deep Learning Models
7 pages
Rohini 62153139785
No ratings yet
Rohini 62153139785
5 pages
Hive
No ratings yet
Hive
5 pages
CV Hamed Farzin
No ratings yet
CV Hamed Farzin
8 pages
ML-ProblemStatement Youtube Adview Prediction-1 Lyst8087
No ratings yet
ML-ProblemStatement Youtube Adview Prediction-1 Lyst8087
2 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Hive Architecture
No ratings yet
Hive Architecture
7 pages
Assignment 4-Gcc: Hive Is Not
No ratings yet
Assignment 4-Gcc: Hive Is Not
3 pages
Unleashing The Power of Large Language Models Fauber
No ratings yet
Unleashing The Power of Large Language Models Fauber
4 pages
NoteGPT - Apache Hive Tutorial For Beginners - Big Data Training - Edureka - Big Data Rewind
No ratings yet
NoteGPT - Apache Hive Tutorial For Beginners - Big Data Training - Edureka - Big Data Rewind
15 pages
An Overview of Named Entity Recognition
No ratings yet
An Overview of Named Entity Recognition
5 pages
Tapan Kumar Experienced Format
No ratings yet
Tapan Kumar Experienced Format
2 pages
Nitesh Resume
No ratings yet
Nitesh Resume
1 page
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Hive Full Lecture

Uploaded by

Hive Full Lecture

Uploaded by

Shivajirao Kadam Institute of Technology

and Management, Indore (M.P.)

Their need mainly was focused on unstructured data

Simultaneously Facebook started working on deploying warehouse

Used for Data Analysis

Created for users comfortable with SQL

Used for Managing and Querying Structured Data

No Need to Learn Java

 Hive enables SQL developers to write Hive Query Language (HQL)

It is designed to make MapReduce programming easier because you

It is designed for OLAP

It supports user-defined functions (UDFs) where user can

Hive is fast and scalable.

Apache Hive Apache Pig

It support JDBC/ODBC drivers It does not support JDBC/ODBC drivers

Sample query : Select * from <TableName>

 JDBC Driver - It is used to establish a connection between hive

 ODBC Driver - It allows the applications that support the ODBC

 Hive MetaStore - It is a central repository that stores all the structure

You might also like