0% found this document useful (0 votes)

32 views3 pages

Unit 3 Hive

Uploaded by

210701274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views3 pages

Unit 3 Hive

Uploaded by

210701274

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Hive is a data warehousing tool built on top of Hadoop, mainly used for data analytics in handling large

datasets stored in a distributed environment. It provides a SQL-like interface (HiveQL) that enables users
to query, analyze, and manage large datasets in the Hadoop Distributed File System (HDFS) without deep
knowledge of complex MapReduce programming. This makes Hive a popular choice in data analytics for
processing, summarizing, and querying structured data.

Here’s a breakdown of key Hive components and its working process:

1. Hive Shell

The Hive Shell, or Hive CLI (Command-Line Interface), is the command-line interface through which users
can interact with Hive. Users can execute HiveQL (similar to SQL) commands in the Hive Shell to manage
data, create tables, and run queries. The shell serves as the starting point for most interactions with
Hive, allowing analysts and data engineers to submit commands directly.

2. Hive Services

Hive Services are components that handle different parts of the data processing lifecycle in Hive. Key
services include:

 Hive Thrift Server: Allows external applications to connect to Hive using JDBC, ODBC, and other
interfaces, supporting a wide range of programming languages.

 Hive Driver: Manages and processes HiveQL queries, communicating with the execution engine
and returning results.

 CLI Service: Facilitates command-line interface communication, primarily through the Hive Shell.

 Compiler: Translates the high-level HiveQL into low-level MapReduce tasks for Hadoop to
execute.

These services allow Hive to handle multiple types of requests and interact with various clients and
applications.

3. Hive Metastore

The Hive Metastore is a crucial component that acts as a central repository for Hive metadata. It stores
information about the structure and properties of Hive databases, tables, partitions, columns, and data
types. The Metastore enables Hive to efficiently manage schemas and query data, storing metadata in a
relational database like MySQL or Derby. When queries are run, Hive uses this metadata to locate data
files in HDFS and interpret the data structure.

4. HiveQL

HiveQL is the query language used in Hive, based on SQL. It includes most of SQL’s features and adds
specific capabilities for working with Hadoop, such as partitioning and bucketing data. Through HiveQL,
users can execute queries to manipulate and retrieve data stored in Hadoop, without needing to write
complex MapReduce code. HiveQL is powerful for running analytical queries on massive datasets and
includes commands for:

 Data Definition Language (DDL): Used for creating and modifying tables, databases, etc.
 Data Manipulation Language (DML): Used for querying, inserting, updating, and deleting data.

 User-Defined Functions (UDFs): Custom functions that can be added to extend Hive’s
capabilities.

5. Working of Hive

The workflow in Hive involves several steps:

1. Query Submission: A user submits a HiveQL query via the Hive Shell, Thrift Server, or other
interface.

2. Compilation: The query is parsed and translated into a directed acyclic graph (DAG) of
MapReduce tasks by the compiler.

3. Optimization: The compiler optimizes the query by reducing unnecessary steps, reusing
intermediate results, and performing optimizations for faster execution.

4. Execution: The Hive execution engine submits the optimized tasks to Hadoop's MapReduce or
Tez framework, where they are executed in parallel across Hadoop clusters.

5. Result Retrieval: Once execution completes, results are gathered and sent back to the Hive
interface from which the query was initiated.

Hive Architecture

The following architecture explains the flow of submission of query into Hive.
Hive Client

Hive allows writing applications in various languages, including Java, Python, and C++. It supports
different types of clients such as:-

o Thrift Server - It is a cross-language service provider platform that serves the request from all
those programming languages that supports Thrift.

o JDBC Driver - It is used to establish a connection between hive and Java applications. The JDBC
Driver is present in the class org.apache.hadoop.hive.jdbc.HiveDriver.

o ODBC Driver - It allows the applications that support the ODBC protocol to connect to Hive.

Hive Services

The following are the services provided by Hive:-

o Hive CLI - The Hive CLI (Command Line Interface) is a shell where we can execute Hive queries
and commands.

o Hive Web User Interface - The Hive Web UI is just an alternative of Hive CLI. It provides a web-
based GUI for executing Hive queries and commands.

o Hive MetaStore - It is a central repository that stores all the structure information of various
tables and partitions in the warehouse. It also includes metadata of column and its type
information, the serializers and deserializers which is used to read and write data and the
corresponding HDFS files where the data is stored.

o Hive Server - It is referred to as Apache Thrift Server. It accepts the request from different clients
and provides it to Hive Driver.

o Hive Driver - It receives queries from different sources like web UI, CLI, Thrift, and JDBC/ODBC
driver. It transfers the queries to the compiler.

o Hive Compiler - The purpose of the compiler is to parse the query and perform semantic analysis
on the different query blocks and expressions. It converts HiveQL statements into MapReduce
jobs.

o Hive Execution Engine - Optimizer generates the logical plan in the form of DAG of map-reduce
tasks and HDFS tasks. In the end, the execution engine executes the incoming tasks in the order
of their dependencies.

Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Mysql Practice
No ratings yet
Mysql Practice
4 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
Introduction To HIVE
No ratings yet
Introduction To HIVE
8 pages
Assignment 4-Gcc: Hive Is Not
No ratings yet
Assignment 4-Gcc: Hive Is Not
3 pages
Hive Slides-2
No ratings yet
Hive Slides-2
25 pages
Unit 3
No ratings yet
Unit 3
8 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
14 pages
Hive
No ratings yet
Hive
5 pages
Bda Unit 5 Notes
No ratings yet
Bda Unit 5 Notes
23 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
Chapter - 4 - Data Access - Hive
No ratings yet
Chapter - 4 - Data Access - Hive
35 pages
Hive
No ratings yet
Hive
30 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Bda Exp-6
No ratings yet
Bda Exp-6
10 pages
Architecture and Working of Hive
No ratings yet
Architecture and Working of Hive
7 pages
Hana Abap
100% (10)
Hana Abap
100 pages
Data Processing For Ss 1 2 3
100% (1)
Data Processing For Ss 1 2 3
4 pages
Hive Tutorial
No ratings yet
Hive Tutorial
19 pages
Hive
No ratings yet
Hive
23 pages
Hive Architecture and Working
No ratings yet
Hive Architecture and Working
2 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
Data Mining
No ratings yet
Data Mining
5 pages
Introduction To Hive
No ratings yet
Introduction To Hive
28 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
DSS U4 HIVE Rev1.1
No ratings yet
DSS U4 HIVE Rev1.1
23 pages
Emailing Hive PDF
No ratings yet
Emailing Hive PDF
25 pages
NE-20761C Querying Data With Transact-SQL: Duration Level Technology Delivery Method Training Credits
No ratings yet
NE-20761C Querying Data With Transact-SQL: Duration Level Technology Delivery Method Training Credits
5 pages
Unit 3
No ratings yet
Unit 3
23 pages
RDBMS Unit-3
No ratings yet
RDBMS Unit-3
16 pages
BDA Answers
No ratings yet
BDA Answers
10 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Day 4
No ratings yet
Day 4
10 pages
Interview Sample Questions
No ratings yet
Interview Sample Questions
3 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
BDA Unit V
No ratings yet
BDA Unit V
23 pages
William Stallings Computer Organization and Architecture 8th Edition Internal Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Internal Memory
20 pages
7 Hive
No ratings yet
7 Hive
30 pages
Fi Documnt
100% (1)
Fi Documnt
3 pages
Health Science Center IT Center - Training Training@health - Ufl.edu 352-273-5051
No ratings yet
Health Science Center IT Center - Training Training@health - Ufl.edu 352-273-5051
33 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Data Scientist - Docx .2
No ratings yet
Data Scientist - Docx .2
10 pages
Dashboard Canvas 2.0
No ratings yet
Dashboard Canvas 2.0
3 pages
Power Query Editor Questions
No ratings yet
Power Query Editor Questions
1 page
Unit 3 Hive Overview and Architecture
No ratings yet
Unit 3 Hive Overview and Architecture
5 pages
Hive
No ratings yet
Hive
49 pages
Hive Architecture
No ratings yet
Hive Architecture
7 pages
Database Bca
No ratings yet
Database Bca
148 pages
Introduction To Hive-5
No ratings yet
Introduction To Hive-5
4 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Bda Report
No ratings yet
Bda Report
16 pages
Wiring, Printed - Component
No ratings yet
Wiring, Printed - Component
2 pages
Unit 4 Hadoop Ecosystem - HIVE and PIG
No ratings yet
Unit 4 Hadoop Ecosystem - HIVE and PIG
157 pages
What Is Hive
No ratings yet
What Is Hive
4 pages
List of Mysql
No ratings yet
List of Mysql
67 pages
Combined
No ratings yet
Combined
108 pages
Week4 Final
No ratings yet
Week4 Final
42 pages
Data Science and Big Data UNIT 4
No ratings yet
Data Science and Big Data UNIT 4
10 pages
Hwontlog
No ratings yet
Hwontlog
7 pages
84 RMAN Quick Reference
No ratings yet
84 RMAN Quick Reference
28 pages
BD U-5 (Anupam Sir)
No ratings yet
BD U-5 (Anupam Sir)
12 pages
Teradata Tools Fastload: Vidya T
No ratings yet
Teradata Tools Fastload: Vidya T
23 pages
Disaster Recovery - Srinivas
No ratings yet
Disaster Recovery - Srinivas
5 pages
Lab 1 - DMS
No ratings yet
Lab 1 - DMS
45 pages
HIVE
No ratings yet
HIVE
18 pages
Big Data: Week - 11
No ratings yet
Big Data: Week - 11
28 pages
Kimball - Data Modelling
No ratings yet
Kimball - Data Modelling
11 pages
Unit-IV - BDA
No ratings yet
Unit-IV - BDA
42 pages
Uit 1 & Unit 2 Notes
No ratings yet
Uit 1 & Unit 2 Notes
79 pages
Hadoop 1
No ratings yet
Hadoop 1
109 pages
Hive
No ratings yet
Hive
52 pages
Mastering Prometheus & Grafana
No ratings yet
Mastering Prometheus & Grafana
18 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Introduction To Data Structures: CS 202 Minor
No ratings yet
Introduction To Data Structures: CS 202 Minor
16 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Lab 2 Install SQL Server
No ratings yet
Lab 2 Install SQL Server
41 pages
Lecture - 08 PLSQL Triggers and Audit Mechanisms
No ratings yet
Lecture - 08 PLSQL Triggers and Audit Mechanisms
89 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages

Unit 3 Hive

Uploaded by

Unit 3 Hive

Uploaded by

Hive is a data warehousing tool built on top of Hadoop, mainly used for data analytics in handling large

Here’s a breakdown of key Hive components and its working process:

The workflow in Hive involves several steps:

The following are the services provided by Hive:-

You might also like