0% found this document useful (0 votes)
32 views2 pages

Hive Architecture and Working

Uploaded by

Anime Time
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views2 pages

Hive Architecture and Working

Uploaded by

Anime Time
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Hive Architecture and Working

Apache Hive is a data warehouse built on top of Hadoop that allows querying of large datasets stored
in HDFS (Hadoop Distributed File System) using an SQL-like query language called HiveQL. It
abstracts the complexity of Hadoop's MapReduce framework, making it easier for users to analyze
large amounts of data without having to write complex code.
Components of Hive Architecture
1. User Interface (UI)
 The user interface enables interaction with Hive. Users submit queries, manage
databases, and perform other operations through:
 CLI (Command Line Interface): The most basic and widely used method to
interact with Hive.
 Web UI: A browser-based interface.
 ODBC/JDBC Drivers: For connecting external tools and applications to
Hive.
2. HiveQL Process Engine
 HiveQL is the query language used in Hive. It converts user queries into MapReduce
tasks or Tez jobs for execution on the Hadoop cluster.
3. Metastore
 The Metastore is a critical component of Hive that stores metadata, such as table
definitions, database schemas, column types, and partition information. The metadata
is stored in a relational database like MySQL or Derby.
4. Driver
 The Driver manages the lifecycle of the query. It coordinates between different
components:
 Parser: Breaks down the query and checks for syntax errors.
 Compiler: Translates the query into a logical plan.
 Optimizer: Optimizes the execution plan for better performance.
 Executor: Executes the plan by coordinating with the Execution Engine.
5. Execution Engine
 The Execution Engine takes the execution plan from the Driver and translates it into
Hadoop MapReduce jobs, which are submitted to the cluster for processing. Hive also
supports other execution engines like Apache Tez or Apache Spark for faster query
execution.
6. Hadoop (HDFS and MapReduce)
 Hive uses Hadoop’s HDFS for storage and MapReduce (or Tez/Spark) for distributed
processing. Hive queries are converted into MapReduce jobs, which are executed on
the Hadoop cluster.
Workflow of Hive
1. Query Submission: The user submits a HiveQL query via CLI, Web UI, or an external tool.
2. Parsing: The Driver parses the query and checks for syntax errors.
3. Plan Generation: The query is converted into a logical execution plan by the compiler.
4. Optimization: The logical plan is optimized for efficient execution (e.g., minimizing the
number of MapReduce tasks).
5. Execution: The optimized plan is converted into MapReduce tasks, which are submitted to
the Hadoop cluster.
6. Result Return: After execution, the results are fetched from HDFS and returned to the user.
+-----------------------+
| User Interface |
| (CLI, Web UI, JDBC) |
+-----------+-----------+
|
v
+-----------------------+
| HiveQL Process Engine |
+-----------+-----------+
|
v
+-----------------------+
| Driver |
| (Parser, Compiler, |
| Optimizer, Executor) |
+-----------+-----------+
|
v
+-----------------------+
| Execution Engine |
+-----------+-----------+
|
v
+-----------------------+
| Hadoop (HDFS, |
| MapReduce/Tez/Spark) |
+-----------------------+
|
v
+-----------------------+
| Data Storage |
| (HDFS, HBase, etc.) |
+-----------------------+

You might also like