Hive Full Lecture
Hive Full Lecture
Hive Full Lecture
Lecture
on
“All About Hive”
Hive
Motivation
Yahoo worked on Pig to facilitate application deployment on
Hadoop.
The Hive generally runs on your workstation and converts your SQL
query into a series of jobs for execution on a Hadoop cluster.
Apache Hive organizes data into tables. This provides a means for
attaching the structure to data stored in HDFS.
Limitation of Hive
Hive is not
A relational database
A design for OnLine Transaction Processing (OLTP)
A language for real-time queries and row-level updates.
Latency for Apache Hive queries is generally very high
Features of Hive
It stores schema in a database and processed data into
HDFS
2) Document Indexing
3) Predictive Modeling
4) Custom Facing UI
Important characteristics of Hive
1) In Hive, tables and databases are created first and then data is loaded into these tables.
2) Hadoop's programming works on flat files. So, Hive can use directory structures to
"partition" data to improve performance on certain queries.
3) A new and important component of Hive i.e. Metastore used for storing schema
information. This Metastore typically resides in a relational database. We can interact
with Hive using methods like
Web GUI
Java Database Connectivity (JDBC) interface
4) Generally, HQL syntax is similar to the SQL syntax that most data analysts are
familiar with. The Sample query below display all the records present in mentioned
table name.
5) Hive supports four file formats those are TEXTFILE, SEQUENCEFILE, ORC and
RCFILE (Record Columnar File).
Hive Architecture
Hive Architecture
Hive Client: Support Different types of client
Thrift Server - It is a cross-language service provider platform
that serves the request from all those programming languages
that supports Thrift.
Hive Web User Interface - The Hive Web UI is just an alternative of Hive
CLI. It provides a web-based GUI for executing Hive queries and
commands.
Hive Driver - It receives queries from different sources like web UI, CLI,
Thrift, and JDBC/ODBC driver. It transfers the queries to the compiler.
Hive Compiler - The purpose of the compiler is to parse the query and
perform semantic analysis on the different query blocks and expressions. It
converts HiveQL statements into MapReduce jobs.
Hive Execution Engine - Optimizer generates the logical plan in the form of
DAG of map-reduce tasks and HDFS tasks. In the end, the execution engine
executes the incoming tasks in the order of their dependencies .
Thank You