HIVE Architecture
HIVE Architecture
Hive
Client hritServer DBC Driverr ODBCADriver
Nive
Services Hive web UI Hive Server CLI
Hive Driver
Metastore
MapReucHh
HDFS
Hive Client
Hive allows writing applications in various
languages, including Java, Python,and C++. It
supportsdifferent types of clients such as:
Thrift Server - It is a cross-language service
provider platform that serves the request from all
those programming languages that supports
Thrift.
JDBC Driver -It is used to establish a connection
between hive and Java applications. The JDBC
Driver is present in the class
org.apache.hadoop.hive.jdbc.HiveDriver.
ODBC Driver - It allows the applications that
support the ODBC protocol to connect to Hive.
6/16/2023
Hive Services
The following are the services provided by Hive:
Hive CLI - The Hive CLI (Command Line Interface) is a shell
execute Hive queries and commands. where we can
Hive Web User Interface - The Hive Web UI is just an
alternative of Hive
CLI. It provides a web-based GUI for executing Hive queries
commands. and
Hive MetaStore - It is a central repository that stores all the
information of various tables and partitions in the warehouse. structure
It also
includes metadata of column and its type information, the serializers and
deserializers which is used to read and write data and the corresponding
HDFS files where the data is stored.
Hive Server - It is referred to as Apache Thrift Server. It
from different clients and provides it to Hive Driver. accepts the request
Hive Driver - It receives queries from different sources like web UI, CLI,
Thrift,and JDBC/ODBC driver. It transfers the queries to the compiler.
Hive Compiler - The purpose of the compiler is to parse the
perform semantic analysis on the different query blocks andquery and
expressions.
It converts HiveQL statements into
MapReduce jobs.
Hive Execution Engine - Optimizer generates the logical plan in
DAG of map-reduce tasks and HDFS tasks. In the end, the the form of
execution engine
Acycie grgeh executes the incoming tasks in the order of their dependencies.
2
External Table
The external table allowS us to create and access a table and adata
externally. The external keyword is used to specify the external table,
whereas the location kevword is used to determine the location of loaded
data.
As the table is external, the data is not present in the Hive directory.
Therefore, if we try to drop the table, the metadata of the table wil be
deleted, but the data stillexists.
To create an external table, follow the below steps: -
Let's create a directory on HDES by using the following command: -
hdfs dfs -mkdir /Hive Directory
Now, store the file on the created directory.
hdfs dfs -put hive/emp_details /HiveDirectory
Let's create an external table using the following command: -
hive> create external table emplist (ld int,Name string, Salary float)
row format delimited
fields terminated by'"
location /HiveDirectory';
Partitioning in Hive
The partitioning in Hive means dividing the table into some parts
based on the values of a particular column like date, course, city or
country.The advantage of partitioning is that since the datais
stored in slices, the query response time becomes faster.
Aswe know that Hadoop is used to handle the huge amount of
data,it is always required to use the best approach to deal with it.
The partitioning in Hive is the best example of it.
Let's.assume we have a data of 10million students studying in an
institute. Now, we have to fetch the students of a particular course.
If we usea traditional approach,we have to gothrough the entire
data.This leads to performance degradation. In such a case, we can
adopt the better approach i.e., partitioning in Hive and divide the
data among the different datasets based on particular columns.
The partitioning in Hive can be executed in two ways -
Static partitioning
Dynamic partitioning
Static Partitioning
to pass the values of
Instatic or manual partitioning, it is required the data into the table.
partitioned columns manually while loading partitioned columns.
Hence, the data file doesn't contain the
Example of Static Partitioning
we want to create a table.
" First, select the database in which
hive> use test;
columns by using the
Create the table and provide the partitioned
following command: -
age int, institute strin
hive> create table student (id int, name string,
partitioned by (course string)
rowformat delimited
fields terminated by "
Dynamic Partitioning
In dynamicpartitioning, the values of partitioned columns exist
of
withinthe table. So, it is not required to pass the values
partitionedcolumns manually.
create a table.
First, select the database in which we want to
Bucketing in Hive
issimilar to
The bucketing in Hive is a data organizing technique. It divides
partitioning in Hive with an added functionality buckets. So, welarge
that it
can
datasets into more manageable parts known as
use bucketing in Hive when the implementationof partitioning
becomes difficult. However, we can also divide partitions further in
buckets.
Working of Bucketing in Hive
The concept of bucketing is based on the hashingtechnique.
Here, modules of current column value and the number of required
buckets is calculated (let say, F(x) %3).
Now, based on the resulted value, the data is stored into the
corresponding bucket.
Example of Bucketing in Hive
First, select the database in which we want to create a table.
hive> use showbucket;
HiveQL -Operators
The HiveQL operators facilitate to perform various arithmetic and
relational operations. Here, we aregoing toexecute such type of
operationson the records of the belowtable:
Example of Operators in Hive
Let's create atable and load the data into it by using the following steps: -
Select the database in which we want to create a table.
hive> use hgl;
Create a hive table using the following command: -
hive> create table employee (Id int, Name string,Salary float)
row format delimited
fields terminated by'
Now, load the data into the table.
table em
hive> load data local inpath '/home/codegyani/hive/emp_data' into
ployee;
Let's fetch the loaded data by using the following command: -
hive> select * from employee;
Arithmetic Operators in Hive
Relational Operators in Hive
HiveQL - Functions
perform mathematical and
The Hive provides various in-built functions to
aggregate type operations. Here, we are going to execute such type of
functionson the records of the below table:
Example of Functions in Hive
using the following steps: -
Let's create atable and load the data into it by
create a table.
Select the database in which we want to
hive> use hql;
command:
Create a hive table using the following
string, Salary float)
hive> create table employee_data (ld int, Name
row format delimited
fields terminated by'
Now, load the data into the table.
/home/codegyani/hive/emp details' intotabl
hive> load datalocal inpath
e employee_data;
command:
Let's fetch the loaded data by using the following
hive> select * from employee_data;