Hive Part 2
Hive Part 2
HIVE
• Hive is a data warehouse system - used to analyse structured data.
• Built on the top of Hadoop.
• Developed by Facebook.
• Functionality of reading, writing, and managing large datasets
residing in distributed storage.
• Runs SQL like queries called HQL (Hive query language) which gets
internally converted to MapReduce jobs.
HIVE
• Ability to bring structure to various data formats
• Simple interface for ad hoc querying, analyzing and summarizing
large amounts of data
• Access to files on various data stores such as HDFS and Hbase
• Using Hive, - skip writing complex MapReduce programs.
• Hive supports Data Definition Language (DDL), Data Manipulation
Language (DML), and User Defined Functions (UDF).
HIVE
• Hive is not- A relational database
• It is not design for OnLine Transaction Processing(OLTP)
• It is not a language for real-time queries and row-level updates
Feature of HIVE
• It stores schema in a database and processed data into HDFS.
• It is designed for OLAP.
• It provides SQL type language for querying called HiveQL or HQL.
• It is familiar, fast, scalable, and extensible.
HIVE Architecture
Hive – Client
• Hive allows writing applications in various languages, including Java,
Python, and C++. It supports different types of clients:
• Thrift Server - It is a cross-language service provider platform that
serves the request from all those programming languages that
supports Thrift.
• JDBC Driver - It is used to establish a connection between hive and
Java applications. The JDBC Driver is present in the class
org.apache.hadoop.hive.jdbc.HiveDriver.
• ODBC Driver - It allows the applications that support the ODBC
protocol to connect to Hive.
Hive – User Interface
• Hive is a data warehouse infrastructure software that can
create interaction between user and HDFS.
• The user interfaces that Hive supports are
• Hive Web UI: is a shell where we can execute Hive queries
and commands.
• Hive command line: It provides a web-based GUI for
executing Hive queries and commands.
• Hive server : It is referred to as Apache Thrift Server. It
accepts the request from different clients and provides it to
Hive Driver.
Hive - Driver
Hive - MetaStore
•Hive MetaStore - It is a central repository that stores all the
structure information of various tables and partitions in the
warehouse.
• Buckets/clusters
• For sub-sampling within a partition
• Join optimization
Hive
• Data Types
• DDL Commands
• DML Operations
• Data Retrieval Queries
Hive Data Types
• Basic datatypes
• Numbers
• Date / Time
• Strings
• Complex data types
HIVE DATA TYPES
Integer Types
Type Size Range
Decimal Types
Type Size Range
19
Hive DDL
20
Hive - Create Database
hive > show databases;
hive> ALTER DATABASE student SET DBPROPERTIES ( ‘batch' = ‘IIITK-Batch2021' , ' Date' = ‘2021-09-
27');
Step 4: Let’s change the existing property to see the effect. In our example, we
are changing the batch from ‘IIITK-Batch2021’ to ‘IIITK-Batch2021-Set1’
hive> ALTER DATABASE student SET DBPROPERTIES ( 'owner' = 'IIITK-Batch2021-Set1' , 'Date' = ‘2024-
09-27');
Hive - Alter Database
ALTER Database Command 2
• With the help of the below command, we can change the database
directory on HDFS.
• The LOCATION with ALTER is only available in Hive 2.2.1, 2.4.0, and later.
One thing we should keep in mind that changing the database location
does not transfer data to the newly specified location.
• It only changes the parent-directory location and the newly added data
will be added to this new HDFS location.
Syntax:
Step 3: Describe the database student to see the location is overridden or not.
• The below command is used to set or change the user name and its
ROLE.
• SET OWNER transfer the current user ownership to a new user or a
new role.
• By default, the user who makes the database is set as the owner of that
database.
Syntax:
Step 1: Change the user name associated with the student database.
hive> ALTER DATABASE student SET OWNER USER Ram; # with this
we have changed the db owner from dikshant to Ram
Hive - Alter Database
ALTER Database Command 3
Step 1: Change the user name associated with the student database.
hive> DESCRIBE DATABASE EXTENDED student; # we have used it to see the current user info
hive> ALTER DATABASE student SET OWNER USER Ram; # with this we have changed the db owner
from dikshant to Ram
Hive - Alter Database
ALTER Database Command 3
Syntax:
Step 1: Create a database first so that we can create tables inside it.
hive> CREATE DATABASE database_name;
hive> SHOW DATABASES;
• The internal tables are not flexible enough to share with other tools like Pig.
• If we try to drop the internal table, Hive deletes both table schema and
data.
Hive DDL – Table - Create
Create an internal table
hive> create table demo.employee (Id int, Name string , Salary float)
row format delimited
fields terminated by ‘,' ;
• Two keywords
external keyword - used to specify the external table
location keyword - used to determine the location of loaded data
• As the table is external, the data is not present in the Hive directory.
• Therefore, if we try to drop the table, the metadata of the table will be
deleted, but the data still exists.
• In case Internal table, if we try to drop the internal table, Hive deletes both
table schema and data.
Hive DDL – Table - Create
External Table
To create an external table, follow the below steps: -
hive> create external table emplist (Id int, Name string , Salary float)
row format delimited
fields terminated by ','
location '/HiveDirectory';
Hive DDL – Table - Create
External Table
Step 3: Let's create an external table using the following command: -
hive> create external table emplist (Id int, Name string , Salary float)
row format delimited
fields terminated by ','
location '/HiveDirectory';
Hive - Load data into Table
• Once the internal table has been created, the next step is to load the data into it.
• So, in Hive, we can easily load data from any file to the database.
• Load the data of the file into the database
40
Hive - Load data into Table
41
Hive - Load data into Table
• If we want to add more data into the current database, execute the same query
again by just updating the new file name.
>load data local inpath '/home/codegyani/hive/emp_details1' into table demo.employee;
42
Hive – Load data into Table
Load unmatched data
• One or more column data doesn't match the data type of specified table
columns), it will not throw any exception.
• However, it stores the Null value at the position of unmatched tuple.
• add one more file to the current table. This file contains the unmatched data.
• Third column contains the data of string type, and the table allows the float type data. So, this
condition arises in an unmatched data situation.
43
Hive – Load data into Table
Load unmatched data
• Third column contains the data of string type, and the table allows the float type data. So, this
condition arises in an unmatched data situation.
44
Hive – Load data into Table
Load unmatched data
45
Hive - Alter Table
• In Hive, we can perform modifications in the
existing table like changing the table name, column
name, comments, and table properties.
• It provides SQL like commands to alter the table.
❑ Rename a Table
❑ Adding column
❑ Change Column
❑ Delete or Replace Column
46
Hive - Alter Table
❑ Rename a Table
change the name of an existing table
47
Hive - Alter Table
❑ Rename a Table
▪ existing tables present in the current database
48
Hive - Alter Table
❑ Adding column
add one or more columns in an existing table
49
Hive - Alter Table
❑ Adding column
Schema of the table data of columns exists in the table
50
Hive - Alter Table
❑ Adding column
Schema of the table Data of columns exists in the table
51
Hive - Alter Table
❑ Adding column
hive> ALTER TABLE employee_data ADD COLUMNS (age int);
Updated schema of the table
52
A high-level comparison of SQL and HiveQL
53