Hive
Hive
a. Creating a database,table
b. Dropping a database,table
c. Describe comand,alter, insert, select
d. Group by with having,Order by
Hive Commands with Syntax and Examples
Create Database
This command will create a database.
hive> create database <database-name>;
For Ex - create database demo;
Drop Database
Delete a defined database.
This command will delete a database.
hive> drop database demo;
Here, the command also includes the information that the data is
separated by ‘,’.
Let’s see the metadata of the created table.
hive> describe demo.employee
Here, we can say that the new table is a copy of an existing table.
Hive — Load Data
Once the internal table has been created, the next step is to load the
data into it.
Let’s load the data of the file into the database by using the
following command: -
load data local inpath '/home/<username>/hive/emp_details'into
table demo.employee;select * from.demo.employee;
Adding a column —
Alter table table_name add columns(columnName datatype);
Change column —
hive> Alter table_name change <old_column_name>
<new_column_name> datatype;
In hive with DML statements, we can add data to the Hive table in 2
different ways.
Using INSERT Command
Load Data Statement
Example:
To insert data into the table let’s create a table with the name student (By
default hive uses its default database to store hive tables).
Command:
CREATE TABLE IF NOT EXISTS student(
Student_Name STRING,
Student_Rollno INT,
Student_Marks FLOAT)
INSERT Query:
INSERT INTO TABLE student VALUES ('Dikshant',1,'95'),('Akshat', 2 ,
'96'),('Dhruv',3,'90');
We can check the data of the student table with the help of the below
command.
SELECT * FROM student;
Note:
The LOCAL Switch specifies that the data we are loading is available
in our Local File System. If the LOCAL switch is not used, the hive will
consider the location as an HDFS path location.
The OVERWRITE switch allows us to overwrite the table data.
Let’s make a CSV(Comma Separated Values) file with the
name data.csv since we have provided ‘,’ as a field terminator while
creating a table in the hive. We are creating this file in our local file system
at ‘/home/dikshant/Documents’ for demonstration purposes.
Command:
cd /home/dikshant/Documents // To change the directory
LOAD DATA to the student hive table with the help of the below
command.
LOAD DATA LOCAL INPATH '/home/dikshant/Documents/data.csv' INTO
TABLE student;
Let’s see the student table content to observe the effect with the help of
the below command.
SELECT * FROM student;
We can observe that we have successfully added the data to
the student table.
Hive — Partitioning
The partitioning in hive can be done in two ways —
Static partitioning
Dynamic partitioning
Static Partitioning
In static or manual partitioning, it is required to pass the values of
partitioned columns manually while loading the data into the table.
Hence, the data file doesn’t contain the partitioned columns.
hive> use test;
hive> create table student (id int, name string, age int,
institute string)
> partitioned by (course string)
> row format delimited
> fields terminated by ',';
Load the data into the table and pass the values of partition
columns with it by using the following command: -
hive> load data local inpath
'/home/<username>/hive/student_details1' into table student
partition(course= "python");hive> load data local inpath
'/home/<username>/hive/student_details1' into table student
partition(course= "Hadoop");
Dynamic Partitioning
In dynamic partitioning, the values of partitioned columns exist
within the table. So, it is not required to pass the values of
partitioned columns manually.
hive> use show;
Now you can view the table data with the help
of select command.
HiveQL — Operators
he HiveQL operators facilitate to perform various arithmetic and
relational operations.
hive> use hql;
hive> create table employee (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;
Functions in Hive
hive> use hql;
hive> create table employee_data (Id int, Name string , Salary
float)
row format delimited
fields terminated by ',' ;
Aggregate Functions
Let’s see an example to fetch the maximum/minimum salary of
an employee.
hive> select max(Salary) from employee_data;
hive> select min(Salary) from employee_data;
GROUP BY Clause
The HQL Group By clause is used to group the data from the
multiple records based on one or more column. It is generally used
in conjunction with the aggregate functions (like SUM, COUNT,
MIN, MAX and AVG) to perform an aggregation over each group.
hive> use hql;
hive> create table employee_data (Id int, Name string , Salary
float)
row format delimited
fields terminated by ',' ;
HAVING CLAUSE
The HQL HAVING clause is used with GROUP BY clause. Its
purpose is to apply constraints on the group of data produced by
GROUP BY clause. Thus, it always returns the data where the
condition is TRUE.
Let’s fetch the sum of employee’s salary based on department
having sum >= 35000 by using the following command:
hive> select department, sum(salary) from emp group by
department having sum(salary)>=35000;