0% found this document useful (0 votes)
14 views15 pages

Hive

hive commands

Uploaded by

midhun reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views15 pages

Hive

hive commands

Uploaded by

midhun reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Hive commands to practice:

a. Creating a database,table
b. Dropping a database,table
c. Describe comand,alter, insert, select
d. Group by with having,Order by
Hive Commands with Syntax and Examples
Create Database
 This command will create a database.
hive> create database <database-name>;
For Ex - create database demo;

 This command will show all databases that are present.


hive> show databases;

 This command will only create a database if it is not present.


hive> create database if not exists <database-name>;
For ex - create database if not exists demo;

 Assigning properties with the database in the form of key-value


pair.
hive> create the database demo
> WITH DBPROPERTIES ('creator'='Varchasa Aggarwal',
'date'='18-04-2021');

 Let’s retrieve the information associated with the database.


hive> describe database extended demo;

Drop Database
Delete a defined database.
 This command will delete a database.
hive> drop database demo;

 To check that database is deleted or not.


hive> show databases;
 Drop database if and only if it exists.
hive> drop database if exists demo;

 In Hive, it is not allowed to drop the database that contains the


tables directly. In such a case, we can drop the database either by
dropping tables first or use Cascade keyword with the command.
 Let’s see the cascade command used to drop the database:-
hive> drop database if exists demo cascade.

This command automatically drops the table present in the database


first.
Hive — Create Table
In hive, we have two types of table —
 Internal table
 External table
Internal Table
 The internal tables are also called managed tables as the lifecycle
of their data is controlled by the Hive.
 By default, these tables are stored in a subdirectory under the
directory defined by hive.metastore.warehouse.dir (i.e.
/user/hive/warehouse).
hive> create table demo.employee(Id int, Name string, Salary
float)
> row format delimited
> fields terminated by ',';

Here, the command also includes the information that the data is
separated by ‘,’.
 Let’s see the metadata of the created table.
hive> describe demo.employee

 Let’s create a table if it not exists.


hive> create table if not exists demo.employee(Id int, Name
string, Salary float)
> row format delimited
> fields terminated by ','

 While creating a table, we can add the comments to the columns


and can also define the table properties.
hive> create table demo.new_employee(Id int comment 'Employee
Id' Name string comment 'Employee Name', Salary float comment
'Employee Salary') comment 'Table Description' TBLProperties
('creator'='Varchasa Aggarwal','created at'='18-04-2021');

 Let’s see the metadata of the created table.


hive> describe new_employee;

 Hive allows creating a new table by using the schema of an


existing table.
Schema is the skeleton structure that represents the logical view of
the entire database. It defines how the data is organized and how
the relations among them are associated.
hive> create table if not exists demo.copy_employee like
demo.employee;

Here, we can say that the new table is a copy of an existing table.
Hive — Load Data
Once the internal table has been created, the next step is to load the
data into it.
 Let’s load the data of the file into the database by using the
following command: -
load data local inpath '/home/<username>/hive/emp_details'into
table demo.employee;select * from.demo.employee;

Hive — Drop Table


Let’s delete a specific table from the database.
hive> show databases;
hive> use demo;
hive> show tables;
hive> drop table new_employee;
hive> show tables;

Hive — Alter table


In Hive, we can perform modifications in the existing table like
changing the table name, column name, comments, and table
properties.
 Rename a table
hive> Alter table <old_table_name> rename to <new_table_name>

Let’s check table name changed or not.


hive> show tables;

 Adding a column —
Alter table table_name add columns(columnName datatype);

 Change column —
hive> Alter table_name change <old_column_name>
<new_column_name> datatype;

 Delete or replace column —


alter table employee_data replace columns( id string, first_name
string, age int);

In hive with DML statements, we can add data to the Hive table in 2
different ways.
 Using INSERT Command
 Load Data Statement

1. Using INSERT Command


Syntax:
INSERT INTO TABLE <table_name> VALUES (<add values as per column
entity>);

Example:
To insert data into the table let’s create a table with the name student (By
default hive uses its default database to store hive tables).
Command:
CREATE TABLE IF NOT EXISTS student(

Student_Name STRING,

Student_Rollno INT,

Student_Marks FLOAT)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ',';

We have successfully created the student table in the


Hive default database with the attribute Student_Name, Student_Rollno,
and Student_Marks respectively.
Now, let’s insert data into this table with an INSERT query.

INSERT Query:
INSERT INTO TABLE student VALUES ('Dikshant',1,'95'),('Akshat', 2 ,
'96'),('Dhruv',3,'90');

We can check the data of the student table with the help of the below
command.
SELECT * FROM student;

2. Load Data Statement


Hive provides us the functionality to load pre-created table entities either
from our local file system or from HDFS. The LOAD DATA statement is
used to load data into the hive table.
Syntax:
LOAD DATA [LOCAL] INPATH '<The table data location>' [OVERWRITE]
INTO TABLE <table_name>;

Note:
 The LOCAL Switch specifies that the data we are loading is available
in our Local File System. If the LOCAL switch is not used, the hive will
consider the location as an HDFS path location.
 The OVERWRITE switch allows us to overwrite the table data.
Let’s make a CSV(Comma Separated Values) file with the
name data.csv since we have provided ‘,’ as a field terminator while
creating a table in the hive. We are creating this file in our local file system
at ‘/home/dikshant/Documents’ for demonstration purposes.
Command:
cd /home/dikshant/Documents // To change the directory

touch data.csv // use to create data.csv file

nano data.csv // nano is a linux command line


editor to edit files

cat data.csv // cat is used to see content of


file

LOAD DATA to the student hive table with the help of the below
command.
LOAD DATA LOCAL INPATH '/home/dikshant/Documents/data.csv' INTO
TABLE student;

Let’s see the student table content to observe the effect with the help of
the below command.
SELECT * FROM student;
We can observe that we have successfully added the data to
the student table.

Hive — Partitioning
The partitioning in hive can be done in two ways —
 Static partitioning
 Dynamic partitioning
Static Partitioning
In static or manual partitioning, it is required to pass the values of
partitioned columns manually while loading the data into the table.
Hence, the data file doesn’t contain the partitioned columns.
hive> use test;
hive> create table student (id int, name string, age int,
institute string)
> partitioned by (course string)
> row format delimited
> fields terminated by ',';

 Let’s retrieve the information.


hive> describe student;

 Load the data into the table and pass the values of partition
columns with it by using the following command: -
hive> load data local inpath
'/home/<username>/hive/student_details1' into table student
partition(course= "python");hive> load data local inpath
'/home/<username>/hive/student_details1' into table student
partition(course= "Hadoop");

 Now retrieve the data.


hive> select * from student;
hive> select * from student where course = 'Hadoop';

Dynamic Partitioning
In dynamic partitioning, the values of partitioned columns exist
within the table. So, it is not required to pass the values of
partitioned columns manually.
hive> use show;

 Enable the dynamic partitioning.


hive> set hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;

 Create the dummy table.


hive> create table stud_demo(id int, name string, age int,
institute string, course string)
row format delimited
fields terminated by ',';

 Now load the data.


hive> load data local inpath
'/home/<username>/hive/student_details' into table stud_demo;

 Create a partition table.


hive> create table student_part (id int, name string, age int,
institute string)
partitioned by (course string)
row format delimited
fields terminated by ',';

 Insert the data of dummy table in the partition table.


hive> insert into student_part
partition(course)
select id, name age, institute, course
from stud_demo;

 Now you can view the table data with the help
of select command.
HiveQL — Operators
he HiveQL operators facilitate to perform various arithmetic and
relational operations.
hive> use hql;
hive> create table employee (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;

 Now load the data.


hive> load data local inpath '/home/<username>/hive/emp_data'
into table employee;

 Fetch the data.


select * from employee;

Arithmetic Operators in Hive

 Adding 50 to salary column.


hive> select id, name, salary + 50 from employee;

 Substracting 50 from the salary column.


hive> select id, name, salary -50 from employee;

 Find out the 10% salary of each employee.


hive> select id, name, salary *10 from employee;

Relational Operators in Hive

 Fetch the details of the employee having salary>=25000.


hive> select * from employee where salary>=25000;

 Fetch the details of the employee having salary<25000.


hive> select * from employee where salary < 25000;

Functions in Hive
hive> use hql;
hive> create table employee_data (Id int, Name string , Salary
float)
row format delimited
fields terminated by ',' ;

 Now load the data.


hive> load data local inpath
'/home/<username>/hive/employee_data' into table employee;
 Fetch the data.
select * from employee_data;

Mathematical Functions in Hive

 Let’s see an example to fetch the square root of each employee’s


salary.
hive> select Id, Name, sqrt(Salary) from employee_data ;

Aggregate Functions
 Let’s see an example to fetch the maximum/minimum salary of
an employee.
hive> select max(Salary) from employee_data;
hive> select min(Salary) from employee_data;

Other functions in Hive

 Let’s see an example to fetch the name of each employee in


uppercase.
hive> select Id, upper(Name) from employee_data;
 Let’s see an example to fetch the name of each employee in
lowercase.
hive> select Id, lower(Name) from employee_data;

GROUP BY Clause
The HQL Group By clause is used to group the data from the
multiple records based on one or more column. It is generally used
in conjunction with the aggregate functions (like SUM, COUNT,
MIN, MAX and AVG) to perform an aggregation over each group.
hive> use hql;
hive> create table employee_data (Id int, Name string , Salary
float)
row format delimited
fields terminated by ',' ;

 Now load the data.


hive> load data local inpath
'/home/<username>/hive/employee_data' into table employee;

 Fetch the data.


select department, sum(salary) from employee_data group by
department;

HAVING CLAUSE
The HQL HAVING clause is used with GROUP BY clause. Its
purpose is to apply constraints on the group of data produced by
GROUP BY clause. Thus, it always returns the data where the
condition is TRUE.
 Let’s fetch the sum of employee’s salary based on department
having sum >= 35000 by using the following command:
hive> select department, sum(salary) from emp group by
department having sum(salary)>=35000;

HiveQL — ORDER BY Clause


In HiveQL, ORDER BY clause performs a complete ordering of the
query result set. Hence, the complete data is passed through a single
reducer. This may take much time in the execution of large datasets.
However, we can use LIMIT to minimize the sorting time.
hive> use hql;
hive> create table employee_data (Id int, Name string , Salary
float)
row format delimited
fields terminated by ',' ;

 Now load the data.


hive> load data local inpath
'/home/<username>/hive/employee_data' into table employee;

 Fetch the data.


select * from emp order by salary desc;

HiveQL — SORT BY Clause


The HiveQL SORT BY clause is an alternative of ORDER BY clause.
It orders the data within each reducer. Hence, it performs the local
ordering, where each reducer’s output is sorted separately. It may
also give a partially ordered result.
 Let’s fetch the data in the descending order by using the
following command:
select * from emp sort by order by salary desc;

You might also like