Hive Commands Syn
Hive Commands Syn
Drop database
LOAD DATA
Inner joins
Select e.empId, empName, department from employee e Right outer join employeede
partment ed on(e.empId=ed.empId);
Select e.empId, empName, department from employee e FULL outer join employeede
partment ed on(e.empId=ed.empId);
HiveQL - Operators
hive> create table employee (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;
Let's see an example to find out the 10% salary of each employee.
Operator Description
A=B It returns true if A equals B, otherwise false.
Let's see an example to fetch the details of the employee having salary>=25000.
Let's see an example to fetch the square root of each employee's salary.
The Hive Query Language provides GROUP BY and HAVING clauses that facilitate
similar functionalities as in SQL. Here, we are going to execute these clauses on the
records of the below table:
GROUP BY Clause
The HQL Group By clause is used to group the data from the multiple records based on
one or more column. It is generally used in conjunction with the aggregate functions
(like SUM, COUNT, MIN, MAX and AVG) to perform an aggregation over each group.
hive> create table emp (Id int, Name string , Salary float, Department string)
row format delimited
fields terminated by ',' ;
Now, fetch the sum of employee salaries department wise by using the
following command:
The HQL HAVING clause is used with GROUP BY clause. Its purpose is to apply
constraints on the group of data produced by GROUP BY clause. Thus, it always
returns the data where the condition is TRUE.
In this example, we fetch the sum of employee's salary based on department and
apply the required constraints on that sum by using HAVING clause.
Let's fetch the sum of employee's salary based on department having sum >= 35000
by using the following command:
hive> select department, sum(salary) from emp group by department having sum(s
alary)>=35000;
Apache Hive View and Hive Index
In other words, Apache Hive View is a searchable object in a database which we can
define by the query. However, we can not store data in the view. Still, some refer to as
a view as “virtual tables”. Hence, we can query a view like we can a table. Moreover, by
using joins it is possible to combine data from or more table. Also, it contains a subset
of information.
i. Apache Hive View Syntax
Create VIEW < VIEWNAME> AS SELECT
Let’s suppose, an employee table. It includes fields Id, Name, Salary, Designation, and
Dept. Now here we are generating a query to retrieve the employee details who earn
a salary of more than Rs 35000. So, we store the result in a view named emp_30000.
Table 1- Apache Hive View
Table 1- Apache Hive View
Hence, using the above scenario here is the following query retrieves the employee details:
hive> CREATE VIEW emp_35000 AS
SELECT * FROM employee
WHERE salary>35000
iv. Dropping a Hive View
However, to drop a Hive view, use the following syntax:
DROP VIEW view_name
The following query drops a view named as emp_35000:
hive> DROP VIEW emp_35000;
b. What is Apache Hive Index?
On defining indexing in Hive we can say these are pointers to particular column
name of a table.
Let’s Learn Hive Operators – A Complete Tutorial for Hive Built-in Operators
ii. Create an Indexing in Hive
However, creating a Apache Hive index means creating a pointer on a particular
column of a table. So, to create an indexing in hive.
Apache Hive Index Syntax is:
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
iii. Apache Hive Index Example
Let’s suppose the same employee table which we had used earlier with the fields Id,
Name, Salary, Designation, and Dept. So, here create an index named index_salary on
the salary column of the employee table.
However, it is a pointer to the salary column. Basically, the changes are stored using an
index value, if the column is modified.