0% found this document useful (0 votes)

24 views80 pages

HIVE

BDa Sem

Uploaded by

gaddamlokesh20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views80 pages

HIVE

BDa Sem

Uploaded by

gaddamlokesh20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

HIVE

Is Hadoop a Solution?
Hive Applications
What is Hive?
• Hive is a data warehouse system which is used to analyze structured
data. It is built on the top of Hadoop.
• It was developed by Facebook.
• Hive provides the functionality of reading, writing, and managing
large datasets residing in distributed storage.
• It runs SQL like queries called HQL (Hive query language) which gets
internally converted to MapReduce jobs.
• Using Hive, we can skip writing complex MapReduce programs.
• Hive supports Data Definition Language (DDL), Data Manipulation
Language (DML), and User Defined Functions (UDF).
Data Definition Language (DDL) Commands:
• CREATE TABLE: Creates a new table in Hive.
• DROP TABLE: Deletes a table from Hive.
• ALTER TABLE: Modifies the structure of an existing table.
• CREATE DATABASE: Creates a new database in Hive.
• DROP DATABASE: Deletes a database from Hive.
• DESCRIBE: Shows the structure of a table.
• SHOW TABLES: Lists the tables in the current database.
• SHOW DATABASES: Lists the databases in Hive.
Data Manipulation Language (DML) Commands:
• INSERT INTO: Inserts data into a table.
• SELECT: Retrieves data from one or more tables.
• UPDATE: Modifies existing data in a table (limited support in Hive,
often achieved using INSERT INTO or INSERT OVERWRITE).
• DELETE: Deletes data from a table LOAD DATA INPATH: Loads data
into a table from a specified path in HDFS.
• EXPORT TABLE: Exports data from a table to HDFS.
• IMPORT TABLE: Imports data from HDFS into a table.
• CREATE INDEX: Creates an index on a table.
• DROP INDEX: Deletes an index from a table.
Features of Hive
• Hive is fast and scalable.
• It provides SQL-like queries (i.e., HQL)
that are implicitly transformed to
MapReduce.
• It is capable of analyzing large
datasets stored in HDFS.
• It allows different storage types such
as plain text, csv.
• It uses indexing to accelerate queries.
• It can operate on compressed data
stored in the Hadoop ecosystem.
• It supports user-defined functions
(UDFs) where user can provide its
functionality.
Differences between Hive and Pig
Hive Pig
Hive is commonly used by Data Analysts. Pig is commonly used by programmers.

It follows SQL-like queries. It follows the data-flow language.

It can handle structured data. It can handle semi-structured data.

It works on server-side of HDFS cluster. It works on client-side of HDFS cluster.

Hive is slower than Pig. Pig is comparatively faster than Hive.

Hive Architecture
• Hive Client:
Hive allows writing applications in various languages, including Java,
Python, and C++.
• Thrift Server - It is a cross-language service provider platform.
• JDBC Driver - It is used to establish a connection between hive and
Java applications. The JDBC Driver is present in the class
org.apache.hadoop.hive.jdbc.HiveDriver.
• ODBC Driver - It allows the applications that support the ODBC
protocol to connect to Hive.
• Hive CLI - The Hive CLI (Command Line Interface) is a shell where we
can execute Hive queries and commands.
• Hive Web User Interface - The Hive Web UI is just an alternative of
Hive CLI.
• It provides a web-based GUI for executing Hive queries and
commands.
• Hive MetaStore - It is a central repository that stores all the structure
information of various tables and partitions in the warehouse.
• Hive Server - It is referred to as Apache Thrift Server. It accepts the
request from different clients and provides it to Hive Driver.
• Hive Driver - It receives queries from different sources like web UI,
CLI, Thrift, and JDBC/ODBC driver. It transfers the queries to the
compiler.
HIVE Data Types
Integer Types

Type Size Range

TINYINT 1-byte signed integer -128 to 127
SMALLINT 2-byte signed integer -32,768 to 32,767
INT 4-byte signed integer -2,147,483,648 to
2,147,483,647
BIGINT 8-byte signed integer -
9,223,372,036,854,775,808
to
9,223,372,036,854,775,807
Decimal Type
Type Size Range

FLOAT 4-byte Single precision

floating point number
DOUBLE 8-byte Double precision
floating point number

float_column | double_column
+------------------+---------------------+
1234.56774902344 | 1234.5678
12345.6787109375 | 12345.6789
123456.7890625 | 123456.789
1234567.890625 | 1234567.89
Date/Time Types

• It supports traditional UNIX timestamp with optional nanosecond

precision.
• As string, it follows java.sql.Timestamp format "YYYY-MM-DD
HH:MM:SS.fffffffff" (9 decimal place precision)
• The range of Date type lies between 0000--01--01 to 9999--12--31.
String Types
• STRING
• The string is a sequence of characters. It values can be enclosed
within single quotes (') or double quotes (").
• Varchar
• The varchar is a variable length type whose range lies between 1 and
65535, which specifies that the maximum number of characters
allowed in the character string.
• CHAR
• The char is a fixed-length type whose maximum length is fixed at 255.
Complex Type
Type Size Range
Struct It is similar to C struct or an object struct('James','Roy')
where fields are accessed using the
"dot" notation.

Map It contains the key-value tuples where map('first','James','last','Roy')

the fields are accessed using array
notation.

Array It is a collection of similar type of values array('James','Roy')

that indexable using zero-based
integers.
Hive - Create Database
• In Hive, the database is considered as a catalog of tables.
• a unique name is assigned to each table.
• Hive provides a default database with a name default.
• To start hive
• Type command hive in the terminal
• hive>
• To know the list of databases
• hive> show databases;
To create new db
• hive> create database demo;

hive> show databases;

If we create two databases with the same
name
creating the database with the same name if
already exists
• hive> create a database if not exists demo;
Drop database
• hive> show databases;

hive> drop database demo;

If we try to drop the database that doesn't exist,
the following error generates:
hive> drop database if exists demo;

•In Hive, it is not allowed to drop the database that contains the tables directly. In such a case, we
can drop the database either by dropping tables first or use Cascade keyword with the command.

hive> drop database if exists demo cascade;

This command automatically drops the tables present in the database first.
Hive - Create Table

hive> create table demo.employee (Id int, Name string , Salary float)
row format delimited fields terminated by ',' ;
• hive> describe demo.employee
If the table is already existing
If not exists
hive> create table if not exists demo.employee (Id int, Name string , Sal
ary float) row format delimited fields terminated by ',' ;
creating a new table by using the schema of an existing table.
hive> create table if not exists demo.copy_employee
like demo.employee;
External table
To store the file on the created directory use the following command at
the terminal
hdfs dfs -put hive/emp_details /HiveDirectory

hive> create external table emplist (Id int, Name string , Salary float)
row format delimited fields terminated by ',' location '/HiveDirectory';
Hive - Load Data
• Once the internal table has been created, the next step is to load the
data into it.
• load the data of the file into the database by using the following
command:
• LOAD DATA LOCAL INPATH '/home/Desktop/hive/emp_details' INTO
TABLE demo.employee;
Hive - Drop Table
• hive> drop table new_employee;
• Hive - Alter Table
• we can perform modifications in the existing table like changing the
table name, column name, comments, and table properties
• Rename a Table
• Alter table old_table_name rename to new_table_name;
• Adding column
• we can add one or more columns in an existing table by using the
following signature
• Alter table table_name add columns(column_name datatype);
• Change Column
• we can rename a column, change its type
• Alter table table_name change old_column_name new_column_nam
e datatype;
• Alter table employee_data change name first_name string;
Delete or Replace Column
• alter table employee_data replace columns( id string, first_name strin
g, age int);
Built-in Operators
1. Relational Operators
2. Arithmetic Operators
3. Logical Operators
4. Complex Operators
Relational Operators
• These operators are used to compare two operands.
• hive> SELECT * FROM employee WHERE Id=1205;
• hive> SELECT * FROM employee WHERE Salary>=40000;
• Arithmetic Operators
• These operators support various common arithmetic operations on
the operands.
• hive> SELECT 20+30 ADD FROM temp;
Logical Operators
• The operators are logical expressions. All of them return either TRUE
or FALSE.
• hive> SELECT * FROM employee WHERE Salary>40000 && Dept=TP;
Complex Operators
These operators provide an expression to access the elements of Complex Types.

Operator Operand Description

It returns the nth
element in the
A is an Array and n
A[n] array A. The first
is an int
element has index
0.
It returns the value
M is a Map<K, V>
M[key] corresponding to
and key has type K
the key in the map.
It returns the x field
S.x S is a struct
of S.
Example
SELECT emp_id, emp_name,
CASE
WHEN salary > 50000 THEN 'High'
WHEN salary > 40000 THEN 'Medium'
ELSE 'Low'
END AS salary_category
FROM employee;
Arrays
CREATE TABLE array_example (
id INT,
names ARRAY<STRING>
);

INSERT INTO array_example VALUES

(1, array('John', 'Alice', 'Bob')),
(2, array('Sarah', 'Michael'));

SELECT id, names[1] AS first_name FROM array_example;

structure
CREATE TABLE struct_example (
id INT,
employee STRUCT<name: STRING, age: INT, department: STRING>
);

INSERT INTO struct_example VALUES

(1, STRUCT('John', 30, 'HR')),
(2, STRUCT('Alice', 25, 'IT'));

SELECT id, employee.name AS name, employee.age AS age FROM

struct_example;
Map
CREATE TABLE map_example (
id INT,
scores MAP<STRING, INT>
);
INSERT INTO map_example VALUES
(1, map('math', 90, 'science', 85)),
(2, map('math', 95, 'science', 88));

SELECT id, scores['math'] AS math_score, scores['science'] AS

science_score FROM map_example;
Builtin functions
• round() function
• hive> SELECT round(2.6) from temp;
• floor() function
• hive> SELECT floor(2.6) from temp;
• ceil() function
• hive> SELECT ceil(2.6) from temp;
• SELECT SUM(salary) AS total_salary FROM employee;
• SELECT AVG(salary) AS avg_salary FROM employee;
• SELECT COUNT(*) AS total_employees FROM employee;
• SELECT MIN(salary) AS min_salary FROM employee;
• SELECT MAX(salary) AS max_salary FROM employee;
Partitioning in Hive
• The partitioning in Hive means dividing the table into some parts
based on the values of a particular column like date, course, city or
country.
• The advantage of partitioning is that since the data is stored in slices,
the query response time becomes faster.
• we have a data of 10 million students studying in an institute.
• Now, we have to fetch the students of a particular course.
• Traditional approach, we have to go through the entire data. This
leads to performance degradation.
• Partitioning in Hive and divide the data among the different datasets
based on particular columns.
The partitioning in Hive can be executed in two
ways
• Static partitioning
• Dynamic partitioning

To know the list of tables type the command in the terminal

Haddop fs –ls /user/hive/warehouse/
Static Partitioning
• In static or manual partitioning, it is required to pass the values of
partitioned columns manually while loading the data into the table.
• Hence, the data file doesn't contain the partitioned columns.
• Example of Static Partitioning
• First, select the database in which we want to create a table.
• hive> use test;
• Create the table and provide the partitioned columns by using the
following command: -
hive> create table student (id int, name string, age int, institute string)
partitioned by (course string)
row format delimited fields terminated by ',';
• hive> describe student;
•

hive> load data local inpath '/home/cloudera/Desktop/hive/student_details1' into table

student partition(course= "java");
• hive> load data local inpath '/home/Desktop/cloudera/student_details2'
into table student partition(course= "hadoop");
hive> select * from student;
hive> select * from student where course="java";
hive> select * from student where course= "hadoop";
• Insert into table student partition (city=‘Chennai’) select id, name,
age, institute from student where city=‘Delhi’;
To know partitioning on a table
hive> show partitions table_name;
Dynamic Partitioning
• In dynamic partitioning, the values of partitioned columns exist within
the table.
• So, it is not required to pass the values of partitioned columns
manually.

• Insert into table student dynamic partition (city) select id, name, age,
institute from student where city=‘Delhi’;
First, select the database in which we want to
create a table.
• hive> use show;
• Enable the dynamic partition by using the following commands: -
• hive> set hive.exec.dynamic.partition=true;
• hive> set hive.exec.dynamic.partition.mode=nonstrict;
• Create a table to store the data.
• hive> create table stud_demo(id int, name string, age int, institute
string, course string) row format delimited fields terminated by ',';
load the data into the table.
• hive> load data local inpath '/home/cloudera/Desktop/hive/student_
details' into table stud_demo;
• Create a partition table by using the following command: -
• hive> create table student_part (id int, name string, age int, institute
string) partitioned by (course string) row format delimited
fields terminated by ',';
insert the data of table into the partition table.
hive> insert into student_part partition(course) select id, name, age, in
stitute, course from stud_demo;
hive> select * from student_part;
• hive> select * from student_part where course= "java ";
hive> select * from student_part where course=
"hadoop";
Bucketing in Hive
• The bucketing in Hive is a data organizing technique.
• It is similar to partitioning in Hive with an added functionality that it
divides large datasets into more manageable parts known as buckets.
• So, we can use bucketing in Hive when the implementation of
partitioning becomes difficult.
• However, we can also divide partitions further in buckets.
Working of Bucketing in Hive

•The concept of bucketing is based on the hashing technique.

•Here, modules of current column value and the number of required buckets is calculated (let say, F(x)
% 3).
•Now, based on the resulted value, the data is stored into the corresponding bucket.
Example of Bucketing in Hive

• hive> use showbucket;

• hive> create table emp_demo (Id int, Name string , Salary float)
row format delimited fields terminated by ',' ;
load the data into the table.
• hive> load data local inpath '/home/hive/emp_details' into table emp_
demo;
• Enable the bucketing by using the following command: -
• hive> set hive.enforce.bucketing = true;
• Create a bucketing table by using the following command: -
• hive> create table emp_bucket(Id int, Name string , Salary float)
clustered by (Id) into 3 buckets row format delimited fields terminated
by ',' ;
hive> insert overwrite table emp_bucket select *
from emp_demo;

• we can see that the data is divided into three buckets.

retrieve the data of bucket 0.
retrieve the data of bucket 1.
According to hash function :
8%3=2
5%3=2
2%3=2
So, these columns stored in bucket 2.
Views:
• The usage of view in Hive is same as that of the view in SQL.
• It is a standard RDBMS concept.
• We can execute all DML operations on a view.
• Creating a View
• hive> CREATE VIEW emp_30000 AS SELECT * FROM employee WHERE salary>30000;

Dropping a View
hive> DROP VIEW emp_30000;
Creating an Index
• An Index is nothing but a pointer on a particular column of a table.
hive> CREATE INDEX index_salary ON TABLE employee(salary)
AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler
';
• If the column is modified, the changes are stored using an index
value.
• Dropping an Index
hive> DROP INDEX index_salary ON employee;
Illustrate the PIG string functions and date and time
functions with student database
• student_id, name, dob, address
• 1, A Gopi, 1995-07-15, 123 Hyderabad
• 2, B Ravi, 1998-03-20, 456 Banglore
• 3, C Krishna, 1996-11-10, 789 Chennai
-- Load student data
• student_data = LOAD 'student_info' USING PigStorage(',') AS
(student_id:int, name:chararray, dob:chararray, address:chararray);
• -- Convert names to uppercase
student_names_upper = FOREACH student_data GENERATE
UPPER(name) AS name_upper;

Dump(student_names_upper);

-- Output: (GOPI), (RAVI), (KRISHNA)

-- Extract initials from names
• student_initials = FOREACH student_data GENERATE name,
SUBSTRING(name, 0, 1) AS initial;

• Dump(student_initials);

• -- Output: (GOPI,A), (RAVI,B), (KRISHNA,C)

-- Concatenate name and address

• student_name_address = FOREACH student_data GENERATE name,

address, CONCAT(name, ', ', address) AS name_address;
• Dump(student_name_address);

• -- Output: (A Gopi, 123 Hyderabad, B Ravi, 456 Banglore), (C Krishna,

789 Chennai)
-- Calculate age from date of birth

• student_age = FOREACH student_data GENERATE name, dob,

(int)TOTUPLE(ToUnixTime(CurrentTime()) - ToUnixTime(ToDate(dob,
'yyyy-MM-dd'))) / (365 * 24 * 60 * 60) AS age;

• Dump(student_age);

• -- Output: (A Gopi, 1995-07-15, 28), (B Ravi, 1998-03-20, 24), (C

Krishna, 1996-11-10, 25)
-- Extract year from date of birth

• student_dob_year = FOREACH student_data GENERATE name, dob,

GetYear(ToDate(dob, 'yyyy-MM-dd')) AS dob_year;
• Dump(student_dob_year);

• -- Output:
• (A Gopi, 1995-07-15, 1995), (B Ravi, 1998-03-20, 1998), (C Krishna,
1996-11-10, 1996)
-- Extract date from date of birth
• student_date = FOREACH student_data GENERATE name, dob, GetDay(ToDate(dob, 'yyyy-MM-
dd')) AS dob_day;

• Dump(student_date);

• -- Extract month from date of birth

• student_month = FOREACH student_data GENERATE name, dob, GetMonth(ToDate(dob, 'yyyy-
MM-dd')) AS dob_month;

• -- Display the contents of the student_month relation DUMP

Dump(student_month);

Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
Hive Final
No ratings yet
Hive Final
75 pages
HIVE
No ratings yet
HIVE
28 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Unit IV
No ratings yet
Unit IV
22 pages
Unit 5 Handouts
No ratings yet
Unit 5 Handouts
16 pages
Bda-Unit-Iv - 2020-21
100% (1)
Bda-Unit-Iv - 2020-21
30 pages
Hive and Pig
No ratings yet
Hive and Pig
57 pages
Introduction To Hive
No ratings yet
Introduction To Hive
14 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Hiveppt
No ratings yet
Hiveppt
29 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Unit IV
No ratings yet
Unit IV
64 pages
Module 3-1
No ratings yet
Module 3-1
32 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Big Data
No ratings yet
Big Data
120 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Practical-2 Hive (Show - Create - Load Commands)
No ratings yet
Practical-2 Hive (Show - Create - Load Commands)
13 pages
Hive
No ratings yet
Hive
9 pages
Hive
No ratings yet
Hive
26 pages
Big Data Analytics and Developers Training Session 10
No ratings yet
Big Data Analytics and Developers Training Session 10
27 pages
Unit 3
No ratings yet
Unit 3
8 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
59 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Hive
No ratings yet
Hive
47 pages
Hive Main
No ratings yet
Hive Main
33 pages
Hive
No ratings yet
Hive
45 pages
Hive
No ratings yet
Hive
29 pages
Hive
No ratings yet
Hive
13 pages
Hive PPTs
No ratings yet
Hive PPTs
34 pages
Hive
No ratings yet
Hive
42 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Unit Iv Part - 1
No ratings yet
Unit Iv Part - 1
60 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Hive Notes
No ratings yet
Hive Notes
15 pages
Hive Part 2
No ratings yet
Hive Part 2
53 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Hive
No ratings yet
Hive
65 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Amex - Campus JD - GSG - Sales and Business Enablement (SABE)
No ratings yet
Amex - Campus JD - GSG - Sales and Business Enablement (SABE)
2 pages
Cloud Computing Applications Part 2 Final
No ratings yet
Cloud Computing Applications Part 2 Final
79 pages
Cassandra and DataStax Enterprise Essentials
No ratings yet
Cassandra and DataStax Enterprise Essentials
38 pages
Dynamodb DG
No ratings yet
Dynamodb DG
1,383 pages
MBDHC 2
No ratings yet
MBDHC 2
23 pages
HIVE Installation
No ratings yet
HIVE Installation
3 pages
Apache Pig Tutorial PDF
0% (1)
Apache Pig Tutorial PDF
21 pages
Faculty of Informatics MCA V Semester (CBSE) Examination, 2021 Subject: Big Data Analytics Lab Question Bank
No ratings yet
Faculty of Informatics MCA V Semester (CBSE) Examination, 2021 Subject: Big Data Analytics Lab Question Bank
2 pages
Vembu M
No ratings yet
Vembu M
3 pages
Big Data (KCS-061)
No ratings yet
Big Data (KCS-061)
46 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Hive
No ratings yet
Hive
37 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
19 pages
Apache Spark For Beginners
No ratings yet
Apache Spark For Beginners
30 pages
Shamama Iqbal Resume
No ratings yet
Shamama Iqbal Resume
2 pages
Medical Big Data Warehouse: Architecture and System Design, A Case Study: Improving Healthcare Resources Distribution
No ratings yet
Medical Big Data Warehouse: Architecture and System Design, A Case Study: Improving Healthcare Resources Distribution
16 pages
Cloud Unit V
No ratings yet
Cloud Unit V
23 pages
Detailed Big Data and Hadoop Notes
No ratings yet
Detailed Big Data and Hadoop Notes
3 pages
Lab Guide - PDF - EN
No ratings yet
Lab Guide - PDF - EN
114 pages
Bigdata 11
No ratings yet
Bigdata 11
12 pages
Building Cost-Based Query Optimizers With Apache Calcite
No ratings yet
Building Cost-Based Query Optimizers With Apache Calcite
33 pages
Best Practices For Resource Management in Hadoop: James Kochuba, SAS Institute Inc., Cary, NC
No ratings yet
Best Practices For Resource Management in Hadoop: James Kochuba, SAS Institute Inc., Cary, NC
10 pages
BCS714D-syllabus
No ratings yet
BCS714D-syllabus
3 pages
Do We Need The Lakehouse Architecture - by Vu Trinh - Apr, 2024 - Data Engineer Things
No ratings yet
Do We Need The Lakehouse Architecture - by Vu Trinh - Apr, 2024 - Data Engineer Things
19 pages
Thesis Apache Spark
100% (2)
Thesis Apache Spark
4 pages
Working of Hive 2
No ratings yet
Working of Hive 2
7 pages
Azure Data Factory
No ratings yet
Azure Data Factory
5 pages
Bda MQP 1
No ratings yet
Bda MQP 1
29 pages
Ajai Chaganti AH
No ratings yet
Ajai Chaganti AH
6 pages
Hive - PIG - HBase - Zookeeper
100% (1)
Hive - PIG - HBase - Zookeeper
31 pages

HIVE

Uploaded by

HIVE

Uploaded by

HIVE

It follows SQL-like queries. It follows the data-flow language.

It can handle structured data. It can handle semi-structured data.

It works on server-side of HDFS cluster. It works on client-side of HDFS cluster.

Hive is slower than Pig. Pig is comparatively faster than Hive.

Type Size Range

FLOAT 4-byte Single precision

• It supports traditional UNIX timestamp with optional nanosecond

Map It contains the key-value tuples where map('first','James','last','Roy')

Array It is a collection of similar type of values array('James','Roy')

hive> show databases;

hive> drop database demo;

hive> drop database if exists demo cascade;

Operator Operand Description

INSERT INTO array_example VALUES

SELECT id, names[1] AS first_name FROM array_example;

INSERT INTO struct_example VALUES

SELECT id, employee.name AS name, employee.age AS age FROM

SELECT id, scores['math'] AS math_score, scores['science'] AS

To know the list of tables type the command in the terminal

hive> load data local inpath '/home/cloudera/Desktop/hive/student_details1' into table

•The concept of bucketing is based on the hashing technique.

• hive> use showbucket;

• we can see that the data is divided into three buckets.

-- Output: (GOPI), (RAVI), (KRISHNA)

• -- Output: (GOPI,A), (RAVI,B), (KRISHNA,C)

• student_name_address = FOREACH student_data GENERATE name,

• -- Output: (A Gopi, 123 Hyderabad, B Ravi, 456 Banglore), (C Krishna,

• student_age = FOREACH student_data GENERATE name, dob,

• -- Output: (A Gopi, 1995-07-15, 28), (B Ravi, 1998-03-20, 24), (C

• student_dob_year = FOREACH student_data GENERATE name, dob,

• -- Extract month from date of birth

• -- Display the contents of the student_month relation DUMP

You might also like