0% found this document useful (0 votes)

35 views38 pages

Hive File Format

The document provides an overview of Hive file formats, including Text File, Sequential File, and RCFile, along with their pros and cons. It also covers Hive Query Language (HQL), including DDL and DML statements, and details about internal and external tables, partitioning, bucketing, views, subqueries, joins, and aggregation functions. Additionally, it explains how to manage data loading and querying efficiently in Hive.

Uploaded by

mohammed.twaha08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views38 pages

Hive File Format

Uploaded by

mohammed.twaha08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

HIVE FILE FORMAT

• The file formats in Hive specify how records are encoded in a file.
• The main file formats in Hive are:
1. Text File
2. Sequential File
3. RCFile(Record Columnar File)
TEXT FILE
• The default file format is text file.
• In this format, each record is a line in the file.
• Different control characters are used as delimiters.
• The delimiters are ^A(octal 001, separates all fields), ^B(octal 002,
separates the elements in the array or struct), ^C(octal 003, separates
key-value pair), and \n
• The supported text files are CSV and TSV.JSON or XML documents
can be specified as text file.
• Pros
Simple and easy to use
Human-readable
• Cons
Inefficient storage (no compression)
Slow performance due to lack of indexing
Example:
• CREATE TABLE students (
id INT,
name STRING,
marks FLOAT
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',’
STORED AS TEXTFILE;
Sequential File
• Sequential files are flat files that store binary key-value pairs.
• It includes compression support, which reduces the CPU, I/O
requirement
• Pros:
Supports compression (Snappy, Gzip, etc.)
Faster read/write compared to text files.
• Cons:
Not human-readable
Example:
• CREATE TABLE students_seq (
id INT,
name STRING,
marks FLOAT
) STORED AS SEQUENCEFILE;
RCFile(Record Columnar File)
• RCFile stores the data in Column Oriented Manner.
HIVE QUERY
LANGUAGE(HQL)
HIVE QUERY LANGUAGE(HQL)
• Hive query language provides basic SQL-like operations.
• Few tasks:-
1. Create and manage tables and partitions.
2. Support various Relational, Arithmetic, and Logical Operators.
3. Evaluate functions.
4. Download the contents of a table to a local directory or the result of
queries to the HDFS directory.
DDL(Data Definition Language)
Statements
• Used to build and modify tables and other objects in the database.
• Commands:
1. Create/Drop/Alter Database
2. Create/Drop/Truncate Table
3. Alter Table/Partition/Column
4. Create/Drop/Alter View
5. Create/Drop/Alter Index
6. Show
7. Describe
DML(Data Manipulation Language)
Statements
• Used to retrieve, store, modify, delete, and update data in the database.
• Commands:
1. Loading files into table.
2. Inserting data into Hive Tables from queries.
• Starting Hive Shell:
Database
• A database is like a container for data. It has a collection of tables that
house the data.
Tables
Two kinds of table
1. Internal or Manged Table
2. External Table

Internal Tables (Managed Tables)

1. Hive stores managed tables under the warehouse folder.
2. Hive manages the entire lifecycle of these tables.
3. When an internal table is dropped, both data and metadata are removed.
To check if a table is managed or external:
DESCRIBE FORMATTED STUDENT;
It shows metadata, including the table type (MANAGED_TABLE or EXTERNAL_TABLE).
External (Self-Managed) Tables
1. Dropping an external table does NOT delete its data.
2. Use the EXTERNAL keyword to create one.
3. Specify a location for data storage.
Loading Data into a Table

Difference Between INTO TABLE and OVERWRITE TABLE

INTO TABLE: Appends new data to the existing table.
OVERWRITE TABLE: Replaces the existing data with new data.
Example Scenario:
EXT_STUDENT table has 100 records.
student.tsv contains 10 records.
LOAD DATA ... INTO TABLE EXT_STUDENT; → Table now has 110 records.
LOAD DATA ... OVERWRITE TABLE EXT_STUDENT; → Table now has 10 records.
Collection Data Types
Partitions
By default, queries scan the entire dataset, even with a WHERE filter. Partitioning reduces the I/O operations
and improves query performance.
Example Scenario:
• XYZ Enterprise has customer data across multiple US states. The IT team needs state-wide sales reports.
• Two options:
1. Without Partitioning
Run queries with WHERE StateName = "A", scanning all records every time. Inefficient for large
datasets.
2. With Partitioning
Create separate folders for each state. Querying a state scans only its folder.

Types of Partitioning
1. Static Partitioning
User manually assigns data to partitions.
Example: Placing State A’s data in a folder named State_A.
2. Dynamic Partitioning
Hive automatically creates partitions based on unique values in a column.
Static Partitioning

Adding Another Partition:

ALTER TABLE STATIC_PART_STUDENT ADD PARTITION (gpa = 3.5);
Dynamic Partitioning
Objective: Create a dynamic partition based on the gpa column.
Table Creation:
CREATE TABLE IF NOT EXISTS DYNAMIC_PART_STUDENT (
rollno INT,
name STRING
)
PARTITIONED BY (gpa FLOAT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
Enabling Dynamic Partitioning:
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
Loading Data into Dynamic Partition:
INSERT OVERWRITE TABLE DYNAMIC_PART_STUDENT
PARTITION (gpa)
SELECT rollno, name, gpa FROM EXT_STUDENT;
Bucketing
• Partitioning creates directories for each unique column value.
• Bucketing groups data into a fixed number of buckets (files) instead of creating thousands of partitions.
Objective: Implement Bucketing.
Table Creation:
CREATE TABLE IF NOT EXISTS STUDENT (
rollno INT,
name STRING,
grade FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
Loading Data:
LOAD DATA LOCAL INPATH '/root/hivedemos/student.tsv' INTO TABLE STUDENT;
Enabling Bucketing:
SET hive.enforce.bucketing = true;
Objective: Create a bucketed table with 3 buckets.
Table Creation:
CREATE TABLE IF NOT EXISTS STUDENT_BUCKET (
rollno INT,
name STRING,
grade FLOAT
)
CLUSTERED BY (grade) INTO 3 BUCKETS;
Loading Data:
FROM STUDENT
INSERT OVERWRITE TABLE STUDENT_BUCKET
SELECT rollno, name, grade;
Querying Buckets:
SELECT DISTINCT GRADE FROM STUDENT_BUCKET
TABLESAMPLE(BUCKET 1 OUT OF 3 ON GRADE);
Views
Views are logical objects in Hive (available from version 0.6). Views do not store data, they act as saved
queries.

Objective: Create a view named STUDENT_VIEW.

CREATE VIEW STUDENT_VIEW AS
SELECT rollno, name FROM EXT_STUDENT;
Querying a View:
SELECT * FROM STUDENT_VIEW LIMIT 4;
Dropping a View:
DROP VIEW STUDENT_VIEW;
Subqueries
Supported only in the FROM clause (Hive 0.12). Subquery columns must have unique names.
Objective: Count occurrences of words in a file.
CREATE TABLE docs (line STRING);

LOAD DATA LOCAL INPATH '/root/hivedemos/lines.txt'

OVERWRITE INTO TABLE docs;

CREATE TABLE word_count AS

SELECT word, COUNT(1) AS count FROM
(SELECT explode(split(line, ' ')) AS word FROM docs) w
GROUP BY word
ORDER BY word;

SELECT * FROM word_count;

explode() splits text into words and outputs them as separate rows.
Joins
Joins work similarly to SQL.
Objective: Join STUDENT and DEPARTMENT tables using rollno.
Table Creation:
CREATE TABLE IF NOT EXISTS STUDENT (
rollno INT,
name STRING,
gpa FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

CREATE TABLE IF NOT EXISTS DEPARTMENT (

rollno INT,
deptno INT,
name STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
Loading Data:
LOAD DATA LOCAL INPATH '/root/hivedemos/student.tsv'
OVERWRITE INTO TABLE STUDENT;

LOAD DATA LOCAL INPATH '/root/hivedemos/department.tsv'

OVERWRITE INTO TABLE DEPARTMENT;
Performing a Join:
SELECT a.rollno, a.name, a.gpa, b.deptno
FROM STUDENT a
JOIN DEPARTMENT b
ON a.rollno = b.rollno;

Aggregation
Hive supports functions like AVG, COUNT, etc.
Objective: Perform aggregation functions.
SELECT AVG(gpa) FROM STUDENT;
SELECT COUNT(*) FROM STUDENT;
GROUP BY and HAVING

GROUP BY groups data based on column values. HAVING filters groups that meet a condition.
Objective: Group by rollno, name, and gpa, and filter gpa > 4.0.
SELECT rollno, name, gpa
FROM STUDENT
GROUP BY rollno, name, gpa
HAVING gpa > 4.0;

Student Management-Project Report Tkinter Mysql
85% (13)
Student Management-Project Report Tkinter Mysql
46 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
CP3000 Service Manual v4
100% (5)
CP3000 Service Manual v4
1,018 pages
Fagor CNC 8025 - 8030
No ratings yet
Fagor CNC 8025 - 8030
255 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Big Data Analytics: Seema Acharya Subhashini Chellappan
100% (1)
Big Data Analytics: Seema Acharya Subhashini Chellappan
47 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
Hive and Pig
No ratings yet
Hive and Pig
57 pages
Hive Interview
75% (4)
Hive Interview
17 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Bda-Unit-Iv - 2020-21
100% (1)
Bda-Unit-Iv - 2020-21
30 pages
Painel 320B CATERPILLAR
100% (7)
Painel 320B CATERPILLAR
6 pages
Launch EPANET PDF
No ratings yet
Launch EPANET PDF
13 pages
Manual Zebra mc9300
No ratings yet
Manual Zebra mc9300
195 pages
Big Data
No ratings yet
Big Data
120 pages
HIVE
No ratings yet
HIVE
80 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Quiz HTML CSS
No ratings yet
Quiz HTML CSS
11 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Hive
No ratings yet
Hive
65 pages
Hive
No ratings yet
Hive
45 pages
Semester6 Major Project Final
No ratings yet
Semester6 Major Project Final
58 pages
LAB 2. Antenna Array: 1 Starting HFSS
No ratings yet
LAB 2. Antenna Array: 1 Starting HFSS
32 pages
Wa0006.
No ratings yet
Wa0006.
53 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Hive
No ratings yet
Hive
42 pages
Computerized Medical Lab Record System
No ratings yet
Computerized Medical Lab Record System
41 pages
Data Representation 1
No ratings yet
Data Representation 1
40 pages
BDA011GU04
No ratings yet
BDA011GU04
49 pages
PCS-902S X Selection Guide en Overseas General X R1.00
No ratings yet
PCS-902S X Selection Guide en Overseas General X R1.00
43 pages
Hive Main
No ratings yet
Hive Main
33 pages
HIVE
No ratings yet
HIVE
24 pages
Hive Basics
No ratings yet
Hive Basics
35 pages
Laudon Textbook Chapter 8 W Lecture Notes v2
No ratings yet
Laudon Textbook Chapter 8 W Lecture Notes v2
46 pages
Hive Query Language
No ratings yet
Hive Query Language
33 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Big Data Analytics and Developers Training Session 10
No ratings yet
Big Data Analytics and Developers Training Session 10
27 pages
Hive Documet
No ratings yet
Hive Documet
33 pages
HIVE
No ratings yet
HIVE
28 pages
Module 4
No ratings yet
Module 4
34 pages
Hive Pig PDF
No ratings yet
Hive Pig PDF
20 pages
Hive Intoduction and Tables
No ratings yet
Hive Intoduction and Tables
31 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Hive
No ratings yet
Hive
29 pages
CD Manual
No ratings yet
CD Manual
30 pages
Hive Main
No ratings yet
Hive Main
24 pages
M4 Q&a
No ratings yet
M4 Q&a
22 pages
Hive Table Session
No ratings yet
Hive Table Session
23 pages
Hive
No ratings yet
Hive
9 pages
HDFSandhivecommands
No ratings yet
HDFSandhivecommands
15 pages
Hive Presentation
No ratings yet
Hive Presentation
18 pages
Hive Notes PDF
No ratings yet
Hive Notes PDF
12 pages
2003.03.12 S.B. No. 5460R2 ELECTRONIC ENGINE CONTROL - REPROGRAMMINGREPLACEMENT OF
No ratings yet
2003.03.12 S.B. No. 5460R2 ELECTRONIC ENGINE CONTROL - REPROGRAMMINGREPLACEMENT OF
15 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
13 pages
Hive
No ratings yet
Hive
15 pages
Functional System Testing: Written by Adam Carmi
No ratings yet
Functional System Testing: Written by Adam Carmi
25 pages
Hive Commands
No ratings yet
Hive Commands
15 pages
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
No ratings yet
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
11 pages
Data Management 1 8
No ratings yet
Data Management 1 8
11 pages
Midterm Module 2 Week 9 Server Installation
No ratings yet
Midterm Module 2 Week 9 Server Installation
12 pages
A New Formulation of The Theorems of Hurwitz, Routh and Sturm
No ratings yet
A New Formulation of The Theorems of Hurwitz, Routh and Sturm
11 pages
Practical-2 Hive (Show - Create - Load Commands)
No ratings yet
Practical-2 Hive (Show - Create - Load Commands)
13 pages
VR-frontier-india by QWR (SURAJ AIAR)
No ratings yet
VR-frontier-india by QWR (SURAJ AIAR)
10 pages
Lab6E - Creating Hive Partition Table
No ratings yet
Lab6E - Creating Hive Partition Table
11 pages
Computer Fundamentals Assignment: Question 1:what Are Input Devices? Discuss Any Two Input Devices. Answer
No ratings yet
Computer Fundamentals Assignment: Question 1:what Are Input Devices? Discuss Any Two Input Devices. Answer
12 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
HIVE Data Types
No ratings yet
HIVE Data Types
6 pages
Hive Commands
No ratings yet
Hive Commands
7 pages
HIVE Architecture
No ratings yet
HIVE Architecture
5 pages
Erreur 0 104 PDF
No ratings yet
Erreur 0 104 PDF
2 pages
Hive
No ratings yet
Hive
4 pages
Fitness App SRS
No ratings yet
Fitness App SRS
3 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
4 pages
DSGN8300 Final Project
No ratings yet
DSGN8300 Final Project
2 pages
Accessing Hadoop Data Using Hive: Hive DDL - VIDEO 1
No ratings yet
Accessing Hadoop Data Using Hive: Hive DDL - VIDEO 1
3 pages
Special Exam Prelim Ias1 Bsit
No ratings yet
Special Exam Prelim Ias1 Bsit
1 page
Me - Cheat.loader Logcat
No ratings yet
Me - Cheat.loader Logcat
2 pages
Swift Sanctions List Distribution Factsheet
No ratings yet
Swift Sanctions List Distribution Factsheet
2 pages
Partner Quotes: Licensing
No ratings yet
Partner Quotes: Licensing
2 pages
Cubic Spline Method
No ratings yet
Cubic Spline Method
2 pages
CV Template Admin Assistant
No ratings yet
CV Template Admin Assistant
2 pages
SQL Server 2014 Development Essentials
From Everand
SQL Server 2014 Development Essentials
Basit A. Masood-Al-Farooq
4.5/5 (2)
SQL Tutorial For Beginners
From Everand
SQL Tutorial For Beginners
HAU DANG
No ratings yet
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Learn SQLite in 24 Hours
From Everand
Learn SQLite in 24 Hours
Alex Nordeen
No ratings yet

Hive File Format

Uploaded by

Hive File Format

Uploaded by

HIVE FILE FORMAT

HIVE FILE FORMAT

Internal Tables (Managed Tables)

Difference Between INTO TABLE and OVERWRITE TABLE

Adding Another Partition:

Objective: Create a view named STUDENT_VIEW.

LOAD DATA LOCAL INPATH '/root/hivedemos/lines.txt'

CREATE TABLE word_count AS

SELECT * FROM word_count;

CREATE TABLE IF NOT EXISTS DEPARTMENT (

LOAD DATA LOCAL INPATH '/root/hivedemos/department.tsv'

You might also like