0% found this document useful (0 votes)
9 views4 pages

Partition Concepts

The document provides an overview of partitioning in Apache Hive, explaining its importance for efficient data querying in large datasets stored in HDFS. It details how to create partitions using the PARTITIONED BY clause, and distinguishes between static and dynamic partitioning methods. Additionally, it includes examples of creating, loading, altering, and dropping partitions in Hive tables.

Uploaded by

Akshay Rathore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Partition Concepts

The document provides an overview of partitioning in Apache Hive, explaining its importance for efficient data querying in large datasets stored in HDFS. It details how to create partitions using the PARTITIONED BY clause, and distinguishes between static and dynamic partitioning methods. Additionally, it includes examples of creating, loading, altering, and dropping partitions in Hive tables.

Uploaded by

Akshay Rathore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Apex Institute of Technology

Department of Computer Science & Engineering


Bachelor of Engineering (Computer Science & Engineering)

INTRODUCTION TO BDA– (21CST-246)

Prepared By: Dr Md Nadeem Ahmed(E13733)


(Assistant Professor)

Dr Md Nadeem Ahmed
PARTITION CONCEPTS
Partitioning is a way of dividing a table into related parts based on the values of particular columns like date,
city, and department. Using partition, it is easy to query a portion of the data.

Why is Partitioning Important?


In the current century, we know that the huge amount of data which is in the range of petabytes is getting stored
in HDFS. So due to this, it becomes very difficult for Hadoop users to query this huge amount of data.

The Hive was introduced to lower down this burden of data querying. Apache Hive converts the SQL queries
into MapReduce jobs and then submits it to the Hadoop cluster. When we submit a SQL query, Hive read the
entire data-set.

So, it becomes inefficient to run MapReduce jobs over a large table. Thus, this is resolved by creating partitions
in tables. Apache Hive makes this job of implementing partitions very easy by creating partitions by its
automatic partition scheme at the time of table creation.

How to Create Partitions in Hive?


Using PARTITIONED BY Clause

Example:-

CREATE TABLE table_name (column1 data_type, column2 data_type)

PARTITIONED BY (partition1 data_type, partition2 data_type,….);

Hive Data Partitioning Example


Now let’s understand data partitioning in Hive with an example. Consider a table named Tab1. The table
contains client detail like id, name, dept, and yoj( year of joining). Suppose we need to retrieve the details of all
the clients who joined in 2012.

Then, the query searches the whole table for the required information. But if we partition the client data with the
year and store it in a separate file, this will reduce the query processing time. The below example will help us to
learn how to partition a file and its data-

The file name says file1 contains client data table:

tab1/clientdata/file1
id, name, dept, yoj
1, sunny, SC, 2009
2, animesh, HR, 2009
3, sumeer, SC, 2010
4, sarthak, TP, 2010[/php]
Now, let us partition above data into two files using years
[php]tab1/clientdata/2009/file2
1, sunny, SC, 2009
2, animesh, HR, 2009
tab1/clientdata/2010/file3
3, sumeer, SC, 2010
4, sarthak, TP, 2010
Now when we are retrieving the data from the table, only the data of the specified partition will be queried.
Creating a partitioned table is as follows:

CREATE TABLE table_tab1 (id INT, name STRING, dept STRING, yoj INT)

PARTITIONED BY (year STRING);


LOAD DATA LOCAL INPATH filepath/file2’OVERWRITE INTO TABLE studentTab

Fo EX:-

PARTITION (year=’2009′);
LOAD DATA LOCAL INPATH filepath/file3’OVERWRITE INTO TABLE studentTab PARTITION
(year=’2010′)

Types of Hive Partitioning

• Static Partitioning
• Dynamic Partitioning

Static Partitioning

In Static Partitioning, we have to manually decide how many partitions tables will have and also value for those
partitions.

Dynamic Partitioning

Dynamic partitions provide us with flexibility and create partitions automatically depending on the data that we
are inserting into the table.

By default, Hive does not enable dynamic partition. This is to protect us, from creating from a huge number of
partitions accidentally. In dynamic partition, we are telling hive which column to use for dynamic partition.

This will allow us to create dynamic partitions in the table without any static partition: -

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;
Show All Partitions on Hive Table: -

SHOW PARTITIONS Table_name.

Add New Partition to the Hive Table: -

A new partition can be added to the table using the ALERT TABLE statement, you can also specify the location
where you wanted to store partition data on HDFS

ALTER TABLE Table_Name ADD PARTITION (partitionColumn = 'value1') location 'loc1';

Example: - ALTER TABLE zipcodes ADD PARTITION (state='CA') LOCATION


'/user/data/zipcodes_ca';

Rename or Update Hive Partition:-

Using: - ALTER TABLE, you can also rename or update the specific partition.

Example: - ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO


PARTITION (state='NY');

Drop Hive Partition

Dropping a partition can also be performed using :-ALTER TABLE tablename DROP

EXAMPLE:- ALTER TABLE sales DROP IF EXISTS PARTITION(year = 2020, quarter = 2);

You might also like