Partition Concepts
Partition Concepts
Dr Md Nadeem Ahmed
PARTITION CONCEPTS
Partitioning is a way of dividing a table into related parts based on the values of particular columns like date,
city, and department. Using partition, it is easy to query a portion of the data.
The Hive was introduced to lower down this burden of data querying. Apache Hive converts the SQL queries
into MapReduce jobs and then submits it to the Hadoop cluster. When we submit a SQL query, Hive read the
entire data-set.
So, it becomes inefficient to run MapReduce jobs over a large table. Thus, this is resolved by creating partitions
in tables. Apache Hive makes this job of implementing partitions very easy by creating partitions by its
automatic partition scheme at the time of table creation.
Example:-
Then, the query searches the whole table for the required information. But if we partition the client data with the
year and store it in a separate file, this will reduce the query processing time. The below example will help us to
learn how to partition a file and its data-
tab1/clientdata/file1
id, name, dept, yoj
1, sunny, SC, 2009
2, animesh, HR, 2009
3, sumeer, SC, 2010
4, sarthak, TP, 2010[/php]
Now, let us partition above data into two files using years
[php]tab1/clientdata/2009/file2
1, sunny, SC, 2009
2, animesh, HR, 2009
tab1/clientdata/2010/file3
3, sumeer, SC, 2010
4, sarthak, TP, 2010
Now when we are retrieving the data from the table, only the data of the specified partition will be queried.
Creating a partitioned table is as follows:
CREATE TABLE table_tab1 (id INT, name STRING, dept STRING, yoj INT)
Fo EX:-
PARTITION (year=’2009′);
LOAD DATA LOCAL INPATH filepath/file3’OVERWRITE INTO TABLE studentTab PARTITION
(year=’2010′)
• Static Partitioning
• Dynamic Partitioning
Static Partitioning
In Static Partitioning, we have to manually decide how many partitions tables will have and also value for those
partitions.
Dynamic Partitioning
Dynamic partitions provide us with flexibility and create partitions automatically depending on the data that we
are inserting into the table.
By default, Hive does not enable dynamic partition. This is to protect us, from creating from a huge number of
partitions accidentally. In dynamic partition, we are telling hive which column to use for dynamic partition.
This will allow us to create dynamic partitions in the table without any static partition: -
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
Show All Partitions on Hive Table: -
A new partition can be added to the table using the ALERT TABLE statement, you can also specify the location
where you wanted to store partition data on HDFS
Using: - ALTER TABLE, you can also rename or update the specific partition.
Dropping a partition can also be performed using :-ALTER TABLE tablename DROP
EXAMPLE:- ALTER TABLE sales DROP IF EXISTS PARTITION(year = 2020, quarter = 2);