0% found this document useful (0 votes)

9 views4 pages

Partition Concepts

The document provides an overview of partitioning in Apache Hive, explaining its importance for efficient data querying in large datasets stored in HDFS. It details how to create partitions using the PARTITIONED BY clause, and distinguishes between static and dynamic partitioning methods. Additionally, it includes examples of creating, loading, altering, and dropping partitions in Hive tables.

Uploaded by

Akshay Rathore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

Partition Concepts

Uploaded by

Akshay Rathore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Apex Institute of Technology

Department of Computer Science & Engineering

Bachelor of Engineering (Computer Science & Engineering)

INTRODUCTION TO BDA– (21CST-246)

Prepared By: Dr Md Nadeem Ahmed(E13733)

(Assistant Professor)

Dr Md Nadeem Ahmed
PARTITION CONCEPTS
Partitioning is a way of dividing a table into related parts based on the values of particular columns like date,
city, and department. Using partition, it is easy to query a portion of the data.

Why is Partitioning Important?

In the current century, we know that the huge amount of data which is in the range of petabytes is getting stored
in HDFS. So due to this, it becomes very difficult for Hadoop users to query this huge amount of data.

The Hive was introduced to lower down this burden of data querying. Apache Hive converts the SQL queries
into MapReduce jobs and then submits it to the Hadoop cluster. When we submit a SQL query, Hive read the
entire data-set.

So, it becomes inefficient to run MapReduce jobs over a large table. Thus, this is resolved by creating partitions
in tables. Apache Hive makes this job of implementing partitions very easy by creating partitions by its
automatic partition scheme at the time of table creation.

How to Create Partitions in Hive?

Using PARTITIONED BY Clause

Example:-

CREATE TABLE table_name (column1 data_type, column2 data_type)

PARTITIONED BY (partition1 data_type, partition2 data_type,….);

Hive Data Partitioning Example

Now let’s understand data partitioning in Hive with an example. Consider a table named Tab1. The table
contains client detail like id, name, dept, and yoj( year of joining). Suppose we need to retrieve the details of all
the clients who joined in 2012.

Then, the query searches the whole table for the required information. But if we partition the client data with the
year and store it in a separate file, this will reduce the query processing time. The below example will help us to
learn how to partition a file and its data-

The file name says file1 contains client data table:

tab1/clientdata/file1
id, name, dept, yoj
1, sunny, SC, 2009
2, animesh, HR, 2009
3, sumeer, SC, 2010
4, sarthak, TP, 2010[/php]
Now, let us partition above data into two files using years
[php]tab1/clientdata/2009/file2
1, sunny, SC, 2009
2, animesh, HR, 2009
tab1/clientdata/2010/file3
3, sumeer, SC, 2010
4, sarthak, TP, 2010
Now when we are retrieving the data from the table, only the data of the specified partition will be queried.
Creating a partitioned table is as follows:

CREATE TABLE table_tab1 (id INT, name STRING, dept STRING, yoj INT)

PARTITIONED BY (year STRING);

LOAD DATA LOCAL INPATH filepath/file2’OVERWRITE INTO TABLE studentTab

Fo EX:-

PARTITION (year=’2009′);
LOAD DATA LOCAL INPATH filepath/file3’OVERWRITE INTO TABLE studentTab PARTITION
(year=’2010′)

Types of Hive Partitioning

• Static Partitioning
• Dynamic Partitioning

Static Partitioning

In Static Partitioning, we have to manually decide how many partitions tables will have and also value for those
partitions.

Dynamic Partitioning

Dynamic partitions provide us with flexibility and create partitions automatically depending on the data that we
are inserting into the table.

By default, Hive does not enable dynamic partition. This is to protect us, from creating from a huge number of
partitions accidentally. In dynamic partition, we are telling hive which column to use for dynamic partition.

This will allow us to create dynamic partitions in the table without any static partition: -

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;
Show All Partitions on Hive Table: -

SHOW PARTITIONS Table_name.

Add New Partition to the Hive Table: -

A new partition can be added to the table using the ALERT TABLE statement, you can also specify the location
where you wanted to store partition data on HDFS

ALTER TABLE Table_Name ADD PARTITION (partitionColumn = 'value1') location 'loc1';

Example: - ALTER TABLE zipcodes ADD PARTITION (state='CA') LOCATION

'/user/data/zipcodes_ca';

Rename or Update Hive Partition:-

Using: - ALTER TABLE, you can also rename or update the specific partition.

Example: - ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO

PARTITION (state='NY');

Drop Hive Partition

Dropping a partition can also be performed using :-ALTER TABLE tablename DROP

EXAMPLE:- ALTER TABLE sales DROP IF EXISTS PARTITION(year = 2020, quarter = 2);

P30 Lite (HL2MARM) Schematic Diagram
100% (4)
P30 Lite (HL2MARM) Schematic Diagram
74 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
21 pages
SQL Server Partitioning
100% (2)
SQL Server Partitioning
20 pages
Full Syllabus Test No. 9 - HPSC Subjective
No ratings yet
Full Syllabus Test No. 9 - HPSC Subjective
6 pages
Dynamic UserForm v1.0
No ratings yet
Dynamic UserForm v1.0
17 pages
5th & 6th Sem BCA Syllabus
No ratings yet
5th & 6th Sem BCA Syllabus
12 pages
Diagnostic Lab Management System
No ratings yet
Diagnostic Lab Management System
19 pages
SAP HANA Database - Partitioning and Distribution of Large Tables PDF
No ratings yet
SAP HANA Database - Partitioning and Distribution of Large Tables PDF
14 pages
Data Partitioning Action Plan
No ratings yet
Data Partitioning Action Plan
8 pages
MCB Properties Manual
No ratings yet
MCB Properties Manual
188 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
2 Partitioning+QC+Done
No ratings yet
2 Partitioning+QC+Done
74 pages
Design and Construction of Voice Activation Control Project (1 & 2) - For Merge (1) (Repaired) (Repaired) (Repaired)
No ratings yet
Design and Construction of Voice Activation Control Project (1 & 2) - For Merge (1) (Repaired) (Repaired) (Repaired)
55 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Hive
No ratings yet
Hive
42 pages
Hive
No ratings yet
Hive
29 pages
Hive File Format
No ratings yet
Hive File Format
38 pages
Parallel Databases
No ratings yet
Parallel Databases
19 pages
Hive Main
No ratings yet
Hive Main
33 pages
Hive Query Language
No ratings yet
Hive Query Language
33 pages
Hive
No ratings yet
Hive
65 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Network Routing Domain
No ratings yet
Network Routing Domain
11 pages
Unit1 Introduction and Operating System Structures
No ratings yet
Unit1 Introduction and Operating System Structures
41 pages
En - Wikipedia - Org - Wiki - Windows - 10# - Text Windows 10 Is A Major Release of The, For The General Public On July 29, 2015
No ratings yet
En - Wikipedia - Org - Wiki - Windows - 10# - Text Windows 10 Is A Major Release of The, For The General Public On July 29, 2015
37 pages
HDFSandhivecommands
No ratings yet
HDFSandhivecommands
15 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Internet Safety Quiz - Rezultati
No ratings yet
Internet Safety Quiz - Rezultati
27 pages
HIVE
No ratings yet
HIVE
24 pages
Hive Main
No ratings yet
Hive Main
24 pages
Program Design
No ratings yet
Program Design
17 pages
Hive
No ratings yet
Hive
9 pages
Hive Cammand
No ratings yet
Hive Cammand
22 pages
Syllabus - Intro To Networking Engineering
No ratings yet
Syllabus - Intro To Networking Engineering
1 page
Erfo Rma Nce With L5. 1 An D5. 5 Tion Ing: Giuseppe Maxia Mysql Community Team Lead Sun Microsystems
No ratings yet
Erfo Rma Nce With L5. 1 An D5. 5 Tion Ing: Giuseppe Maxia Mysql Community Team Lead Sun Microsystems
103 pages
Oracle Partitioning For Developers
No ratings yet
Oracle Partitioning For Developers
70 pages
Database Partitioning With MySQL
No ratings yet
Database Partitioning With MySQL
6 pages
Oracle Performance Tuning - Oracle Partitioning - Introduction
No ratings yet
Oracle Performance Tuning - Oracle Partitioning - Introduction
57 pages
Basics of Partitioning
100% (1)
Basics of Partitioning
2 pages
3 SQL Hadoop Analyzing Big Data Hive m3 Hiveql Slides
No ratings yet
3 SQL Hadoop Analyzing Big Data Hive m3 Hiveql Slides
33 pages
QR Code Generator & Scanner App2
No ratings yet
QR Code Generator & Scanner App2
13 pages
M4 Q&a
No ratings yet
M4 Q&a
22 pages
Hive Presentation
No ratings yet
Hive Presentation
18 pages
Oracle Partitioning in Oracle Database 11g
No ratings yet
Oracle Partitioning in Oracle Database 11g
47 pages
Performance Tuning - Partitioning
No ratings yet
Performance Tuning - Partitioning
11 pages
A Comprehensive Guide To Oracle Partitioning With Samples
No ratings yet
A Comprehensive Guide To Oracle Partitioning With Samples
36 pages
Oracle Partitioned Tables
No ratings yet
Oracle Partitioned Tables
38 pages
How To Partition PostgreSQL Database
No ratings yet
How To Partition PostgreSQL Database
8 pages
Lab6E - Creating Hive Partition Table
No ratings yet
Lab6E - Creating Hive Partition Table
11 pages
Hive Commands
No ratings yet
Hive Commands
15 pages
Project Report Crime Record System
No ratings yet
Project Report Crime Record System
119 pages
Dmac
No ratings yet
Dmac
10 pages
Infineon-LED7SEG User Module-Software Module Datasheets-V01 02-En
No ratings yet
Infineon-LED7SEG User Module-Software Module Datasheets-V01 02-En
13 pages
Datatypes in Hive
No ratings yet
Datatypes in Hive
31 pages
PPL Experiment No-8
No ratings yet
PPL Experiment No-8
7 pages
Partitioning in Oracle 1728042170
No ratings yet
Partitioning in Oracle 1728042170
12 pages
Complete MTE Syllabus II Year 2024-2025 15112024
No ratings yet
Complete MTE Syllabus II Year 2024-2025 15112024
2 pages
Oracle Partitions by Fayyaz Ahmed
No ratings yet
Oracle Partitions by Fayyaz Ahmed
7 pages
Keyboard Interface - Verilog
No ratings yet
Keyboard Interface - Verilog
10 pages
HIVE Architecture
No ratings yet
HIVE Architecture
5 pages
Hive Partitions and Buckets Exercises
No ratings yet
Hive Partitions and Buckets Exercises
8 pages
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
No ratings yet
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
10 pages
Cyberark Engineer IAM
No ratings yet
Cyberark Engineer IAM
4 pages
Lab 6 - Hive
No ratings yet
Lab 6 - Hive
4 pages
232056-Homework 5
No ratings yet
232056-Homework 5
4 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Database Partitioning A Review Paper
No ratings yet
Database Partitioning A Review Paper
4 pages
Partitioning For Database Performance
No ratings yet
Partitioning For Database Performance
3 pages
Exercise Underfitting and Overfitting
No ratings yet
Exercise Underfitting and Overfitting
2 pages
Product CI854A Classic
No ratings yet
Product CI854A Classic
3 pages
Dynamo DB Cheat Sheet: Partitions - 10% Rule
No ratings yet
Dynamo DB Cheat Sheet: Partitions - 10% Rule
3 pages
Informatica Service Getting Stopped Automatically
No ratings yet
Informatica Service Getting Stopped Automatically
2 pages
Table Partitioning: Creating Partition Tables
No ratings yet
Table Partitioning: Creating Partition Tables
8 pages
Teradata PPI
No ratings yet
Teradata PPI
14 pages
BIT 457 - Network Security
No ratings yet
BIT 457 - Network Security
2 pages
A Succinct Survey On NSGA2 - Organigramme NSGA2
No ratings yet
A Succinct Survey On NSGA2 - Organigramme NSGA2
4 pages
Partitioning in Oracle
No ratings yet
Partitioning in Oracle
5 pages
Oracle 11g Partitioning
No ratings yet
Oracle 11g Partitioning
11 pages
19hive Partitioning
No ratings yet
19hive Partitioning
2 pages
Partition Table
No ratings yet
Partition Table
5 pages
Cs s8 Theory of Computation
No ratings yet
Cs s8 Theory of Computation
2 pages
Oracle Partitioning
No ratings yet
Oracle Partitioning
6 pages
Hive-Static Partition
No ratings yet
Hive-Static Partition
1 page
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Best Ethical Hacking Course
No ratings yet
Best Ethical Hacking Course
2 pages
Trabajo Fina3
No ratings yet
Trabajo Fina3
19 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Partition Concepts

Uploaded by

Partition Concepts

Uploaded by

Apex Institute of Technology

Department of Computer Science & Engineering

INTRODUCTION TO BDA– (21CST-246)

Prepared By: Dr Md Nadeem Ahmed(E13733)

Why is Partitioning Important?

How to Create Partitions in Hive?

CREATE TABLE table_name (column1 data_type, column2 data_type)

PARTITIONED BY (partition1 data_type, partition2 data_type,….);

Hive Data Partitioning Example

The file name says file1 contains client data table:

PARTITIONED BY (year STRING);

Types of Hive Partitioning

SHOW PARTITIONS Table_name.

Add New Partition to the Hive Table: -

ALTER TABLE Table_Name ADD PARTITION (partitionColumn = 'value1') location 'loc1';

Example: - ALTER TABLE zipcodes ADD PARTITION (state='CA') LOCATION

Rename or Update Hive Partition:-

Example: - ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO

Drop Hive Partition

You might also like