0% found this document useful (0 votes)

67 views5 pages

Hive Partitions

Hive partitions and buckets are used to organize and distribute data in tables. Partitions divide tables into parts based on partition keys, allowing data to be stored and queried more efficiently. Buckets further divide partitioned data into multiple files or directories using a hashing algorithm on selected columns. The example shows creating a partitioned table on the state column of an e-commerce data table, resulting in 38 partitions, one for each Indian state. Buckets are then created to divide the partitioned data into 4 groups to improve querying performance.

Uploaded by

செல்வா

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views5 pages

Hive Partitions

Uploaded by

செல்வா

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Hive Partitions & Buckets with Example

Tables, Partitions, and Buckets are the parts of Hive data modeling.

What is Partitions?

Hive Partitions is a way to organizes tables into partitions by dividing tables into different parts based
on partition keys.

Partition is helpful when the table has one or more Partition keys. Partition keys are basic elements
for determining how the data is stored in the table.

For Example: -

"Client having Some E –commerce data which belongs to India operations in which each state (38
states) operations mentioned in as a whole. If we take state column as partition key and perform
partitions on that India data as a whole, we can able to get Number of partitions (38 partitions) which
is equal to number of states (38) present in India. Such that each state data can be viewed separately
in partitions tables.

Sample Code Snippet for partitions

1. Creation of Table all states

create table all states(state string, District string,Enrolments string)

row format delimited

fields terminated by ',';

2. Loading data into created table all states

Load data local inpath '/home/hduser/Desktop/AllStates.csv' into table allstates;

3. Creation of partition table

create table state_part(District string,Enrolments string) PARTITIONED BY(state

string);

4. For partition we have to set this property

set hive.exec.dynamic.partition.mode=nonstrict

5. Loading data into partition table

INSERT OVERWRITE TABLE state_part PARTITION(state)

SELECT district,enrolments,state from allstates;

6. Actual processing and formation of partition tables based on state as partition key
7. There are going to be 38 partition outputs in HDFS storage with the file name as state name.
We will check this in this step

The following screen shots will show u the execution of above mentioned code
From the above code, we do following things

1. Creation of table all states with 3 column names such as state, district, and enrollment
2. Loading data into table all states
3. Creation of partition table with state as partition key
4. In this step Setting partition mode as non-strict( This mode will activate dynamic partition
mode)
5. Loading data into partition tablestate_part
6. Actual processing and formation of partition tables based on state as partition key
7. There is going to 38 partition outputs in HDFS storage with the file name as state name. We
will check this in this step. In This step, we seeing the 38 partition outputs in HDFS

What is Buckets?
Buckets in hive is used in segregating of hive table-data into multiple files or directories. it is used for
efficient querying.

 The data i.e. present in that partitions can be divided further into Buckets
 The division is performed based on Hash of particular columns that we selected in the table.
 Buckets use some form of Hashing algorithm at back end to read each record and place it
into buckets
 In Hive, we have to enable buckets by using the set.hive.enforce.bucketing=true;

Step 1) Creating Bucket as shown below.

From the above screen shot

 We are creating sample_bucket with column names such as first_name, job_id, department,
salary and country
 We are creating 4 buckets overhere.
 Once the data get loaded it automatically, place the data into 4 buckets

Step 2) Loading Data into table sample bucket

Assuming that"Employees table" already created in Hive system. In this step, we will see the loading
of Data from employees table into table sample bucket.

Before we start moving employees data into buckets, make sure that it consist of column names such
as first_name, job_id, department, salary and country.

Here we are loading data into sample bucket from employees table.

Step 3)Displaying 4 buckets that created in Step 1

From the above screenshot, we can see that the data from the employees table is transferred into 4
buckets created in step 1.

Hive 1
No ratings yet
Hive 1
39 pages
Wa0006.
No ratings yet
Wa0006.
53 pages
Hive Main
No ratings yet
Hive Main
24 pages
Hive Query Language
No ratings yet
Hive Query Language
33 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Lab6E - Creating Hive Partition Table
No ratings yet
Lab6E - Creating Hive Partition Table
11 pages
M4 Q&a
No ratings yet
M4 Q&a
22 pages
Hive Commands
No ratings yet
Hive Commands
15 pages
Lab 6 - Hive
No ratings yet
Lab 6 - Hive
4 pages
Partition Concepts
No ratings yet
Partition Concepts
4 pages
Hive Partitions and Buckets Exercises
No ratings yet
Hive Partitions and Buckets Exercises
8 pages
HIVE Architecture
No ratings yet
HIVE Architecture
5 pages
040 Hiveql-Buckets-Q
No ratings yet
040 Hiveql-Buckets-Q
2 pages
Complete Hive Practical
No ratings yet
Complete Hive Practical
8 pages
Data Wrangling
No ratings yet
Data Wrangling
30 pages
HDFSandhivecommands
No ratings yet
HDFSandhivecommands
15 pages
GCP Tablepartiton
No ratings yet
GCP Tablepartiton
2 pages
19hive Partitioning
No ratings yet
19hive Partitioning
2 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Partitioning Bucketing and Join
No ratings yet
Partitioning Bucketing and Join
4 pages
Bucketing Hadoop Class Notes
No ratings yet
Bucketing Hadoop Class Notes
1 page
MySQL Question Bank
0% (2)
MySQL Question Bank
4 pages
SQ L Questions by Lips A
No ratings yet
SQ L Questions by Lips A
25 pages
Sai Hive Practicals Phase I
No ratings yet
Sai Hive Practicals Phase I
8 pages
To Create A Table in Hive
No ratings yet
To Create A Table in Hive
1 page
How To Configure Webutil With Oracle Forms 10g
100% (1)
How To Configure Webutil With Oracle Forms 10g
3 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
No ratings yet
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
11 pages
Single-Row Functions
No ratings yet
Single-Row Functions
3 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
SPPU Pattern2019 Fds Unit 6
No ratings yet
SPPU Pattern2019 Fds Unit 6
32 pages
Instagram Privacy Policy
No ratings yet
Instagram Privacy Policy
8 pages
Upgrading Oracle Application 11i To e
No ratings yet
Upgrading Oracle Application 11i To e
23 pages
4.big Data Technology Landscape
No ratings yet
4.big Data Technology Landscape
31 pages
Extrea Queries For Practice
No ratings yet
Extrea Queries For Practice
7 pages
Lec1 Special
No ratings yet
Lec1 Special
21 pages
IM Ch01 Database Approach IntEdition Solutions
No ratings yet
IM Ch01 Database Approach IntEdition Solutions
9 pages
Javascript
No ratings yet
Javascript
10 pages
Final BDM CAE 1 QB Ans
No ratings yet
Final BDM CAE 1 QB Ans
13 pages
Acronis Certified Engineer Backup 12.5 Training Presentation Module 5 en
No ratings yet
Acronis Certified Engineer Backup 12.5 Training Presentation Module 5 en
102 pages
Predictive - Analytics 2
No ratings yet
Predictive - Analytics 2
18 pages
Ibm Lto 9 Tape Drive Data Sheet
No ratings yet
Ibm Lto 9 Tape Drive Data Sheet
6 pages
TIB Ebx 6.1.3 Relnotes
No ratings yet
TIB Ebx 6.1.3 Relnotes
38 pages
PROVINCIAL
No ratings yet
PROVINCIAL
5 pages
WinFR Recover
No ratings yet
WinFR Recover
13 pages
1 Stored Procedures in PL/SQL: 1.1 Oracle Users
No ratings yet
1 Stored Procedures in PL/SQL: 1.1 Oracle Users
22 pages
Chapter 4 - Queue 2019
No ratings yet
Chapter 4 - Queue 2019
25 pages
w2 Reviewsem12223
No ratings yet
w2 Reviewsem12223
4 pages
Raja Reddy
No ratings yet
Raja Reddy
5 pages
Rebellabs SQL Cheat Sheeasdasdasd55555555
No ratings yet
Rebellabs SQL Cheat Sheeasdasdasd55555555
1 page
Iptvgo Toid
No ratings yet
Iptvgo Toid
12 pages
Linked List - Insertion and Deletion PDF
No ratings yet
Linked List - Insertion and Deletion PDF
8 pages
Lecture-17 PLSQL CONDITIONAL CONTROL
No ratings yet
Lecture-17 PLSQL CONDITIONAL CONTROL
9 pages
Cassandra - Data Model For Twitter - Part 3 - Treselle Systems
No ratings yet
Cassandra - Data Model For Twitter - Part 3 - Treselle Systems
6 pages
Gsmartcontrol Stderr
No ratings yet
Gsmartcontrol Stderr
1 page
Useful Logging Information For CM and DM
No ratings yet
Useful Logging Information For CM and DM
2 pages
Microsoft 365 Excel For Dummies
From Everand
Microsoft 365 Excel For Dummies
David H. Ringstrom
No ratings yet
Pivot Tables In Depth For Microsoft Excel 2016
From Everand
Pivot Tables In Depth For Microsoft Excel 2016
Suljan Qeska
3.5/5 (3)
Excel 2010 Just the Steps For Dummies
From Everand
Excel 2010 Just the Steps For Dummies
Diane Koers
No ratings yet
Jump into JMP Scripting, Second Edition
From Everand
Jump into JMP Scripting, Second Edition
Wendy Murphrey
No ratings yet
Excel Portable Genius
From Everand
Excel Portable Genius
Lisa A. Bucki
No ratings yet
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
From Everand
How to Write a Bulk Emails Application in Vb.Net and Mysql: Step by Step Fully Working Program
Lotfi Ferchichi
No ratings yet
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
Secrets of Access Database Development and Programming
From Everand
Secrets of Access Database Development and Programming
Andrei Besedin
5/5 (1)
Learn Excel in 24 Hours
From Everand
Learn Excel in 24 Hours
Alex Nordeen
4/5 (3)
SharePoint 2010 Issue Tracking System Design, Create, and Manage
From Everand
SharePoint 2010 Issue Tracking System Design, Create, and Manage
Sarath Thirumoorthi
3/5 (1)
Oracle Essbase 9 Implementation Guide
From Everand
Oracle Essbase 9 Implementation Guide
Joseph Sydney Gomez
No ratings yet
Microsoft 365 Access For Dummies
From Everand
Microsoft 365 Access For Dummies
Laurie A. Ulrich
No ratings yet
SQL Server Functions and tutorials 50 examples
From Everand
SQL Server Functions and tutorials 50 examples
Nino Paiotta
1/5 (1)
OpenCart Tips and Tricks
From Everand
OpenCart Tips and Tricks
iSenseLabs
No ratings yet
How To Develop A Performance Reporting Tool with MS Excel and MS SharePoint
From Everand
How To Develop A Performance Reporting Tool with MS Excel and MS SharePoint
S. Alyafei
No ratings yet
Visual Analytics with Tableau
From Everand
Visual Analytics with Tableau
Alexander Loth
No ratings yet
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
The Definitive Guide to Getting Started with OpenCart 2.x
From Everand
The Definitive Guide to Getting Started with OpenCart 2.x
iSenseLabs
No ratings yet
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Practice Questions for Tableau Desktop Specialist Certification Case Based
From Everand
Practice Questions for Tableau Desktop Specialist Certification Case Based
Exam OG
5/5 (1)
Upgrading your skills with Access
From Everand
Upgrading your skills with Access
Rémy Lentzner
No ratings yet
Microsoft Access for Beginners and Intermediates
From Everand
Microsoft Access for Beginners and Intermediates
Fredrick Ezeh
No ratings yet
Salesforce Developer Interview Questions: 1.0, #1
From Everand
Salesforce Developer Interview Questions: 1.0, #1
SFDC TELUGU
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Intermediate Access: Access Essentials, #2
From Everand
Intermediate Access: Access Essentials, #2
M.L. Humphrey
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
Excel Subtotals Straight to the Point
From Everand
Excel Subtotals Straight to the Point
Bill Jelen
No ratings yet
Learning Excel Made Easier
From Everand
Learning Excel Made Easier
Dorothy Mohl
No ratings yet
EnterpriseOne Interview Questions
From Everand
EnterpriseOne Interview Questions
equitypress
No ratings yet
Learning Open Office: Calc & Base
From Everand
Learning Open Office: Calc & Base
Durgesh
No ratings yet
15 Most Powerful Features Of Pivot Tables: Save Your Time With MS Excel
From Everand
15 Most Powerful Features Of Pivot Tables: Save Your Time With MS Excel
Andrei Besedin
No ratings yet
Base SAS Interview Questions You'll Most Likely Be Asked
From Everand
Base SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Learn SQLite in 24 Hours
From Everand
Learn SQLite in 24 Hours
Alex Nordeen
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Hive Partitions

Uploaded by

Hive Partitions

Uploaded by

Hive Partitions & Buckets with Example

Sample Code Snippet for partitions

1. Creation of Table all states

create table all states(state string, District string,Enrolments string)

row format delimited

fields terminated by ',';

2. Loading data into created table all states

Load data local inpath '/home/hduser/Desktop/AllStates.csv' into table allstates;

3. Creation of partition table

create table state_part(District string,Enrolments string) PARTITIONED BY(state

4. For partition we have to set this property

5. Loading data into partition table

INSERT OVERWRITE TABLE state_part PARTITION(state)

Step 1) Creating Bucket as shown below.

From the above screen shot

Step 2) Loading Data into table sample bucket

Step 3)Displaying 4 buckets that created in Step 1

You might also like