0% found this document useful (0 votes)

17 views23 pages

Data Partition Survey

Uploaded by

Ding Rui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views23 pages

Data Partition Survey

Uploaded by

Ding Rui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Liu PJ, Li CP, Chen H. Enhancing storage efficiency and performance: A survey of data partitioning techniques.

JOUR-
NAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(2): 346−368 Mar. 2024. DOI: 10.1007/s11390-024-3538-1

Enhancing Storage Efficiency and Performance: A Survey of Data

Partitioning Techniques

Peng-Ju Liu (刘鹏举), Cui-Ping Li* (李翠平), Distinguished Member, CCF

and Hong Chen (陈红), Distinguished Member, CCF

School of Information, Renmin University of China, Beijing 100872, China

Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education, Beijing 100872, China

E-mail: [email protected]; [email protected]; [email protected]

Received June 21, 2023; accepted February 29, 2024.

Abstract Data partitioning techniques are pivotal for optimal data placement across storage devices, thereby enhanc-
ing resource utilization and overall system throughput. However, the design of effective partition schemes faces multiple
challenges, including considerations of the cluster environment, storage device characteristics, optimization objectives, and
the balance between partition quality and computational efficiency. Furthermore, dynamic environments necessitate ro-
bust partition detection mechanisms. This paper presents a comprehensive survey structured around partition deployment
environments, outlining the distinguishing features and applicability of various partitioning strategies while delving into
how these challenges are addressed. We discuss partitioning features pertaining to database schema, table data, workload,
and runtime metrics. We then delve into the partition generation process, segmenting it into initialization and optimiza-
tion stages. A comparative analysis of partition generation and update algorithms is provided, emphasizing their suitabili-
ty for different scenarios and optimization objectives. Additionally, we illustrate the applications of partitioning in preva-
lent database products and suggest potential future research directions and solutions. This survey aims to foster the imple-
mentation, deployment, and updating of high-quality partitions for specific system scenarios.
Keywords data partitioning, survey, partitioning feature, partition generation, partition update

1 Introduction across various disks to facilitate better disk collabora-

tion and accelerate read/write operations. In dis-
In the era of big data, effectively processing mas- tributed databases, partitioning effectively mitigates
sive data has emerged as a critical issue. Database machine node imbalances caused by overloading data
partitioning, a fundamental yet challenging task, sim- and queries. Moreover, distributing large-scale
plifies data manipulations by breaking down large datasets to multiple nodes, inclusive of replicas, can
datasets into smaller, easy-to-manage partitions based boost system availability and scalability. In parallel
on specified criteria and storing them separately databases, it enables multiple processing units or
across multiple data blocks. A well-designed partition cores to work on different parts of the data simultane-
scheme significantly impacts system performance, re- ously. In NoSQL databases, driven by new data types
source utilization, and manageability, making it an in- and data storage/retrieval mechanisms, partitioning is
dispensable strategy for database administrators crafted to better manage large volumes of unstruc-
(DBAs). Partitioning is often optimized for specific tured or semi-structured data.
purposes in various database management systems From the perspective of physical characteristics,
(DBMSs). In multi-disk databases, it distributes data partitioning can be broadly classified into three types:

Survey
The work was supported by the National Key Research and Development Program of China under Grant No. 2023YFB4503603,
the National Natural Science Foundation of China under Grant Nos. 62072460, 62076245, and 62172424, and the Beijing Natural
Science Foundation under Grant No. 4212022.
*Corresponding Author

©Institute of Computing Technology, Chinese Academy of Sciences 2024

Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 347

horizontal partitioning (HP), vertical partitioning structure designed for quickly locating and retrieving
(VP), and irregular partitioning (IP), as detailed in tuples, such as 1-dimensional indexes (B-tree[2]), and
Table 1. HP operates on a row-wise basis, keeping n-dimensional indexes (KD-tree[3], R-tree[4]). However,
complete tuples within each partition, whereas VP its performance tends to degrade when handling high-
functions column-wise, allowing incomplete yet consis- dimensional data or certain types of queries. In con-
tent column data. IP, on the other hand, focuses on trast, partitioning performs well in these scenarios.
the data itself, without imposing strict restrictions on Partition vs Materialized View. Materialized view
how it is partitioned. Thus, in terms of partition techniques[5, 6] adopt a space-for-time strategy, creat-
shape, both HP and VP divide the table space into ing views separating queried data copies from raw da-
rectangular areas, whereas IP allows partitions of ar- ta and routing relevant queries to the most suitable
bitrary shapes, including rectangles. IP designs parti- view for faster execution. However, copying the com-
tion shapes tailored to query access patterns to plete query results requires additional storage space.
achieve optimal query efficiency, ideal for online ana- We present a detailed partitioning workflow and
lytical processing (OLAP) and hybrid transactional/ review a wide spectrum of existing partitioning stud-
analytical processing (HTAP) applications. HP and ies. Some studies[7, 8] share a similar topic to ours;
VP also take into account partial record integrity to however, their focus lies on data-driven horizontal
facilitate online transaction processing (OLTP), there- partitioning for specific environments (e.g., Hadoop
by making them suitable for any load scenario. cluster②). Our survey, in contrast, considers a broad-
Data partitioning can be designed based on the er range of generalized scenarios. We explore various
database schema, data and load distribution, or a partition types and place greater emphasis on parti-
combination of these features. Schema-driven ap- tioning requirements, design details, and the imple-
proaches examine the join relationships among tables mentation process. We further delve into the feature
to centrally allocate tuples involved in join operations. extraction and cost model design before partitioning,
Data-driven approaches commonly employ domain along with addressing the data and load update is-
and hash values of column values to create partitions. sues after partitioning.
Query-driven approaches concentrate on mining nest- This paper is organized as follows: Section 2 pro-
ed filtering rules from queries to ensure each tuple is vides an overview of data partitioning, including its
assigned to the most appropriate partition. four-stage workflow and core modules. Sections 3–5
Other physical designs also significantly impact explore the development trajectory of partitioning, in-
query latency, disk space usage, and more. To eluci- corporating classical approaches to horizontal, verti-
date the role of partitioning, we next briefly describe cal, and irregular partitioning, respectively. Section 6
how it differs from other design strategies. summarizes the support for partitioning in industry-
Partition vs Storage Structure. Partitioning speci- leading database products. Section 7 gives open prob-
fies which data should be stored in the same block lems in this field and potential solutions. Finally, we
file, while storage structure solves how the data is or- conclude the survey in Section 8.
ganized within a block. For example, Parquet[1], a
widely adopted column-store file format in HDFS 2 Data Partitioning Overview
(Hadoop Distributed File System)①, provides effi-
cient data compression and encoding schemes to en- The partitioning workflow typically comprises four
hance the performance of read-intensive queries. stages, as depicted in Fig.1. Stage 1, feature extrac-
Partition vs Index. Index is an auxiliary data tion, addresses the issue of what to use for partition-
Table 1. Comparison of Three Common Partition Types
Type Partition Strategy Partition Shape Scenario
OLTP OLAP HTAP
HP Row-wise Rectangular ✔ ✔ ✔
VP Column-wise Rectangular ✔ ✔ ✔
IP Data-wise Arbitrary ✘ ✔ ✔

①https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html, Mar. 2024.

②https://hadoop.apache.org/docs/stable/index.html, Mar. 2024.
348 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

Fig.1. Data partitioning workflow. (a) Feature extraction. (b) Partition generation. (c) Partition deployment. (d) Partition update.

ing. This stage entails analyzing the database (DB) tition boundaries for M2 ( [21, 23] ⇒ [21, 22] ) and M3
schema, parsing representative queries, conducting ( [24, 25] ⇒ [23, 25] ) are promptly adjusted, and a da-
column data statistics, and selecting system optimiza- ta migration plan is devised to move tuple T7 from
tion metrics. Stage 2, partition generation, includes M2 to M3.
two subtasks: partition initialization, which quickly Fig.2 displays a framework comprising five key
establishes initial partitions using a low-complexity modules used in the partitioning workflow. This sur-
algorithm, and partition optimization, where the ini- vey concentrates on the modules highlighted in green.
tial solution is iteratively refined based on predefined 1) Deployment Scenario. Partitioning optimiza-
cost models. Stage 3, partition deployment, involves tion objectives, such as performance, manageability,
routing data to partition files via automated write and device costs, are greatly affected by system envi-
transactions based on created partition structures. ronments, user requirements, and the storage devices
Stage 4, automatic partition update, timely adjusts used. For instance, in a distributed database, parti-
partitions to sustain stable system performance amid tioning tasks are more complex, necessitating the con-
data, load, and hardware resource uncertainties, siderations of factors like multi-node clusters, node
which includes deciding update timings and formulat- replicas, and network latency, to ensure uniform par-
ing detailed update plans accordingly. tition access and reduce cross-node operations. Ta-
Consider a teaching system comprising three tables 2 and 3 offer categorizations and symbolic repre-
bles: student (S), course (C), student course (SC). Be- sentations of common optimization objectives and
fore partitioning a table (e.g., S), we first analyze its database environments, respectively.
entity-relationship (E-R) graph and common column 2) Partition Type. Before designing partitions, it
data distributions, gathering query information and is necessary to choose the partition type to use based
system metrics as necessary. Assuming the age col- on the given scenario, as shown in Table 1.
umn has been selected as the partition key, initial 3) Cost Model. After identifying the deployment
partitioning rules are derived from its value domain, scenario and deciding the partition type, a cost mod-
and skew partitions are further split according to the el is created to assess the given partition scheme and
column histogram statistics. With the partitions, its associated update plan. There are three types of
eight given tuples (T0 , . . . , T7) are distributed across cost estimations: optimizer-based models, simplifying
three machines (M1, M2, M3). Subsequently, a ser- cost design at the expense of accuracy; network-based
vice is established to continuously monitor the envi- learning models, offering high precision but requiring
ronment. When detecting an overload on M2, the par- sufficient metric samples and extensive training over-
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 349

Deployment Scenario Partition Generation Module

Environment Feature Storage Device Two-Phase Method

Database Initialization Phase Empirical-

Centralized DB HDD/SSD Mathematical
Schema/Metrics Based
Optimization Phase Programing
Distributed DB Data/Load RAM (MP)-Based
Greedy-Based
Strategy
Optimization Objective
Data/Schema-Driven Machine Learning
Deep Learning
Performance Manageability Device Cost (ML)-Based
Query-Driven (DL)-Based

Determine Call
Partition Type Cost Model
Determine
Monitor Horizontal Vertical Irregular Optimizer- Function- Network-
Partitioning Partitioning Partitioning Based Based Based
Call
Partition Update Module
Monitoring Service Data Migration Plan
Query Window Threshold Control Random Rule Heuristic
RL-Based MP-Based
(QW)-Based (TH)-Based Theory (CT) (RM)-Based (RE)-Based (HC)-Based

Fig.2. Overview of a modular framework for data partitioning technologies.

Table 2. Classification of Optimization Objectives Table 4. Classification of Partitioning Generation Methods

Symbol Description Symbol Description
O1 Data balancing Greedy-based Making partitioning decisions at each step
O2 Load balancing based on predefined heuristic rules
O3 Query cost estimation Empirical-based Developing schemes manually based on
O4 Query latency or system throughput observation and experience
O5 Data transfer overhead ML-based Using traditional machine learning models
O6 Number of distributed transactions DL-based Using deep learning algorithms
O7 Coordination cost between machines MP-based Constructing mathematical programming
O8 Layout generation time equations with objectives and constraints
O9 Memory footprint
O10 Cache utilization features two core components. a) The monitoring ser-
O11 Device cost vice uses five optional mechanisms (refer to Table 5)
O12 Storage space to determine repartition times. The term “window” in
M-QW denotes a container for monitoring queries,
Table 3. Classification of System Environments
cleared in time after each repartitioning. M-TH sets
Symbol Description
threshold conditions as feature boundaries like fixed
E-CH/S Centralized database (HDD or SSD)
E-DH/S Distributed database (HDD or SSD)
time intervals and minimum query latency. b) Once
E-CM Centralized database (RAM) repartitioning is triggered, a data migration plan is
E-DM Distributed database (RAM) strictly made using a specific strategy (refer to Table
6), with considerations for data transfer overhead and
head; and function-based models, being more bal- potential benefits of new partitions. These plans are
anced due to their flexible and comprehensive design.
Table 5. Classification of Monitoring Methods
4) Partition Generation Module. The partition
Symbol Description
generation process, encompassing both initialization
M-QW Monitoring whether a window is filled as new
and optimization phases, is shared across most tech- queries arrive
niques and is guided by specifically designed cost M-TH Monitoring whether threshold conditions are met
models. Here, we categorize existing methods accord- M-CT A feedback mechanism to determine the
repartition timing
ing to their algorithm types, as detailed in Table 4.
M-RL Training an agent to automatically take
5) Partition Update Module. This module timely repartition action based on the environmental
updates inefficient partitions, especially those that are feedback
query-driven and become fragile under new loads. It M-SD Service on demand
350 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

Table 6. Classification of Data Migration Plans ing Q over P is denoted by C(ϕ, Q) .

Symbol Description ( )
ϕ∗ = arg min C P ⇐ ϕ(D), Q .
D-RM One partition is randomly selected from those that ϕ
need adjustment for partial reorganization
D-RE Partitions are swapped using predefined rules,
Definition 2 (Dynamic Horizontal Partitioning).
operators, structures, and algorithms For a database running over future n intervals
D-HC Using heuristic information such as evaluation T = (t1 , t2 , . . . , tn ) , where the corresponding submit-
functions and metrics to guide data migration ted queries are Q = (Q1 , Q2 , . . . , Qn ) , and the initial
D-MP Converting the data migration into mathematical
optimization problems, e.g., dynamic programing
partition scheme is P0 , our goal is to design an opti-
(DP), integer linear programming (ILP) mal controller G ∗ . This controller analyzes current
partitions and queries at each interval ti to decide
whether to deploy new partitions ( Pi ) or maintain ex-
executed in the background, uniformly controlled by
isting ones (Pi = Pi−1 ) . The aim is to minimize the
the master node, to ensure minimal impact on nor-
sum of I/O costs for achieving optimization objec-
mal transaction execution.
tives and data migration costs during the entire run-
ning process.
3 Horizontal Partitioning
n ( )
∑
In Subsection 3.1, we define the static and dynam- min Ci(1) + Ci(2) ,
 i=1
ic horizontal partitioning (HP) problems, followed by  Pi = G (Qi−1 , Pi−1 ) ,

 (1)
an introduction to the extracted features in HP (Sub- Ci = ∑ Cr (Pi−1 , Pi ) ,
s.t.
section 3.2). We then describe and summarize the re- 
 C =(2)
C (Pi , q) ,
 i
search development of HP methods (Subsections 3.3 q∈Qi

and 3.4), and the design of the HP-wise cost models

where Cr (Pi , Pj ) calculates the minimum data migra-
(Subsection 3.5), based on different system environ-
tion cost required to reorganize the partition files
ments. Fig.3 depicts the timeline of HP methods. when the partition scheme changes from Pi to Pj .

3.1 Formalization 3.2 Feature Extraction

Definition 1 (Static Horizontal Partitioning). Stat- Database Schema. Depending on the given
ic horizontal partitioning aims to find a classifier ϕ(·) database schema, we can 1) classify tables into
for a table with m tuples D = (e1 , e2 , . . . , em ) and n large/small ones based on the number of tuples, and
collected queries Q = (q1 , q2 , . . . , qn ) . When a new static/dynamic ones based on data changes; 2) ana-
tuple arrives, the classifier ϕ assigns it to the speci- lyze the characteristics of numerical columns, includ-
fied partition P in time, i.e., P = ϕ(e), ∀e ∈ D. The ing data type, constraints, indexes, and triggers, etc.;
classifier partitions all tuples into k distinct parti- 3) learn the foreign key relationships and constraints
tions, represented as P = (P1 , P2 , . . . , Pk ) , to achieve between tables to help construct co-partitions[19, 28, 32, 33].
optimal system objectives such as low query latency Table Data. When analyzing numerical column
and high system throughput. The total cost of process- data, distribution types (e.g., uniform, skewed/hot

ML-Based DL-Based Greedy -Based MP-Based Empirical -Based

Round -Robin
Range Kangaroo Ameoba
Hash KD-Tree SOP AQWA AdaptDB QdTree MTO PAW

Earlier 2002 2006 2008 2010 2011 2012 2013 2014 2015 2016 2017 2018 2020 2021 2022
Hash, Range Rao Agrawal06 REF DYFRAM MESA Horti - DynPart E-Store SOAP Clay NashDB BaW SAHARA
Round -Robin culture
PREF
Schism SWORD GPT Advisor
Cumulus

Fig.3. Timeline of HP research development, including general empirical-based approaches (round-robin, range, and hash), as well
as on-axis studies (KD-tree[3], SOP[9], AQWA[10], Kangaroo[11], Ameoba[12], AdaptDB[13], QdTree[14], MTO[15], and PAW[16]) focused
on centralized environments and off-axis studies (Rao[17], Agrawal06[18], REF[19], DYFRAM[20], Schism[21], MESA[22], Horticulture[23],
DynPart[24], SWORD[25], E-Store[26], SOAP[27], PREF[28], Cumulus[29], Clay[30], NashDB[31], GPT[32], BaW[33], Advisor[34], and SA-
HARA[35]) on distributed environments.
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 351

spot[20, 23, 24, 26, 35], discrete) and domain statistical act as a classifier, partitioning new data and guiding
metrics (e.g., median[12, 13, 16], maximum, and mini- incoming queries to skip unnecessary blocks. Kanga-
mum values[15, 16]) are considered. These can also be roo[11] utilizes grid and tree structures for partitioning.
depicted using histogram technologies[15, 20]. In a 2D table space, the grids are represented by two
Workload. In HP, important query logical fea- bit strings, with positions marked as 1 acting as the
tures (e.g., filter conditions, join keys, operator cost partition boundaries. Kangaroo then applies a genet-
estimates, and SQL keywords) and physical features ic algorithm (GA) for partition initialization and
(e.g., read-to-write ratios, occurrence frequencies, sub- merging, deriving the optimal partition scheme. Its
mission/completion times, and inserted/updated tree-based approach replaces the grid with a tree rep-
rows) can be extracted from query plans. Some stud- resentation within the GA process.
ies count partition or tuple access frequencies[21, 25, 30] Greedy-Based. To solve SOP limitations, such as
to identify hot and cold data[26, 35] by tracking query- the exponential growth in execution time with more
tuple accesses. Furthermore, load can be classified as predicates, Yang et al.[14] proposed a greedy-built
either heavy or light based on the average query ar- query data routing tree (QdTree). QdTree is a bina-
rival rate. ry tree created by selecting the predicate with the
Database Runtime Metric. OS-level metrics relat- maximum split benefit as the split condition at each
ed to HP are chosen to monitor the database state, tree expansion step until no further splits are possible.
including resource usage (e.g., memory, CPU, disk), Each leaf node maintains metadata for routing, with
performance (e.g., query latency, throughput), and the path from root to leaf serving as the search pro-
machine hotspots. cess for assigning tuples to partitions. Ding et al.[15]
extended QdTree to multi-table datasets with a mul-
3.3 Partitioning Process in Centralized ti-table optimizer (MTO), leveraging sideways infor-
Databases mation passing through joins. MTO periodically com-
putes a reward value to decide the best repartition
In this subsection, we discuss studies designed for timing and then uses dynamic programming (DP) to
centralized systems or those that neglect factors such find the optimal reorganization set of non-overlap-
as multi-nodes, replicas, and network costs. ping subtrees. Li et al.[16] proposed PAW (Partition-
Empirical-Based. Range partitioning typically ing Aware of Workload Variance), focusing on creat-
splits data based on a pre-defined range of values de- ing partitions adaptable to future load variances by
rived from partition keys. This method is suitable for scaling historical queries and employing multi-step
data with prior statistics but requires careful selec- splits to replace multiple one-step predicate splits in
tion of partition boundaries, which is difficult for QdTree when splitting smaller nodes.
large-scale datasets. Hash partitioning maps tuples to However, in a new environment where query logs
specific partitions using a hash function, ideal for un- are unavailable, query-driven physical design tech-
ordered data. Round-robin partitioning is a special niques will become ineffective, leading to the
type of hash partitioning that assigns data to avail- database's cold start issue. Moreover, collecting repre-
sentative queries is sometimes difficult; for instance, a
able N machine nodes in a circular fashion, i.e., as-
study[13] on IoT startups revealed that, even after an-
signing the i -th data row to the ( i mod N )-th node,
alyzing the first 80% of historical queries, the remain-
to ensure equi-sized balanced partitions. These tradi-
ing 20% still contained 57% new queries previously
tional methods are data-driven and do not require
unseen. To tackle this issue, Aly et al.[10] developed an
prior load knowledge.
adaptive query-workload-aware partitioning (AQWA).
ML-Based. SOP[9] (Skipping-Oriented Partition- AQWA utilizes the KD-tree[3] structure for creating
ing) adopts the Apriori algorithm[36] to extract m rep- initial partitions with equal spatial points distribu-
resentative filter predicates from load, and converts tion. It dynamically maintains update plans for all
each tuple into an m -bit one-hot feature vector with visited nodes, considering split gain and data migra-
each bit indicating tuple-predicate satisfaction. These tion costs. To support KNN queries, AQWA uses
vectors are clustered into different blocks via the MinDist and MaxDist indicators[38] along with the vir-
Ward algorithm[37], with each block generating a tual grid technology to compute query boundaries.
union vector (as known as partition map) by perform- Amoeba[12] initializes a heterogeneous binary tree,
ing bitwise OR operations on its vectors. These maps similar to KD-tree, and dynamically modifies it for in-
352 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

coming queries using three node update operations: state drives has greater potential in boosting system
swap, pushup, and rotate. AdaptDB[13] adapts Amoe- throughput, due to their slower read/write speeds
ba for join operations by splitting each Amoeba tree than memory drives. Early partitioning studies[39–41]
based on joined columns. It employs a greedy search in E-DH/S environments relate to physical design
strategy to co-partition joined blocks, yielding a supe- tools offering layout suggestions for data and load
rior hyper-join operation over shuffle-join. AdaptDB balancing. However, they do not design a cost func-
manages repartitioning via a fixed-length query win- tion for accurate evaluation of alternative solutions.
dow, refreshing the tree for new queries, and reallo- Rao et al.[17] combined a rank-based method with cost
cating old nodes. estimations derived from query optimizer statistics to
Table 7 summarizes the horizontal partitioning quickly recommend partition keys. Agrawal et al.[18]
techniques discussed above for centralized environ- refined this by treating workload as a sequence with
ments. The “Cost” column indicates whether a cost temporal features, eliminating redundant and ineffi-
model is used or not. The “Deployment” column indi- cient designs. Other similar studies[42, 43] utilize opti-
cates whether the partitions have been deployed in a mizer and load information, adopting greedy and
real database environment. The “Method Content” heuristic-based strategies for effective partitioning.
column uses various symbols to represent different However, while these strategies mentioned above
partitioning stages: partition initialization (∇), parti- excel in large-scale data scans, they easily incur dis-
tion optimization (〇), and partition update ( ⟳ ). The tributed (i.e., cross-node) calls during small transac-
representations are applied to all subsequent tables. tions touching only a few tuples.
ML-Based. Schism[21] addresses this issue by mini-
3.4 Partitioning Process in Distributed mizing distributed transactions. Fig.4 illustrates its
Databases partitioning process. 1) Data preparation, inputting
table data and transaction information (omitted). 2)
Data-driven approaches are universally applicable Partitioning. A hypergraph is created, with nodes rep-
to various database environments and can always resenting tuples or tuple replicas. Replication edges
achieve data balancing. However, the performance of connect a tuple to its replicas, while transaction edges
query-driven approaches, tailored for E-CH/S envi- connect all tuples accessed by the same transaction. A
ronments, might be limited by new factors in E-DH/S Metis partitioner[44] then splits the hypergraph into
environments. Thus, in this subsection, we introduce multiple balanced partitions with minimal cross-parti-
the studies specifically designed for distributed envi- tion transactions. In the illustrated example with five
ronments. tuples, we get partitions 0 and 1 after graph splitting.
3) Explanation and validation. Decision trees are con-
3.4.1 Disk Storage Environment structed based on tuple features within each parti-
tion to find predicate-based explanations for adapt-
Optimizing data placement on hard and solid- ing new data. In Fig.4, the decision tree is construct-
Table 7. Major Horizontal Partitioning Strategies for Centralized Environments
Category Work Baseline Objective Automatic Cost Deployment Method Content
Empirical Range, hash, N/A O1, O2 ✘ ✘ ✔ Partitioning by columns or
round-robin data insertion order∇
ML SOP[9] SimpleRange O3 ✘ ✔ ✔ Frequent itemset+Ward clustering∇
ML Kangaroo[11] Random O4, O8 ✘ ✔ ✔ GA-based grid/tree generation∇; partition
schemes initialization using DPΟ
Greedy AQWA[10] Uniform grids O4, O8 M-TH+ ✔ ✔ Spatial data-based recursive KD-tree∇; greedy
D-RE tree node split selection ⟳
Greedy Ameoba[12], FullScan, O3, O5 M-QW+ ✔ ✔ Heterogeneous tree∇; heuristic-groupΟ;
AdaptDB[13] SOP[9] D-RE/RM predicate-based tree update ⟳
Greedy QdTree[14] SOP[9] O3 ✘ ✘ ✔ Greedy-based binary predicate tree
Greedy MTO [15]
QdTree[14] O3, O8 M-TH+ ✘ ✔ QdTree∇; join-induced predicatesΟ; tree update
D-MP using DP ⟳
Greedy PAW[16] QdTree[14] O3 ✘ ✘ ✔ Query deviation prediction+ multi-group split∇;
data replicationΟ
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 353

Tuple
   … proposing heuristic rules for efficiently distributing in-
ID
1 1
coming data. Unlike SWORD's approach of isolating
… 2 2
2 2 … 3 updated data during repartitioning, SOAP[27] inte-
3 3 … 2 1
4 4 … 1 grates repartition operations into normal transactions
0
5 5 it ion
…
Part for smooth partition management. SOAP employs a
(a) 1 cost-based method to prioritize repartition transac-
5 Partit
All Tuples io n1 tions and utilizes a feedback model for scheduling
5 4
 5
5
their executions. NashDB[31] supports user-defined
  - <  - query prioritization and efficient resource use, com-
Transaction Edges
Partitions Partition Partition
Replication Edges
bining economic models, dynamic programming, and
0, 1 0 1
the Munkres algorithm[45] to optimize node usage and
(c) (b)
minimize data migration costs.
Fig.4. Graph partitioning process introduced in [21]. (a) Input:
table data. (b) Hypergraph creation and partitioning. (c) Deci- Table 8 summarizes common horizontal partition-
sion tree construction. ing techniques for distributed disk storage environ-
ments.
ed with the a1 column serving as the decision point,
using criteria such as a1 = 1 , 2 ⩽ a1 < 4 , and a1 ⩾ 4 3.4.2 Distributed Partition Key Recommendation
for decision branches. The leaf nodes indicate that tu- in Disk Storage Environments
ples meeting the specified criteria are allocated to
their respective partitions. Nehme et al.[22] developed Non-co-located joins cause excessive data transfer
the MEMO-based search algorithm (MESA) for long- overhead among machine nodes, adversely affecting
running analytical transactions touching large-scale join performance. Co-partitioning tables using shared
tuples, while Schism adapts to small short-lived trans- join keys can significantly reduce data shuffling. We
actions. The MEMO structure is a search space for term this problem as Distributed Partition Key Rec-
parallel query optimization. MESA generates MEM- ommendation (DKR). For example, in Spark SQL,
Os for each query and then faster simulates and ex- data can be organized into multiple buckets accord-
plores tree-style partition candidate configurations using to the hash or range values of selected partition
ing a branch and bound strategy. keys. Costa et al.[46] verified that creating a consis-
To adapt Schism to load changes, SWORD[25] tent number of buckets for join keys across two large
compresses the hypergraph into virtual nodes, periodi- tables can significantly boost join performance over
cally monitors load variations, and sets a threshold traditional sort-merge joins.
for distributed transaction ratios to determine reparti- Empirical-Based. When facing joins with refer-
tion timings, employing virtual node swaps for incre- ence constraints, the query executor requires copying
mental graph updates to minimize data movement. partition keys and strategies from parent to child ta-
Cumulus[29] filters out infrequent transactions and bles, and subsequently repeats partition merging,
predicts future transaction frequency with an expo- splitting, or key updates across all parent-child tables.
nential moving average. It dynamically re-partitions Eadon et al.[19] proposed reference partitioning (REF)
data in a user-driven live migration to avoid poten- that enables partition maintenance operations per-
tial hotspots, balancing the increase in repartitioning formed on parent tables to be extended to child ta-
overhead with the decrease in distributed transaction bles, ensuring the migration of child tuples is handled
costs. as a single atomic operation when the partition key in
Greedy-Based. DYFRAM[20] addresses the cold the parent table is modified.
start problem by initially creating simple range parti- Greedy-Based. PREF[28] (Predicate-Based Refer-
tions for equi-width data distribution histograms, ence Partitioning) improves REF by supporting co-
then periodically evaluates whether to replicate parti- partitioning of tables for any join predicate, not just
tions based on partition size limitations and cross-par- foreign keys, through tuple duplication. A join graph
tition overheads. DynPart[24], designed for continuous- is defined with each node denoting a table and each
ly growing database (e.g., observation and log data). edge indicating joins over two tables. PREF assigns
As data volume increases, DynPart models the affini- weights to each edge as the connected smaller table
ty between data and partition based on given queries, size, and extracts candidate key configurations from
354 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

Table 8. Major Horizontal Partitioning Strategies for Distributed Disk Storage Environments
Category Work Baseline Objective Automatic Cost Deployment Method Content
ML Schism[21] Manual partitions O1, O2, O6 ✘ ✘ ✔ Metis∇, decision treeΟ
ML MESA[22] Rao et al.[17], O3, O8 ✘ ✘ ✘ Memo-based search∇;
Schism[21] pruning branch and
bound treeΟ
ML SWORD[25] Schism[21], simple O1, O2, O4, O7 M-TH+D-HC ✔ ✔ Graph
hash partitions compression/partition∇;
node
swapping/replication ⟳
ML Cumulus[29] Schism[21] O2, O7 M-SD+D-RE ✔ ✔ Multi-objective cost
model;
on-demand repartition ⟳
Greedy DYFRAM[20] Optimal solution O3, O7 M-TH+D-HC ✔ ✘ Histogram+rule-based
replication/partitioning∇
Greedy DynPart[24] Schism[21] O3, O7 M-TH+D-HC ✔ ✘ Single partition∇;
affinity-based heuristic
strategyΟ
Greedy SOAP[27] SWORD[25] O2, O4 M-CT ✔ ✔ PID-controller ⟳
Greedy NashDB[31] SWORD[25], O2, O4, O5 ✘ ✔ ✔ Economic model∇;
optimal solution greedy Munkres
algorithm ⟳

each query, greedily merging them to minimize the key. 2) Action, comprising a candidate set that in-
graph weights. GPT[32] reduces data redundancy in cludes actions to replicate or (de-)activate edges be-
PREF. It first selects vertices and edges to be added tween partition keys. 3) Reward function, which uti-
from the join graph by considering both the storage lizes the cost model to calculate the performance
overhead and shuffle-free query benefits, and then gains of each action as the reward, disregarding data
adopts a multi-column partitioning to hash partition migration overheads.
key values for each edge. BAW[33] (Best of All Table 9 summarizes the partition key recommen-
Worlds) is an assumption-free framework that uses dation techniques for distributed disk environments.
exact integer linear programming and heuristic vari-
ants to transform the DKR problem into a graph 3.4.3 Main Memory Storage Environment
matching problem, unlike prior studies[19, 28] that rely
on many assumptions not generally applicable. In modern OLTP systems with small, repetitive,
RL-Based. Hilprecht et al.[34] introduced a parti- and short-lived transactions, applications can keep
tion advisor using Q-learning[47] to automatically as- their entire dataset in memory through widely shared
sess and recommend partition keys under varying server clusters, making it more feasible to develop
loads. The advisor refines a network-centric cost mod- new storage system prototypes than to add indexes to
el with actual runtimes and designs a training envi- traditional disk-oriented DBMSs. H-Store[48] is such a
ronment consisting of three parts: 1) State, which is a main memory database that supports user-defined
one-hot encoding of table attributes indicating layout designs. The studies[23, 26, 30, 35] discussed below
whether an attribute at each position is a partition are all designed on H-Store, where network latency
Table 9. Major Distributed Partition Key Recommendation Strategies for Optimizing Join Operations
Category Work Baseline Objective Automatic Cost Deployment Method Content
Empirical REF[19] N/A O4, O12 ✘ ✘ ✔ Reference partitioning∇
Greedy PREF[28] REF[19] O4, O12 ✘ ✔ ✔ Schema/query driven design∇
Greedy GPT[32] PREF[28] O4, O12 ✘ ✔ ✔ Join graph+hash-based
multi-column partitioning∇
Greedy BAW[33] Greedy O4, O7 ✘ ✘ ✔ Integer linear programming∇;
matching graph matching∇
DL Hilprecht et al.[34] PREF[28] O4 M-RL ✔ ✔ Network-centric cost model+
Q-learning algorithm ⟳
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 355

and resource utilization have become critical factors. Reducing hardware expenses alongside improving
Horticulture[23] estimates the coordination and performance is also an important research topic. SA-
skew costs between machine nodes to achieve load HARA[35] minimizes resource overhead while satisfy-
balancing and reduce distributed transactions. To ing all performance objectives by leveraging query ac-
handle complex database schemas and a larger num- cess skew to move cold data to cheaper storage layers,
ber of partitions, it uses a large neighborhood search retaining only hot data in main memory.
algorithm converging to near-optimal partitioning so- Table 10 summarizes major horizontal partition-
lutions within a reasonable time overhead. However, ing techniques for distributed memory environments.
it does not provide any partition update strategy. E-
Store[26] dynamically reallocates resource to accommo- 3.5 Cost Estimation for Horizontal
date demand spikes and new transactions. It periodi- Partition Scheme
cally collects metrics at the tuple, partition, OS levels,
identifies hot keys for hot tuple assignment, and even- Table 11 compares representative HP cost models.
ly distributes cold data in large chunks for the re- Notably, function-based cost models are prevalent, fo-
maining space. If CPU utilization exceeds a given cusing on a wide range of elements including block
threshold, E-Store scales cluster nodes and uses a skipping, join overhead, and hardware resources.
two-tiered bin packing algorithm to optimize tuple-to-
partition assignments. Clay[30] enhances E-Store by 3.5.1 Centralized Environment
addressing the issue of accessing tuples in multiple
blocks and non-colocated on the same cluster node. It Most studies[9, 12–16] evaluate partition quality by
adopts a two-tier partitioning with fine-grained map- calculating the number of scanned tuples using a skip-
ping (Metis[44] for hypergraphs) for hot tuples and ping-based cost function. In the SOP[9] model, the giv-
coarse-grained mapping (simple range/hash strate- en query set Q is initially encoded into n distinct fea-
gies) for cold tuples. When some partitions become ture vectors F = (F1 , F2 , . . . , Fn ) . The number of
overloaded, Clay employs a threshold-based sub-graph queries satisfying Fi is represented as zi . A function
migration algorithm to update them. f (P, Fi ) returns the number of accessed tuples when
Table 10. Major Horizontal Partitioning Strategies for Distributed Main Memory Environments
Category Work Baseline Objective Automatic Cost Deployment Method Content
Greedy Horticulture[23] Schism[21], manual partitions O2, O6 ✘ ✔ ✔ Skew-aware model+
large-neighborhood search∇
Greedy E-Store[26] Optimal solution O5 M-TH+D-MP ✘ ✔ Two-tiered partitioning∇;
greedy/first-fit ⟳
Greedy SAHARA[35] Unpartitioned state, DB-expert O9, O11 ✘ ✔ ✔ Hot/cold data division∇;
MaxMinDiff range partitionsΟ
ML Clay[30] E-Store[26], Metis[44] O2, O5 M-TH+D-RE ✘ ✔ Tuple grouping+graph split∇;
heuristic data migration plan ⟳

Table 11. Comparative Analysis of Major Horizontal Partitioning Cost Models in Diverse Environments
Category Cost Model Objective Environment Characteristic
Optimizer Rao et al.[17], MESA[22] O3, O8 E-DH/S Adjusting query plan node costs for different
partitions based on table/index statistics
Function SOP[9], AdaptDB[13], O3, O5 E-C(D)H/S Skipping-based block scan cost and join cost
MTO[15]
Function Horticulture[23] O2, O6 E-DM Quantifying the effects of load skew on the cluster
Function DYFRAM[20], SOAP[27], O5, O4, O6, O10 E-DH/S, Costs for dynamic environments (replication/reparti-
SWord[25], E-Store[26], E-DM tion operations, cold/hot data)
Clay[30]
Function PREF[28], GPT[32], O3, O12 E-DH/S Fine-grained cost designs for the PKR problem
BAW[33]
Function NashDB[31] O1, O3 E-DH/S A monetary value function for tuples; converting the
HP problem into an economic problem
Function SAHARA[35] O11 E-DM A novel objective for reducing hardware cost
Learning Hilprecht et al.[34] O4, O8 E-DH/S A network-centric cost model
356 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

running Fi over the given partition P . The total ∑

SKt (P, Q) × Ntxn

t
query cost C(P, Q) equals the sum of scanning tu-
t=0
ples in the data layout P when executing Q , i.e., Fskew (P, Q) = ,
Ntxn
∑∑
n
where n is the number of time intervals, and Ntxn t
is
C(P, Q) = f (P, Fi ) × zi . the number of accessing transactions during the t -th
P ∈P i=1
interval.
However, this model has a significant issue, i.e., Another representative cost model, as seen in SA-
the number of scanned block files only affects the cost HARA[35], utilizes limited device resources to improve
of scan operators and is not linearly correlate with the performance in two ways: 1) optimizing performance
final query latency, despite a positive relationship. under a given maximum resource budget; 2) optimiz-
Thus, it is crucial to consider more factors that affect ing resource budgets for given optimization objec-
the overall execution of query plans, such as filter, tives (denoted as SLA). SAHARA employs the latter,
join, and write operators. using the β -second rule to classify the given partition
as cold or hot and estimating their memory footprint
3.5.2 Distributed Environment to achieve the set objective.
If a partition is accessed more frequently than ev-
In distributed environments, query execution costs ery β seconds, it is classified as hot; otherwise, it is
deemed cold. Data from cold partitions is loaded from
are often associated with the number of local and dis-
disk as necessary, while data from hot partitions is
tributed transactions, as well as the uniformity of da-
kept entirely in memory. Given a partition size in
ta and load distribution. Horticulture[23] considers
bytes ( |Pi | ) and the memory overhead cost per unit of
both factors. It defines C(P, Q) as the weighted sum
buffer pool ( CRAM ), the memory footprint of a hot
of coordination cost Ccoo and skew factor Fskew , i.e., partition is Mhot (|Pi |) = CRAM × |Pi | . Considering the
estimated access frequency of Pi denoted as N b col , an
wcoo × Ccoo (P, Q) + wskew × Fskew (P, Q) i
C(P, Q) = , allowed maximum query execution time (i.e., SLA),
wcoo + wskew
I/O operations per second (PS), the disk cost per
where wcoo and wskew are user-specified weights for the page ( Cdisk ), and the page size ( Spage ), the memory
coordination and skew costs of machine nodes, respec- footprint of a cold partition can ⌈be expressed as:
tively; Ccoo measures how effectively P reduces dis- ( ) b col ⌉
b , SLA = N |P | C disk
Mcold |Pi |, N × ×
col i i
tributed transactions in Q . Ccoo is computed as the i .
SLA Spage PS
ratio of accessed partitions ( N̄par ) to the maximum Therefore, the memory footprint cost of a partition Pi
possible partitions ( N̂par ), which is then scaled by the that fulfills SLA is
ratio of cross-partition transactions ( Ndtxn ) to all ac- ( )
cessing transactions ( Ntxn ). We have: M |Pi |, N bicol , SLA, β =
{
( ) Mhot (|P b col ⩽ β,
N̄par Ndtxn ( i |) , ) if SLA/N i
Ccoo (P, Q) = × 1+ . Mcold |Pi |, N b col , SLA , otherwise.
Ntxn × N̂par Ntxn i

To get the skew factor Fskew , Horticulture first Finally, we discuss the partition update costs,
computes the skew factor for each t -th interval ( SKt ) which typically consider cost savings of new parti-
by dividing the average partition skew value by the tions and data migration expenses. They directly de-
ideal skew value, i.e., termine whether the repartition scheme is executed.
( |P| ( i ) ) The cost savings arise from lower transaction execu-
∑ Npar /Npar 1 tion and resource costs, whereas data migration ex-
SKt (P, Q) = log /N̂par / log ,
ρ̄ txn ρ̄ txn penses cover the overheads tied to partition and repli-
i=1

i
ca modifications.
where Npar represents the number of transactions ac-
cessing the i -th partition, and ρ̄txn represents the ide-
3.6 Summary
al transaction distribution, estimated as 1/N̂par .
Next, Horticulture accumulates them to obtain We summarize key characteristics of HP as fol-
the final skew factor, i.e., lows: 1) It is crucial to survey the storage and deploy-
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 357

ment environments before designing partitions. 2) tion Ccg to evaluate only the division of each CG, and
When query features are scarce, query-driven meth- then finds optimal classifier (ϕ∗ ) to generate P .
ods often utilize data features as a supplement to ∑
build finer-grained or size-constrained partitions. 3) CGs∗ = arg min Ccg (CG, Q),
CGs
CG∈CGs
Each partitioning strategy has unique strengths and ∗
Pi =ϕ(e, CGs ), ∀e ∈ D,
weaknesses. Mathematical programming requires fea- ( )
sibility verification due to partitioning's NP-hard na- ϕ∗ = arg min C P ⇐ ϕ(D, CGs∗ ), Q . (1)
ϕ

ture. Learning-based algorithms exhibit high perfor- Definition 4 (Dynamic Vertical Partitioning). This
mance but adapt poorly to environmental changes.
concept exhibits parallel characteristics to dynamic
Conversely, greedy algorithms offer more flexibility
HP and will not be defined repeatedly here.
for existing partitioning constraints, but may lack sta-
ble performance, which could be improved with addi-
4.2 Feature Extraction
tional optimization phases.
Database Schema. 1) VP is essentially an exten-
4 Vertical Partitioning
sion of table splitting, and high-frequency column
Subsection 4.1 and Subsection 4.2 provide the def- groups can be directly extracted from independent
inition and feature extraction of the vertical partition- business scenarios in advance. 2) Distinguishing be-
ing (VP) problem, respectively. Mainstream VP con- tween indexed and non-indexed columns[52, 64, 72] con-
struction strategies for centralized and distributed en- siderably affects column grouping. 3) Small/large ta-
vironments are presented in Subsections 4.3 and 4.4, bles are categorized based on the number of at-
respectively, and their cost models are introduced in tributes to verify algorithms' execution efficiency.
Subsection 4.5. Fig.5 depicts the development trajec- Table Data. When constructing column groups,
tory of VP methods. examining attribute types, such as primary/con-
strained keys, can help reduce join costs. When creat-
4.1 Formalization ing range-based horizontal splits, the attribute distri-
bution characteristic[73] serves as a crucial reference
Definition 3 (Static Vertical Partitioning). Static factor in determining partition keys.
VP is a two-phase partitioning technique for process- Workload. Query features[50, 53, 60, 65] such as access-
ing the collected queries Q . A table data D is initially ing attributes (projection, filter, and join columns),
divided vertically into disjoint column groups CGs , affected rows, selectivity, SQL keywords[57, 69, 73], and
which are subsequently split horizontally into k dis- submission time are commonly extracted in VP. Here,
tinct partitions P = (P1 , P2 , . . . , Pk ) through two accessing attributes are used to calculate co-occur-
candidate strategies: 1) all CGs are split into P as a rence frequency between attributes; selectivity[72, 73]
whole containing aligned tuples; 2) each CG is inde- reflects the proportion of scanned tuples in the total
pendently split into partitions and then merged into P . table tuples, with higher selectivity typically indi-
The objective of VP is to generate the optimal combi- cates a greater query weight.
nation of CGs and P that minimizes the final process- Database Runtime Metric. Similar to HP, the VP
ing cost C of Q . VP first identifies optimal column layout primarily focuses on key metrics like system
groups (CGs∗ ) by introducing an additional cost func- throughput, processor stalls, and resource utilization.

ML-Based DL-Based Greedy-Based MP-Based Empirical-Based

Navathe89 GA Agrawal04 Lisbeth Smopd HYF

Active Grid
Hoffer Navathe84 OBP HillClimb AutoPart VF AutoStore Dyvep Smopdc GSOP DB Formation AutoVP SCVP

1975 1984-1989 1993 2003 2004 2008 2010 2011 2012 2013 2014 2016 2017 2018 2019 2021
HYRISE Trojan CHAC Peloton Casper

Fig.5. Timeline of VP research development, including on-axis studies (Hoffer[49], Navathe84[50], Navathe89[51], OBP[52], GA[53], Hill-
Climb[54], AutoPart[55], Agrawal04[56], VF[57], Lisbeth[58], AutoStore[59], Smopd[60], Dyvep[61], Smopdc[62], GSOP[63], HYF[64],
ActiveDB[65], GridFormation[66], AutoVP[67], and SCVP[68]) based on centralized environments and off-axis studies (HYRISE[69], Tro-
jan[70], CHAC[71], Peloton[72], and Casper[73]) based on distributed environments.
358 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

4.3 Partitioning Process in Centralized constraints. PAX[75] (Partition Attributes Across) lay-
Databases out decomposes relations at the page level to avoid
the join expenses of prior VP studies[49–53] that break
Recent research[66, 67] has highlighted the benefits down a table into multiple subtables. HillClimb[54] ex-
of applying reinforcement learning algorithms for dy- tends PAX by defining a finer page layout. Starting
namic partition updates. This marks a major evolu- with PAX's single-column partitions, it merges the
tion from earlier methods[49–51, 53, 55–58, 64], which de- two partitions offering the largest query cost reduc-
pended on the partitioning feature of attribute affini- tion in iterative rounds until no further reduction is
ty. These methods aimed to enhance performance but possible. Fig.6 illustrates this process. The CGs
often at the cost of increased execution time. Howev- |a1 |a2 |a3 |a4 |a5 |a6 | are the initial page layout. The first
er, as the field progressed, there was a shift towards round merges a1 and a2 for their greatest merging
more efficient, lightweight, end-to-end partitioning benefit. Then we update merging benefits of valid
methods[59–63, 65, 68], reflecting a continuous effort to candidate mergers, and a5 and a6 are next merged.
balance system design's efficiency and effectiveness. The process continues until reaching the optimal state
ML-Based. Hoffer et al.[49] calculated an attribute |a1 a2 a3 |a4 a5 a6 | with no feasible mergers left.
affinity matrix (AAM) from column accesses and ap-
plied the BEA[74] algorithm to cluster the AAM into Initial 
     Merged Benefit
column groups (CGs). Navathe et al.[50] refined this Partitions   10
approach by introducing a two-phase partitioning    8
       8
with a cost model to select appropriate cost types for …

the given environment. Initially, they used Hoffer's   7

         6
method to obtain CGs, followed by a binary parti- …

tioning to recursively split each candidate CG into          8

   6
two finer-grained ones. Navathe et al.[51] reduced the Partition …
time complexity of previous work[50]. They created an Optimization    6
     
…
undirected weighted graph with nodes representing
Termination , ..., -8
table attributes and edge weights denoting attribute
affinity, and identified minimal-weight closed “loops” Fig.6. Exemplary case of the HillClimb algorithm[54] for col-
umn group creation.
as the final CGs. Ng et al.[53] designed a cost function
focusing solely on the transaction's page access and MP-Based. Sun et al.[63] generalized the SOP[9]
the penalty cost for cross-partition transactions. A model (GSOP) by introducing column grouping (ver-
GA model was trained to find optimal CGs and tuple tical splits) before local feature selection (horizontal
clusters within each CG. splits). To obtain CGs, GSOP constructs an ILP
Greedy-Based. OBP[52] (Optimal Binary Partition- equation to balance block skipping and tuple recon-
ing) treats transaction attributes as basic units for struction.
building a binary search tree. Each leaf node is ex- ML-Based. Several studies[57, 58, 64, 68] leveraged
panded by assigning attributes referenced from the se- frequent itemset mining algorithms to identify key
lected unit to its left branch and remaining at- column groups from loads. Gorla et al.[57] introduced a
tributes to its right. AutoPart[55] identifies discrete vertical fragmentation (VF) method that selects the
condition values in queries as horizontal splits to cre-
top- k non-overlapping Apriori[36]-generated patterns
ate atomic partitions. Each atomic partition contains
for forming complete CGs, along with a cost function
all attributes referenced by its access queries, with no
for assessing the partition scanning and concatena-
query accessing merely a subset of its attributes. To
tion overheads in transaction operations. Lisbeth et
reduce join overhead, AutoPart merges atomic parti-
tions and adds redundant attributes to form compos- al.[58] proposed a VP technique that sets the mini-
ite partitions. Agrawal et al.[56] considered partition mum support threshold automatically, while VF re-
manageability, introducing an interestingness score quires manual specification. HYF[64] assigns new
for CG effectiveness and employing greedy approach- weights to frequent patterns by multiplying their sup-
es to generate CG candidates. They designed algo- port value with the cosine similarity of all patterns. It
rithms MergeColumns and MergeRanges to identify generates multiple candidate complete schemes like
the optimal CG solution meeting partition aligned VF and designs a cost function considering sequence/
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 359

index scans for optimal selection. SCVP[68] improves enabling faster experience learning and MDP process.
HYF's cost model by incorporating tuple reconstruc- Table 12 summarizes vertical partitioning tech-
tion costs. Leveraging the cost independence proper- niques for centralized environments.
ty between CGs, SCVP first designs an estimation
function for rapid CG division gain calculation, mak- 4.4 Partitioning Process in Distributed
ing it suitable for large tables and heavy loads. It Databases
then applies spectral clustering on AAM to form ini-
tial CGs and adopts a greedy search strategy to split In big data systems, VP layouts are commonly
and merge CGs based on frequent patterns. built on page-level stores like [75]. Trojan[70] defines
Constructing a self-adaptive VP layout is crucial. an interestingness score to reflect how effectively the
AutoStore[59] introduces O2P (One-dimensional On- CG accelerates most queries, then solves a 0-1 knap-
line Partitioning) to monitor query changes through a sack problem to select the optimal CG combinations.
query window and updates AAM online. O2P uses the Trojan achieves layout-aware replication by design-
BEA algorithm to recluster only the CGs referenced ing unique CGs for each replica, better adapting to
by new queries, and designs a transforming benefit given queries. CHAC[71] (Column-oriented Hadoop At-
model to decide whether the repartition decision tribute) extracts frequent closed item sets from a fre-
should be executed. SMOPD[60] improves on Auto- quency-weighted AAM to generate overlapping and
Store by determining appropriate checkpoint inter- non-overlapping candidate clustering solutions, and
vals for repartitioning based on historical data analy- designs a cost model to select the optimal solution.
sis, employing AutoClust[76] for partition updates. VP-based hybrid storage is customized for HTAP
SMOPD-C[62] further adapts SMOPD to distributed databases. HYRISE[69] measures cache misses result-
settings by updating monitoring procedures. To solve ing from data movement from RAM to cache, miti-
cold start issues of VP, DYVEP[61] designs a statistic gating cache pollution in update operations using
collector to monitor the changes in query patterns non-temporal writes. It creates CG layouts that adapt
and database schema, creating new partitions or trig- to cache lines to accelerate read operations. Peloton[72]
gering repartitioning when query latency increases or clusters queries by their co-accessed attributes via the
table attributes are deleted. k -means algorithm, selecting representative queries
Empirical-Based. ActiveDB[65] uses 21 active rules for each cluster by optimizer estimates and submis-
to monitor both internal and external system and us- sion time. It then prioritizes these queries, using a
er activities. The first 15 rules gather query-related greedy policy to extract CGs and maintains recent
statistical indicator changes. Two rules estimate the query statistics in a time series graph to periodically
current performance change to determine the necessi- replace old CGs. Casper[73], a column layout that
ty for partition updates. The final four rules use sta- works with VP algorithms like HYRISE and Peloton,
tistical features to create new partitions and access optimizes HTAP load processing in in-memory
their performance improvement threshold. DBMS. It estimates block read/write I/O costs for
DL-Based. GridFormation[66] is the first learning- various transaction operations, aligns block sizes with
based agent using Q-learning[47] for online VP layout cache lines, and track each operation access via block
design. The state is defined as a collection of sets, domain histograms. This helps establish ILP equa-
each indicating a partition containing a list of tuple tions to allocate data while satisfying constraints re-
IDs. GridFormation's partitioning process follows a lated to read/update latencies.
Markov decision process (MDP), with rewards calcu- Table 13 summarizes vertical partitioning tech-
lated based on touched partitions and tuple access ra- niques for distributed environments.
tio of each query. AutoVP[67] redesigns the GridFor-
4.5 Cost Estimation for Vertical Partition
mation agent to accelerate training, offering three op-
Scheme
tional DQN variants[77] and using HillClimb[54] and
HDD[78] to evaluate temporary partitions. It simpli- This subsection reviews common function-based
fies state representation to a 2D array, with each row cost models (see Table 14) employed for VP evalua-
corresponding to a query and each column to a table tion, including two-phase partitioning and partition
attribute. Rewards are based on the cost difference updates. They consider query execution on VP lay-
between the current state and HillClimb's ideal state, outs, partition updates, and the impact of indexes,
360 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

Table 12. Major Vertical Partitioning Strategies for Centralized Environments

Category Work Baseline Objective Automatic Cost Deployment Method Content
ML Navathe[50] Hoffer et al.[49] O3 ✘ ✔ ✘ BEA∇; binary partitioningΟ
ML DYVEP[61] AutoPart[55] O4 M-TH ✘ ✔ Query table ⟳ +Navathe84∇Ο
ML SMOPD(-C)[60, 62] Static VP using AutoClust[76] O2, O4, O8 M-QW ✘ ✘ Statistics monitoring ⟳ +
filtered AutoClust∇
ML VF[57] Unpartitioned state O3 ✘ ✔ ✘ Apriori∇; greedy searchΟ
ML HYF[64] Lisbeth et al.[58], VF[57] O3 ✘ ✔ ✘ Aprior with cosine similarity∇;
greedy searchΟ
ML SCVP[68] HillClimb[54], HYF[64] O3, O8 ✘ ✔ ✘ Spectrum clustering∇,
greedy splitting/mergingΟ
Greedy OBP[52] Navathe et al.[51] O3 ✘ ✔ ✘ Binary search tree using
transaction-based splits∇
Greedy HillClimb[54] Navathe et al.[50, 51] O10 ✘ ✔ ✔ HillClimb algorithm∇
Greedy AutoPart[55] Navathe et al.[50] O3 ✘ ✘ ✘ Composite partitions∇;
pair-wise mergingΟ
Greedy Agrawal[56] Navathe et al.[51] O12 ✘ ✔ ✘ Interesting column groups∇
greedy selection/mergingΟ
Greedy AutoStore[59] HillClimb[54] O3, O8 M-QW ✔ ✔ O2P+query window ⟳
MP GSOP[63] SOP[9], HillClimb[54] O3, O7 ✘ ✔ ✔ ILP∇; Apriori+WardΟ
DL GridFormation[66] Manual partitions O3 M-RL ✘ ✘ Q-learning algorithm∇ ⟳
DL AutoVP[67] AutoPart[55], O2P[59] O3 ✘ ✘ ✘ DQN and its variants∇

Table 13. Major Vertical Partitioning Strategies for Distributed Environments

Category Work Baseline Objective Automatic Cost Deployment Method Content
Greedy HYRISE[69] Simple partitions O10 ✘ ✔ ✔ Kmetis∇; greedy mergingΟ
Greedy CHAC[71] Hoffer et al.[49] O3 ✘ ✔ ✘ AAM+frequent closed items∇
ML Peloton[72] Simple row/column O3 M-TH+D-RE ✘ ✔ k -means∇; greedy selectionΟ
partitioning
MP Trojan[70] Hadoop-row, O3, O8 ✘ ✔ ✘ Interestingness grouping∇;
Hadoop-PAX, HYRISE[69] 0-1 knapsack programmingΟ
MP Casper[73] DSM+leading columns, O3 ✘ ✔ ✔ Equ-size partitions+
DSM+equi-width partitions histogram-based frequency
model+ILP∇

Table 14. Comparative Analysis of Function-Based Vertical Partitioning Cost Models in Diverse Environments
Cost Model Objective Environment Characteristic
VF[57], GSOP[63] O3 E-CH/S Incorporating the cost of tuple construct across partitions
ACO[78] O3 E-CH/S Cost designs for bandwidth-based disk access operations
AutoPart[55], HYF[64] O3 E-CH/S Approximating the costs of index scans and block joins
AutoStore[59] O3, O5 E-CH/S Considering the repartitioning potential benefit
CHAC[71], Trojan[70] O3, O5 E-DH/S Finer cost estimations for map-reduce phases
DataMorhing[54], HYRISE[69] O10 E-DM Estimating cache misses for diverse data access operations
Casper[73] O3 E-DM Modeling costs for five distinct data access operations

joins, and map-reduce operations. For non-PAX VP use a fixed query window, while [60–62] employ dy-
techniques, the additional tuple reconstruction cost namic windows based on query performance thresh-
olds. The choice of monitoring approach does not im-
for cross-partition queries is another crucial factor.
pact the modeling of repartitioning benefits. However,
AutoStore[59] differs from other approaches by consid-
4.5.1 Centralized Environment ering potential benefits of new partitions rather than
solely evaluating them based on historical loads. It in-
To determine when to repartition, some studies[59] troduces a transformation benefit Btf , resulting from
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 361

updating current partitions P to a new scheme P′ When local data is not available, the network cost Ctr
when executing n queries Q collected in the window. arises from transferring data from one machine to an-
Btf is calculated as Ccg (Q, P) − Ccg (Q, P′ ) , with Ccg de- other, i.e., Ctr = (1 − ptr ) × (Ssplit /BWnet ) . BWnet de-
noting the query processing cost over a given vertical notes the network bandwidth (1 GB/s) and ptr is the
layout. AutoStore assumes the presence of multiple occurrence probability (0.97) of remote accesses. As-
future windows with workloads similar to the current suming a map initialization time of 0.1 s ( Cinit ), the
window, and estimates their frequency using an expo- total latency of query q over the Trojan layout is
nential decaying model with a shape parameter γ , computed as (Ctr (q) + Crand (q) + Cseq (q) + Cinit )×Nmap .
i.e., f req = 1/(1 − γ −n ) . The potential benefit of up-
dating current partitions is then calculated as 4.6 Summary
Br = f req × Btf − Ccg , and new partitions are de-
ployed only if Br > 0 . Differing from HP, VP involves a two-phase pro-
Various approaches[57, 59, 63, 64, 78] calculate Ccg by cess of column grouping and horizontal division of tu-
breaking down the total query cost into scan and tuples, with each phase being NP-hard. In the first
ple reconstruction costs of multiple accessed CGs. The phase, mathematical programming algorithms effi-
scan cost counts the number of both random and se- ciently identify CGs in small tables, while greedy and
quential I/O blocks, with random I/O accounting for ML-based algorithms are preferred for large tables. In
unclustered and clustered index scan costs plus index the second phase, partitions within each CG are typi-
lookup costs. The tuple reconstruction cost considers cally generated using hash or range values of keys.
only the join cost (e.g., hash and sort-merge joins) if Additionally, cost models play a crucial role in the
tuples between different CGs are not aligned; other- VP process, calculating scan costs for CGs and cross-
wise, a minimal tuple addressing cost is considered. CG reconstruction costs for selecting candidate parti-
tions. Despite its advantages, deploying and evaluat-
4.5.2 Distributed Environment ing VP in real-world databases is challenging due to
the limited native support for VP creation.
VP layout is prevalent in distributed Hadoop en-
vironments. For example, both Trojan[70] and 5 Irregular Partitioning
CHAC[71] consider the impact of column groups dur-
ing the map phase as the main cost factor. However, Irregular partitioning (IP) is a cutting-edge tech-
unlike Trojan, CHAC estimates costs roughly, focus- nique for handling analytical and mixed loads. How-
ing solely on data access volume and omitting disk ever, deploying it poses challenges such as maintain-
read/write characteristics and network cost considera- ing storage structure, updating partitions, and coordi-
tions. We will introduce the Trojan cost model next. nating query executors. Furthermore, there is a
To avoid tuple reconstruction, Trojan is based on scarcity of relevant studies according to [8, 81]. In
PAX and considers data reading and network costs. this section, we define the IP problem in Subsection
The known parameters include the block size Sb , num- 5.1. The partitioning features required by IP dis-
ber of machines n , map tasks m , and the split size cussed in Section 3 and Section 4, will not be reintro-
Ssplit . Ssplit determines the number of data slices, with duced. Subsequently, we describe several classic IP
each being handled by a single mapper. When pro-
techniques in Subsection 5.2 and provide a summary
cessing a query q , the number of blocks read is denot-
in Subsection 5.3. Fig.7 depicts a simple development
ed as Nb , and then the number of map phases is cal-
trajectory of IP methods.
culated as Nmap = Nb × Sb /(Ssplit × m × n) .
The read cost for each map phase includes both Empirical-Based Greedy-Based
random I/O, Crand (q) = Frand × (Ssplit × Ccg′ /(Sbuffer ×
Teradata GridTable Jigsaw
|Ccg |)) , and sequential I/O, Cseq (q) = Ssplit × Ccg′ /
2016 2020 2021 2022
(BWdisk × |Ccg |) . Frand denotes the average random
Proteus
seek time (0.005 s); Ssplit is set to 256 MB; Ccg and Ccg′
represent the complete and accessed column sets, re- Fig.7. Timeline of IP research development, including on-axis
studies (Teradata[79], GridTable[80], and Jigsaw[81]) based on
spectively; Sbuffer is the buffer size (512 KB), and centralized environments and off-axis studies (Proteus[82]) based
BWdisk is the average disk bandwidth (100 MB/s). on distributed environments.
362 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

5.1 Formalization merges or splits these segments for adapting block

size. Jigsaw layout requires a hash table for tuple re-
Definition 5 (Static Irregular Partitioning). The construction and manages partitions effectively by
IP problem uses a predefined classifier ϕ(·) to gener- materializing logical segments into rectangular physi-
ate a set of partitions with arbitrary shapes for a giv- cal ones storing tuples with the same columns. Pro-
en table, denoted as P = (P1 , P2 , . . . , Pk ) . The table teus[82] adaptively designs and updates storage lay-
data consists of m tuples D = (e1 , e2 , . . . , em ) , where outs for different table areas, aligning partition shapes
each tuple e has n attributes. The y -th attribute value with queried data areas and selecting appropriate row
of the x -th tuple ex in T is represented as ex,y . The or column formats for specific transactions to sup-
objective of IP is to assign specific irregular partition port efficient data reads and updates. When storage
for each ex,y in order to minimize the I/O cost of pro- limits are reached or performance issues arise, Pro-
teus optimizes storage formats and reorganizes hot
cessing load Q .
( ) partitions and replicas. To evaluate the potential ben-
ϕ∗ = arg min C P ⇐ ϕ(D), Q , efits of data migration plans, Proteus uses recurrent
ϕ
neural networks (RNNs) and linear predictors to pre-
Pi = ϕ(ex,y ), ∀ex ∈ D and y = 1, . . . , n.
dict future data access patterns.
Definition 6 (Dynamic Irregular Partitioning). Fig.8 displays four distinct layouts, with the right
This concept adheres to the identical update mecha- arrow ( → ) indicating row-wise storage and the down
nism employed by both the HP and VP algorithms, arrow ( ↓ ) indicating column-wise storage. 1) Terada-
and will not be detailed further in this context. ta vertically splits the table into three column parti-
tions ( P1−2 , P3−4 , P5−6 ), each further hierarchically
5.2 Partitioning Process partitioned by rows; 2) GridTable breaks the table in-
to six grids of variable rows and columns, each sup-
Empirical-Based. Teradata[79] introduces a hybrid porting either row-major or column-major storage;
3) Jigsaw supports arbitrary partition shapes, similar
row-column layout via multi-level definitions, with
to Tetris; 4) Proteus generates column partitions
the first level for column partitions and subsequent
( P1 , P2 ) and row partitions ( P3 , P4 ) without hierar-
levels for further row partitions. Teradata uses a par-
chical order. It can be observed that Teradata and
tition number combining row IDs and partition levels
Jigsaw layouts prefer row-oriented storage, while
to make file systems identify arbitrary partition GridTable and Proteus layouts exhibit flexibility in
shapes, reducing partition scanning and optimizing their storage formats.
DML operations. GridTable[80] extends VP layouts Table 15 summarizes the characteristics of the
like HYRISE and Peloton with a flexible grid layout. four irregular partitioning techniques.
Each grid is self-contained, organizing tuples column-
wise or row-wise independently. It supports tuple-cen- 5.3 Summary
tric read/write operations and efficient range query
executions. However, neither study provides any par- The IP field aims to enhance mixed and analyti-
tition creation guidance. cal loads by maximizing the optimization potential of
Greedy-Based. Jigsaw[81] provides ultimate data query-driven strategies. It fully utilizes current query
skipping for static queries with tetris-shaped parti- distributions to create complex partitioning rules to
tions managed by logical segments. These segments satisfy diverse query access patterns. Despite its ad-
are classified as active or frozen based on if they can vantages, IP still faces several challenges, such as the
be split for I/O reduction. Jigsaw first partitions ta- need to develop a unified transaction execution inter-
bles into frozen segments using a split function, then face and handling the management complexities asso-

   
  
 
  
  
  
   

(a) (b) (c) (d)

Fig.8. Comparison of four IP designs. (a) Teradata layout[79]. (b) GridTable layout[80]. (c) Jigsaw layout[81]. (d) Proteus layout[82].
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 363

Table 15. Summary of Irregular Partitioning Strategies in Centralized and Distributed Environments
Category Work Baseline Objective Automatic Cost Composition Content
Empirical Teradata[79] Simple range O3, O10 ✘ Optimizer-based I/O costs; Rowid-based storage+ multi-
partitions CPU metrics level range partitioning∇
Empirical GridTable[80] N/A O4 ✘ Access and transition costs Three level-specific data
between grids manipulation operations
Greedy Jigsaw[81] Schism, O3 ✘ Read I/Os of layouts; memory Segment partitioning∇;
Schism+Peloton for hash tables greedy mergingΟ
Greedy Proteus[82] TiDB O4 M-TH+D-RE Costs for layout-aware/ Layout creation rules∇;
-agnostic storages hybird predictors for queries ⟳

ciated with irregular partition replication, mainte- sides, their simplicity enables DBAs to effortlessly
nance, and joins. create and manage partitions. Organizing data based
on data distribution can also makes it easier to con-
duct data analysis, particularly for time-series data.
6 Data Partitioning in Industry
In contrast, certain products (e.g., Vertica⑩,
Greenplum⑪, and VoltDB⑫) incorporate load analy-
Table 16 compares popular database products and
sis into their partitioning design. VoltDB is an in-
their partitioning support. Most DBMSs, e.g., Red-
memory DMBS for fast data processing tasks like on-
shift③, Firebolt④, Databricks⑤, GaussDB⑥, TiDB⑦,
line gaming and IoT sensors. By analyzing historical
OceanBase⑧, and SingleStore⑨, offer user-defined HP load and data distribution, it scales transaction pro-
strategies such as range, hash, key, list, and round- cessing capacity, creating optimal range partitions.
robin, where partition keys necessitate manual selec- This ensures load balancing and allows high-frequen-
tion and updates. These systems prioritize balanced cy transactions to be executed locally.
resource utilization among nodes and cluster scalabili- Some products, e.g., ClickHouse⑬, StarRocks⑭,
ty/parallelism through partitioning, rather than fo- Apache Hudi⑮, Oracle Autonomous Database⑯, and
cusing solely on maximizing system performance. Be- Snowflake⑰, provide automated partition key selec-
Table 16. Partitioning Support Comparison of Popular Database Products for OLAP, OLTP, and HTAP Scenarios
Scenario Type Partitioning Strategy Strategy Type Automatic Representative Product
OLAP HP Key, hash, range, list, round-robin Data-driven ✘ Redshift③, Firebolt④, Databricks⑤, GaussDB⑥
HP Round-robin, list, hash, range Data-driven M-TH ClickHouse⑬, StarRocks⑭, Apache Hudi⑮
HP Automatic interval/list Data-/query-driven M-SD Oracle Autonomous Database⑯
HP Auto clustering Data-driven M-TH Snowflake⑰
HP&VP Range, table projections+hash Data-/query-driven ✘ Vertica⑩, Greenplum⑪
OLTP HP Key, hash, range, list Data-driven ✘ PostgreSQL, MySQL, Oracle, SQLServer
HP Key, hash, range, list Data-/query-driven ✘ VoltDB⑫
VP Sharding+table views Data-/query-driven ✘ PostgreSQL, MySQL, Oracle, SQLServer
HTAP HP Key, hash, range, list Data-driven ✘ TiDB⑦, OceanBase⑧
HP Hash Data-driven ✘ SingleStoreDB⑨

③https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html, Mar. 2024.

④https://docs.firebolt.io/working-with-partitions.html,Mar. 2024.
⑤https://docs.databricks.com/sql/language-manual/sql-ref-partition.html, Mar. 2024.
⑥https://support.huaweicloud.com/intl/en-us/twp-dws/dws_11_0013.html, Mar. 2024.
⑦https://docs.pingcap.com/zh/tidb/v7.0/partitioned-table, Mar. 2024.
⑧https://en.oceanbase.com/docs/common-oceanbase-database-10000000001105730, Mar. 2024.
⑨https://docs.singlestore.com/cloud/create-a-database/choosing-a-table-storage-type, Mar. 2024.
⑩https://docs.vertica.com/12.0.x/en/admin/projections, Mar. 2024.
⑪https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-ddl-ddl-partition.html, Mar. 2024.
⑫https://docs.voltactivedata.com/UsingVoltDB/DesignPartition.php, Mar. 2024.
⑬https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key, Mar. 2024.
⑭https://docs.starrocks.io/docs/table_design/dynamic_partitioning, Mar. 2024.
⑮https://hudi.apache.org/docs/file_layouts, Mar. 2024.
⑯https://docs.oracle.com/en/cloud/paas/autonomous-database, Mar. 2024.
⑰https://docs.snowflake.com/en/user-guide/tables-auto-reclustering, Mar. 2024.
364 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

tion and updates. The Oracle Autonomous Database adopts simple data-driven methods to allocate tuples
service is a resource-intensive and time-consuming op- into blocks after obtaining column groups (CGs). Al-
eration, invoked on-demand rather than running peri- though [53, 56] have considered load information,
odically. Analyzing workload information, it automat- they still encounter convergence or performance is-
ically identifies candidate partitioning tables and rec- sues. This inefficiency prevents the VP algorithm
ommends partitions for optimal I/O reduction using from achieving its optimal potential, even when the
strategies: automatic interval, automatic list, and CG division is aligned with column access patterns. A
hash. Snowflake creates micro-partitions via ZoneMap promising solution is to incorporate proven effective
indexes and column distribution histograms. Data is query-driven HP algorithms like QdTree[14] into VP.
organized in a natural order (unclustered state), then Reliability of Partition Updating. Monitoring ser-
clustered by selected keys to prevent cluster key val-
vices frequently rely on recently collected query logs
ue duplication across partitions (clustered state). As
to design new partitions; however, this method ne-
new data arrives, the number of duplicate key values
glects the similarity between future and historical
across different partitions increases; each partition's
loads, making it challenging to estimate updated par-
depth is quantified by the count of its overlapping
titions' potential performance. While [34, 59, 82] have
partitions. To preserve overall data order, Snowflake
prioritizes selecting micro-partitions with higher tried to model special scenarios to calculate future
depths and sorts and merges them independently. benefits of new partitions, these assumptions often
Vertica and Greenplum are among the few prod- prove unrealistic. This issue presents significant opti-
ucts natively supporting VP creation for efficient par- mization potential in two aspects: firstly, improving
tition pruning. This is achieved by creating local col- the prediction accuracy of future load for generating
umn group projections on disk for partitioned tables better new partitions; and secondly, reducing the
and evenly distributing projected data to partitions number of problem assumptions.
via hashing. By storing related data together, Vertica Deep Learning Models for Cost Estimation. To
can more efficiently utilize system resources. Fine- the best of our knowledge, no public, network-centric
grained projection replicas make it easier to achieve cost model exists for partitioning. However, the learn-
high availability and data recovery. Conversely, oth- ing and generalization capabilities of deep neural net-
er products simulate VP by combining sharding and works render them particularly suitable for such
views, an approach complicating table schema, subtasks. The main challenge lies in collecting sufficient
table data consistency, and query planning, which training samples due to the high partition deploy-
may lead to performance issues like overloaded shards ment cost and the vast partition solution space. A vi-
and increased partition maintenance costs. able solution entails compressing or trimming the so-
lution space by identifying factors influencing the
7 Open Problems query plan. This could be achieved by using a pruned
branch bounding tree for candidate partitions and re-
In this section, we explore remaining challenges moving the deployment and metric measurements of
and potential solutions in the current data partition- cold data. Subsequently, query plans and execution
ing community. metrics for various partitions are collected to train an
Partitioning for Non-Numeric Columns. Query ac- RNN-stacked tree network.
cess patterns pertaining to non-numeric columns are
often ignored, which greatly limits the optimization 8 Conclusions
space of partitioning. A feasible solution to this dilem-
ma involves transforming non-numeric column data In this paper, we modularized the partitioning
into numeric data via data encoding. Date columns technique, emphasizing the significance of cluster and
can be transformed into numeric values through storage environments in formulating an efficient parti-
timestamp functions, while enumeration columns are tioning path. Our approach enhances the tracking of
dictionary-encoded based on their semantic or alpha- partitioning progress and clarifies the considerations
betical order. For more complex column values, a trie- necessary at each partitioning stage, ensuring opti-
based index tree[83] can be built, with a depth-first mal designs. Before partitioning, it is crucial to align
traversal to derive encoding keys. cost models and partition types with specific environ-
Block Allocation Within VP. Current research mental characteristics. Furthermore, the intricate re-
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 365

lationship between data migration plans during parti- Management of Data, Jun. 2014, pp.1115–1126. DOI: 10.
tion updates and cluster configuration underscores the 1145/2588555.2610515.
[10] Aly A M, Mahmood A R, Hassan M S, Aref W G, Ouz-
importance of a holistic approach. We also classified
zani M, Elmeleegy H, Qadah T. AQWA: Adaptive query
partition generation strategies based on algorithm workload aware partitioning of big spatial data. Proceed-
types, distinguishing key features such as model con- ings of the VLDB Endowment, 2015, 8(13): 2062–2073.
vergence and partition quality to aid in strategy selec- DOI: 10.14778/2831360.2831361.
tion. For future research, we would like to explore [11] Aly A M, Elmeleegy H, Qi Y, Aref W. Kangaroo: Work-
load-aware processing of range data and range queries in
feasible solutions for addressing existing key chal-
Hadoop. In Proc. the 9th ACM International Conference
lenges including non-numeric column-based partition- on Web Search and Data Mining, Feb. 2016, pp.397–
ing and the reliability of partition updating. We hope 406. DOI: 10.1145/2835776.2835841.
our framework and findings could contribute to the [12] Shanbhag A, Jindal A, Madden S, Quiane J, Elmore A J.
advancement of partitioning systems and provide A robust partitioning scheme for ad-hoc query workloads.
practical insights for DBAs in various environments. In Proc. the 2017 Symposium on Cloud Computing, Sept.
2017, pp.229–241. DOI: 10.1145/3127479.3131613.
Conflict of Interest The authors declare that [13] Lu Y, Shanbhag A, Jindal A, Madden S. AdaptDB:
Adaptive partitioning for distributed joins. Proceedings of
they have no conflict of interest.
the VLDB Endowment, 2017, 10(5): 589–600. DOI: 10.
14778/3055540.3055551.
References [14] Yang Z H, Chandramouli B, Wang C et al. Qd-tree:
Learning data layouts for big data analytics. In Proc. the
[1] Melnik S, Gubarev A, Long J J et al. Dremel: A decade of
2020 ACM SIGMOD International Conference on Man-
interactive SQL analysis at web scale. Proceedings of the
agement of Data, Jun. 2020, pp.193–208. DOI: 10.1145/
VLDB Endowment, 2020, 13(12): 3461–3472. DOI: 10.
3318464.3389770.
14778/3415478.3415568. [15] Ding J L, Minhas U F, Chandramouli B et al. Instance-
[2] Bayer R, McCreight E. Organization and maintenance of optimized data layouts for cloud analytics workloads. In
large ordered indices. In Proc. the 1970 ACM SIGFIDET Proc. the 2021 International Conference on Management
(Now SIGMOD) Workshop on Data Description, Access of Data, Jun. 2021, pp.418–431. DOI: 10.1145/3448016.
and Control, Nov. 1970, pp.107–141. DOI: 10.1145/1734663. 3457270.
1734671. [16] Li Z, Yiu M L, Chan T N. PAW: Data partitioning meets
[3] Bentley J L. Multidimensional binary search trees used workload variance. In Proc. the 38th IEEE International
for associative searching. Communications of the ACM, Conference on Data Engineering, May 2022, pp.123–135.
1975, 18(9): 509–517. DOI: 10.1145/361002.361007. DOI: 10.1109/icde53745.2022.00014.
[4] Guttman A. R-trees: A dynamic index structure for spa- [17] Rao J, Zhang C, Megiddo N, Lohman G. Automating
tial searching. In Proc. the 1984 ACM SIGMOD Interna- physical database design in a parallel database. In Proc.
tional Conference on Management of Data, Jun. 1984, the 2002 ACM SIGMOD International Conference on
pp.47–57. DOI: 10.1145/602259.602266. Management of Data, Jun. 2002, pp.558–569. DOI: 10.
[5] Yuan H T, Li G L, Feng L, Sun J, Han Y. Automatic 1145/564691.564757.
view generation with deep learning and reinforcement [18] Agrawal S, Chu E, Narasayya V. Automatic physical de-
learning. In Proc. the 36th IEEE International Confer- sign tuning: Workload as a sequence. In Proc. the 2006
ence on Data Engineering, Apr. 2020, pp.1501–1512. DOI: ACM SIGMOD International Conference on Manage-
10.1109/ICDE48307.2020.00133. ment of Data, Jun. 2006, pp.683–694. DOI: 10.1145/
[6] Han Y, Li G L, Yuan H T, Sun J. An autonomous mate- 1142473.1142549.
rialized view management system with deep reinforce- [19] Eadon G, Chong E I, Shankar S, Raghavan A, Srini-
ment learning. In Proc. the 37th IEEE International Con- vasan J, Das S. Supporting table partitioning by refer-
ference on Data Engineering, Apr. 2021, pp.2159–2164. ence in oracle. In Proc. the 2008 ACM SIGMOD Interna-
DOI: 10.1109/ICDE51399.2021.00217. tional Conference on Management of Data, Jun. 2008,
[7] Zhang H, Chen G, Ooi B C, Tan K L, Zhang M H. In- pp.1111–1122. DOI: 10.1145/1376616.1376727.
memory big data management and processing: A survey. [20] Hauglid J O, Ryeng N H, Nørvåg K. DYFRAM: Dynam-
IEEE Trans. Knowledge and Data Engineering, 2015, ic fragmentation and replica management in distributed
27(7): 1920–1948. DOI: 10.1109/TKDE.2015.2427795. database systems. Distributed and Parallel Databases,
[8] Mahmud M S, Huang J Z, Salloum S et al. A survey of 2010, 28(2): 157–185. DOI: 10.1007/s10619-010-7068-1.
data partitioning and sampling methods to support big [21] Curino C, Jones E, Zhang Y, Madden S. Schism: A work-
data analysis. Big Data Mining and Analytics, 2020, 3(2): load-driven approach to database replication and parti-
85–101. DOI: 10.26599/BDMA.2019.9020015. tioning. Proceedings of the VLDB Endowment, 2010,
[9] Sun L W, Franklin M J, Krishnan S, Xin R S. Fine- 3(1/2): 48–57. DOI: 10.14778/1920841.1920853.
grained partitioning for aggressive data skipping. In Proc. [22] Nehme R, Bruno N. Automated partitioning design in
the 2014 ACM SIGMOD International Conference on parallel database systems. In Proc. the 2011 ACM SIG-
366 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

MOD International Conference on Management of Data, MOD International Conference on Management of Data,
Jun. 2011, pp.1137–1148. DOI: 10.1145/1989323.1989444. Jun. 2020, pp.143–157. DOI: 10.1145/3318464.3389704.
[23] Pavlo A, Curino C, Zdonik S. Skew-aware automatic [35] Brendle M, Weber N, Valiyev M, May N, Schulze R,
database partitioning in shared-nothing, parallel OLTP Böhm A, Moerkotte G, Grossniklaus M. SAHARA: Mem-
systems. In Proc. the 2012 ACM SIGMOD International ory footprint reduction of cloud databases with automat-
Conference on Management of Data, May 2012, pp.61–72. ed table partitioning. In Proc. the 25th International Con-
DOI: 10.1145/2213836.2213844. ference on Extending Database Technology, Mar. 29–Apr.
[24] Liroz-Gistau M, Akbarinia R, Pacitti E et al. Dynamic 1, 2022. DOI: 10.5441/002/edbt.2022.02.
workload-based partitioning algorithms for continuously [36] Agrawal R, Srikant R. Fast algorithms for mining associa-
growing databases. In Transactions on Large-Scale Data- tion rules in large databases. In Proc. the 20th Interna-
and Knowledge-Centered Systems XII, Hameurlain A, tional Conference on Very Large Data Bases, Sept. 1994,
Küng J, Wagner R (eds.), Springer, 2013, pp.105–128. pp.487–499.
DOI: 10.1007/978-3-642-45315-1_5. [37] Ward J H Jr. Hierarchical grouping to optimize an objec-
[25] Quamar A, Kumar K A, Deshpande A. 2013. SWORD: tive function. Journal of the American Statistical Associa-
Scalable workload-aware data placement for transaction- tion, 1963, 58(301): 236–244. DOI: 10.1080/01621459.1963.
al workloads. In Proc. the 16th International Conference 10500845.
on Extending Database Technology, Mar. 2013, pp.430– [38] Roussopoulos N, Kelley S, Vincent F. Nearest neighbor
441. DOI: 10.1145/2452376.2452427. queries. In Proc. the 1995 ACM SIGMOD International
[26] Taft R, Mansour E, Serafini M, Duggan J, Elmore A J, Conference on Management of Data, May 1995, pp.71–79.
Aboulnaga A, Pavlo A, Stonebraker M. E-store: Fine- DOI: 10.1145/223784.223794.
grained elastic partitioning for distributed transaction [39] Sacca D, Wiederhold G. Database partitioning in a clus-
processing systems. Proceedings of the VLDB Endow- ter of processors. ACM Trans. Database Systems, 1985,
ment, 2014, 8(3): 245–256. DOI: 10.14778/2735508.2735514. 10(1): 29–56. DOI: 10.1145/3148.3161.
[27] Chen K J, Zhou Y L, Cao Y. Online data partitioning in [40] Copeland G, Alexander W, Boughter E, Keller T. Data
distributed database systems. In Proc. the 18th Interna- placement in Bubba. In Proc. the 1988 ACM SIGMOD
tional Conference on Extending Database Technology, International Conference on Management of Data, Jun.
Mar. 2015, pp.1–12. DOI: 10.5441/002/edbt.2015.02. 1988, pp.99–108. DOI: 10.1145/50202.50213.
[28] Zamanian E, Binnig C, Salama A. Locality-aware parti- [41] Stöhr T, Märtens H, Rahm E. Multi-dimensional database
tioning in parallel database systems. In Proc. the 2015 allocation for parallel data warehouses. In Proc. the 26th
ACM SIGMOD International Conference on Manage- International Conference on Very Large Data Bases, Sept.
ment of Data, May 2015, pp.17–30. DOI: 10.1145/2723372. 2000, pp.273–284.
2723718. [42] Bruno N, Chaudhuri S. An online approach to physical
[29] Fetai I, Murezzan D, Schuldt H. Workload-driven adap- design tuning. In Proc. the 23rd IEEE International Con-
tive data partitioning and distribution—The Cumulus ap- ference on Data Engineering, Apr. 2007, pp.826–835. DOI:
proach. In Proc. the 2015 IEEE International Conference 10.1109/ICDE.2007.367928.
on Big Data, Oct. 29–Nov. 1, 2015, pp.1688–1697. DOI: [43] Garcia-Alvarado C, Raghavan V, Narayanan S, Waas F
10.1109/BigData.2015.7363940. M. Automatic data placement in MPP databases. In
[30] Serafini M, Taft R, Elmore A J et al. Clay: Fine-grained Proc. the IEEE 28th International Conference on Data
adaptive partitioning for general database schemas. Pro- Engineering Workshops, Apr. 2012, pp.322–327. DOI: 10.
ceedings of the VLDB Endowment, 2016, 10(4): 445–456. 1109/ICDEW.2012.45.
DOI: 10.14778/3025111.3025125. [44] Karypis G, Kumar V. METIS: A software package for
[31] Marcus R, Papaemmanouil O, Semenova S, Garber S. partitioning unstructured graphs, partitioning meshes,
NashDB: An end-to-end economic method for elastic and computing fill-reducing orderings of sparse matrices.
database fragmentation, replication, and provisioning. In Technical Report, TR 97-061, Univeristy of Minnesota,
Proc. the 2018 International Conference on Management 1997. https://hdl.handle.net/11299/215346, Mar. 2024.
of Data, May 2018, pp.1253–1267. DOI: 10.1145/3183713. [45] Kuhn H W. The Hungarian method for the assignment
3196935. problem. In 50 Years of Integer Programming 1958-2008,
[32] Nam Y M, Kim M S, Han D. A graph-based database Jünger M, Liebling T M, Naddef D, Nemhauser G L, Pul-
partitioning method for parallel OLAP query processing. leyblank W R, Reinelt G, Rinaldi G, Wolsey L A (eds.),
In Proc. the 34th IEEE International Conference on Data Springer, 2010, pp.29–47. DOI: 10.1007/978-3-540-68279-
Engineering, Apr. 2018, pp.1025–1036. DOI: 10.1109/ICDE. 0_2.
2018.00096. [46] Costa E, Costa C, Santos M Y. Evaluating partitioning
[33] Parchas P, Naamad Y, Van Bouwel P, Faloutsos C, and bucketing strategies for hive-based big data ware-
Petropoulos M. Fast and effective distribution-key recom- housing systems. Journal of Big Data, 2019, 6(1): 34.
mendation for amazon redshift. Proceedings of the VLDB DOI: 10.1186/s40537-019-0196-1.
Endowment, 2020, 13(12): 2411–2423. DOI: 10.14778/3407 [47] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou
790.3407834. I, Wierstra D, Riedmiller M. Playing Atari with deep re-
[34] Hilprecht B, Binnig C, Röhm U. Learning a partitioning inforcement learning. arXiv: 1312.5602, 2013. https://arx-
advisor for cloud databases. In Proc. the 2020 ACM SIG- iv.org/abs/1312.5602, Mar. 2024.
Peng-Ju Liu et al.: Enhancing Storage Efficiency and Performance: Survey of Data Partitioning Techniques 367

[48] Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, 2513649.

Zdonik S, Jones E P C, Madden S, Stonebraker M, Zhang [61] Rodríguez L, Li X O, Cuevas-Rasgado A D, García-Lam-
Y, Hugg J, Abadi D J. H-store: A high-performance, dis- ont F. DYVEP: An active database system with vertical
tributed main memory transaction processing system. partitioning functionality. In Proc. the 10th IEEE Inter-
Proceedings of the VLDB Endowment, 2008, 1(2): 1496– national Conference on Networking, Sensing and Control,
1499. DOI: 10.14778/1454159.1454211. Apr. 2013, pp.457–462. DOI: 10.1109/ICNSC.2013.6548782.
[49] Hoffer J A, Severance D G. The use of cluster analysis in [62] Li L Z, Gruenwald L. SMOPD-C: An autonomous verti-
physical data base design. In Proc. the 1st International cal partitioning technique for distributed databases on
Conference on Very Large Data Bases, Sept. 1975, pp.69– cluster computers. In Proc. the 15th IEEE International
86. DOI: 10.1145/1282480.1282486. Conference on Information Reuse and Integration, Aug.
[50] Navathe S, Ceri S, Wiederhold G, Dou J L. Vertical par- 2014, pp.171–178. DOI: 10.1109/IRI.2014.7051887.
titioning algorithms for database design. ACM Trans. [63] Sun L W, Franklin M J, Wang J N, Wu E. Skipping-ori-
Database Systems, 1984, 9(4): 680–710. DOI: 10.1145/ ented partitioning for columnar layouts. Proceedings of
1994.2209. the VLDB Endowment, 2016, 10(4): 421–432. DOI: 10.
[51] Navathe S B, Ra M. Vertical partitioning for database de- 14778/3025111.3025123.
sign: A graphical algorithm. ACM SIGMOD Record, [64] Huang Y F, Lai C J. Integrating frequent pattern cluster-
1989, 18(2): 440–450. DOI: 10.1145/66926.66966. ing and branch-and-bound approaches for data partition-
[52] Chu W W, Ieong I T. A transaction-based approach to ing. Information Sciences, 2016, 328: 288–301. DOI: 10.
vertical partitioning for relational database systems. IEEE 1016/j.ins.2015.08.047.
Trans. Software Engineering, 1993, 19(8): 804–812. DOI: [65] Rodríguez-Mazahua L, Alor-Hernández G, Li X O, Cer-
10.1109/32.238583. vantes J, López-Chau A. Active rule base development for
[53] Ng V, Law D M, Gorla N, Chan C K. Applying genetic dynamic vertical partitioning of multimedia databases.
algorithms in database partitioning. In Proc. the 2003 Journal of Intelligent Information Systems, 2017, 48(2):
ACM Symposium on Applied Computing, Mar. 2003, 421–451. DOI: 10.1007/s10844-016-0420-9.
pp.544–549. DOI: 10.1145/952532.952639. [66] Durand G C, Pinnecke M, Piriyev R et al. GridForma-
[54] Hankins R A, Patel J M. Data morphing: An adaptive, tion: Towards self-driven online data partitioning using
cache-conscious storage technique. In Proceedings 2003 reinforcement learning. In Proc. the 1st International
VLDB Conference, Freytag J C, Lockemann P, Abite- Workshop on Exploiting Artificial Intelligence Tech-
boul S, Carey M, Selinger P, Heuer A (eds.), Elsevier, niques for Data Management, Jun. 2018, Article No. 1.
2003, pp.417–428. DOI: 10.1016/B978-012722442-8/50044- DOI: 10.1145/3211954.3211956.
6. [67] Durand G C, Piriyev R, Pinnecke M et al. Automated
[55] Papadomanolakis S, Ailamaki A. AutoPart: Automating vertical partitioning with deep reinforcement learning. In
schema design for large scientific databases using data New Trends in Databases and Information Systems,
partitioning. In Proc. the 16th International Conference Welzer T et al. (eds.), Springer, 2019, pp.126–134. DOI:
on Scientific and Statistical Database Management, Jun. 10.1007/978-3-030-30278-8_16.
2004, pp.383–392. DOI: 10.1109/SSDM.2004.1311234. [68] Liu P J, Li H Y, Wang T Y et al. Multi-stage method for
[56] Agrawal S, Narasayya V, Yang B. Integrating vertical online vertical data partitioning based on spectral cluster-
and horizontal partitioning into automated physical ing. Journal of Software, 2023, 34(6): 2804–2832. DOI: 10.
database design. In Proc. the 2004 ACM SIGMOD Inter- 13328/j.cnki.jos.006496.
national Conference on Management of Data, Jun. 2004, [69] Grund M, Krüger J, Plattner H, Zeier A, Cudre-Mauroux
pp.359–370. DOI: 10.1145/1007568.1007609. P, Madden S. HYRISE: A main memory hybrid storage
[57] Gorla N, Betty P W Y. Vertical fragmentation in engine. Proceedings of the VLDB Endowment, 2010, 4(2):
databases using data-mining technique. International 105–116. DOI: 10.14778/1921071.1921077.
Journal of Data Warehousing and Mining (IJDWM), [70] Jindal A, Quiané-Ruiz J A, Dittrich J. Trojan data lay-
2008, 4(3): 35–53. DOI: 10.4018/jdwm.2008070103. outs: Right shoes for a running elephant. In Proc. the 2nd
[58] Rodríguez L, Li X O. A support-based vertical partition- ACM Symposium on Cloud Computing, Oct. 2011, Arti-
ing method for database design. In Proc. the 8th Interna- cle No. 21. DOI: 10.1145/2038916.2038937.
tional Conference on Electrical Engineering, Computing [71] Gu X Y, Yang X F, Wang W P, Jin Y, Meng D. CHAC:
Science and Automatic Control, Oct. 2011. DOI: 10.1109/ An effective attribute clustering algorithm for large-scale
ICEEE.2011.6106682. data processing. In Proc. the 7th International Confer-
[59] Jindal A, Dittrich J. Relax and let the database do the ence on Networking, Architecture, and Storage, Jun.
partitioning online. In Proc. the 5th International Work- 2012, pp.94–98. DOI: 10.1109/NAS.2012.16.
shop on Enabling Real-Time Business Intelligence, Sept. [72] Arulraj J, Pavlo A, Menon P. Bridging the archipelago
2011, pp.65–80. DOI: 10.1007/978-3-642-33500-6_5. between row-stores and column-stores for hybrid work-
[60] Li L Z, Gruenwald L. Self-managing online partitioner for loads. In Proc. the 2016 International Conference on Man-
databases (SMOPD): A vertical database partitioning sys- agement of Data, Jun. 2016, pp.583–598. DOI: 10.1145/
tem with a fully automatic online approach. In Proc. the 2882903.2915231.
17th International Database Engineering & Applications [73] Athanassoulis M, Bøgh K S, Idreos S. Optimal column
Symposium, Oct. 2013, pp.168–173. DOI: 10.1145/2513591. layout for hybrid workloads. Proceedings of the VLDB
368 J. Comput. Sci. & Technol., Mar. 2024, Vol.39, No.2

Endowment, 2019, 12(13): 2393–2407. DOI: 10.14778/ Peng-Ju Liu received his B.S. de-
3358701.3358707. gree in information management and
[74] McCormick W T, Schweitzer P J, White T W. Problem
information system from Dalian Mar-
decomposition and data reorganization by a clustering
technique. Operations Research, 1972, 20(5): 993–1009.
itime University, Dalian, in 2020. He
DOI: 10.1287/opre.20.5.993. is currently pursuing his Ph.D. degree
[75] Ailamaki A, DeWitt D J, Hill M D, Skounakis M. Weav- at the School of Information, Renmin
ing relations for cache performance. In Proc. the 27th In-
University of China, Beijing. His re-
ternational Conference on Very Large Data Bases, Sept.
2001, pp.169–180. search interests include adaptable data partitioning,
[76] Li L Z, Gruenwald L. Autonomous database partitioning load forecasting, and learning-based query optimization.
using data mining on single computers and cluster com-
puters. In Proc. the 16th International Database Engi-
neering & Applications Sysmposium, Aug. 2012, pp.32–41. Cui-Ping Li is currently a profes-
DOI: 10.1145/2351476.2351481.
sor at Renmin University of China,
[77] van Hasselt H, Guez A, Silver D. Deep reinforcement
learning with double Q-learning. In Proc. the 30th AAAI Beijing. She received her Ph.D. de-
Conference on Artificial Intelligence, Feb. 2016, pp.2094– gree from Chinese Academy of Sci-
2100. DOI: 10.1609/aaai.v30i1.10295. ences, Beijing, in 2003. Before that,
[78] Jindal A, Palatinus E, Pavlov V, Dittrich J. A compari-
she received her B.S. and M.S. de-
son of knives for bread slicing. Proceedings of the VLDB
Endowment, 2013, 6(6): 361–372. DOI: 10.14778/2536336. grees from Xi'an Jiaotong University,
2536338. Xi'an, in 1994 and 1997, respectively. She received the
[79] Al-Kateb M, Sinclair P, Au G, Ballinger C. Hybrid row- Second Prize of the National Award for Science and
column partitioning in Teradata®. Proceedings of the
VLDB Endowment, 2016, 9(13): 1353–1364. DOI: 10.14778/
Technology Progress in 2018. Her main research inter-
3007263.3007273. ests include social network analysis, social recommenda-
[80] Pinnecke M, Durand G C, Broneske D, Zoun R, Saake G. tion, and big data analysis.
GridTables: A One-Size-Fits-Most H2TAP data store.
Datenbank-Spektrum, 2020, 20(1): 43–56. DOI: 10.1007/
s13222-019-00330-x.
Hong Chen is currently a professor
[81] Kang D H, Jiang R C, Blanas S. Jigsaw: A data storage
and query processing engine for irregular table partition- at Renmin University of China, Bei-
ing. In Proc. the 2021 International Conference on Man- jing. She received her Ph.D. degree
agement of Data, Jun. 2021, pp.898–911. DOI: 10.1145/ from Chinese Academy of Sciences,
3448016.3457547.
Beijing, in 2000. Before that, she re-
[82] Abebe M, Lazu H, Daudjee K. Proteus: Autonomous
adaptive storage for mixed workloads. In Proc. the 2022 ceived her B.S. and M.S. degrees from
International Conference on Management of Data, Jun. Renmin University of China, Beijing,
2022, pp.700–714. DOI: 10.1145/3514221.3517834. in 1986 and 1989, respectively. She received the Second
[83] Wang J Y, Chai C L, Liu J B, Li G L. FACE: A normal-
Prize of the National Award for Science and Technolo-
izing flow based cardinality estimator. Proceedings of the
VLDB Endowment, 2021, 15(1): 72–84. DOI: 10.14778/ gy Progress in 2018. Her research interests include
3485450.3485458. database technology and high-performance computing.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Headlight Wiring
No ratings yet
Headlight Wiring
127 pages
Natgeo-Formation-Of-Earth-2000002398-Article Quiz and Answers
No ratings yet
Natgeo-Formation-Of-Earth-2000002398-Article Quiz and Answers
4 pages
Iceberg Table Formats and Analytics: Definitive Reference for Developers and Engineers
From Everand
Iceberg Table Formats and Analytics: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
From Everand
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Partitioning PDF
No ratings yet
Partitioning PDF
5 pages
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Bigtable Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Bigtable Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PrestoDB in Practice: Definitive Reference for Developers and Engineers
From Everand
PrestoDB in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Science Workflows with Vaex: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Science Workflows with Vaex: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
From Everand
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Hive Architecture and Query Language: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Parquet Engineering: Definitive Reference for Developers and Engineers
From Everand
Practical Parquet Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
B-Tree Algorithms and Applications: Definitive Reference for Developers and Engineers
From Everand
B-Tree Algorithms and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Analytics with ClickHouse: Definitive Reference for Developers and Engineers
From Everand
Efficient Analytics with ClickHouse: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Database Partitioning A Review Paper
No ratings yet
Database Partitioning A Review Paper
4 pages
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
Optimizing Big Data Queries with LLAP: Definitive Reference for Developers and Engineers
From Everand
Optimizing Big Data Queries with LLAP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
From Everand
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
From Everand
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Datastore Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Datastore Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DataFusion: Query Execution with Rust and Arrow: The Complete Guide for Developers and Engineers
From Everand
DataFusion: Query Execution with Rust and Arrow: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Aerospike Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Aerospike Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Virtuoso Database Systems: The Complete Guide for Developers and Engineers
From Everand
Virtuoso Database Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
From Everand
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AVL Trees: Algorithms and Balanced Data Structures
From Everand
AVL Trees: Algorithms and Balanced Data Structures
Richard Johnson
No ratings yet
Slide 5
No ratings yet
Slide 5
43 pages
Apache Arrow Dataset in Practice: The Complete Guide for Developers and Engineers
From Everand
Apache Arrow Dataset in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Time-Series Data Management with TimescaleDB: The Complete Guide for Developers and Engineers
From Everand
Efficient Time-Series Data Management with TimescaleDB: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
IEEEBig Data 2020
No ratings yet
IEEEBig Data 2020
10 pages
How To Partition PostgreSQL Database
No ratings yet
How To Partition PostgreSQL Database
8 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GreptimeDB Essentials: The Complete Guide for Developers and Engineers
From Everand
GreptimeDB Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
An Optimized Scheme For Vertical Partitioning of A
No ratings yet
An Optimized Scheme For Vertical Partitioning of A
8 pages
Applied Hudi Systems: Definitive Reference for Developers and Engineers
From Everand
Applied Hudi Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Study of Building the Data Warehouse
From Everand
The Study of Building the Data Warehouse
venkateswara Rao
No ratings yet
Table Partitioning:: Secret Weapon For Big Data Problems
No ratings yet
Table Partitioning:: Secret Weapon For Big Data Problems
46 pages
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
From Everand
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Crafting Data-Driven Solutions: Core Principles for Robust, Scalable, and Sustainable Systems
From Everand
Crafting Data-Driven Solutions: Core Principles for Robust, Scalable, and Sustainable Systems
Peter Jones
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Parallel Databases
No ratings yet
Parallel Databases
19 pages
Partition Table in STARS Concept and Evaluations
No ratings yet
Partition Table in STARS Concept and Evaluations
8 pages
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Tarantool Architecture and Development: Definitive Reference for Developers and Engineers
From Everand
Tarantool Architecture and Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Presto in Practice: Definitive Reference for Developers and Engineers
From Everand
Presto in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
ICDE 2018 A Graph-Based Database Partitioning Method For Parallel OLAP Query Processing
No ratings yet
ICDE 2018 A Graph-Based Database Partitioning Method For Parallel OLAP Query Processing
12 pages
Randomized Sketches for Quantile in LSM-tree Based Store
No ratings yet
Randomized Sketches for Quantile in LSM-tree Based Store
26 pages
Rethinking the Compaction Policies in LSM-trees
No ratings yet
Rethinking the Compaction Policies in LSM-trees
26 pages
Pilot Db
No ratings yet
Pilot Db
23 pages
On The Reasonable Effectiveness of Relational Diagrams-模板
No ratings yet
On The Reasonable Effectiveness of Relational Diagrams-模板
27 pages
Sigmod - 15 - Locality-Aware Partitioning in Parallel Database Systems
No ratings yet
Sigmod - 15 - Locality-Aware Partitioning in Parallel Database Systems
14 pages
IntellSys2021 Camera Ready
No ratings yet
IntellSys2021 Camera Ready
9 pages
Ailibaba-Time-Series DB
No ratings yet
Ailibaba-Time-Series DB
13 pages
ICDM2023 Camera Ready
No ratings yet
ICDM2023 Camera Ready
10 pages
RecSys2023 Camera Ready
No ratings yet
RecSys2023 Camera Ready
7 pages
SDM2020 Camera Ready
No ratings yet
SDM2020 Camera Ready
9 pages
Camry - EF932 - Instructions - For - Use - Manual 21
No ratings yet
Camry - EF932 - Instructions - For - Use - Manual 21
8 pages
Q.18604 Cummin Genset Nta 855 - 1
100% (2)
Q.18604 Cummin Genset Nta 855 - 1
1 page
Q1-DLL-WK-7 - October 9-13-2023-2024
No ratings yet
Q1-DLL-WK-7 - October 9-13-2023-2024
5 pages
Photoluminescence FBG
No ratings yet
Photoluminescence FBG
13 pages
3.1 Tuple Relational Calculus
No ratings yet
3.1 Tuple Relational Calculus
11 pages
IELTS Writing Task 2
No ratings yet
IELTS Writing Task 2
34 pages
Features
No ratings yet
Features
7 pages
Conjugate Beam Method SLU
No ratings yet
Conjugate Beam Method SLU
41 pages
Calcium Carbonate
33% (3)
Calcium Carbonate
1 page
Chapter-4: Operations, Material and Maketing Management: Definition & Importance of Operational Management
No ratings yet
Chapter-4: Operations, Material and Maketing Management: Definition & Importance of Operational Management
47 pages
Parkinson Disease & ALS Cheat Sheet
No ratings yet
Parkinson Disease & ALS Cheat Sheet
4 pages
Application of Six Sigma With Respect To Abbott Laboratories.
100% (1)
Application of Six Sigma With Respect To Abbott Laboratories.
17 pages
Pebeo
No ratings yet
Pebeo
1 page
Mahindra Holidays & Resorts India LTD - Buy
No ratings yet
Mahindra Holidays & Resorts India LTD - Buy
3 pages
Strat Sim
No ratings yet
Strat Sim
289 pages
Briere ITCT-A Final PDF
No ratings yet
Briere ITCT-A Final PDF
119 pages
Amanda Mcelvany Position Paper Final
No ratings yet
Amanda Mcelvany Position Paper Final
6 pages
Compact, High-Flow, Electric Remote Controlled Water Monitor
No ratings yet
Compact, High-Flow, Electric Remote Controlled Water Monitor
2 pages
Amine Unit
100% (1)
Amine Unit
69 pages
User Manual 3948368
No ratings yet
User Manual 3948368
4 pages
Instant Download Understanding Race and Crime 1st Edition Colin Webster PDF All Chapter
100% (3)
Instant Download Understanding Race and Crime 1st Edition Colin Webster PDF All Chapter
84 pages
B10x Technical Reference 1.4
No ratings yet
B10x Technical Reference 1.4
29 pages
Telangana State Report 10-05-2022
No ratings yet
Telangana State Report 10-05-2022
34 pages
Sains (Kertas 2) PMR Perak
No ratings yet
Sains (Kertas 2) PMR Perak
17 pages
Schema de Principe Electrical Schematic
No ratings yet
Schema de Principe Electrical Schematic
78 pages
Lesson 4 (Computer Maintenance)
No ratings yet
Lesson 4 (Computer Maintenance)
4 pages
1.0 Introduction To Biochemistry and Cellular Organization
No ratings yet
1.0 Introduction To Biochemistry and Cellular Organization
6 pages
The Relationship of Endodontic-Periodontic Lesions
No ratings yet
The Relationship of Endodontic-Periodontic Lesions
7 pages

Data Partition Survey

Uploaded by

Data Partition Survey

Uploaded by

Liu PJ, Li CP, Chen H. Enhancing storage efficiency and performance: A survey of data partitioning techniques.

Enhancing Storage Efficiency and Performance: A Survey of Data

Peng-Ju Liu (刘鹏举), Cui-Ping Li* (李翠平), Distinguished Member, CCF

School of Information, Renmin University of China, Beijing 100872, China

E-mail: [email protected]; [email protected]; [email protected]

Received June 21, 2023; accepted February 29, 2024.

1 Introduction across various disks to facilitate better disk collabora-

©Institute of Computing Technology, Chinese Academy of Sciences 2024

①https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html, Mar. 2024.

Deployment Scenario Partition Generation Module

Environment Feature Storage Device Two-Phase Method

Database Initialization Phase Empirical-

Fig.2. Overview of a modular framework for data partitioning technologies.

Table 2. Classification of Optimization Objectives Table 4. Classification of Partitioning Generation Methods

Table 6. Classification of Data Migration Plans ing Q over P is denoted by C(ϕ, Q) .

and 3.4), and the design of the HP-wise cost models

3.1 Formalization 3.2 Feature Extraction

ML-Based DL-Based Greedy -Based MP-Based Empirical -Based

running Fi over the given partition P . The total ∑

SKt (P, Q) × Ntxn

ML-Based DL-Based Greedy-Based MP-Based Empirical-Based

Navathe89 GA Agrawal04 Lisbeth Smopd HYF

the given environment. Initially, they used Hoffer's   7

tioning to recursively split each candidate CG into          8

Table 12. Major Vertical Partitioning Strategies for Centralized Environments

Table 13. Major Vertical Partitioning Strategies for Distributed Environments

5.1 Formalization merges or splits these segments for adapting block

(a) (b) (c) (d)

③https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html, Mar. 2024.

[48] Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, 2513649.

You might also like