Enhanced Schemes For Data Fragmentation, Allocation, and Replication in Distributed Database Systems
Enhanced Schemes For Data Fragmentation, Allocation, and Replication in Distributed Database Systems
With the growth of information technology and computer networks, there is a vital need for optimal design of distributed databases with the aim of performance
improvement in terms of minimizing the round-trip response time and query transmission and processing costs. To address this issue, new fragmentation,
data allocation, and replication techniques are required. In this paper, we propose enhanced vertical fragmentation, allocation, and replication schemes
to improve the performance of distributed database systems. The proposed fragmentation scheme clusters highly-bonded attributes (i.e., normally accessed
together) into a single fragment in order to minimize the query processing cost. The allocation scheme is proposed to find an optimized allocation to
minimize the round-trip response time. The replication scheme partially replicates the fragments to increase the local execution of queries in a way that
minimizes the cost of transmitting replicas to the sites. Experimental results show that, on average, the proposed schemes reduce the round-trip response
time of queries by 23% and query processing cost by 15%, as compared to the related work.
a network site and the time when the response to the query is the proposed approach is explained in Section 4. Section 5
received at the query source. In other words, round-trip response presents the simulation results and the validation of the proposed
time consists of transmitting a query from its source site to a approach as well as a comparative. Finally, Section 6 concludes
server site, processing the query and generating a response at the paper.
the server site, and transmitting the response back to the source
site [14]. Also, most works have proposed heuristic algorithms
for addressing DDB fragmentation and allocation, and only a 2. RELATED WORK
few have proposed mathematical programming formulations.
In this paper, we address the three above mentioned issues Fragmentation and allocation have been known as the main
in a distributed database by proposing vertical fragmentation, procedures for reliable performance and an efficient design for
allocation, and replication schemes to minimize the round-trip a distributed database and investigated in many articles. The
response time. The contributions are summarized as follows: work in [4] developed an integrated methodology for frag-
mentation and allocation, which incorporated concurrence
– We propose a vertical fragmentation scheme that partitions control and communication network cost in distributed environ-
the data into fragments such that bonded attributes (i.e., ments. Authors in [11] proposed a clustering-based technique
accessed together by the queries) locate in a single for vertical fragmentation and allocation in distributed database
fragment, thus reducing the access cost to those attributes systems. Their proposed scheme created query clusters to form
by the queries. The proposed scheme utilizes a weighted fragments. They assume that each fragment is a set of attributes
graph (with attributes as the vertices and the bonds accessed together by a particular query Similarly, authors in [12]
between attributes as edges) and partitions it to subgraphs proposed a heuristic approach to reduce transmission costs of
(i.e., fragments) with maximum connectivity between the distributed queries. They also proposed a site clustering algo-
relevant vertices (i.e., attributes) of each partition. The rithm to ensures the creation of highly-balanced clusters. They
graph partitioning is done in a way that prevents the creation also suggested several advanced allocation scenarios with data
of too small or too big partitions. The fragmentation scheme replication consideration. The work in [17] proposed a new
aims to minimize the query processing cost. More details vertical fragmentation algorithm using a graphical technique and
can be found in Section 4.1. an Attribute Usage Matrix (AUM), which represents the essential
queries whose primary purpose, unlike iterative binary partition-
– We propose a static allocation scheme that takes advantage
ing methods, is to create all fragments by one iteration. The work
of simulated annealing metaheuristic technique to solve
in [18] proposed an algorithm to measure the similarity between
the NP-hard problem of optimized allocation to minimize
any pair of attributes. This method clusters attributes into sub-
the round-trip response time. Instead of considering a
relations, which are called fragments. For this purpose, the
random allocation sequence for the initial allocation in the
relations are divided into sub-relations at the design cycle. The
simulated annealing process, the allocation scheme creates
works in. [19, 20] presented an objective function to evaluate
a targeted initial allocation pattern by considering the
the “goodness” of fragmentation algorithms. The work in [21]
fragment access ratios and allocating fragments with higher
developed a vertical fragmentation approach where an attribute
access ratios to the sites with higher processing speeds. This
affinity table was used as input to the proposed approach. A
targeted initialization will decrease the required time for the
dynamic table for fragmentation and allocation was proposed
simulated annealing algorithm to find the optimal allocation
in [22], which monitors the access pattern of network sites to
pattern. Moreover, in order to avoid the bottleneck problem
data tables and utilizes it to perform fragmentation, replication,
(allocating high demand fragments to low processing speed
and re-allocation to maximize the number of local accesses. The
sites), the proposed allocation scheme considers the site
work in [23] presented a mathematical optimization model
capacity constraints.
called DFAR that unifies the fragmentation, allocation and
– We propose a replication scheme that performs partial dynamical migration of data in distributed database systems
replication of fragments to increase the local execution of considering the storage capacity of network sites. Their model
queries. The fragments are replicated in a way to minimize utilizes the Threshold Accepting algorithm to solve the DFAR
the cost of transmitting replicas to the sites. In order to problem. The works in [14–16] presented a new mathematical
find an optimal replication solution, the proposed scheme model for the fragmentation and the allocation problem, called
utilizes the simulated annealing technique and considers VFA-RT, which aims to minimize the round-trip response time
two constraints: of queries. VFA-RT model is made of a non-linear objective
function and a group of constraints. In order to solve the model,
i) a fragment is replicated to a site only if there is a need Threshold Accepting (TA) and Tabu Search (TS) metaheuristic
for the fragment on that site and algorithms were used. The work in [24] proposed the Adaptive
ii) the fragment is replicated to a site only if the site Distributed Request Window Algorithm (ADRW) to achieve
capacity constraint is preserved. fragmentation and dynamic allocation of data. This approach
is compatible with the access patterns changes of requests for
The remainder of the paper is organized as follows: Section 2 attributes and makes decisions on the replications according to
reviews the works concentrating on vertical fragmentation, their “read/write” requests for data and total servicing part. The
allocation, and replication in distributed database design. purpose of this algorithm is to adjust data allocation patterns to
Section 3 expresses the preliminaries and basic concepts reduce the total servicing cost of the full read/write requests of
referenced throughout the paper. The detailed description of data. The work in [25] presented a genetic algorithm approach
Reference
[20]
[19]
[18]
[26]
[14]
[11]
[23]
[28]
[10]
[12]
[4]
Fragmentation * * * * * *
Allocation * *
Problem
Fragmentation + Allocation * * * * *
addressed
Re-allocation
Replication * * *
Query Transmission Cost * * * * *
Value to
Query Processing Cost * * *
minimize Round-trip Response Time * * *
to solve the combined problems of vertical fragmentation and Moreover, most of the works have concentrated on minimizing
access path selection. Choosing the access path is a kind of the query processing/transmission cost, and only a few (e.g.,
mechanism which is capable of conducting an effective search [14]) have dealt with round trip time minimization. In our
for the physical sites of data. The work in [26] presented a new proposed allocation scheme, we also consider minimizing this
heuristic approach for fragmentation. This approach reduces the parameter. It is worth mentioning that our work is different from
transfer cost of fragments to different sites using a mathematical the works in [14–16] since we address the replication problem
model. In this approach, fragmentation and allocation are done as well as the fragmentation and allocation problems. Moreover,
simultaneously. Authors in [7] propose a linear approach to we have some innovations in the fragmentation and allocation
distributed database optimization that gathers incremental online schemes compared to other related works. More detail about the
knowledge about data access patterns and database statistics proposed schemes can be found in Section 4.
for online re-allocation of the fragments in order to continually
optimize the query response time. In [6], the authors proposed a
method based on a particle swarm optimization algorithm to 3. PRELIMINARIES AND ASSUMPTIONS
solve the data allocation problem that aims to minimize the
query execution time and transaction cost. In [27,28] authors 3.1 Basic Definitions:
discussed the data allocation issue in the purpose of minimizing
data transmission across network sites using an ant colony Vertical fragmentation divides an original relation (DB table)
optimization algorithm. The proposed procedure in [8] was into some sub-relations (fragments) in a way that the combi-
a vertical fragmentation model with the two-phase allocation nation of the fragments generates the primary relation [1]. If
process. Unlike most earlier studies, the tradeoffs between R denotes a relation with a set of attributes (columns) A =
different allocation scenarios were discussed for finding an {A1 .A2 . . . . A L }, vertical fragmentation is partitioning R into
optimal way of attribute assignment over sites. However, the some sub relations Fi , such that Fi s are derived from Equation 1:
model presented in [9] was an extension for [8] and could
considerably reduce communication costs and query response Fi = R ∀ Ai ∈ A
P K ,Ai (1)
time. R = F1 ∞F2 ∞ . . . ∞FN
The work in [5] considered the data allocation problem
in distributed databases where the query execution strategy Where is the projection operator of relational algebra
affects allocation decisions. Authors in [29] propose a vertical [1], and PK is the primary key that should be replicated in
partitioning algorithm that uses graphical techniques and starts all fragments. Relation R should also be reconstructable by
from the attribute affinity matrix by considering it as a applying the join operator ∞ on the resultant sub-relations (i.e.,
complete graph. Then, forming a linearly connected spanning fragments), as illustrated in the above Equation. So, vertical
tree, it generates all meaningful fragments simultaneously by fragmentation on a relation R is defined as determining sub-
considering a cycle as a fragment. relations F1 , F2 , . . . , FN , such that query execution cost is
Table 1 summarizes the most relevant and recent works optimized concerning some criterion (here, minimizing the
discussed above. As can be observed, many works have query processing cost).
only dealt with the fragmentation problem, and many works Since vertical partitioning puts in one fragment those attributes
have addressed the integration of fragmentation and allocation usually accessed together, there is a need for some measure
problems. However, few works have considered addressing that would define more precisely the notion of “togetherness”
the fragmentation, allocation, and replication problems inte- [1]. Query execution frequency (f) and access frequency (the
grally. Replicating fragments has been shown to result in more frequency of accessing an attribute by a query) are two crucial
reliability, accessibility, traffic reduction of network, increase factors that define this notion. For each query Q i (1 ≤ i ≤ K )
of scalability, and better performance compared to the lack of and each attribute A j (1 ≤ j ≤L), we associate an attribute
replications [1, 22, 29]. In this article, we are going to propose access value, which equals to 1 if query Q i references attribute
solutions for all these three issues. A j , and zero otherwise. The set of all access values can be
represented by a K × L matrix called AAM1 as expressed by that in a typical environment, there may be many queries
Equation 2. being executed. However, typically, only important queries (for
example, 20% of the whole active queries that have made 80%
1, if an attribute A j is accessed by Q i of data accesses) have been taken into consideration [1]. Table 2
A AM(Q i , A j ) =
0, Other wi se presents a detailed description of the notations used in the paper.
(2)
Similarly, we define an attribute bond value that measures
the strength of an imaginary bond between the two attributes. 3.2 Assumptions:
Attribute bond value represents the number of times two
attributes are accessed together by all queries at all sites. The In this paper, we assume of having a client-server architecture
set of all bond values can be represented by a L × L matrix where the server is responsible for performing the proposed
called ABM2 , as expressed by Equation 3. fragmentation and allocation schemes, and clients (i.e., sites)
store the fragments that are defined and allocated by the
⎧ K
⎪
⎨ q=1 A AM(Q q , Ai )∗
server. We also assume a static environment in which, the
AB M(Ai , A j ) = queries that are to be performed are read-only (i.e., do not
A AM(Q q , A j ) i = j (3)
⎪
⎩ modify the database) and are known beforehand (i.e., there
0, Other wi se exists information about what queries are going to be performed
on what sites and what attributes are going to be accessed by
Attributes that are accessed by queries are called relevant
these queries). We also assume that fragments are disjoint for
attributes, and every fragment that contains most of the relevant
all attributes except for the primary key PK, which should be
attributes is defined as the local fragment [25]. Wi j shows
repeated in all fragments of a relation (for reconstruction).
the number of attributes existing in local fragment Fi and to
which the query Q j accesses. The number of attributes not
locating in the local fragment Fi must be accessed remotely
by query Q j in fragment Fr are defined by Rir j . Note
3.3 Cost Model
1 Attribute Access Matrix As mentioned previously, vertical fragmentation and allocation
2 Attribute Bond Matrix are to be done to minimize the query processing costs. The
cost of a distributed query processing can be expressed by two 4.1 Vertical Fragmentation Scheme
factors: local query processing cost (cost of accessing irrelevant
local attributes) and remote query processing cost (the cost of The fragmentation scheme is responsible for dividing a database
accessing the remote relevant attributes). In this article, we into fragments to minimize the query response cost. As
consider the cost model in which, the cost of executing operations mentioned in Section 3.3, the query processing cost is affected
such as select, project, and join are not considered. In other by the cost of accessing irrelevant local attributes and the cost
words, since CPU time is negligible in comparison with I/O of accessing relevant remote attributes. So, in order to reduce
time, we do not consider the processing cost (which includes these costs, it is required that relevant attributes which are
the time of executing operations such as select, and join) and accessed together by the queries are located in a fragment. The
only consider the cost of accessing the attributes by these intuition behind this idea is that fragmenting relevant attributes
operations. together will decrease the number of irrelevant attributes within
In vertical fragmentation, a query does not usually require that fragment, thus reducing the irrelevant access cost. Besides,
retrieving all the attributes of a fragment during query process- locally fragmenting relevant attributes reduces the need for a
ing. Each attribute that is not required by a query but exists query to access them remotely, thus reducing the relevant remote
in the local fragment causes irrelevant local attribute access attribute cost. The fragmentation process, as proposed by this
cost. Attributes that are not required to be accessed by a query component, is as follows.
(but accessed because they reside within the retrieved fragment) In order to identify the attributes that are to be located within
are called irrelevant attributes. The existence of the irrelevant a fragment, we make use of AAM. As mentioned previously in
attributes in the local fragment may lead to the growth of the Section 3, AAM defines whether a query accesses an attribute or
number of local access. This, in turn, may result in the rise of not. Next, ABM is constructed using Equation 3. Remember that
the number of disk access, and hence, the local query processing ABM defines the number of times two attributes are accessed
cost increases. Equation 4 expresses the irrelevant local attribute together by all queries running on a site. A more detailed
access cost, as described in [17]: description of AAM, ABM and bond values has been presented
previously in Section 3.
N K
|Wi j | In the next step, Graph G is created based on ABM in
Costlocal = f j2 × |Wi j | × (4)
ni which, vertices resemble attributes and edges connect those
i=1 j =1
two vertices (i.e., attributes) that are bonded together. The
Similarly, there are attributes that are required by the queries weight of an edge between two bond attributes Ai and A j
but do not exist in the local fragment. These attributes are called is obtained from ABM [i, j]. Once the graph is created, it is
relevant remote attributes. A greater number of relevant attributes partitioned into subgraphs. As mentioned above, putting highly-
that are in the remote fragments may also lead to an increase in bonded attributes in one partition (i.e., fragment) results in the
the remote query processing cost. [17] Equation 5 expresses the reduction of access cost. So, partitioning is done in a way
relevant remote attribute access cost [19]: that each subgraph has the maximum bond values between its
vertices. These subgraphs are then considered as fragments. So,
K N
|Rir j | in order to do the partitioning, at the first step, we find the
Costremot e = min f j2 × |Rir j | × (5) edges with the lowest weights and remove them from the graph
i=1,N n ir j
j =1 r=1 provided that it does not lead to the graph disconnection. This
r=i
process can be expressed as finding a maximal spanning tree for
So, the total query processing cost, denoted by TCost, is the graph, which connects all the graph vertices and includes the
expressed by Equation 6, as mentioned in [19]. This parameter edges with higher weights (i.e., bond values).
will be further used in Section 5 in order to evaluate the Once the maximal spanning tree is constructed, we begin
performance of the proposed fragmentation scheme. partitioning it to subgraphs. A useful parameter in partitioning
is the partition size. Partition size is defined as the number of
T Cost = Costlocal + Costremot e (6) attributes that reside inside a partition. If the partition size is
too large, it leads to an increase in the irrelevant local attribute
access cost. The same stands for the partition size being too
4. PROPOSED VERTICAL small, which leads to the increase in the relevant remote attribute
FRAGMENTATION, ALLOCATION access cost. In order to create subgraphs, we start removing the
AND REPLICATION SCHEMES edges with the lowest weights. If there are multiple edges with
equal weights, we remove the edge that partitions the graph into
In the following, we describe the proposed fragmentation, sub-partitions (i.e., subgraphs) with the least difference in their
allocation, and replication schemes. The fragmentation scheme partition size. This will prevent the creation of too small or too
partitions attributes into fragments to minimize the query large partitions. The partitioning is done until N − 1 edges are
processing cost. The allocation component then optimally removed, resulting in the creationof N subgraphs. Each subgraph
allocates the fragments to the sites to minimize the round- is considered as a fragment.
trip response time. Once the allocation is done, data that are The output of the fragmentation component is the AFM3
commonly accessed by queries are replicated on the query’s (expressed in Equation 7), which is a L × N matrix that shows
local site to increase the locality of reference and reduce the whether an attribute belongs to a fragment or not.
communication cost. A detailed description of each component
3 Attribute Fragment Matrix
has been presented below.
1, if an attribute A j belongs to Fragment F j Let us assume that at j denotes whether attribute At is allocated
AFM(Ai , F j ) = on site S j or not (if yes, equals to 1; if not, equals to zero), lt is
0, Other wi se
(7) the length of attribute At in bytes, and CA is the cardinality of
the relation R, then the mean size of all fragments on site S j (in
bytes), denoted as µ S j is calculated by Equation 9.
4.2 Allocation Scheme
L
Once fragments are created, the next step is allocating the µS j = C A l t · at j (9)
t =1
fragments to the sites. Such allocation is done in a way to
minimize the round-trip response time. In other words, we are The above-mentioned capacity constraint is then defined by
looking for an optimal fragment allocation that minimizes the Equation 10.
round-trip response time.
µS j ≤ C S j , ∀ j ; 1 ≤ j ≤ S N (10)
As described previously, round trip response time is defined
as the time elapsed between the arrival of a query to a site This constraint should be considered both in the formation of
and the time the query response is received at the query the initial allocation pattern (as the initial solution) and during
source. In other words, the average round-trip response time the execution of the SA algorithm to find the optimal allocation
using is described by three terms: average transmission delay pattern.
of queries incurred by their transmission from query sources to
the servers, average processing delay of queries at the servers, 4.2.2 Fragment Prioritization
and average transmission delay of queries response back to their
sources. Specifically, the objective function, as described in Consider a situation in which, there is a fragment that is widely
[14], is minimizing round trip response time (RRT) described as accessed by a large number of queries. If such a fragment is
Equation 8. allocated to a site with low processing capability, it results in
⎡ an increase in processing delay, which contradicts the allocation
objective (i.e., minimizing the round-trip response time). So,
1 ⎣ 1
R RT = Mq C i j
there is a need to compute the access ratio of each fragment
j q i fqi y j q
ij −1 (by all queries) and give allocation priority to those with higher
q f qi y j q
⎤ access ratios. In order to do so, the following steps should be
1 1 ⎦ done:
+ Cj
+ M R Ci j
(8)
j f qi y j q
−1 ij −1 1. At first, we calculate the number of executions of query
q i q f qi y j q
Q q , denoted by MKq , which is the sum of the execution
As mentioned before, the general problem of minimizing frequency of query Q q per all sites, as expressed by
round trip response time is NP-hard. Therefore the proposed Equation 11.
solutions are based on heuristics. In this paper, we have utilized SN
5. Based on the access ratios obtained from the previous by query Q q on site S j . So, if a fragment is not needed by any
step, we are now able to create an access ratio vector of of the queries running on a site, X kj (which indicates whether
sizeN,denoted by AV, in which, each element AV[i] is fragment Fk should be replicated to site Sj) should be zero. The
initialized by the access ratio of Fragment Fi , i.e., AV i = second constraint is similar to the one expressed by Equation
A Ri . The access ratios can be regarded as the priority of 10. Replicas are stored in a site as long as the storage capacity
fragments in the allocation process. constraint of the site is not violated. In other words, the fragment
will be replicated on a site if at least one single access to the
6. Finally, in order to create an initial allocation pattern, fragment has been done AND there exists enough storage on
we first create an ordered list of sites based on their that site to store the replicated fragment.
processing speed and then allocate the fragments to the In order to find an optimal solution for the replication of
sites based on their priorities: the fragment with the highest fragments, the Simulating Annealing Algorithm (SA) has been
priority (e.g., highest access ratio) will be allocated to used. The algorithm begins with an initial answer of X kj for
the site with the highest processing speed. This step is the replication of fragments. At every iteration, the cost of
done until all fragments are allocated. Note that in order the obtained replication solution is calculated (considering the
to prevent the bottleneck problem discussed above, the constraints mentioned above) and compared with the previous
capacity constraints of sites, as described in Section 4.2.1, one. The process is continued until an optimal solution that
should be considered. minimizes Costtr is obtained.
The result of the above steps is an initial allocation pattern
of fragments to the sites, described as FSM5 , as expressed
by Equation 15. 5. NUMERICAL EXPERIMENTS
1, if fragment Fi is allocated to node S j In this section, we evaluate the performance of our proposed
F S M(F j , S j ) =
0, Other wi se schemes. First, we explain the experimentation setup, the
(15) scenarios we consider for performance evaluation, and the
datasets we used in experiments in Section 5.1. We then make
This initial allocation is then fed to the SA algorithm to find
an initial analysis of our proposed fragmentation scheme in
an optimal fragment allocation.
Section 5.2, based on the cost model mentioned in Section 3.3.
Then, we compare our proposed schemes with other methods in
Section 5.3.
4.3 The Replication Scheme
X kj ≤ Q F Mqk × ∅q j ∀k, j (17)
As mentioned previously, fragment replication allows the q
retrieval queries to be processed locally and quickly, which
results in the reduction of transmission time, and subsequently,
the round-trip response time of query executions. 5.1 Experiment Setup
In our proposed replication method, we replicate fragments to
the sites to minimize the total transmission cost of replicas. The We implemented the proposed schemes in Matlab 8.1 and con-
total transmission cost (Costtr ) is expressed by Equation 16. ducted a series of experiments to evaluate their performance. For
these experiments to be done,we randomly created 100 instances
Costtr = Ti j Si zek F S Mki X kj , ∀i = j (16) and grouped them into five different scenarios S1 to S5 (each
j k i having 20 instances) such that the instances in each scenario are
similar to each other. We aimed to create scenarios with different
Where Ti j is the cost of transmitting a byte from site S j to site loads (i.e., number of queries) and capacities (i.e., number of
S j , Si zek is the size of fragment Fk in bytes,FSM is the Fragment sites), from fewest (S1 ) to the highest (S5 ). Table 3 shows the
Site Matrix which is initial allocation pattern of fragments to the data used to generate the instances which are variable coefficients
sites, and X kj is a decision making variable, which is 1 or 0, for expressions in Section 4. It is worth mentioning that the data
indicating whether fragment Fk should be replicated to site S j values presented in this table are typical values that can be found
or not. in real cases.
In the replication method mentioned earlier, there are a set
of constraints that should be considered. The first constraint, as
expressed by Equation 17, denotes that a fragment is replicated
on a site if there is a need for the fragment on that site. In order to
5.2 Cost Analysis
determine whether a fragment is needed on a site, we consider
Figure 2 demonstrates the behavior of the two components of
two parameters. The first parameter is ∅q j , which equals 1 if
the query processing cost, as mentioned in Equations 4 and 5
the execution frequency of a query Q q on a site S j (i.e., f q j )
(i.e., local irrelevant attribute access cost and remote relevant
is greater than zero, and 0 otherwise. The second parameter is
attribute access cost) as a function of the number of fragments
Q F Mqk , which, as described by Equation 13, specifies whether
for two scenarios S1 and S2. As demonstrated in figure 2.a, the
a query Q q needs a fragment Fk or not. The multiplication of
increase in the number of fragments results in the reduction of
these two parameters denotes whether a fragment Fk is needed
irrelevant local attribute access cost. This is because when the
5 Fragment Site Matrix fragments are few, they each contain a higher number of local
106
M. NIAZI TORSHIZ ET AL.
attributes and so, the local attribute access cost will be high. On access cost for different number of fragments, as illustrated in
the other hand, when the number of fragments increases, we have Figure 3.a. The optimal number of fragments is acquired when
a fewer number of attributes in each fragment. So, when a query remote and local attribute access cost curves meet, which is 2
gets access to a fragment, it will encounter a fewer number of for this scenario. Figure 3.b confirms that the least amount of
irrelevant attributes. As the number of fragments increases, the query processing cost is acquired when the number of fragments
reduction of the number of irrelevant attributes continues until it is 2.
reaches zero, as shown in figures 2.a. In contrast, an increase in Next, we compare the performance of the proposed model
the number of fragments leads to the increase in the number of with the VFA-RT model [14–16] based on the round-trip response
relevant remote attributes, thus increasing the irrelevant remote time, expressed in Equation 8. Remember from Section 4.2
attribute access cost, as illustrated in Figure 2.b. that the aim of the allocation process is allocating fragments
into sites in a way that minimizes the round-trip response time
of queries. Figure 3.c illustrates the round-trip response time
5.3 Evaluation Result obtained from running the experiment on 20 instances of the S1
scenario. As this figure shows, our proposed model has shown
In the following, we compare the performance of our proposed better performance in regards to obtaining less amount of round-
model with other related models for the different S1 to S5 trip response time. On average, the proposed approach has
scenarios. resulted in the 26% reduction of round-trip response time, as
compared to the VFA-RT method, for S1 scenario.
5.3.1 S1 Experiment Results
5.3.2 S2 Experiment Results
To evaluate the proposed fragmentation and allocation schemes,
we consider the cost model described in Section 3.3. First, in Figure 4 illustrates the query processing cost and the round-
order to obtain the optimal number of fragments, we evaluate the trip response time for the second scenario S2, respectively. As
irrelevant local attribute access cost and relevant remote attribute figure 4.a shows, the minimum amount of query processing
Cost
1
0
1 2 3 4 5 6
N
Number of Fraagments
a
4.5 4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0 0.5
1 2 3 4 5 6 1 3 5 7 9 111 13 15 17 19
N
Number of Fraagments Instance N
Number
c
b
Figure 3 S1 Experimental Results.
cost is obtained when the fragment number equals 2. Figure 4.b based on the round-trip response time for different instance
compares the performance of the proposed model and the VFA- numbers. As shown in this figure, the proposed allocation scheme
RT model based on the round-trip response time for different method outperforms the VFA-RT model in regards to less amount
instance numbers in S2. Again, the proposed model has achieved of round-trip response time. On average, our proposed model has
better outcomes resulting in a 30% reduction of the average acquired a 27% reduction in round trip response time, compared
round-trip response time. to the VFA-RT model.
5.5 7
5
6
4.5
4 5
3.5
Round-trip
4
3
Round
3
2.5
2 2
1 2 3 4 5 1 3 5 7 9 111 13 15 17 19
Tootal Cost- Propoosed Scheme Round-trip Response Time (s) (× 104) Propoosed Scheme VFA-RT
T
Total Query Processing Cost (× 104)
3.5 6
3 5
2.5 4
2 3
1.5 2
1 1
0.5 0
1 2 3 4 5 1 3 5 7 9 11 13 15 17 199 21
Num
mber of Fragm
ments Instance Nuumber
a b
Figure 5 S3 Experimental Results.
Round-trip Response Time (s) (× 10-3)
T
Total Cost- Propposed Scheme
2 6
1.88 5
1.66 4
1.44
3
1.22
2
1
1
0.88
0.66 0
1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 11 133 15 17 19
Nuumber of Fragm
ments I
Instance Numbber
a b
Figure 6 S4 Experimental Results.
3.5 0.0335
0.003
3
0.0225
2.5 0.002
2 0.0115
0.001
1.5
0.0005
1 0
1 3 5 7 9 11 13 15 17 19 1 3 5 7 9 11 13 15 17 199
Num
mber of Fragmeents Instance Num
mber
a
b
Figure 7 S5 Experimental Results.
0.4
0.3
0.2
0.1
0
S1 S2 S3 S4 S5
Table 4 Query Processing Cost for the Proposed Fragmentation Scheme and VFA-RT.
Query Processing Cost (×104 )
Scenario
VFA-RT Proposed Fragmentation
Scheme
S1 1.2144 0.9564
S2 0.3145 0.2993
S3 1.3297 0.8348
S4 0.9564 0.6846
S5 1.9821 1.4901
and the VFA-RT model for all S1 to S5 scenarios. As has been fragments, based on some criteria, with the aim of local execution
shown in this table, the proposed fragmentation scheme has of queries and reducing both the communication cost between
shown better performance in comparison with VFA-RT model sites and the queries’ execution time. In order to observe the
and acquired less query processing cost. impact of such replication, we evaluate the round-trip response
time for different S1 to S5 scenarios in two different situations:
the situation where the replication scheme is applied and the
5.4 Impact of the Proposed Replication Scheme one without replication. The results that have been shown in
Figure 8 demonstrate that replicating the fragments will cause
As mentioned previously in Section 4.3, once fragments are a substantial reduction in round trip response time for all
allocated to the sites, the replication scheme replicates some scenarios.
28. Goli, M. and S.M.T.R. Rankoohi, A new vertical fragmentation 30. Khan, S.U. and I. Ahmad, Replicating data objects in large
algorithm based on ant collective behavior in distributed database distributed database systems: an axiomatic game theoretic
systems. Knowledge and Information Systems, 2012. 30(2): mechanism design approach. Distributed and Parallel Databases,
p. 435–455. 2010. 28(2–3): p. 187–218.
29. Shamkant B. Navathe and Mingyoung Ra., Vertical partitioning
for database design: A graphical algorithm, Proceedings of the
ACM SIGMOD International Conference on Management of Data,
pp. 440–450, 1989.