Received February 16, 2021, accepted March 1, 2021, date of publication March 17, 2021, date of current version April 19, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3065944
BVAGQ-AR for Fragmented Database Replication
Management
A. NORAZIAH 1,2 , AINUL AZILA CHE FAUZI 3 ,
SHARIFAH HAFIZAH SY AHMAD UBAIDILLAH 1 , BASEM ALKAZEMI
AND JULIUS BENEOLUCHI ODILI 5
4,
1 Faculty
of Computing, Universiti Malaysia Pahang, Pekan 26600, Malaysia
for Software Development and Integrated Computing, Universiti Malaysia Pahang, Pekan 26600, Malaysia
3 Faculty of Computer and Mathematical Sciences, Universiti Teknologi Mara Kelantan, Machang 18500, Malaysia
4 Department of Computer Science, College of Computer and Information Systems, Umm Al-Qura University, Makkah 21955, Saudi Arabia
5 Department of Mathematical Sciences, Anchor University Lagos, Lagos 234101, Nigeria
2 Centre
Corresponding author: A. Noraziah (
[email protected])
This work was supported in part by Ministry of Higher Education Malaysia through Fundamental Research Grant Scheme under Grant
RDU190185 represents FRGS grants number (FRGS/1/2018/ICT03/UMP/02/3); and in part by University Malaysia Pahang (UMP) Short
Term Grant RDU1903122 and UMP PGRS Grant RDU170329.
ABSTRACT Large amounts of data have been produced at a rapid rate since the invention of computers.
This condition is the key motivation for up-to-date and forthcoming research frontiers. Replication is one
of the mechanisms for managing data, since it improves data accessibility and reliability in the distributed
database environment. In recent years, the amount of various data grows rapidly with widely available lowcost technology. Although we have been packed with data, we still have lacked of knowledge. Nevertheless,
if the impractical data is used in database replication, this will cause waste of data storage and the time taken
for a replication process will be delayed. This paper proposes Binary Vote Assignment on Grid Quorum with
Association Rule (BVAGQ-AR) algorithm in order to handle fragmented database synchronous replication.
BVAGQ-AR algorithm is capable for partitioning the database into disjoint fragments. Fragmentation in
distributed database is very useful in terms of usage, reliability and efficiency. Managing fragmented
database replication becomes a concern for the administrator because the distributed database is disseminated
into split replica partitions. The result from the experiment shows that handling fragmented database
synchronous replication through proposed BVAGQ-AR algorithm able to preserve data consistency in
distributed environment.
INDEX TERMS Replication, algorithm, fragmentation, data mining, computational intelligence, distributed
databases, data grid.
I. INTRODUCTION
Large amounts of data have been produced at a rapid rate
since the invention of computers. This condition is the key
motivation for up-to-date and forthcoming research frontiers.
Nowadays, huge numbers of data are generated around the
world distributed across data grid. One of the biggest problems that data grids users have to overcome today is to
improve the management of data. Providing reliable services
along with high data availability and the performance are the
important requirements that need to be essentially met. The
concept of replication is used to ensure these requirements.
The main idea of replication is to manage large volumes of
data in a distributed manner, speeds up data access, reduces
The associate editor coordinating the review of this manuscript and
approving it for publication was Senthil Kumar
56168
.
access latency and increases data availability [1], [2]. In addition, fragmentation replication is designed to enhance the data
availability and the system performance of the distributed
database for data management [3].
Distributed database replication is a very challenging platform especially when dealing with a huge data. However,
in recent years, with widely available, low-cost technology, the amount of various data grows rapidly. The problem is although we are packed with data, but we still
lacked of knowledge. Nevertheless, if the impractical data
is used in database replication, this will cause waste of data
storage and the time taken for a replication process will
be delayed. In Distributed Indexing Dispatched Alignment
(DIDA), when there are too many requests and/or huge
targets, the arrangement process becomes computationally
challenging [4]. However, this research not focusing on query
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
updates processing. The BSCA strategies [5] applied association rules in its replication strategies. Association Rules is
used to find the correlations between the data. This method
will improve the average response time for the transactions.
However, data replication will only be done during the collecting components process. Hence, this method does not
apply synchronous replication method. In Prefetching-Based
Replication Algorithm (PRA), when a local site obtains a file
request but the file is not stored locally, it will search other
site to transfer the required file replica through the Replica
Directory Server [6]. The local site will select some adjacent
files to start the replication process. However, the sequence
databases need some storage space. This is because as the
time goes on, the size of the databases will become larger.
Hierarchical Replication Scheme (HRS) consists of a root
database server and one or more database servers organized
into a hierarchy topology [7]. Once the changes have been
made, all the data will be replicated into the entire replicas. In
order to maintain consistency among the updates by clients,
all blocks are propagated and locked during the transaction
process. This means only one client can modify the data at
a time. Branch Replication Scheme (BRS) is composed of
a different set of sub-replicas organized using a hierarchical
topology [7]. In order to maintain consistency among the
updates by clients, a mechanism is proposed. Clients only can
modify the data located in the terminal replica, or referred as
the leaf nodes of the replication tree. A problem may occur
in BRS when a client tries to write in a sub-replica which
is not terminal, because that sub-replica has been split into
other sub replica. For replication techniques namely ReadOne-Write-All (ROWA), they copy all data to all sites which
means all servers will have the same data [8], [9]. Data
reliability and availability is confirmed but the issues are the
data redundancy will be high, it will waste the storage space
and the processing time for a transaction also will be high
because it has to commit the transaction at all servers.
Although data availability is better because data are stored
at more than one site, most of existing replication strategies
neglects the correlation between the data files in a Distributed
Database Systems (DDS). The information about the data
correlation can be dig out from past data using techniques
from data mining field. Data mining technique is a part of data
clustering method [10]. It is a powerful tool for assisting the
extraction of meaningful data from large data sets [11], [12],
[14], [15]. The objective for mining grid data is analyzing
grid systems with data mining techniques in order to find new
meaningful knowledge. The information later can be used to
improve grid systems in numerous fields. However, only a
small number of works have applied data mining techniques
to discover file correlations in data grids [13]. Therefore,
the study on this basis is initiated.
In our previous work, the Binary Vote Assignment on Grid
(BVAG) has been proposed in order to increase write query
availability with low communication cost through the small
replication quorum [21]. However, the paper not considering
the data fragmentation design, which is more suitable for
VOLUME 9, 2021
distributed database environment. Thus, this paper proposes
Binary Vote Assignment on Grid Quorum with Association
Rule (BVAGQ-AR) algorithm in order to handle fragmented
database synchronous replication. BVAGQ-AR algorithm is
capable for partitioning the database into disjoint fragments.
This paper is organized as the following. The nature of data
mining in grid is explained in Section 2. Section 3 presents the
BVAGQ-AR technique for data management. Section 4 elaborate experimental results in distributed environment. Finally,
Section 5 and 6 discuss and conclude our research finding
from this article.
II. DATA MINING IN GRID
One of the data mining techniques is called Association
rules. The rules are created by analyzing data for frequent
if/then patterns and using the criteria support and confidence
to identify the most important relationships. Support is an
indication of how frequently the items appear in the database.
In addition, Association rules are also able to discover a set of
items that appear frequently together in a transaction by using
Apriori algorithm. This data set is called a frequent item set.
The basic concepts of data mining association rules are
called support and confidence. These concepts showed the
practicality and certainty in data discovery rules.
Rule 1: A ⇒ B set up in transaction D, it has support s,
where P is percent of A ∪ B in transaction D, it is the P(A ∪ B)
where A and B are item sets which A 6= B. So support is
defined as:
support (A ⇒ B) = P (A ∪ B)
(1)
Each discovery mode should be denoted by a certainty
measure of its efficiency or reliability, so rule 2 is:
Rule 2: A ⇒ B has confidence c, it is percent both A and
B in transaction D. It is conditional probability P(A|B), so the
certainty measure confidence is defined as:
confidence (A ⇒ B) = P (A|B)
(2)
If rule 1 and rule 2 meet the specified minimum support
and confidence, that the rules for strong association rules.
Rule 3: it is strong association rule, if support ≥
min support and confidence ≥ min confidence. The min
support is minimum support, and min confidence is minimum
confidence.
An algorithm namely Apriori is proposed for mining frequent item sets for Boolean association rules [16]. The name
of the algorithm is established on the fact that the algorithm
uses prior knowledge of frequent item set properties, which
will be explained later. Apriori is an iterative method known
as a level-wise search, where k − item sets are used to explore
(k + 1) − item sets.
First, the set of frequent 1-itemsets is discovered by scanning the database to determine the count for each item,
and assembling those items that satisfy the minimum support. The resulting set is represented as L1 . After that,
L1 is used to identify the set of frequent 2-itemsets, L2 ,
which later is used to identify L3 , and so on, until no
56169
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
FIGURE 1. BVAGQ-AR framework.
more frequent k-item sets can be discovered. The process
of discovering each of the Lk involves one full scan of the
database.
An important property called the Apriori property is used
to reduce the search space in order to improve the efficiency
of the level-wise generation of frequent item sets,
Apriori property: All nonempty subsets of a frequent item
set must also be frequent.
The Apriori property is based on the following observation.
By definition, if an item set, I does not satisfy the minimum
support threshold, min sup, then I is not frequent, that is,
P (1) < min sup. If an item A is added to the item set I ,
then the resulting item set (i.e., I ∪ A) cannot occur more
frequently than I . Therefore, I ∪ A is not frequent either, that
is, P(I ∪ A) < min sup.
III. BVAGQ-AR TECHNIQUE
The main idea of replication is to create multiple copies
of the same data or replicas in several storage resources.
However, while focusing in replication, there are some methods that neglect the correlation among different data files.
Actually, in many applications, data files may be correlated
in terms of accesses and have to be considered together in
order to reduce the access cost [17]. Indeed, the analysis
of data usage in several real data grids such as Dzero [18]
and Coadd [19] revealed the existence of strong correlations
between files, i.e., jobs tend to request a set of correlated
files. This paper proposes Binary Vote Assignment on Grid
Quorum with Association Rule (BVAGQ-AR) technique.
In BVAGQ-AR, all sites are logically organized in the form
of a two-dimensional grid structure. For example, if BVAGQAR consists of twenty-five sites, it will be logically organized
in the form of 5 x 5 grid. There are four phases involves in
BAVGQ-AR framework, which are:
1.
2.
3.
4.
Data mining – Apriori algorithm from Association Rules
Database fragmentation
Database allocation
Database replication
Figure 1 shows the BVAGQ-AR framework.
56170
1) Data mining – Apriori algorithm from Association
Rules: Data mining technique that has been deployed in this
experiment called association rules. It is used to discover the
correlation between data. Apriori algorithm is an algorithm
for frequent item set mining and association rules learning
over transactional databases. Learning association rules basically means finding the items that are appeared together more
frequently than the others.
2) Database Fragmentation: This method also has been
proposed to make sure data replication can be effectively done
while minimize storage. In general, applications work with
some relations rather than entire relations. Therefore, for data
distribution, it is better to work with subsets of relation as the
unit of distribution. Thus, not all data will be replicated to all
sites. The data is fragmented based on data mining analysis
results.
3) Database Allocation: All sites are logically organized
in the form of two-dimensional grid structure. For example,
if BVAGQ-AR consists of twenty-five sites, it will logically
organize in the form of 3 × 3 grids. Each site has database
relation files. The databases that are produced after database
fragmentation process are allocated at their assigned sites.
4) Database Replication: After database allocation process, each site has a database relation file. A site is either
operational or failed and the state (operational or failed) of
each site is statistically independent to the others. A copy at
a site is available when the site is operational; otherwise it is
unavailable [20], [21].
A. BVAGQ-AR ALGORITHM DEFINITION
In this section, BVAGQ-AR is proposed by considering the
distributed database fragmentation. The following notations
are defined:
i.
ii.
iii.
iv.
v.
vi.
vii.
viii.
ix.
x.
xi.
xii.
xiii.
xiv.
xv.
xvi.
xvii.
S is a relation in database.
S ′ is relation after mining
s is the instance in S or S ′
J1 is the frequent item sets
J2 is not the frequent item sets
S (B)1 is the four sites in the corners
S (B)2 is the other sites on the boundaries
S (B)3 is the middle sites
V is a transaction.
T is a tuple in J1 .
x is an instant in T which will be modified by
element of V .
y is an instant in T which will not be modified by
element of V .
S1 is a vertical fragmented relation with instant x
derived from J1 .
S2 is a vertical fragmented relation without instant x
derived from J1 .
Pk is a primary key.
Pk, x is a primary key with data x.
Pk, y is a primary with data y, where y 6= x
VOLUME 9, 2021
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
xviii. S1(Pk,x) and S1(Pk,y) are a horizontal fragmentation
relation derived from J1 .
xix. η and ψ are groups for the transaction V .
xx. λ = η or ψ where it represents different transaction
V (before and until get quorum).
xxi. Vη is a set of transactions that comes before Vψ ,
while Vψ is a set of transactions that comes after Vη .
xxii. D is a union of all data objects managed by all
transactions V of BVAGQ-AR.
xxiii. Target set = {1, 0} is a result of transaction V .
xxiv. BVAGQ-AR transaction element Vλ is an element
either in different set of transactions Vη or Vψ .
xxv. wV λ is write counter for the transaction.
xxvi. V̂λx is a transaction that is transformed from Vλx
xxvii. Vµx represents the transaction feedback from a
neighbour site. Vµx exists if either Vλx or V̂λx exists.
xxviii. Successful transaction at primary site Vλx = 0
where Vλx ǫD (i.e., the transaction locked an instant
x at primary). Meanwhile, successful transaction
at neighbour site V (µx ) = 0, where µx ǫD (i.e.,
the transaction locked a data x at neighbour).
xxix. ⌈ n2 ⌉ is the greatest integer function (i.e., n =
9, ⌈ 29 ⌉ = 5.
This model starts with inserting database S. Then, S is
mined into S ′ . From S ′ , the data is fragmented into J1 and
J2 . If J1 is less than or equivalent to three, then the data will
be allocated at S (B)1 because it has three replication servers.
If the J1 is equivalent to four, the data will be allocated at
S (B)2 because it has four replication servers. If J1 is more
than or equivalent to five, then the data will be allocated at
S (B)3 because it has five replication servers. After all data
are replicated to their specific servers, the replication process
can be executed.
The primary replica for a particular instant x is a replica
that accepts the client’s request. In BVAGQ-AR model, each
replica of S (B) can be a primary or a neighbour replica at
the same time. Any replica iǫS (B) can be chosen as the
primary replica, while other replicas jǫS (B) where i 6= j are
neighbours. When a transaction Vη request an instant x from
any replica of S (B) , that replica will be the primary, while
others will be the neighbour replica for processing Vη . At the
same time, if other sets of transactions invoke to update x
after Vη , these set of transactions are called Vψ . When Vψ
obtain lock from instant x from any site of S (B), which is a
different site of the primary replica for processing Vη , that
site becomes the primary processing for Vψ . Simultaneously,
the primary processing for Vψ also functions as neighbour
replica for processing Vη and vice versa. Other sites of S (B)
that is neither primary replica for processing Vη nor primary
replicas for processing Vψ will function as neighbour replicas
for processing Vλx , where λ = η, ψ.
S (B) is the set of replicas with replicated copies are stored
corresponding to the assignment B for particular instant x,
S (Bx ) =
VOLUME 9, 2021
m (i, j) , m (i − 1, j) , m (i, j − 1) ,
.
m (i, j + 1) , m (i + 1, j)
Two sets of transactions, Vη request instant x from
m (i, j) replica, while Vψ request instant x from m (i − 1, j)
respectively. The m (i, j) replica functions as the primary
replica for processing Vη , where m (i − 1, j) , m (i, j − 1) ,
m (i, j + 1) , m (i + 1, j) are neighbour replicas for processing Vγx ǫVη . Simultaneously, m (i − 1, j) replica functions as the primary replica for processing Vψ , while
m (i, j − 1) , m (i, j + 1) , m (i + 1, j) and m (i, j) are neighbour replicas for processing Vγx ǫVψ . Both m (i, j) and
m (i − 1, j) replicas execute two different processing task
concurrently. The m (i, j) replica is the primary replica processing Vη and neighbour replica processing for Vψ , whereas
the m (i − 1, j) replica is the primary replica for processing
Vψ and neighbour replica for processing Vψ . BVAGQ-AR
model considers different sets of transactions Vη and Vψ . Vη
is a set of transactions that comes before Vψ , while Vψ is a set
of transactions that comes after Vη . The effect of BVAGQ-AR
transaction is defined as the processing of one instance of the
transaction.
One site has a preliminary database, S, which will be
converted into binary format. Each row corresponds to a
transaction and each column corresponds to an item. An item
can be treated as a binary variable whose value is one if the
item is present in a transaction and zero otherwise.
For example, a database with binary variable is shown
in Table 1. W and Z represent the items in the database and n
is the total number of transactions.
Support, s, is the fraction of transactions that contain both
W and Z where
s = σ (a, b, c, d)/n = 7/20 = 0.35@35%
(3)
Confidence, c, measures how often items in Z appear in
transactions that contain W .
c = σ (a, b, c, d)/σ (a, b) = 7/10 = 0.7@70%
(4)
For simplicity, data from row 1 to 5 and column 1 to 6 in
Table 2 is used for this example case. Figure 2 shows an
illustration of the frequent item set generation in the Apriori
algorithm for the transactions. It is assumed that the support
threshold is 60%, which is equivalent to a minimum support
count equal to three because in this example, the items have
to appear more than half of the transactions to be taken as a
frequent item sets. In large databases, if the threshold is 40%
or below, all the data most likely will appear together.
Initially, every item is considered as a candidate 1-itemset.
After counting their supports, the candidate item sets {c} and
{f } are discarded because they appear in fewer than three
transactions. In the next iteration, candidate 2-itemsets are
generated using only the frequent 1-itemsets because the
Apriori algorithm ensures that all supersets of the infrequent
1-itemsets must be infrequent. Because there are only four
frequent 1-itemsets, the number of candidate 2-itemsets generated by the algorithm is (24) = 6.
Two of these six candidates, {b, e} and {d, e}, are subsequently found to be infrequent after computing their support values. The remaining four candidates are frequent, and
56171
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
TABLE 1. Database with binary variable.
thus will be used to generate candidate 3-itemsets. Without support-based pruning, there are (36) = 20 candidate
3-itemsets that can be formed using the six items given in
this example. With the Apriori algorithm, only candidate
3-itemsets whose subsets are frequent will be kept. The only
candidate that has this property is {a, b, d}.
The relation that is resulted from identifying the frequent
item sets, S ′ will be fragmented into relation with frequent
item sets, J1 and relation without frequent item sets, J2 using
vertical fragmentation. When S ′ is fragmented, it is divided
into a number of fragments S ′ 1, S ′ 2 , . . . .S ′ n .
S ′ = S ′ 1 ∪ S2′ ∪ . . . .∪S ′ n
(5)
The fragmentation should be done in such a way that
relation S can be reconstructed from the fragments:
S′ = S′ 1 ⊲⊳ S′2 ⊲⊳ . . . .⊲⊳ S′ n
56172
(6)
It is necessary to include the primary key or some
candidate key attribute in every vertical fragment so that
the full relation can be reconstructed from the fragments.
After fragmentation, J1 is allocated at its replica sites,
S (B)1 , S (B)2 or S (B)3 .
Each site now has a primary data file which is either
operational or failed, and the state (operational or failed) of
each site is statistically independent to the others. When a site
is operational, the copy at the site is available; otherwise it is
unavailable.
Recall the Binary Vote Assignment on Grid (BVAG) technique [13]. However, BVAG only covers the voting and a part
of the replication process.
Definition 1: A site X is a neighbour to site Y , if X is
logically located adjacent to Y .
A data will replicate to the neighboring sites from its
primary site. The number of data replication, d, can be calculated using Property 1, as described below.
VOLUME 9, 2021
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
FIGURE 2. Generating frequent item sets using the Apriori algorithm.
Property 1: The number of data replication from each site,
d ≤ 5.
Proof: Let n be a set of all sites that are logically organized in a two-dimensional grid√structure form.
√Then n sites
are labelled m (i, j) , 1 ≤ i ≤ n, 1 ≤ j ≤ n. Two way
links will connect sites m (i, j) with its four neighbours, sites
m (i ± 1, j) and m (i, j ± 1), as long as there are sites in the
grid. Note that, four sites on the corners of the grid have only
two adjacent sites, and other sites on the boundaries have only
three neighbours. Thus, the number of neighbours of each site
is less than or equal to 4. Since the data will be replicated to
neighbours, then the possible number of data replication from
each site, d, is:
d ≤ the number of neighbours + a data from it self
≤4+1=5
FIGURE 3. Five replication servers connected to each other.
IV. EXPERIMENTAL RESULTS
In this section, the experiments for managing transaction
and replication are described. To demonstrate BVAGQ-AR
transaction, 9 servers that logically organized in 3 × 3 are
considered based on BVAGQ-AR two-dimensional logical
design. 9 servers have been used because the number of
replicated data, d, can be 3, 4 or 5. Hence, 9 servers are
chosen in order to get maximum replicated data, d = 5 in
the experiment. The 5 replication servers have been deployed
as in Figure 3. Each server or node is connected to one
another through a fast Ethernet switch hub. Theoretically,
each of the neighbour replication servers and the primary
replication server should be connected each other logically as
shown in Figure 2. Each server has been assigned with vote
VOLUME 9, 2021
0 or 1. Vote 0 means the server is free locked and able to
proceed with a new transaction. In contrast, vote 1 means the
server is busy which means it is already locked. Hence, new
transaction cannot be initiated on that server.
The Binary Vote Grid Coordination depicted in Table 2.
Replica B with IP 172.21.202.163, replica D with
IP 172.21.202.162, replica E with IP 172.21.202.169,
replica F with IP 172.21.202.168 and replica H with IP
172.21.202.2167 locate instant e.
In this experiment, a transaction, Vη requests to update
instant e at site E. The aim of this experiment is to record
the job execution time for the replication process. The result
for this experiment is presented in Table 4.
56173
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
TABLE 2. BVAGQ-AR Grid coordination.
TABLE 3. Experimental result for one transaction at one site.
From the result from Table 4, at time equivalent to 1 (t1),
instant e at all servers are unlocked. At (t2), the transaction
begins. At (t3), there is a transaction, Vηe request to update
instant e at server E. The transaction initiates lock. Hence,
56174
write counter for server E now is equal to 1. At (t4), Vηe
propagate lock at its neighbour replica B at server B, Vηe lock
(e) from E. Thus at (t6), the transaction achieved in getting
locked from the B then write quorum is equal to 2. Next,
VOLUME 9, 2021
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
TABLE 4. Comparison of job execution time for the minimum number of replication servers.
Vηe propagates lock at server D at (t7) and at (t8), Vηe lock
(e) from E. Thus at (t9), the transaction achieved in getting
locked from the D then write quorum is equivalent to 3. After
that, Vηe propagate lock at server F at (t10) and at (t11), Vηe
lock (e) from F. Thus, at (t12), the transaction achieved in
getting locked from the F then write quorum is equivalent to
4. Then, Vηe propagate lock at server H at (t13) and at (t14),
Vηe lock ((e)) from H. Thus at (t15), the transaction achieved
in getting locked from the H then write quorum is equal to 5.
At (t16), Vηe obtain all quorums and then instant e is updated
at (t17.) At (t18), the relation S is fragmented into S1 and
S2 using vertical fragmentation. At (t19), the relation S1 is
fragmented again using horizontal fragmentation into S1(Pk,x)
and S 1(Pk,y) . Finally, at (t20), V̂λe ∈ Vη is commit and at (t21),
instant e at all replica servers will unlock and ready for the
next transaction to take place.
V. DISCUSSION
The proposed BVAGQ-AR has been compared with other
replication techniques in terms of the total job execution
time for a transaction. In this section, the total job execution
time to update data between five existing techniques namely
Dynamic Replication based on the Correlation of the File
Strategy in Multi-Tier Data Grid Algorithm (BSCA) [5], A
Prefetching-Based Replication Algorithm (PRA) [6], Hierarchical Replication Scheme (HRS) [7], Branch Replication
Scheme (BRS) [7] and Read-One-Write-All (ROWA) [8], [9]
have been compared with the proposed technique.
A. VALIDITY THREATS
Several validity threats can be associated with these experimental studies. Few threats have been identifies and their
effects on the results are elaborated.
First, the benchmark choice represents an essential threat.
The experimental benchmarks from other studies in literature
have been adopted. However, we cannot guarantee these
benchmarks represent the actual software and hardware configurations in real world. Nevertheless, the benchmarks are
derived from configurations of different software programs.
Second, a comparison with other techniques is another
threat. Other replication techniques with data mining such
as BSCA and PRA are tested using simulation tools. This
research focus on testing the replication technique in real time
DDS because simulation cannot capture the problems that
VOLUME 9, 2021
arise in real time environment. Nevertheless, the comparison
is valid because all the techniques that we compared we have
tested them using the same software and hardware in real time
environment.
B. REPLICATION JOB EXECUTION TIME COMPARISON
Two series of experiments has been done in order to compare
the job execution time for each technique. The first experiment is executed using the minimum number of replication
servers of each replication technique. Table 4 shows the time
comparison for the first experiment.
Table 4 shows the execution time comparison between
BSCA, PRA, ROWA, HRS, BRS and BVAGQ-AR in their
minimum replication servers. From Table 4, it is proved
that BVAGQ-AR requires the lowest time to complete a
transaction. It took only 66.548 milliseconds to complete a
transaction. The second lowest execution time is BCSA with
88.404 milliseconds followed by PRA with total time taken
is 96.711 milliseconds. PRA takes longer time due to user
prefetching data from other servers. Next is BRS which takes
137.157 milliseconds to complete the replication process.
ROWA and HRS takes the longest execution times which are
more than 250 milliseconds. As it shown in Table 4, there are
big differences of total job execution time between BSCA
and PRA with ROWA, BRS and HRS. This is because the
data in ROWA, BRS and HRS is not mined since the original
techniques do not consider the data correlation.
For the second experiment, it is executed using the
maximum number of replication servers for each method.
Table 5 shows the time comparison for the second experiment.
Table 5 shows the execution time comparison between
BSCA, PRA, ROWA, HRS, BRS and BVAGQ-AR for maximum replication servers. From the Table 6, again, it is proved
that BVAGQ-AR requires the lowest time to complete a transaction as the maximum replication servers in this technique is
only five. It took only 83.868 milliseconds for BVAGQ-AR to
complete a transaction. The second lowest execution time is
PRA with 191.608 milliseconds. This is followed by BSCA
with total time taken is 192.974 milliseconds. Next is BRS
which took 185.172 milliseconds to complete the replication
process. ROWA and HRS took the longest execution times
which are more than 250 milliseconds. Compare to other
56175
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
TABLE 5. Comparison of job execution time for the maximum number of replication servers.
TABLE 6. CBVAGQ - AR Improvement in terms of job execution time (%).
methods, BRS need less time to do a transaction because the
data in this technique are fragmented and allocated at several
different sites while other methods replicate all data to all
sites.
From Table 6, it is shown that, BVAGQ-AR has 31.19%
improvement from BCSA when experiment is executed in
minimum number of replication servers and 56.54% improvement in maximum number of replication servers. This is followed by PRA where BVAGQ-AR has 24.72% improvement
from it in minimum number of replication servers and 56.23%
improvement in maximum number of replication servers.
The improvement in BSCA and PRA has a big different
since in BVAGQ-AR, the minimum and maximum number of
replication servers are 3 and 5 while in BSCA and PRA are
3 and 9. BVAGQ-AR had improved 74.62% from ROWA and
74.20% from HRS in minimum number of servers, 68.67%
and 68.58% in maximum number of replication servers.
There are not much different in the results since ROWA and
HRS use 9 replication servers in both experiments. Last but
not least is BRS, where BVAGQ-AR has 51.48% improvement from it in minimum number of replication servers and
51.23% improvement in maximum number of replication
servers. The percentages are much higher in ROWA, HRS
and BRS compare to BSCA, PRA and BVAGQ-AR because
they do not take correlations between data into consideration.
Hence, the processing times for these techniques are longer.
In conclusion, BVAGQ-AR has the lowest job execution time
to complete a transaction compare to BSCA, PRA, ROWA,
HRS and BRS.
VI. CONCLUSION
In order to preserve data consistency and reliability of the
systems, managing transactions is very important. BVAGQAR resolves this by setting the lock with small quorum size
before update and commits transaction synchronously to the
sites that has the same fragmented data. Since this technique
using small size of quorum, less computational time is needed
to send and receive messages from its neighbours’ replicas.
56176
BVAGQ-AR only took only 66.548 milliseconds to complete
a transaction while the second lowest execution time is BCSA
with 88.404 milliseconds followed by PRA with total time
taken is 96.711 milliseconds. PRA takes longer time due to
user prefetching data from other servers. BRS takes 137.157
milliseconds to complete the replication process and ROWA
and HRS takes the longest execution times which are more
than 250 milliseconds. In addition, maintaining data consistency also easier compare to other techniques because it has
low communication cost. This is because less computational
time required for the locking of the small quorum size in
synchronization process. From the experiment result, we can
say that managing replication and transaction through proposed BVAGQ-AR able to preserve data consistency. It also
increases the degrees of parallelism because by using fragmentation, replication and transaction can be divided into several subqueries that operate on the fragments. BVAGQ-AR
can be improved in many different ways. As we know, server
failure can happen anytime. Currently, BVAGQ-AR does not
support handling fragmented database replication transaction
management by considering failure cases. In future, BVAGQAR will take this challenge to handle fragmented database
failure case and fault tolerance such as system crashes, statement failure, application software errors, network failure and
media failure in real time distributed database system in real
time environment.
REFERENCES
[1] B. A. Milani and N. J. Navimipour, ‘‘A comprehensive review of the data
replication techniques in the cloud environments: Major trends and future
directions,’’ J. Netw. Comput. Appl., vol. 64, pp. 229–238, Apr. 2016.
[2] J. Wang, H. Wu, and R. Wang, ‘‘A new reliability model in replicationbased big data storage systems,’’ J. Parallel Distrib. Comput., vol. 108,
pp. 14–27, Oct. 2017.
[3] S. H. S. A. Ubaidillah, N. Ahmad, and J. B. Odili, ‘‘Fragmentation techniques for ideal performance in distributed database—A survey,’’ Int. J.
Softw. Eng. Comput. Syst., vol. 6, no. 1, pp. 18–24, May 2020.
[4] H. Mohamadi, P. V. Benjamin, A. Raymond, S. D. Jackman, J. Chu,
C. P. Breshears, and I. Birol, ‘‘DIDA: Distributed indexing dispatched
alignment,’’ PLOS ONE, vol. 10, no. 4, 2015, Art. no. e0126409, doi:
10.1371/journal.pone.0126409.
VOLUME 9, 2021
A. Noraziah et al.: BVAGQ-AR for Fragmented Database Replication Management
[5] Z. Cui, D. Zuo, and Z. Zhang, ‘‘Based on support and confidence dynamic
replication algorithm in multi-tier data grid,’’ Int. J. Comput. Inf. Syst.,
vol. 9, no. 10, pp. 3909–3918, 2013.
[6] T. Tian, J. Luo, Z. Wu, and A. Song, ‘‘A prefetching-based replication
algorithm in data grid,’’ in Proc. 3rd Int. Conf. Pervas. Comput. Appl.,
vol. 1, 2008, pp. 526–531.
[7] J. M. Pérez, F. G. Carballeira, J. Carretero, A. Calderón, and J. Fernández,
‘‘Branch replication scheme: A new model for data replication in large
scale data grids,’’ Future Gener. Comput. Syst., vol. 26, pp. 12–20,
Jan. 2010.
[8] S. Budiarto and N. M. Tsukamoto, ‘‘Data management issues in mobile
and peer-to-peer environments,’’ Data Knowl. Eng., vol. 41, pp. 183–204,
Jun. 2002.
[9] A. Noraziah, A. N. Abdalla, and M. S. Roslina, ‘‘Data replication using
read-one-write-all monitoring synchronization transaction systems in distributed environment,’’ J. Comput. Sci., vol. 6, no. 10, pp. 1033–1036,
2010.
[10] H. A. Abdulwahab, A. Noraziah, A. A. Alsewari, and S. Q. Salih,
‘‘An enhanced version of black hole algorithm via Levy flight for
optimization and data clustering problems,’’ IEEE Access, vol. 7,
pp. 142085–142096, 2019.
[11] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques.
San Mateo, CA, USA: Morgan Kaufmann, 2010.
[12] M. J. Zaki, W. Meira, Jr., and W. Meira, Data Mining and
Analysis: Fundamental Concepts and Algorithms. Cambridge, U.K.:
Cambridge Univ. Press, 2014.
[13] A. Sánchez, J. Montes, W. Dubitzky, J. J. Valdés, M. S. Pérez, and
P. D. Miguel, ‘‘Data mining meets grid computing: Time to dance?’’ in
Data Mining Techniques in Grid Computing Environments. Hoboken, NJ,
USA: Wiley, 2008, pp. 1–16.
[14] T. Hamrouni, S. Slimani, and F. B. Charrada, ‘‘A critical survey of data grid
replication strategies based on data mining techniques,’’ Procedia Comput.
Sci., vol. 51, pp. 2779–2788, Jan. 2015.
[15] T. Hamrouni, S. Slimani, and F. B. Charrada, ‘‘A survey of dynamic
replication and replica selection strategies based on data mining techniques
in data grids,’’ Eng. Appl. Artif. Intell., vol. 48, pp. 140–158, Feb. 2015.
[16] R. Agrawal and J. C. Shafer, ‘‘Parallel mining of association rules,’’ IEEE
Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 962–969, Dec. 1996.
[17] M. Tu, ‘‘A data management framework for secure and dependable data
grid,’’ Univ. Texas Dallas, Richardson, TX, USA, Tech. Rep., 2006.
[18] S. Doraimani, ‘‘Filecules: A new granularity for resource management in
grids,’’ M.S. thesis, Univ. South Florida, Tampa, FL, USA, 2007.
[19] S. Y. Ko, R. Morales, and I. Gupta, ‘‘New worker-centric scheduling strategies for data-intensive grid applications,’’ in Proc. Int. Conf. Middleware,
2007, pp. 121–142.
[20] N. Ahmad, A. A. C. Fauzi, R. M. Sidek, N. M. Zin, and A. H. Beg, ‘‘Lowest
data replication storage of binary vote assignment data grid,’’ in Proc. Int.
Conf. Netw. Digit. Technol., 2010, pp. 466–473.
[21] M. M. Deris, D. J. Evans, M. Y. Saman, and A. Noraziah, ‘‘Binary vote
assignment on a grid for efficient access of replicated data,’’ Int. J. Comput.
Math., vol. 80, no. 12, pp. 1489–1498, 2003.
A. NORAZIAH received the Ph.D. degree in
database from University Malaysia Terengganu,
Malaysia. She is currently an Associate Professor with the Faculty of Computing, and a
Research Fellow with the Centre for Software
Development and Integrated Computing, University Malaysia Pahang, Malaysia. She had published 280 scientific research articles. She also
supervised more than 20 postgraduate students,
and obtained many grants related to her research
expertise. Her research interests include distributed database, data grid, data
mining, big data, and computational intelligence. She has a professional
membership in IEEE Computer Society, IACSIT, MNCC, and IAENG. She
received several international and national awards. She served as an International Program Committees and a reviewers for many numerous international
journals and conferences. She is the Chief Editor of the International Journal
of Software Engineering and Computer Systems.
VOLUME 9, 2021
AINUL AZILA CHE FAUZI received the bachelor’s, master’s, and Ph.D. degrees in computer science from University Malaysia Pahang,
Malaysia. She has been working as a Lecturer
for a period of one year. She is currently a
Senior Lecturer with the Faculty of Computer and
Mathematical Sciences, University of Technology
Mara (Machang Campus), Kelantan, Malaysia.
Her main research interests include distributed
database, data grid, data mining, distributed systems, and cloud computing.
SHARIFAH HAFIZAH SY AHMAD UBAIDILLAH received the bachelor’s and master’s degrees
in computer science from the University of
Technology Malaysia, Malaysia. She is currently pursuing the Ph.D. degree with the Faculty of Computing, University Malaysia Pahang,
Malaysia. She has been working as a Research
Assistant with University Malaysia Pahang for
a period of six years. She is also working as a
Reviewer and had already reviewed many research
articles mostly in classification, feature selection, and forecasting researches.
Her current research interests include distributed database systems, fault
tolerance systems, artificial intelligence, and machine learning.
BASEM ALKAZEMI is currently a Professor with
the College of Computer and Information System, Umm Al-Qura University, Saudi Arabia. He
is also the Head of the Software Engineering
Research Group, Umm Al-Qura University. He is
also holding the position of vice dean for research
projects and grants in the deanship of scientific
research. His main research interests include software engineering, data mining, and machine learning. He is involved, as a PI, in a number of funded
research projects in the area of WSN, big data, and NLP. He served as a
reviewer in a number of international conferences and journals. He supervised several postgraduate students those conducted their theses in software
continuous delivery (CD), BPM, the IoT, big data for retails, and machine
translation.
JULIUS BENEOLUCHI ODILI received the Ph.D.
degree in computer science from University
Malaysia Pahang, Malaysia. He is currently working as a Senior Lecturer with the Department of
Mathematical Sciences, Anchor University Lagos,
Lagos, Nigeria. He is also the Head of the Mathematical Science Department, Anchor University
Lagos. He had already published 50 scientific
research articles in numerous international journals. His research interests include artificial intelligence, optimization, algorithm development and analysis, and Africa buffalo
optimization.
56177