Secure Data Objects Replication in Data Grid
Secure Data Objects Replication in Data Grid
ABSTRACT:
Secret sharing and erasure coding-based approaches have been used in distributed storage systems to ensure the confidentiality, integrity, and availability of critical information. To achieve performance goals in data accesses, these data fragmentation approaches can be combined ith dynamic replication. !n this paper, e consider data partitioning "both secret sharing and erasure coding# and dynamic replication in data grids, in hich security and data access performance e investigate the problem of are critical issues. $ore specifically,
optimal allocation of sensitive data objects that are partitioned by using secret sharing scheme or erasure coding scheme and%or replicated. The grid topology e consider consists of t o layers. !n the upper layer, ithin each cluster is represented by a tree multiple clusters form a net or& topology that can be represented by a general graph. The topology graph. 'e decompose the share replica allocation problem into t o subproblems( the Optimal !ntercluster Resident Set )roblem "O!RS)# that determines hich clusters need share replicas and the Optimal !ntracluster Share *llocation )roblem "O!S*)# that determines the number of share replicas needed in a cluster and their placements. 'e develop t o heuristic algorithms for the t o sub problems. +,perimental studies sho that the heuristic algorithms achieve good
Index Terms:
Secure data, secret sharing, erasure coding, replication, data grids
OBJECTIVES:
Secret sharing and erasure coding-based approaches have been used in distributed storage systems to ensure the confidentiality, integrity, and availability of critical information. To achieve performance goals in data accesses, these data fragmentation approaches can be combined replication. ith dynamic
EXISTING SYSTEM:
Security and data access performance are critical issues in e,isting system. The severe problem in e,isting system is optimal allocation of sensitive data objects. +,isting system doesn-t achieve data survivability, security, and access performance.
PROPOSED SYSTEM:
!n this paper, e consider data partitioning "both secret sharing and erasure coding# and dynamic replication in data grids, in security and data access performance are critical issues. 'e investigate the problem of optimal allocation of sensitive data objects that are partitioned by using secret sharing scheme or erasure coding scheme and%or replicated. 'e develop t o heuristic algorithms for the t o sub problems.
hich
*nd the O!S*) determines the number of share replicas needed in a cluster and their placements. that the heuristic algorithms achieve
SYSTEM SPECIFICATION
.*RD'*R+ /O01!G2R*T!O0
.ard dis& R*$ )rocessor $onitor ( ( ( ( 34 G5 678mb )entium !9 7:--/olor $onitor
SO1T'*R+ /O01!G2R*T!O0
1ront +nd Operating System 5ac& +nd ( ( ( ;ava 'indo s <). $yS=>
MODULES:
* .euristic *lgorithm. )erformance of the O!RS) .euristic *lgorithm. The +fficiency of the O!S*) SD)-Tree *lgorithm.
.Heuristics are intended to gain computational performance or conceptual simplicity, potentially at the cost of accuracy or precision
In computer science a heuristic is a technique designed to solve a problem that ignores whether the solution can be proven to be correct, but which usually produces a good solution or solves a simpler problem that contains or intersects with the solution of the more complex problem. Most real-time, and even some on-demand, anti-virus scanners use heuristic signatures to look for specific attributes and characteristics for detecting viruses and other forms of malware. Heuristic algorithms are often employed because they may be seen to work without having been mathematically proven to meet a given set of requirements. !ne common pitfall in implementing a heuristic method to meet a requirement comes when the engineer or designer fails to reali"e that the current data set does not necessarily represent future system states. #hile the existing data can be pored over and an algorithm can be devised to successfully handle the current data, it is imperative to ensure that the heuristic method employed is capable of handling future data sets. $his means that the engineer or designer must fully understand the rules that generate the data and develop the algorithm to meet those requirements and not %ust address the current data sets. If one seeks to use a heuristic as a means of solving a search or knapsack problem, then one must be careful to make sure that the heuristic function which one is choosing to use is an admissible heuristic. &iven a heuristic function labeled as'
If a heuristic is not admissible, it might never find the goal, by ending up in a dead end of graph G or by skipping back and forth between two nodes vi and vj where .
determines the
number of share replicas needed in a cluster and their placements. 'hen e consider allocation problem ithin a cluster .,, e can isolate the cluster and consider the problem independently. The all read re?uests from remote clusters can be vie ed as read re?uests from the root node. *lso, the / updates in the entire system can be e can simplify the
considered as updates done at the root node of the cluster. Thus, notation hen discussing allocation
determines hich clusters need to maintain share replication. 'e define the first problem, O!RS), as the optimal resident set problem in a general graphintercluster level graph ith an $asterSlave/luster .$S/. Our goal is to
determine the optimal R/ that yields minimum access cost at the cluster level
The goal of O!RS) is to determine the optimal resident set R/"Read cost# in G/. G/ is a general graph. +ach edge in G/ is considered as one hop. The optimal resident set problem in a general graph is an instance of the problem . !t has been sho n that the problem is 0)-complete. Thus, e develop a heuristic algorithm to find a near-optimal solution. Our
approach is to first build a minimal spanning tree in G/ ith R/ being the root and then identify the cluster to be added to R/ based on the tree structure. The clusters in G/ access data hosted in R/ along the shortest paths, and these paths and the clusters form a set of the shortest path trees. Since all the nodes in R/ are connected, e vie them as one virtual node S. Then, S, all
clusters that are not in R/, and all the shortest access paths form a tree rooted at S in the graph.
algorithm and the randomi@e $-replication algorithm. !n the e,periments, the trees are generated randomly by using the topology generator ith changing 0, D, and read%update ratio, here 0 is
the total number of nodes in the cluster, D is the ma,imum node degree, and read%update is the ratio of the average number of read re?uests in the cluster to the total number of update re?uests in the system