Distributed Data Systems: Sesapzg554
Distributed Data Systems: Sesapzg554
SESAPZG554
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
Parthasarathy
1
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
SESAPZG554 – CS#5
Distributed Database Design
Issues & Integration Part 2
2
Agenda for CS #5
1) Recap of Sessions
2) Vertical Fragmentation
Attribute usage Matrix (AU Matrix)
Attribute affinity Matrix (AA Matrix)
Clustered affinity Matrix (CA Matrix) using Bond Energy
Algorithm
3) Hybrid Fragmentation
4) Allocation
5) Bottom-up Design Methodology
6) Schema Matching, Integration & mapping
7) Data Cleaning
8) Portions for Mid-Semester Examination (EC2)
3
10
11
i j
is maximized.
12
where
n
bond(Ax,Ay) = aff(A ,A ) aff(A ,A )
z x z y
z 1
14
Step 01:
The column
values are from
AA matrix and A0
is pseudo column
The column
values are from
AA matrix and A0
is pseudo column
The column
values are from
AA matrix and A0
is pseudo column
20
21
23
25
26
Minimal cost
The cost function consists of the cost of storing each Fi at a site Sj,
the cost of querying Fi at site Sj , the cost of updating Fi at all sites
where it is stored, and the cost of data communication.
The allocation problem, then, attempts to find an allocation
scheme that minimizes a combined cost function.
Why such models are not available ?
Performance
The allocation strategy is designed to maintain a performance
metric.
Two well-known ones are to minimize the response time and to
maximize the system throughput at each site. 28
29
30
31
34
38
39
40
42