Lt17 Decomposition
Lt17 Decomposition
Lecture 17 Decomposition
What is Decomposition?
Decomposition the process of breaking down in parts or elements. Decomposition in database means breaking tables down into multiple tables From Database perspective means going to a higher normal form
Decomposition
Important that decompositions are good, Two Characteristics of Good Decompositions 1) Lossless 2) Preserve dependencies
Some queries become more expensive Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation information loss.
What is lossless?
Lossless means functioning without a loss. In other words, retain everything. Important for databases to have this feature.
Lossless Decomposition
A decomposition is lossless if we can recover: R(A,B,C)
Decompose
R1(A,B)
R2(A,C)
Recover
Lossless Decomposition
Sometimes the same set of data is reproduced:
Name Word Oracle Access Name Word Oracle Access Price 100 1000 100 Price 100 1000 100 Category WP DB DB Name Word Oracle Access Category WP DB DB
(Word, 100) + (Word, WP) (Word, 100, WP) (Oracle, 1000) + (Oracle, DB) (Oracle, 1000, DB) (Access, 100) + (Access, DB) (Access, 100, DB)
Lossy Decomposition
Sometimes its not:
Name Word Oracle Access Category WP DB Name Word Oracle Access Price 100 1000 100 Category WP DB DB Category WP DB DB Price 100 1000 100
Whats wrong?
DB
(Word, WP) + (100, WP) = (Word, 100, WP) (Oracle, DB) + (1000, DB) = (Oracle, 1000, DB) (Oracle, DB) + (100, DB) = (Oracle, 100, DB) (Access, DB) + (1000, DB) = (Access, 1000, DB) (Access, DB) + (100, DB) = (Access, 100, DB)
Lossy Decomposition T Employee Samu Soumini Soumini Sekhar Sekhar Project Mars Jupiter Venus Saturn Venus Branch Warangal Hyderabad Hyderabad Hyderabad Hyderabad
Branch
Proje ct
Lossy Decomposition
Decomposition of the previous relation
T1
Employee Samu Soumini Sekhar Branch Warangal Hyderabad Hyderabad Project Mars Jupiter Saturn Venus
T2
Branch Warangal Hyderabad Hyderabad Hyderabad
Original Relation
Employee Samu Soumini Soumini Sekhar Sekhar Project Mars Jupiter Venus Saturn Venus Branch Warangal Hyderabad Hyderabad Hyderabad Hyderabad
The result is different from the original relation: the information can not be reconstructed.
R(A1,, ..., An,, B1,, ..., Bm,, C1,, ..., Cp)) R(A1 ..., An B1 ..., Bm C1 ..., Cp
If A1, ..., An B1, ..., Bm or A1, ..., An C1, ..., Cp Then the decomposition is lossless Note: dont need both
In Simpler Terms
R1 R2 R1 R1 R2 R2 If R is split into R1 and R2, for the decomposition to be lossless then at least one of the two should hold true. Projecting on R1 and R2, and joining back, results in the relation you started with
Why lossless?
Ensures that attributes involved in the natural join (R1 R2) are a candidate key for at least one of the two relations. This ensures we can never get the situation where false tuples are generated, as for any value on the join attributes there will be a unique tuple in one of the relations.
GIVEN: LENDINGSCHEME=(BRANCHNAME, ASSETS, BRANCHCITY, LOANNUMBER, CUSTOMERNAME, AMOUNT) FD'S: BRANCHNAME ASSETS BRANCHCITY LOANNUMBER AMOUNT BRANCHNAME DECOMPOSE LENDINGSCHEME INTO: 1. BRANCHSCHEME=(BRANCHNAME, ASSETS, BRANCHCITY) 2. BORROWSCHEME=(BRANCHNAME, LOANNUMBER, CUSTOMERNAME, AMOUNT)
Example 2
GIVEN: BORROWSCHEME=(BRANCHNAME, LOANNUMBER, CUSTOMERNAME, AMOUNT) FD'S: LOANNUMBER AMOUNT BRANCHNAME
Example 2 (cont)
SHOW THAT THE DECOMPOSITION IS A LOSSLESS DECOMPOSITION 1. USE AUGMENTATION RULE ON FD TO OBTAIN: LOANNUMBER LOANNUMBER AMOUNT BRANCHNAME 1. INTERSECTION OF LOAN-INFO-SCHEME AND CUSTOMERLOAN-SCHEME IS LOANNUMBER 1. LOANNUMBER LOAN-INFO-SCHEME 1. SO, INITIAL DECOMPOSITION IS A LOSSLESS
Example
R1 (A1, A2, A3, A5) R2 (A1, A3, A4) R3 (A4, A5) FD1: A1 A3 A5 FD2: A5 A1 A4 FD3: A3 A4 A2
Example (cont)
A1 R1 R2 R3 a(1) a(1) b(3,1) A2 a(2) b(2,2) b(3,2) A3 a(3) a(3) b(3,3) A4 b(1,4) a(4) a(4) A5 a(5) b(2,5) a(5)
Example (cont)
By FD1: A1 A3 A5
A3 a(3)
A4
a(3) b(3,3)
Example (cont)
By FD1: A1 A3 A5 we have a new result table A1 R1 R2 R3 a(1) a(1) b(3,1) A2 a(2) b(2,2) b(3,2) A3 a(3) a(3) b(3,3) A4 A5 b(1,4) a(4) a(4) a(5) a(5) a(5)
Example (cont)
By FD2: A5 A1 A4
A3 a(3)
A4
a(3) b(3,3)
Example (cont)
By FD2: A5 A1 A4 we have a new result table A1 R1 R2 R3 a(1) a(1) a(1) A2 a(2) b(2,2) b(3,2) A3 a(3) a(3) b(3,3) A4 a(4) a(4) a(4) A5 a(5) a(5) a(5)
Example 1
R(A B C D E) FD1 = (A B) FD2 = (BC E) FD3 = (ED A) R1=(AB); R2=(ACDE);
Example 2
Is this decomposition lossless? R (A B C D E) FD1 AB C FD2 C E FD3 BD FD4 EA R1=(BCD); R2=(ACE);
Example 3
R(A B C D E) FD1: A BC FD2: BD CE FD3: E AD FD4: CE A R1(ABC) = R2 (BCDE) =
Conclusion
Decomposing is the act of breaking tables down in order to achieve higher normal form. Decompositions should always be lossless. This confirms that information in the original relation can be accurately reconstructed based on the decomposed relations. Remember that for a decomposition to be considered GOOD it must also preserve functional dependencies.