DBMS Unit4
DBMS Unit4
1. Select Operation:
1. Notation: σ p(r)
Where:
Input:
1. σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.
Where
Input:
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Input:
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each
row in the other table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
1. ρ(STUDENT1, STUDENT)
Join Operations:
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
Result:
o A natural join is the set of tuples of all combinations in R and S that are equal
on their common attribute names.
o It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with
missing information.
Example:
EMPLOYEE
FACT_WORKERS
Input:
1. (EMPLOYEE ⋈ FACT_WORKERS)
Output:
o Left outer join contains the set of tuples of all combinations in R and S that are
equal on their common attribute names.
o In the left outer join, tuples in R have no matching tuples in S.
o It is denoted by ⟕.
Input:
1. EMPLOYEE ⟕ FACT_WORKERS
o Right outer join contains the set of tuples of all combinations in R and S that
are equal on their common attribute names.
o In right outer join, tuples in S have no matching tuples in R.
o It is denoted by ⟖.
Input:
1. EMPLOYEE ⟖ FACT_WORKERS
Output:
o Full outer join is like a left or right join except that it contains all rows from both
tables.
o In full outer join, tuples in R that have no matching tuples in S and tuples in S
that have no matching tuples in R in their common attribute name.
o It is denoted by ⟗.
Input:
1. EMPLOYEE ⟗ FACT_WORKERS
Output:
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched
data as per the equality condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
1. CUSTOMER ⋈ PRODUCT
Output:
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is
known as a dependent.
For example:
1. Emp_Id → Emp_Name
Example:
Example:
1. ID → Name,
2. Name → DOB
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In augmentation, if X determines Y,
then XZ determines YZ for any Z.
If X → Y then XZ → YZ
Example:
For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3)
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z
separately.
If X → YZ then X → Y and X → Z
Proof:
1.X→YZ (given)
2. Y Z→Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
Normalization
A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
So to handle these problems, we should analyse and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.
The main reason for normalizing the relations is removing these anomalies. Failure
to eliminate anomalies leads to data redundancy and can cause data integrity and
other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher
degree.
o Careless decomposition may lead to a bad database design, leading to
serious problems.
When the sub relations combine again then the new relation must be the same as the
original relation was before decomposition.
If we union the sub Relation R1 and R2 then it must contain all the attributes that
are available in the original relation R before decomposition.
Intersections of R1 and R2 cannot be Null. The sub relation must contain a
common attribute. The common attribute must contain unique data.
The common attribute must be a super key of sub relations either R1 or R2.
Here,
R = (A, B, C)
R1 = (A, B)
R2 = (B, C)
The relation R has three attributes A, B, and C. The relation R is decomposed into two
relation R1 and R2. . R1 and R2 both have 2-2 attributes.The common attributes are B.
The Value in Column B must be unique. if it contains a duplicate value then the
Lossless-join decomposition is not possible.
R (A, B, C)
A B C
12 25 34
10 36 09
12 42 30
R1 (A, B)
A B
12 25
10 36
12 42
R2 (B, C)
B C
25 34
36 09
42 30
R1U R2 = R
A B C
12 25 34
10 36 09
12 42 30
The relation is the same as the original relation R. Hence, the above decomposition is
Lossless-join decomposition.
Transaction Management
Transactions are a set of operations used to perform a logical set of work.
It is the bundle of all the instructions of a logical operation. A transaction
usually means that the data in the database has changed. One of the
major uses of DBMS is to protect the user’s data from system failures. It is
done by ensuring that all the data is restored to a consistent state when
the computer is restarted after a crash. The transaction is any one
execution of the user program in a DBMS. One of the important properties
of the transaction is that it contains a finite number of steps. Executing the
same program multiple times will generate multiple transactions.
Example: Consider the following example of transaction operations to be
performed to withdraw cash from an ATM vestibule.
Steps for ATM Transaction
1. Transaction Start.
2. Insert your ATM card.
3. Select a language for your transaction.
4. Select the Savings Account option.
5. Enter the amount you want to withdraw.
6. Enter your secret pin.
7. Wait for some time for processing.
8. Collect your Cash.
9. Transaction Completed.
A transaction can include the following basic database access
operation.
Read/Access data (R): Accessing the database item from disk
(where the database stored data) to memory variable.
Write/Change data (W): Write the data item from the memory
variable to the disk.
Commit: Commit is a transaction control language that is used to
permanently save the changes done in a transaction
Example: Transfer of 50₹ from Account A to Account B. Initially A= 500 ₹,
B= 800₹. This data is brought to RAM from Hard Disk.
Stages of Transaction
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a
point from where it can’t go any further. If a few transactions or process is
hurt, then this is called as transaction failure.
Reasons for a transaction failure could be –
1. Logical errors: If a transaction cannot complete due to some
code error or an internal error condition, then the logical error
occurs.
2. Syntax error: It occurs where the DBMS itself terminates an
active transaction because the database system is not able to
execute it. For example, The system aborts an active transaction,
in case of deadlock or resource unavailability.
2. System Crash
System failure can occur due to power failure or other hardware or
software failure. Example: Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage
is assumed not to be corrupted.
3. Disk Failure
It occurs where hard-disk drives or storage drives used to fail
frequently. It was a common problem in the early days of
technology evolution.
Disk failure occurs due to the formation of bad sectors, disk head
crash, and unreachability to the disk or any other failure, which
destroy all or part of disk storage.
1. Serial Schedule
The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction. In the serial schedule, when the first
transaction completes its cycle, then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some
operations. If it has no interleaving of operations, then there are the following two
possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of
T2.
2. Execute all the operations of T1 which was followed by all the operations of
T2.
o In the given (a) figure, Schedule A shows the serial schedule where T1
followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2
followed by T1.
2. Non-serial Schedule
o If interleaving of operations is allowed, then there will be non-serial schedule.
o It contains many possible orders in which the system can execute the
individual operations of the transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial
schedules. It has interleaving of operations.
3. Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow
the transaction to execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction
have interleaving of their operations.
o A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Here,
Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.
o If a precedence graph contains a single edge Ti → Tj, then all the instructions
of Ti are executed before the first instruction of Tj is executed.
o If a precedence graph for schedule S contains a cycle, then S is non-
serializable. If the precedence graph has no cycle, then S is known as
serializable.
For example:
Explanation:
The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is
serializable.
Conflicting Operations
The two operations become conflicting if all conditions satisfy:
Example:
Swapping is possible only if S1 and S2 are logically equal.
Conflict Equivalent
In the conflict equivalent, one can be transformed to another by swapping non-
conflicting operations. In the given example, S2 is conflict equivalent to S1 (S1 can
be converted to S2 by swapping non-conflicting operations).
Example:
Schedule S2 is a serial schedule because, in this, all operations of T1 are performed
before starting any operation of T2. Schedule S1 can be transformed into a serial
schedule by swapping non-conflicting operations of S1.
T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
View Serializability
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:
1. Initial Read
An initial read of both schedules must be the same. Suppose two schedule S1 and
S2. In schedule S1, if a transaction T1 is reading the data item A, then in S2,
transaction T1 should also read A.
Above two schedules are view equivalent because Initial read operation in S1 is
done by T1 and in S2 it is also done by T1.
2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should
read A which is updated by Tj.
Above two schedules are not view equal because, in S1, T3 is reading A updated by
T2 and in S2, T3 is reading A updated by T1.
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a
transaction T1 updates A at last then in S2, final writes operations should also be
done by T1.
Above two schedules is view equal because Final write operation in S1 is done by
T3 and in S2, the final write operation is also done by T3.
Example:
Schedule S
1. = 3! = 6
2. S1 = <T1 T2 T3>
3. S2 = <T1 T3 T2>
4. S3 = <T2 T3 T1>
5. S4 = <T2 T1 T3>
6. S5 = <T3 T1 T2>
7. S6 = <T3 T2 T1>
In both schedules S and S1, there is no read except the initial read that's why we
don't need to check that condition.
The initial read operation in S is done by T1 and in S1, it is also done by T1.
The final write operation in S is done by T3 and in S1, it is also done by T3. So, S
and S1 are view Equivalent.
The first schedule S1 satisfies all three conditions, so we don't need to check
another schedule.
1. T1 → T2 → T3