DBMS 3 1
DBMS 3 1
There can be various reasons for anomalies to occur in the database. For example, if there is a
lot of redundant data present in our database then DBMS anomalies can occur. If a table is
constructed in a very poor manner then there is a chance of database anomaly. Due to database
anomalies, the integrity of the database suffers.
The other reason for the database anomalies is that all the data is stored in a single table. So, to
remove the anomalies of the database, normalization is the process which is done where the
splitting of the table and joining of the table (different types of join) occurs.
Example:
In the above table, we have four columns which describe the details about the workers like
their name, address, department and their id. The above table is not normalized, and there is
definitely a chance of anomalies present in the table.
There can be three types of an anomaly in the database:
Update Anomaly
Definition: Occurs when updating data in a table leads to inconsistency. For example, if a
person's address is stored in multiple rows and one row is updated while others are not, it
creates conflicting data.
Example: If Ramesh's address changes and only some rows are updated, the table may show
different addresses for him, leading to confusion.
Insertion Anomaly
Definition: Arises when inserting new data into a table leads to inconsistency or loss of
important information. This often happens when certain values are mandatory but not available.
Example: If you want to add a new worker without assigning them to a department, but the
table structure requires a department ID, you can't insert this worker's data, leading to an
anomaly.
Deletion Anomaly
Definition: Happens when deleting data inadvertently removes other valuable information.
This often occurs when related data is stored together inappropriately.
Example: If you delete a department and all its associated employees’ records are removed,
you lose valuable employee information that might be needed for other purposes.
To remove this type of anomalies, we will normalize the table or split the table or join the
tables. There can be various normalized forms of a table like 1NF, 2NF, 3NF, BCNF etc. we
will apply the different normalization schemes according to the current form of the table.
Relational Decomposition:
When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
In a database, it breaks the table into multiple tables.
If the relation has no proper decomposition, then it may lead to problems like loss of
information.
Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Types of Decomposition:
1. Lossless Decomposition
If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
The lossless decomposition guarantees that the join of relations will result
in the same relation as it was decomposed.
The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table:
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:
Employee ⋈ Department
2. Dependency Preserving
It is an important constraint of the database.
In the dependency preservation, at least one decomposed table must satisfy every
dependency.
If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).
Functional dependency:
Notation
X→Y
Example
Emp_Id → Emp_Name
This means that for each unique value of Emp_Id, there is a corresponding Emp_Name.
Knowing the Emp_Id allows us to determine the Emp_Name associated with it.
A functional dependency A→B is considered trivial if B is a subset of A. This means that the
information in B can be derived from an alone. Common examples include:
A→A
Employee_Id,Employee_Name→Employee_Id
Trivial Dependencies are often not useful for normalization processes since they don’t
provide new information about the relationships between attributes.
ID→Name
Name→DOB
Non-Trivial Dependencies play a crucial role in database design and normalization, helping
to eliminate redundancy and ensure data integrity.
Armstrong’s Axioms:
AXIOMS: The Axioms are a set rules, that when applied to a specified to a specific set,
generates a closure of functional dependencies.
Armstrong’s Axioms has a different set of rules:
1. Reflexivity Rule: If A is a set of attributes and B is a subset of a, then a holds B.
If B⊆A then A→B.
This property is trivial property.
2. Augmentation Rule: If A→B holds and Y is the attribute set, then AY→BY also
holds. That is adding attributes to dependencies, does not change the basic
dependencies.
If A→B, then AC→BC for any C.
3. Transitivity Rule: Same as the transitive rule in algebra, if A→B holds
and B→C holds, then A→C also holds. A→B is called A functionally which
determines B.
If A→B and B→C, then A→C
4. Union Rule: If A→B holds and A→C holds, then A→BC holds.
If A→B and A→C then A→BC.
5. Composition: If A→B and X→Y hold, then AX→BY holds.
6. Decomposition: If A→BC holds then A→B and A→C hold.
If A→BC then A→B and A→C.
7. Pseudo Transitivity: If A→B holds and BC→D holds, then AC→D holds.
If A→B and BC→D then AC→D.
Normalization:
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations.
It is also used to eliminate undesirable characteristics like Insertion, Update, and
Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
7272826385,
14 John UP
9064738238
7390372389,
12 Sam Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
3NF [ Third Normal Form]
Normalization is the process of organizing data in a database to reduce redundancy and
improve data integrity. The Third Normal Form (3NF) specifically aims to eliminate
transitive dependencies.
Note: The table is in 2NF, meaning it already satisfies the requirements of the First Normal
Form (1NF) and the Second Normal Form (2NF).
Functional dependencies describe the relationship between attributes. Here are the
dependencies in our data:
1. std-id → course, fee: The student ID uniquely determines both the course and the fee.
2. std-id → course: The student ID determines the course.
3. std-id → fee: The student ID determines the fee.
4. course → fee: The course determines the fee.
course → fee: The fee depends on the course, not directly on the primary key (std-id).
This table captures the relationship between students and their enrolled courses.
Primary Key: std-id (uniquely identifies each student)
This table captures the relationship between courses and their fees.
o Contains student IDs and the courses they are enrolled in. Each student is
identified by their unique std-id.
o Contains information about courses and their associated fees. Each course is
uniquely identified by the course attribute.
Boyce-Codd normal form (BCNF)
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Canonical Cover:
A→B
A, B→D
B→D
This set of dependencies is minimal and equivalent to the original set.
2. Maxima Cover:
The maxima cover refers to the opposite of a canonical cover. It involves a set of
functional dependencies that might include redundant or unnecessary dependencies,
but it represents the maximum possible coverage of functional dependencies within a
database schema.
A maxima cover is not minimized, and it may include unnecessary dependencies.
Example:
Consider the same set of functional dependencies from above:
A→B, C
A, B→D
A→C
B→D
In the maxima cover, we would keep all of these dependencies as they are, without
eliminating redundancies.
Y's Account
Open_Account(Y)
Old_Balance = Y.balance
New_Balance = Old_Balance + 800
Y.balance = New_Balance
Close_Account(Y)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores
it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the
buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
1. R(X);
2. X = X - 500;
3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
The first operation reads X's value from database and stores it in a buffer.
The second operation will decrease the value of X by 500. So buffer will contain
3500.
The third operation will write the buffer's value to the database. So X's final value will
be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation
2 then X's value will remain 4000 in the database which is not acceptable by the bank.
Single user databases do not have Multiple users can access databases and use
multiprogramming thus, a single CPU computer systems simultaneously because
can only execute at most one process at of the concept of Multiprogramming.
a time.
The data is neither integrated nor shared The data is integrated and shared among
among any other user. other users.
Designed to be used by only one user at Designed to be accessed by multiple users
a time simultaneously
Typically installed on a single computer Typically installed on a network server
Can only be accessed by the user who Can be accessed by users who are logged in
installed it or the user who is currently to the network
logged in
Simpler and less expensive than multi- More complex and requires more resources
user systems than single-user systems
Not suitable for environments where Essential for organizations where multiple
multiple users need to access the same users need to access the same data at the
data at the same time same time
A single central processing unit (CPU) can only execute at most one process at a time.
However, multi-programming operating systems execute some commands from one process,
then suspend that process and execute some commands from the next process, and so on. A
process is resumed at the point where it was suspended whenever it gets its turn to use the
CPU again. Hence, concurrent execution of processes is actually interleaved, as illustrated in
the figure below:
The above figure shows two processes, A and B, executing concurrently in an interleaved
fashion. Interleaving keeps the CPU busy when a process requires an input or output (I/O)
operation, such as reading a block from a disk. The CPU is switched to execute another
process rather than remaining idle during I/O time. Interleaving also prevents a long process
from delaying other processes. If the computer system has multiple hardware processors
(CPUs), parallel processing of multiple processes is possible, as illustrated by processes C
and D in the above figure.
Concurrency Control:
Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
Concurrent Execution in DBMS
In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same database
is executed simultaneously on a multi-user system by different users.
While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case, concurrent
execution of the database is performed.
The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of the
transaction operations, there occur several challenging problems that need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So,
there is a need to manage these two operations in the concurrent execution of the transactions
as if these operations are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution of the
operations:
At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
Alternately, at time t3, transaction TY reads the value of account a will be $300 only
because TX didn't update the value yet.
At time t4, transaction TY adds $100 to account A that becomes $400 (only added but
not updated/write).
At time t6, transaction TX writes the value of account a will be updated as $250 only,
as TY didn't update the value yet.
Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is
lost.
Hence data becomes incorrect, and database sets to inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations
on account A where the available balance in account A is $300:
Failures in Transaction:
Failure in terms of a database can be defined as its inability to execute the specified transaction
or loss of data from the database. A DBMS is vulnerable to several kinds of failures and each
of these failures needs to be managed differently. There are many reasons that can cause
database failures such as network failure, system crash, natural disasters, carelessness, sabotage
(corrupting the data intentionally), software errors, etc.
Failure Classification in DBMS
A failure in DBMS can be classified as:
Transaction Failure:
If a transaction is not able to execute or it comes to a point from where the transaction becomes
incapable of executing further then it is termed as a failure in a transaction.
Reason for a transaction failure in DBMS:
Logical error: A logical error occurs if a transaction is unable to execute because of
some mistakes in the code or due to the presence of some internal faults.
System error: Where the termination of an active transaction is done by the database
system itself due to some system issue or because the database management system is
unable to proceed with the transaction. For example– The system ends an operating
transaction if it reaches a deadlock condition or if there is an unavailability of resources.
System Crash:
A system crash usually occurs when there is some sort of hardware or software breakdown.
Some other problems which are external to the system and cause the system to abruptly stop
or eventually crash include failure of the transaction, operating system errors, power cuts, main
memory crash, etc.
These types of failures are often termed soft failures and are responsible for the data losses in
the volatile memory. It is assumed that a system crash does not have any effect on the data
stored in the non-volatile storage and this is known as the fail-stop assumption.
Data-transfer Failure:
When a disk failure occurs amid data-transfer operation resulting in loss of content from disk
storage then such failures are categorized as data-transfer failures. Some other reason for disk
failures includes disk head crash, disk unreachability, formation of bad sectors, read-write
errors on the disk, etc.
In order to quickly recover from a disk failure caused amid a data-transfer operation, the backup
copy of the data stored on other tapes or disks can be used. Thus it’s a good practice to back
up your data frequently.
Transaction States:
In Database Management Systems (DBMS), a transaction is indeed a group of operations that
are logically connected and executed as a single unit to ensure data integrity and consistency.
To maintain consistency, transactions follow the ACID properties (Atomicity, Consistency,
Isolation, Durability), which ensure that even in concurrent environments, database integrity is
preserved.
A Transaction log is a file maintained by the recovery management component to record all
the activities of the transaction. After the commit is done transaction log file is removed.
1. Active State – When the instructions of the transaction are running then the transaction is in
active state. If all the ‘read and write’ operations are performed without any error then it goes
to the “partially committed state”; if any instruction fails, it goes to the “failed state”.
2. Partially Committed – After completion of all the read and write operation the changes are
made in main memory or local buffer. If the changes are made permanent on the Database then
the state will change to “committed state” and in case of failure it will go to the “failed state”.
3. Failed State – When any instruction of the transaction fails, it goes to the “failed state” or
if failure occurs in making a permanent change of data on Database.
4. Aborted State – After having any type of failure the transaction goes from “failed state” to
“aborted state” and since in previous states, the changes are only made to local buffer or main
memory and hence these changes are deleted or rolled-back.
5. Committed State – It is the state when the changes are made permanent on the Data Base
and the transaction is complete and therefore terminated in the “terminated state”.
6. Terminated State – If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new transaction and the old
transaction is terminated.
If the transaction fails after completion of T1 but before completion of T2. (Say, after write(X)
but before write(Y)), then the amount has been deducted from X but not added to Y. This
results in an inconsistent database state. Therefore, the transaction must be executed in its
entirety in order to ensure the correctness of the database state.
Consistency:
This means that integrity constraints must be maintained so that the database is consistent
before and after the transaction. It refers to the correctness of a database. Referring to the
example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails.
As a result, T is incomplete.
Isolation:
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of the database state. Transactions occur independently without interference.
Changes occurring in a particular transaction will not be visible to any other transaction until
that particular change in that transaction is written to memory or has been committed. This
property ensures that the execution of transactions concurrently will result in a state that is
equivalent to a state achieved these were executed serially in some order.
Let X = 500, Y = 500.
Consider two transactions T and T”.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving of
operations takes place due to which T’’ reads the correct value of X but the incorrect value of
Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
Is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main
memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory.
The effects of the transaction, thus, are never lost.