0% found this document useful (0 votes)
19 views27 pages

DBMS 3 1

The document discusses data normalization in relational databases, highlighting the concept of anomalies that can occur due to poor design, redundancy, and improper data storage. It explains different types of anomalies (update, insertion, deletion), the process of normalization to eliminate these issues, and various normal forms (1NF, 2NF, 3NF, BCNF) that help organize data effectively. Additionally, it covers functional dependencies and Armstrong's Axioms, which are essential for understanding relationships between attributes in database design.

Uploaded by

Ajay Kumar R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views27 pages

DBMS 3 1

The document discusses data normalization in relational databases, highlighting the concept of anomalies that can occur due to poor design, redundancy, and improper data storage. It explains different types of anomalies (update, insertion, deletion), the process of normalization to eliminate these issues, and various normal forms (1NF, 2NF, 3NF, BCNF) that help organize data effectively. Additionally, it covers functional dependencies and Armstrong's Axioms, which are essential for understanding relationships between attributes in database design.

Uploaded by

Ajay Kumar R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Data Normalization

Anomalies in Relational Database Design:


Anomaly means inconsistency in the pattern from the normal form. In Database Management
System (DBMS), anomaly means the inconsistency occurred in the relational table during the
operations performed on the relational table.

There can be various reasons for anomalies to occur in the database. For example, if there is a
lot of redundant data present in our database then DBMS anomalies can occur. If a table is
constructed in a very poor manner then there is a chance of database anomaly. Due to database
anomalies, the integrity of the database suffers.

The other reason for the database anomalies is that all the data is stored in a single table. So, to
remove the anomalies of the database, normalization is the process which is done where the
splitting of the table and joining of the table (different types of join) occurs.

We will see the anomalies present in a table by the different examples:

Example:

Worker_id Worker_name Worker_dept Worker_address

65 Ramesh ECT001 Jaipur

65 Ramesh ECT002 Jaipur

73 Amit ECT002 Delhi

76 Vikas ECT501 Pune

76 Vikas ECT502 Pune

79 Rajesh ECT669 Mumbai

In the above table, we have four columns which describe the details about the workers like
their name, address, department and their id. The above table is not normalized, and there is
definitely a chance of anomalies present in the table.
There can be three types of an anomaly in the database:
 Update Anomaly
Definition: Occurs when updating data in a table leads to inconsistency. For example, if a
person's address is stored in multiple rows and one row is updated while others are not, it
creates conflicting data.
Example: If Ramesh's address changes and only some rows are updated, the table may show
different addresses for him, leading to confusion.

 Insertion Anomaly
Definition: Arises when inserting new data into a table leads to inconsistency or loss of
important information. This often happens when certain values are mandatory but not available.
Example: If you want to add a new worker without assigning them to a department, but the
table structure requires a department ID, you can't insert this worker's data, leading to an
anomaly.

 Deletion Anomaly
Definition: Happens when deleting data inadvertently removes other valuable information.
This often occurs when related data is stored together inappropriately.
Example: If you delete a department and all its associated employees’ records are removed,
you lose valuable employee information that might be needed for other purposes.

To remove this type of anomalies, we will normalize the table or split the table or join the
tables. There can be various normalized forms of a table like 1NF, 2NF, 3NF, BCNF etc. we
will apply the different normalization schemes according to the current form of the table.

Relational Decomposition:
 When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
 In a database, it breaks the table into multiple tables.
 If the relation has no proper decomposition, then it may lead to problems like loss of
information.
 Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.

Types of Decomposition:
1. Lossless Decomposition
 If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
 The lossless decomposition guarantees that the join of relations will result
in the same relation as it was decomposed.
 The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida
DEPARTMENT table:
DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

2. Dependency Preserving
 It is an important constraint of the database.
 In the dependency preservation, at least one decomposed table must satisfy every
dependency.
 If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
 For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).
Functional dependency:

 Definition: A functional dependency is a relationship between two attributes in a


table, where one attribute (the determinant) uniquely determines another attribute (the
dependent). It is crucial in understanding how attributes relate to each other.

Notation

 Notation: If X and Y are attributes in a table, the functional dependency can be


represented as:

X→Y

Here, X is the determinant and Y is the dependent.

Example

Consider an Employee table with the following attributes:

 Emp_Id: Unique identifier for each employee


 Emp_Name: Name of the employee
 Emp_Address: Address of the employee

In this case, we can establish the following functional dependency:

Emp_Id → Emp_Name

This means that for each unique value of Emp_Id, there is a corresponding Emp_Name.
Knowing the Emp_Id allows us to determine the Emp_Name associated with it.

Importance of Functional Dependency


 Normalization: Functional dependencies are used to identify and eliminate
redundancy in database design through normalization.
 Data Integrity: They help maintain the integrity of the data by ensuring that
relationships between attributes are preserved.
 Schema Design: Understanding FDs aids in designing a well-structured schema that
accurately reflects the real-world relationships among entities.

Types of Functional dependency


1. Trivial Functional Dependency

A functional dependency A→B is considered trivial if B is a subset of A. This means that the
information in B can be derived from an alone. Common examples include:

 A→A
 Employee_Id,Employee_Name→Employee_Id

Trivial Dependencies are often not useful for normalization processes since they don’t
provide new information about the relationships between attributes.

2. Non-Trivial Functional Dependency

A functional dependency A→B is non-trivial if B is not a subset of A. If A and B share no


common attributes, then it is classified as a complete non-trivial dependency. Examples
include:

 ID→Name
 Name→DOB

Non-Trivial Dependencies play a crucial role in database design and normalization, helping
to eliminate redundancy and ensure data integrity.

Armstrong’s Axioms:

AXIOMS: The Axioms are a set rules, that when applied to a specified to a specific set,
generates a closure of functional dependencies.
Armstrong’s Axioms has a different set of rules:
1. Reflexivity Rule: If A is a set of attributes and B is a subset of a, then a holds B.
If B⊆A then A→B.
This property is trivial property.
2. Augmentation Rule: If A→B holds and Y is the attribute set, then AY→BY also
holds. That is adding attributes to dependencies, does not change the basic
dependencies.
If A→B, then AC→BC for any C.
3. Transitivity Rule: Same as the transitive rule in algebra, if A→B holds
and B→C holds, then A→C also holds. A→B is called A functionally which
determines B.
If A→B and B→C, then A→C
4. Union Rule: If A→B holds and A→C holds, then A→BC holds.
If A→B and A→C then A→BC.
5. Composition: If A→B and X→Y hold, then AX→BY holds.
6. Decomposition: If A→BC holds then A→B and A→C hold.
If A→BC then A→B and A→C.
7. Pseudo Transitivity: If A→B holds and BC→D holds, then AC→D holds.
If A→B and BC→D then AC→D.

Normalization:
 Normalization is the process of organizing the data in the database.
 Normalization is used to minimize the redundancy from a relation or set of relations.
It is also used to eliminate undesirable characteristics like Insertion, Update, and
Deletion Anomalies.
 Normalization divides the larger table into smaller and links them using relationships.
 The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?


The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other problems
as the database grows. Normalization consists of a series of guidelines that helps to guide you
in creating a good database structure.

Data modification anomalies can be categorized into three types:


Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a
relationship due to lack of data.
Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
Updatation Anomaly: The update anomaly is when an update of a single data value requires
multiple rows of data to be updated.

Types of Normal Forms:

1.1NF [First Normal Form] – Eliminate Repeating Groups


2.2NF [Second Normal Form] – Eliminate Partial Functional Dependency
3.3NF [Third Normal Form] – Eliminate Transitive Dependency
 1NF [First Normal Form]
 A relation will be 1NF if it contains an atomic value.
 It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
 First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

7272826385,
14 John UP
9064738238

20 Harry 8574783832 Bihar

7390372389,
12 Sam Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

 2NF [ Second Normal Form]


 In the 2NF, relational must be in 1NF.
 In the second normal form, all non-key attributes are fully functional dependent on the
primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In
a school, a teacher can teach more than one subject.
TEACHER table:
TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID


which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:
TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer
 3NF [ Third Normal Form]
Normalization is the process of organizing data in a database to reduce redundancy and
improve data integrity. The Third Normal Form (3NF) specifically aims to eliminate
transitive dependencies.

We start with a 2NF table:

Note: The table is in 2NF, meaning it already satisfies the requirements of the First Normal
Form (1NF) and the Second Normal Form (2NF).

Functional dependencies describe the relationship between attributes. Here are the
dependencies in our data:

1. std-id → course, fee: The student ID uniquely determines both the course and the fee.
2. std-id → course: The student ID determines the course.
3. std-id → fee: The student ID determines the fee.
4. course → fee: The course determines the fee.

A transitive dependency occurs when a non-key attribute depends on another non-key


attribute. In our case:

 course → fee: The fee depends on the course, not directly on the primary key (std-id).

Note: This is a key point to address when moving to 3NF.

To eliminate the transitive dependency, we will create two separate tables:

Table 1: Student Course

This table captures the relationship between students and their enrolled courses.
 Primary Key: std-id (uniquely identifies each student)

Table 2: Course Fee

This table captures the relationship between courses and their fees.

 Primary Key: course (uniquely identifies each course)

Table 1 (Student Course):

o Contains student IDs and the courses they are enrolled in. Each student is
identified by their unique std-id.

Table 2 (Course Fee):

o Contains information about courses and their associated fees. Each course is
uniquely identified by the course attribute.
Boyce-Codd normal form (BCNF)

 BCNF is the advance version of 3NF. It is stricter than 3NF.


 A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
 For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India
EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Minima and Maxima cover:


The terms Minima or Canonical Cover and Maxima Cover refer to concepts in Normalization
and Functional Dependency management, especially in the context of database schema
optimization. They are used to reduce or simplify the set of functional dependencies to make
the schema more efficient without losing any information.

1. Canonical (Minima) Cover:


 A canonical cover (also known as a minimal cover) is a set of functional dependencies
that is both minimal and equivalent to the original set of dependencies.
 This cover helps to eliminate redundant or unnecessary functional dependencies.
 A canonical cover should contain the least number of functional dependencies that
preserve the same information as the original set.
Steps to find the Canonical Cover:
1. Decompose multi-attribute right-hand side dependencies: If a functional
dependency has multiple attributes on the right-hand side, split them into multiple
dependencies.
2. Remove redundant dependencies: If a dependency can be derived from others,
remove it.
3. Remove redundant attributes: If an attribute on the left side of a functional
dependency is unnecessary, remove it.
Example:
Consider the following set of functional dependencies for a table R with attributes
{A, B, C, D}:
A→B, C
A, B→D
A→C
B→D
Steps to find the Canonical Cover:
1. Decompose multi-attribute dependencies:
A→B, C becomes two dependencies:
A→B
A→C
2. Remove redundant dependencies: The dependency A→C is redundant because it is
already implied by
A→B and
A, B→D.
3. Remove redundant attributes: Since
A→C is already implied by
A→B, the dependency
A→C can be removed.

Canonical Cover:
A→B
A, B→D
B→D
This set of dependencies is minimal and equivalent to the original set.

2. Maxima Cover:
 The maxima cover refers to the opposite of a canonical cover. It involves a set of
functional dependencies that might include redundant or unnecessary dependencies,
but it represents the maximum possible coverage of functional dependencies within a
database schema.
 A maxima cover is not minimized, and it may include unnecessary dependencies.
Example:
Consider the same set of functional dependencies from above:
A→B, C
A, B→D
A→C
B→D
In the maxima cover, we would keep all of these dependencies as they are, without
eliminating redundancies.

Maxima Cover (no simplification):


A→B, C
A, B→D
A→C
B→D
This set includes all the original dependencies and is not reduced, making it the "maximal"
form of the functional dependency set.
Query Processing Transaction Management:
Introduction to Transaction Processing:
 The transaction is a set of logically related operation. It contains a group of tasks.
 A transaction is an action or series of actions. It is performed by a single user to
perform operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
Open_Account(X)
Old_Balance = X.balance
New_Balance = Old_Balance - 800
X.balance = New_Balance
Close_Account(X)

Y's Account
Open_Account(Y)
Old_Balance = Y.balance
New_Balance = Old_Balance + 800
Y.balance = New_Balance
Close_Account(Y)

Operations of Transaction:
Following are the main operations of transaction:
 Read(X): Read operation is used to read the value of X from the database and stores
it in a buffer in main memory.
 Write(X): Write operation is used to write the value back to the database from the
buffer.
Let's take an example to debit transaction from an account which consists of following
operations:

1. R(X);
2. X = X - 500;
3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
 The first operation reads X's value from database and stores it in a buffer.
 The second operation will decrease the value of X by 500. So buffer will contain
3500.
 The third operation will write the buffer's value to the database. So X's final value will
be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation
2 then X's value will remain 4000 in the database which is not acceptable by the bank.

To solve this problem, we have two important operations:


 Commit: It is used to save the work done permanently.
 Rollback: It is used to undo the work done.

Single user and Multi user systems:


 Single-User Database Systems
A single-user database system is designed to be used by only one user at a time. It is typically
installed on a single computer and can only be accessed by the user who installed it or the
user who is currently logged in. Single-user systems are simpler and less expensive than
multi-user systems, but they are not suitable for environments where multiple users need to
access the same data at the same time.
Example: Personal Computers

Advantages of Single-User Database Systems


 Simplicity: Straight-forward to purchase, use, and maintain since only the user has an
interface with the system.
 Lower Costs: Due to the low demand for Resource-hogging hardware and efficient
software, Single-user systems sometimes prove to be cost-effective.
 Less Complexity: When there is only one user involved in the use of the application
there tends to be no issues related to conflicting users or simultaneous access to the
data.
Disadvantages of Multi-User Database Systems
 Limited Scalability: These systems are strictly for the single user, which makes them
unsuitable to be used in large organizations.
 Lower Efficiency for Collaboration: This makes the database unsuitable for the
environment where the simultaneous use of the database by several users is necessary
since only one user can have access to the database at a certain time.
 Not Ideal for Large Data Handling: It is worth noting that most of these databases
are applied in small scale systems which may not efficiently manage huge data.

 Multi-User Database Systems


A multi-user database system, on the other hand, can be accessed by multiple users
simultaneously. It is typically installed on a network server and can be accessed by users who
are logged in to the network. Multi-user systems are more complex and require more
resources than single-user systems, but they are essential for organizations where multiple
users need to access the same data at the same time.
Example: Databases of Banks, insurance agencies, stock exchanges, supermarkets, etc.

Advantages of Multi-User Database Systems


 Concurrency: Since they do not require exclusive rights to the databased, more than
one user can work on it at the same time and hence boost productivity.
 Scalability: Such systems can support a widely ranging population as well as support
large amounts of information to suit commercial and corporate uses.
 Data Consistency and Integrity: Transaction control capabilities offered by
programs guarantee the consistency of data, and when the data is stored using this
technique, even when done by different people, it will not be compromised.

Disadvantages of Multi-User Database Systems


 Higher Complexity: Such systems are much more challenging to administer and
support because of the numerous user disputes, data integrity and concurrent access.
 Cost: The Multi-User Systems may involve the use of costly colossal, effective and
efficient hardware and software and firm network connections.
 Performance Overhead: The risk that arises out of having a number of users
accessing the system is that this leads to slow system response time if for instance the
back-end database is not well optimized or if there is inadequate allocation of system
resources.
Difference between Single User and Multi-User Database Systems

Single User Database Systems Multi-User Database Systems

A DBMS is a single-user if at most one A DBMS is a multi-user if many/multi-users


user at a time can use the system. can use the system and hence access the
database concurrently.
Single-User DBMSs are mostly Most DBMSs are multi-user, like databases
restricted to personal computer systems. of airline reservation systems, banking
databases, etc.

Single user databases do not have Multiple users can access databases and use
multiprogramming thus, a single CPU computer systems simultaneously because
can only execute at most one process at of the concept of Multiprogramming.
a time.
The data is neither integrated nor shared The data is integrated and shared among
among any other user. other users.
Designed to be used by only one user at Designed to be accessed by multiple users
a time simultaneously
Typically installed on a single computer Typically installed on a network server

Can only be accessed by the user who Can be accessed by users who are logged in
installed it or the user who is currently to the network
logged in
Simpler and less expensive than multi- More complex and requires more resources
user systems than single-user systems

Not suitable for environments where Essential for organizations where multiple
multiple users need to access the same users need to access the same data at the
data at the same time same time

Example: Personal Computers. Example: Databases of Banks, insurance


agencies, stock exchanges, supermarkets,
etc.

A single central processing unit (CPU) can only execute at most one process at a time.
However, multi-programming operating systems execute some commands from one process,
then suspend that process and execute some commands from the next process, and so on. A
process is resumed at the point where it was suspended whenever it gets its turn to use the
CPU again. Hence, concurrent execution of processes is actually interleaved, as illustrated in
the figure below:
The above figure shows two processes, A and B, executing concurrently in an interleaved
fashion. Interleaving keeps the CPU busy when a process requires an input or output (I/O)
operation, such as reading a block from a disk. The CPU is switched to execute another
process rather than remaining idle during I/O time. Interleaving also prevents a long process
from delaying other processes. If the computer system has multiple hardware processors
(CPUs), parallel processing of multiple processes is possible, as illustrated by processes C
and D in the above figure.

Concurrency Control:
Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
Concurrent Execution in DBMS
In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same database
is executed simultaneously on a multi-user system by different users.
While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case, concurrent
execution of the database is performed.
The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of the
transaction operations, there occur several challenging problems that need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So,
there is a need to manage these two operations in the concurrent execution of the transactions
as if these operations are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution of the
operations:

Problem 1: Lost Update Problems (W - W Conflict)


The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent execution)
that makes the values of the items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are performed on the same
account A where the balance of account A is $300.

 At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
 At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
 Alternately, at time t3, transaction TY reads the value of account a will be $300 only
because TX didn't update the value yet.
 At time t4, transaction TY adds $100 to account A that becomes $400 (only added but
not updated/write).
 At time t6, transaction TX writes the value of account a will be updated as $250 only,
as TY didn't update the value yet.
 Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is
lost.
Hence data becomes incorrect, and database sets to inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.

For example:
Consider two transactions TX and TY in the below diagram performing read/write operations
on account A where the available balance in account A is $300:

 At time t1, transaction TX reads the value of account A, i.e., $300.


 At time t2, transaction TX adds $50 to account A that becomes $350.
 At time t3, transaction TX writes the updated value in account A, i.e., $350.
 Then at time t4, transaction TY reads account a will be read as $350.
 Then at time t5, transaction TX rollbacks due to server problem, and the value changes
back to $300 (as initially).
 But the value for account A remains $350 for transaction TY as committed, which is
the dirty read and therefore known as the Dirty Read Problem

Failures in Transaction:
Failure in terms of a database can be defined as its inability to execute the specified transaction
or loss of data from the database. A DBMS is vulnerable to several kinds of failures and each
of these failures needs to be managed differently. There are many reasons that can cause
database failures such as network failure, system crash, natural disasters, carelessness, sabotage
(corrupting the data intentionally), software errors, etc.
Failure Classification in DBMS
A failure in DBMS can be classified as:

 Transaction Failure:
If a transaction is not able to execute or it comes to a point from where the transaction becomes
incapable of executing further then it is termed as a failure in a transaction.
Reason for a transaction failure in DBMS:
 Logical error: A logical error occurs if a transaction is unable to execute because of
some mistakes in the code or due to the presence of some internal faults.
 System error: Where the termination of an active transaction is done by the database
system itself due to some system issue or because the database management system is
unable to proceed with the transaction. For example– The system ends an operating
transaction if it reaches a deadlock condition or if there is an unavailability of resources.

 System Crash:
A system crash usually occurs when there is some sort of hardware or software breakdown.
Some other problems which are external to the system and cause the system to abruptly stop
or eventually crash include failure of the transaction, operating system errors, power cuts, main
memory crash, etc.
These types of failures are often termed soft failures and are responsible for the data losses in
the volatile memory. It is assumed that a system crash does not have any effect on the data
stored in the non-volatile storage and this is known as the fail-stop assumption.
 Data-transfer Failure:
When a disk failure occurs amid data-transfer operation resulting in loss of content from disk
storage then such failures are categorized as data-transfer failures. Some other reason for disk
failures includes disk head crash, disk unreachability, formation of bad sectors, read-write
errors on the disk, etc.
In order to quickly recover from a disk failure caused amid a data-transfer operation, the backup
copy of the data stored on other tapes or disks can be used. Thus it’s a good practice to back
up your data frequently.

Transaction States:
In Database Management Systems (DBMS), a transaction is indeed a group of operations that
are logically connected and executed as a single unit to ensure data integrity and consistency.
To maintain consistency, transactions follow the ACID properties (Atomicity, Consistency,
Isolation, Durability), which ensure that even in concurrent environments, database integrity is
preserved.
A Transaction log is a file maintained by the recovery management component to record all
the activities of the transaction. After the commit is done transaction log file is removed.

These are different types of Transaction States:

1. Active State – When the instructions of the transaction are running then the transaction is in
active state. If all the ‘read and write’ operations are performed without any error then it goes
to the “partially committed state”; if any instruction fails, it goes to the “failed state”.
2. Partially Committed – After completion of all the read and write operation the changes are
made in main memory or local buffer. If the changes are made permanent on the Database then
the state will change to “committed state” and in case of failure it will go to the “failed state”.

3. Failed State – When any instruction of the transaction fails, it goes to the “failed state” or
if failure occurs in making a permanent change of data on Database.

4. Aborted State – After having any type of failure the transaction goes from “failed state” to
“aborted state” and since in previous states, the changes are only made to local buffer or main
memory and hence these changes are deleted or rolled-back.

5. Committed State – It is the state when the changes are made permanent on the Data Base
and the transaction is complete and therefore terminated in the “terminated state”.

6. Terminated State – If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new transaction and the old
transaction is terminated.

Desirable Properties (ACID Properties) of Transactions:


A transaction is a single logical unit of work that accesses and possibly modifies the contents
of a database. Transactions access data using read-and-write operations. To maintain
consistency in a database, before and after the transaction, certain properties are followed.
These are called ACID properties.
Atomicity:
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as
one unit and either runs to completion or is not executed at all. It involves the following two
operations.
— Abort: If a transaction aborts, changes made to the database are not visible.
— Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X
to account Y.

If the transaction fails after completion of T1 but before completion of T2. (Say, after write(X)
but before write(Y)), then the amount has been deducted from X but not added to Y. This
results in an inconsistent database state. Therefore, the transaction must be executed in its
entirety in order to ensure the correctness of the database state.
Consistency:
This means that integrity constraints must be maintained so that the database is consistent
before and after the transaction. It refers to the correctness of a database. Referring to the
example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails.
As a result, T is incomplete.
Isolation:
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of the database state. Transactions occur independently without interference.
Changes occurring in a particular transaction will not be visible to any other transaction until
that particular change in that transaction is written to memory or has been committed. This
property ensures that the execution of transactions concurrently will result in a state that is
equivalent to a state achieved these were executed serially in some order.
Let X = 500, Y = 500.
Consider two transactions T and T”.

Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving of
operations takes place due to which T’’ reads the correct value of X but the incorrect value of
Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
Is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main
memory.

Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory.
The effects of the transaction, thus, are never lost.

You might also like