0% found this document useful (0 votes)
17 views9 pages

SQL Normalisation, Constraints, ERD and ACID Properties

The document discusses database normalization, which is the process of organizing data to minimize redundancy and eliminate anomalies such as insertion, deletion, and update issues. It outlines the different normal forms (1NF, 2NF, 3NF, BCNF) and their requirements, as well as the importance of primary keys, foreign keys, and composite keys in maintaining data integrity. Additionally, it explains the ACID properties (Atomicity, Consistency, Isolation, Durability) that ensure reliable transaction processing in databases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

SQL Normalisation, Constraints, ERD and ACID Properties

The document discusses database normalization, which is the process of organizing data to minimize redundancy and eliminate anomalies such as insertion, deletion, and update issues. It outlines the different normal forms (1NF, 2NF, 3NF, BCNF) and their requirements, as well as the importance of primary keys, foreign keys, and composite keys in maintaining data integrity. Additionally, it explains the ACID properties (Atomicity, Consistency, Isolation, Durability) that ensure reliable transaction processing in databases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Research on Database Normalization

Why do we need normalization?


Normalization is the process of organizing the data in the database.

Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate
undesirable characteristics like Insertion, Update, and Deletion Anomalies.

Normalization divides the larger table into smaller and links them using relationships.

The normal form is used to reduce redundancy from the database table.

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to
data redundancy and can cause data integrity and other problems as the database grows.

Data modification anomalies can be categorized into three types:

Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship due to lack of
data.

Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss
of some other important data.

Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to
be updated.

Normal forms help to reduce data redundancy, increase data consistency, and improve database performance.

Advantages of Normal Form:

• Reduced data redundancy: Normalization helps to eliminate duplicate data in tables, reducing the amount of
storage space needed and improving database efficiency.
• Improved data consistency: Normalization ensures that data is stored in a consistent and organized manner,
reducing the risk of data inconsistencies and errors.
• Simplified database design: Normalization provides guidelines for organizing tables and data relationships,
making it easier to design and maintain a database.

What are normalization forms? (1 NF, 2 NF, 3 NF, BCNF) - and their usage.
First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each table cell should contain only a
single value, and each column should have a unique name. The first normal form helps to eliminate duplicate data and
simplify queries.

Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each non-key attribute be dependent on
the primary key. This means that each column should be directly related to the primary key, and not to other columns.

Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes are independent of each other.
This means that each column should be directly related to the primary key, and not to any other columns in the same
table.

Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that each determinant in a table is a
candidate key. In other words, BCNF ensures that each non-key attribute is dependent only on the candidate key.

First Normal Form (1NF):

If a relation contains composite or multi-valued attribute, it violates first normal form or a relation is in first normal form
if it does not contain any composite or multi-valued attribute. A relation is in first normal form if every attribute in that
relation is singled valued attribute.

1
How to achieve 1st Normal Form?

There are 4 basic rules that a table should follow to be in 1st Normal form.

• Each column should contain atomic (single-valued) values. Entries like X, Y and W, X violate this rule.
• A column should contain values that are of the same datatype. Do not inter-mix different types of values in any
column.
• Each column should have a unique name. Same names lead to confusion at the time of data retrieval.

Example – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its
decomposition into 1NF has been shown in table 2.

Second Normal Form (2NF):

In the 2NF, relational must be in 1NF.

In the second normal form, all non-key attributes are fully functional dependent on the primary key / Candidate key.

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher
can teach more than one subject.

2
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a
candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

Third Normal Form (3NF):

The first condition for the table to be in Third Normal Form is that the table should be in the Second Normal Form.

The second condition is that there should be no transitive dependency for non-prime attributes, which indicates that
non-prime attributes (which are not a part of the candidate key) should not depend on other non-prime attributes in a
table. Therefore, a transitive dependency is a functional dependency in which A → C (A determines C) indirectly,
because of A → B and B → C (where it is not the case that B → A).

The third Normal Form ensures the reduction of data duplication. It is also used to achieve data integrity.

Example: Below is a student table that has student id, student name, subject id, subject name, and address of the
student as its columns.

3
In the above student table, stu_id determines subid, and subid determines sub. Therefore, stu_id determines sub via
subid. This implies that the table possesses a transitive functional dependency, and it does not fulfill the third normal
form criteria.

Now to change the table to the third normal form, you need to divide the table as shown below:

As you can see in both the tables, all the non-key attributes are now fully functional, dependent only on the primary
key. In the first table, columns name, subid, and addresses only depend on stu_id. In the second table, the sub only
depends on subid.

Boyce-Codd Normal Form (BCNF):

It is an extension of the Third Normal Form, and that’s why it also is known as the 3.5 Normal form. For a table to be in
BCNF, it should satisfy the following two conditions:

The table should be in the Third Normal form.

For any dependency, A ---> B, and A should be a super key.

LHS of each functional dependent should be a Candidate key.

What is PK, FK and composite primary key?


Primary Key:

There can be more than one candidate key in relation out of which one can be chosen as the primary key. For
Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation STUDENT but STUD_NO can be
chosen as the primary key (only one out of many candidate keys).

• It is a unique key.
• It can identify only one tuple (a record) at a time.
• It has no duplicate values; it has unique values.
• It cannot be NULL.

4
• Primary keys are not necessarily to be a single column; more than one column can also be a primary key for a
table.

Foreign Key:

If an attribute can only take the values which are present as values of some other attribute, it will be a foreign key to
the attribute to which it refers. The relation which is being referenced is called referenced relation and the
corresponding attribute is called referenced attribute the relation which refers to the referenced relation is called
referencing relation and the corresponding attribute is called referencing attribute. The referenced attribute of the
referenced relation should be the primary key to it.

• It is a key it acts as a primary key in one table and it acts as secondary key in another table.
• It combines two or more relations (tables) at a time.
• They act as a cross-reference between the tables.

Composite Key:

Sometimes, a table might not have a single column/attribute that uniquely identifies all the records of a table. To
uniquely identify rows of a table, a combination of two or more columns/attributes can be used. It still can give
duplicate values in rare cases. So, we need to find the optimal set of attributes that can uniquely identify rows in a
table.

• It acts as a primary key if there is no primary key in a table


• Two or more attributes are used together to make a composite key.

Different combinations of attributes may give different accuracy in terms of identifying the rows uniquely.

Why we need DBMS Keys?

• For identifying any row of data in a table uniquely.


• We can force identity of data and ensure integrity of data is maintained.
• Establish relationship between tables.

5
What's the difference between PK and unique column?
Some of the essential features of Primary Keys are discussed below.

• There will be no duplicate row in case of a Primary Key.


• Only a single primary key exists for a table.
• Primary Key contains NOT NULL constraints.
• The primary Key can be made from one or more table fields.

Some of the essential features of Unique Keys are discussed below.

• There can be more than one unique key for a table.


• Unique Keys have the liberty of having NULL values in the column.
• Unique Keys can be formed from one or more tables.
• Foreign Keys can refer to Unique Keys for referencing.

How do you identify the type of relationship between two tables (1 to Many, Many to Many, 1
to 1)?
In a database, the mapping cardinality or cardinality ratio means to denote the number of entities to which another
entity can be linked through a certain relation set. Mapping cardinality is most useful in describing binary relation sets,
although they can contribute to the description of relation sets containing more than two entity sets. Here, we will
focus only on binary relation sets means we will find the relation between entity sets A and B for the set R. So, we can
map any one of following the cardinality:

One-to-one: In this type of cardinality mapping, an entity in A is connected to at most one entity in B. Or we can say
that a unit or item in B is connected to at most one unit or item in A.

One-to-many: In this type of cardinality mapping, an entity in A is associated with any number of entities in B. Or we
can say that one unit or item in B can be connected to at most one unit or item in A.

6
Many-to-one: In this type of cardinality mapping, an entity in A is connected to at most one entity in B. Or we can say
a unit or item in B can be associated with any number (zero or more) of entities or items in A.

Many-to-many: In this type of cardinality mapping, an entity in A is associated with any number of entities in B, and
an entity in B is associated with any number of entities in A.

How does the relationship type impact the FK placement between two related tables?
The type of relationship between two tables significantly impacts the placement of foreign keys (FKs).

Here's how:

One-to-One Relationship:

In a one-to-one relationship, each record in the first table is associated with at most one record in the second table.
The FK is placed in the second table, referencing the primary key (PK) of the first table. This ensures that a child
record can only exist if it has a valid parent in the first table.

One-to-Many Relationship:

In a one-to-many relationship, each record in the first table can be associated with many records in the second table.
The FK is placed in the second table, referencing the PK of the first table. This allows multiple child records to exist for
a single parent record.

Many-to-One Relationship:

In a many-to-one relationship, each record in the second table can be associated with at most one record in the first
table. The FK is placed in the second table, referencing the PK of the first table. This ensures that a child record can
only have one parent in the first table.

7
Many-to-Many Relationship:

In a many-to-many relationship, each record in the first table can be associated with many records in the second table,
and vice versa. This relationship usually requires a separate "junction table" to hold the relationships. The primary key
of the junction table is a composite key made up of the FKs from both the first and second tables.

What are ACID Properties in database design?


A transaction is a single logical unit of work that accesses and possibly modifies the contents of a database.
Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after the transaction, certain properties are followed. These
are called ACID properties. ACID Properties in SQL ensure Data Integrity during a transaction.

Atomicity:

By this, we mean that either the entire transaction takes place at once or doesn’t happen at all. There is no
midway i.e. transactions do not occur partially. Each transaction is considered as one unit and either runs to
completion or is not executed at all. Or you can say, all the statements (insert, update, delete) inside a transaction are
either completed or rolled back.
It involves the following two operations:
—Abort: If a transaction aborts, changes made to the database are not visible.
—Commit: If a transaction commits, changes made are visible.

Atomicity is also known as the ‘All or nothing rule’.

Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y.

8
If the transaction fails after completion of T1 but before completion of T2 (say, after write(X) but before write(Y)), then
the amount has been deducted from X but not added to Y. This results in an inconsistent database state. Therefore,
the transaction must be executed in its entirety in order to ensure the correctness of the database state.

Consistency:

This means that integrity constraints must be maintained so that the database is consistent before and after the
transaction. It refers to the correctness of a database. Referring to the example above - The total amount before and
after the transaction must be maintained.

• If the transaction is completed successfully, then it will apply all the changes to the database.
• If there is an error in a transaction, then all the changes that have already been made will be rolled back
automatically. It means the database will restore to its state before the transaction starts.
• If there is a system failure in the middle of the transaction, all the changes already made will automatically roll
back.

Total before T occurs = 500 + 200 = 700.


Total after T occurs = 400 + 300 = 700.
Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result, T is
incomplete.

Isolation:

This property ensures that multiple transactions can occur concurrently without leading to the inconsistency of the
database state. Transactions occur independently without interference. Changes occurring in a particular transaction
will not be visible to any other transaction until that particular change in that transaction is written to memory or has
been committed. This property ensures that the execution of transactions concurrently will result in a state that is
equivalent to a state achieved these were executed serially in some order.
Let X= 500, Y = 500.

Consider two transactions T and T”.

Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving of operations takes place due
to which T’’ reads the correct value of X but the incorrect value of Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in isolation and
changes should be visible only after they have been made to the main memory.

Durability:

This property ensures that once the transaction has completed execution, the updates and modifications to the
database are stored in and written to disk and they persist even if a system failure occurs. These updates now
become permanent and are stored in non-volatile memory. The effects of the transaction, thus, are never lost. In other
word, Once the transaction is completed, then the changes it has made to the database will be permanent. Even if
there is a system failure or any abnormal changes also, this property will safeguard the committed data.

You might also like