0% found this document useful (0 votes)
52 views

DB Unit 4

Uploaded by

Ojas Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

DB Unit 4

Uploaded by

Ojas Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Unit-IV

Relational-Database Design
Contents:
• Pitfalls in Relational-Database designs, Concept of normalization, Function Dependencies,
Normal Forms- 1NF, 2NF, 3NF, BCNF
Pitfalls / Drawbacks in Relational Database Design
The right database design will give less trouble during deployment, development, and performance.

1. Poor Design / Planning


• The database is a vital aspect of every custom software, hence taking the time to map out the goals of database design
ensures the success of any project. Consequences of lack of planning are seen further down the line and impacts projects
in terms of time management.
• Improper planning of the database leaves you with no time to go back and fix errors and leads to malicious cyber
attacks. Therefore, consider sitting down with a paper and drawing a data model as per business requirements.
• Developers can avoid poor planning/design by checking off the following points.
Main tables of your database model
Names for tables
Rules for naming tables
Time span required for the project
2. Ignoring Normalization
• Normalization or SQL (Structured Query Language) groups data under a single table and indirectly related
data are put under separate tables. These tables are connected with a logical relationship between child and
parent tables.
• Lack of normalization reduces data consistency and leads to duplication of data because entire data isn’t
stored in one place.
• Finding related data is difficult due to lack of grouping and costs time for searching. Hence, consider
implementing normalization rules during database design.
• Despite of normalization rules, databases don’t function as required. That’s because they need to be
normalized to a third normal form.
3. Redundant Records
Redundancy in a database means having the same data stored in multiple places or fields within the software.
This can be a problem for developers because they need to make sure that all these copies of data stay up to
date.

When you have redundant data, it causes a few issues:


Database Size: Redundant data makes the database bigger, taking up more space than necessary.
Efficiency: A larger database can be slower to work with, reducing efficiency.
Data Corruption: Having the same data in many places can lead to data inconsistencies and errors.
So, it's usually best to avoid having redundant data unless you need it for backup purposes.
• There are two types of data redundancy:

• Wasteful Redundancy: This happens when you repeat data unnecessarily. It can occur
due to complicated data storage methods or inefficient coding. For example, if you have
a table with student information, and you store the name of the college and the course
for each student, even if they are the same for many students, it's wasteful redundancy.

• Excessive Redundancy: This occurs when there's too much repeated data, making the
database less efficient and harder to manage.

• To avoid redundant records, it's a good practice to regularly delete data that you no
longer need in your database. This helps keep your database efficient and free from
unnecessary data.
4. Poor Naming Standards
• Naming is a personal choice, however, it is an important aspect of documentation.
• Poor naming standards result in messy and large data files, hence consider incorporating consistency.
• The purpose of naming is to allow all future developers or programmers to easily understand the
components of databases and what was their use.
• This saves time for developers and they need not go through documents to understand the meaning of a
name.
There isn’t a universal guide to naming conventions, but it’s best to avoid bad naming practices. Here are
examples of unsuccessful naming conventions that one must avoid.
• Underscore_for_Word_Separation
• Meaningless or Generic Names
• ALL UPPER CASE (impacts reading, don’t allow to use camel case)
5. Lack of Documentation
• As per a survey conducted, the second most challenging task faced by developers was poor technical
documentation.
• Lack of documentation leads to the loss of vital information or a tedious handover process to a new
programmer.
• Consider documenting everything you know from day one because any documentation is better than none.
Well-organized documentation throughout the project helps to wrap up everything smoothly and in turn,
helps build robust software.
• The goal of documentation is to provide information to the support programmer to detect bugs and fix them.
Documentation starts with naming columns, objects, and tables in a database model. A well-documented data
model consists of solid names, definitions on columns, tables, relationships, and check and default
constraints.
6. One Table to Hold All Domain Values
• To prepare one table for all the same values. For example, you have a range of values for varied areas such as
order status, account status, and payment status; each one of them with different values.
Table Structure:
• You have a single table that contains the following columns:
• Table or entity: This column specifies which entity the status value is associated with, like order, account, or
payment.
• Key: A unique identifier for the status value within the given entity or table.
• Value: The actual status value, such as "pending," "draft," "paid," etc.
Issues with this Approach:
• Lack of Referential Integrity
• Data Integrity and Validation
• Limited Flexibility
• Efficiency and Query Performance
• Better Approach:

• A more typical database design would involve creating separate tables for each type of status,
such as an "OrderStatus" table, an "AccountStatus" table, and a "PaymentStatus" table. Each of
these tables would contain status values specific to its associated entity. This approach provides
clearer data organization, enforces referential integrity, and makes it easier to manage and
expand status values for different entities.
7. Ignoring Frequency or Purpose of the Data
• By ignoring the fundamental purpose of data, a designer shifts away from the primary goal of storing
and retrieving data efficiently when needed.
• Purpose of Data: Before designing a database, you need to know why you're collecting data, how often
you'll collect it, and how you'll use it. For example, if you're manually recording data once a day, it's very
different from real-time data collection. The purpose of the data influences how you structure the
database.
• Data Volume Matters: The amount of data you're dealing with is crucial. Managing a few thousand
pieces of data each month is different from handling millions. The volume of data affects how you
organize and manage your database.
• Data Structure and Normalization: Data structure and how you organize information
depend on its purpose. Normalization, which reduces data redundancy, is important, but
it should be based on why you're collecting the data.

• Database Design and Formats: Knowing the purpose of your data helps you design the
database, choose how it's organized, and decide on formats. Without clarity on purpose,
even if the design seems perfect on paper, it might not work well in practice.

• In summary, understanding why you are collecting data and how you will use it is
fundamental to creating an efficient and effective database. Without a clear purpose,
even the best mathematical and structural designs can fail.
8. Insufficient Indexing
• Indexing is a data structure technique which allows you to quickly retrieve records from a database file
• Insufficient indexing comes from a SQL configuration whose performance is affected due to improper,
excessive, or missing indexes. In case indexes aren’t created properly, the SQL server goes through more
records to retrieve the data that’s requested by the query.
• A wrong index does not offer easy data manipulation and an index developed on multiple columns slows
down queries instead of speeding them up. The lack of a clustered index in a table is a form of poor indexing.
Execution of inserting, SELECT statement, deleting, and updating records is slower than on a clustered index.
• Index efficiency is connected to the column type, for instance, indexes on INT column display the best
performance, however, indexes on DATE, VARCHAR, or DECIMAL aren’t as efficient. This leads to redesigning
tables with the best possible efficiency.
• Overall, indexing is a complex decision because too much indexing is bad as little indexing, as it impacts the
final outcome.
9. Lack of Testing
• The lack of database testing fails to give information on whether the data values stored and received in the
database are valid or not.
• Testing helps to save transaction data, avoids data loss, and prevents unauthorized access to information.
• The database is essential for every type of software application, therefore testers need to know about SQL
during testing.
• Consider testing for a banking application, and during tests a few things to note are:
1. No loss of information during the process.
2. Application stores transaction data correctly in the database and displays it accurately.
3. No aborted or partial operation data is saved by the application.

So, these were the nine common pitfalls to avoid during database design. For developers, creating a neat and
tight database structure is essential for a seamless project flow.
Concept of Normalization
• It is a technique to reduce or remove redundancy from a table.
• Normalization is the process of organizing data in a database. This includes creating tables and
establishing relationships between those tables according to rules designed both to protect the data
and to make the database more flexible by eliminating redundancy and inconsistent dependency.
Anomalies/ Irregularities in Relational Model
There are different types of anomalies which can occur in referencing and referenced relation
Player_no Name city Game_no G_name

11 Smita Pune 1 basketball


11 Smita Pune 2 Volleyball
12 Shirish Pune 3 Cricket
13 Sanjiv Mumbai 1 Basketball
13 Sanjiv Mumbai 3 Cricket
14 Mandar Nashik 4 Skating
15 Mahesh Solapur 2 Volleyball
Insertion of record -
When we try to insert record for the player. So we have to add the information of game also or vice a versa.

Player_no Name city Game_no G_name

17 Smita Nashik

6 Badminton
Deletion of record -
This anomaly happens when deletion of a data record result in losing some unrelated
information that was stored as a part of record that was deleted from a table

Player_no Name city Game_no G_name


14 Mandar Nashik 4 Skating
Modification of record -
If modification do not occur at all places then database will be in inconsistent
state.

Player_no Name city Game_no G_name


11 Smita Pune 1 Basketball
11 Smita Pune 2 Volleyball
Functional Dependency
• The functional dependency is a relationship that exists between two attributes.
• A relationship that is present between attributes of any table that are dependent on each other.
• It helps in avoiding data redundancy and getting to know more about bad designs.
• It typically exists between the primary key and non-key attribute within a table.
X → Y
• The left side of FD is known as a determinant (main attribute), the right side of the production is known as a
dependent (dependent attribute).
• The values of the X component of a tuple uniquely (or functionally) determine the values of the Y
component. We also say that there is a functional dependency from X to Y, or that Y is functionally
dependent on X.
• X determines Y. Or Y is determined by X.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know
the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency
• A → B has trivial functional dependency if B is a subset of A.
• The following dependencies are also trivial like: A → A, B → B
• always valid
• L.H.S Intersection R.H.S. =! Null
Example:
1. Consider a table with two columns Employee_Id and Employee_Name.
2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
2. Non-trivial functional dependency
• A → B has a non-trivial functional dependency if B is not a subset of A.
• When A intersection B is NULL, then A → B is called as complete non-trivial.
• L.H.S Intersection R.H.S. = Null

Example:
1. ID → Name,
2. Name → DOB
3. Semi Non Trivial Functional Dependencies
X → Y is called semi non-trivial when X intersect Y is not NULL.
Examples:
AB → BC
AD → DC
Database Normal Forms
Here is a list of Normal Forms in SQL:
• 1NF (First Normal Form)
• 2NF (Second Normal Form)
• 3NF (Third Normal Form)
• BCNF (Boyce-Codd Normal Form)
Normalization is the process of minimizing redundancy from a relation or set of relations.
Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce
redundancy in database tables.
1. First Normal Form –
• Each table cell should contain a single value.
• Each record needs to be unique.
• If a relation contain composite or multi-valued attribute, it violates first normal form
• A relation is in first normal form if it does not contain any composite or multi-valued
attribute. A relation is in first normal form if every attribute in that relation is singled
valued attribute.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE.
Its decomposition into 1NF has been shown in table 2.
Not in 1NF

Here {ID, Course} will be the


Composite Primary key.
Here {ID} will be the Composite Primary key.
Secondary / Child /

Primary / Base / Parent / Referencing table

Referenced table

{ID} will be the Primary key.


{ID} will be the Foreign key and
{ID, Course} will be Primary key
2. Second Normal Form –
• Rule 1- Be in 1NF
• Rule 2- Single Column Primary Key that does not functionally dependent on any subset of candidate key
relation
• All the non-prime attributes should be fully functional dependent on candidate key
• Non prime attribute- The attribute which are not participating in the formation of candidate key
• To be in second normal form, a relation must be in first normal form and relation must not contain any
partial dependency.
• A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part
of any candidate key) is dependent on any proper subset of any candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is called partial
dependency.
Candidate Key: {CustID, StoreID}
Primary Attribute: CustID, StoreID
Non prime attribute: Location
Here StoreID is determining location
Location is determined by StoreID

{StoreID} will be the Primary key and


Location is determined by StoreID. {CustID, StoreID} will be the Composite Primary key
{Note that, there are many courses having the same course fee. }
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the candidate
key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency
and so this relation is not in 2NF.
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE

NOTE: 2NF tries to reduce the redundant data getting


stored in memory. For instance, if there are 100
students taking C1 course, we don’t need to store its
Fee as 1000 for all the 100 records, instead, once we can
store it in the second table as the course fee for C1 is
1000.
3NF (Third Normal Form) Rules
• Rule 1- Be in 2NF
• Rule 2- Has no transitive functional dependencies in the table
• To move our 2NF table into 3NF, we again need to again divide our table.
• For each FD= LHS must be a Candidate key or Super key or RHS is a prime attribute.
What are transitive functional dependencies?
A transitive functional dependency is when changing a non-key column, might cause any of the other non-key
columns to change
Consider the table Changing the non-key column Full Name may change Salutation.

FD: MembershipID -> FullName


{MembershipID} is the Primary key or Prime Attribute
Non-Prime attributes= Full Name, Address, Salutation
But FullName -> salutation
Transitivity: MembershipID -> FullName -> salutation
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in Table 3
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes
(EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary
key.
Q. Convert given table into 3 NF.
Imagine we're building a restaurant management application. That application needs to store data about the
company's employees and it starts out by creating the following table of employees:

All the entries are atomic


EMPLOYEE_ID NAME JOB_CODE JOB STATE_CODE HOME_STATE
E001 Alice J01 Chef 26 Michigan and there is a composite
E001 Alice J02 Waiter 26 Michigan primary key
E002 Bob J02 Waiter 56 Wyoming (employee_id, job_code)
E002 Bob J03 Driver 56 Wyoming
so the table is in the first
E003 Alice J01 Chef 56 Wyoming
normal form (1NF).

But even if you only know someone's employee_id, then you can determine their name, home_state,
and state_code (because they should be the same person). This means name, home_state, and state_code are
dependent on employee_id (a part of primary composite key). So, the table is not in 2NF. We should separate them
to a different table to make it 2NF.
employee_roles Table EMPLOYEE_ID JOB_CODE
Second Normal Form (2NF)
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01

EMPLOYEE_ID NAME STATE_CODE HOME_STATE


employees Table
E001 Alice 26 Michigan
E002 Bob 56 Wyoming
E003 Alice 56 Wyoming

JOB_CODE JOB
jobs table
J01 Chef
J02 Waiter
J03 Driver
home_state is now dependent on state_code. So, if you know the state_code, then you can find
the home_state value. Therefore convert to 3 NF.
employee_roles Table EMPLOYEE_ID JOB_CODE
Third Normal Form (3NF)
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01

EMPLOYEE_ID NAME STATE_CODE


employees Table E001 Alice 26
E002 Bob 56
E003 Alice 56
states Table
JOB_CODE JOB
jobs table
J01 Chef STATE_CODE HOME_STATE
J02 Waiter
26 Michigan
J03 Driver
56 Wyoming
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF.
LHS of each functional dependency should be candidate key or super key.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
In the above table Functional dependencies are as follows:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies
is a key.
Student Table

Candidate Key: { RollNo, VoterID }


FD: Rollno name
Rollno VoterID
VoterID age
VoterID Rollno
Note:
BCNF decomposition does not always satisfy dependency preserving property.
After BCNF decomposition if dependency is not preserved then we have to decide
whether we want to remain in BCNF or rollback to 3NF. This process of rollback is called
Denormalization.

You might also like