DB Unit 4
DB Unit 4
Relational-Database Design
Contents:
• Pitfalls in Relational-Database designs, Concept of normalization, Function Dependencies,
Normal Forms- 1NF, 2NF, 3NF, BCNF
Pitfalls / Drawbacks in Relational Database Design
The right database design will give less trouble during deployment, development, and performance.
• Wasteful Redundancy: This happens when you repeat data unnecessarily. It can occur
due to complicated data storage methods or inefficient coding. For example, if you have
a table with student information, and you store the name of the college and the course
for each student, even if they are the same for many students, it's wasteful redundancy.
• Excessive Redundancy: This occurs when there's too much repeated data, making the
database less efficient and harder to manage.
• To avoid redundant records, it's a good practice to regularly delete data that you no
longer need in your database. This helps keep your database efficient and free from
unnecessary data.
4. Poor Naming Standards
• Naming is a personal choice, however, it is an important aspect of documentation.
• Poor naming standards result in messy and large data files, hence consider incorporating consistency.
• The purpose of naming is to allow all future developers or programmers to easily understand the
components of databases and what was their use.
• This saves time for developers and they need not go through documents to understand the meaning of a
name.
There isn’t a universal guide to naming conventions, but it’s best to avoid bad naming practices. Here are
examples of unsuccessful naming conventions that one must avoid.
• Underscore_for_Word_Separation
• Meaningless or Generic Names
• ALL UPPER CASE (impacts reading, don’t allow to use camel case)
5. Lack of Documentation
• As per a survey conducted, the second most challenging task faced by developers was poor technical
documentation.
• Lack of documentation leads to the loss of vital information or a tedious handover process to a new
programmer.
• Consider documenting everything you know from day one because any documentation is better than none.
Well-organized documentation throughout the project helps to wrap up everything smoothly and in turn,
helps build robust software.
• The goal of documentation is to provide information to the support programmer to detect bugs and fix them.
Documentation starts with naming columns, objects, and tables in a database model. A well-documented data
model consists of solid names, definitions on columns, tables, relationships, and check and default
constraints.
6. One Table to Hold All Domain Values
• To prepare one table for all the same values. For example, you have a range of values for varied areas such as
order status, account status, and payment status; each one of them with different values.
Table Structure:
• You have a single table that contains the following columns:
• Table or entity: This column specifies which entity the status value is associated with, like order, account, or
payment.
• Key: A unique identifier for the status value within the given entity or table.
• Value: The actual status value, such as "pending," "draft," "paid," etc.
Issues with this Approach:
• Lack of Referential Integrity
• Data Integrity and Validation
• Limited Flexibility
• Efficiency and Query Performance
• Better Approach:
• A more typical database design would involve creating separate tables for each type of status,
such as an "OrderStatus" table, an "AccountStatus" table, and a "PaymentStatus" table. Each of
these tables would contain status values specific to its associated entity. This approach provides
clearer data organization, enforces referential integrity, and makes it easier to manage and
expand status values for different entities.
7. Ignoring Frequency or Purpose of the Data
• By ignoring the fundamental purpose of data, a designer shifts away from the primary goal of storing
and retrieving data efficiently when needed.
• Purpose of Data: Before designing a database, you need to know why you're collecting data, how often
you'll collect it, and how you'll use it. For example, if you're manually recording data once a day, it's very
different from real-time data collection. The purpose of the data influences how you structure the
database.
• Data Volume Matters: The amount of data you're dealing with is crucial. Managing a few thousand
pieces of data each month is different from handling millions. The volume of data affects how you
organize and manage your database.
• Data Structure and Normalization: Data structure and how you organize information
depend on its purpose. Normalization, which reduces data redundancy, is important, but
it should be based on why you're collecting the data.
• Database Design and Formats: Knowing the purpose of your data helps you design the
database, choose how it's organized, and decide on formats. Without clarity on purpose,
even if the design seems perfect on paper, it might not work well in practice.
• In summary, understanding why you are collecting data and how you will use it is
fundamental to creating an efficient and effective database. Without a clear purpose,
even the best mathematical and structural designs can fail.
8. Insufficient Indexing
• Indexing is a data structure technique which allows you to quickly retrieve records from a database file
• Insufficient indexing comes from a SQL configuration whose performance is affected due to improper,
excessive, or missing indexes. In case indexes aren’t created properly, the SQL server goes through more
records to retrieve the data that’s requested by the query.
• A wrong index does not offer easy data manipulation and an index developed on multiple columns slows
down queries instead of speeding them up. The lack of a clustered index in a table is a form of poor indexing.
Execution of inserting, SELECT statement, deleting, and updating records is slower than on a clustered index.
• Index efficiency is connected to the column type, for instance, indexes on INT column display the best
performance, however, indexes on DATE, VARCHAR, or DECIMAL aren’t as efficient. This leads to redesigning
tables with the best possible efficiency.
• Overall, indexing is a complex decision because too much indexing is bad as little indexing, as it impacts the
final outcome.
9. Lack of Testing
• The lack of database testing fails to give information on whether the data values stored and received in the
database are valid or not.
• Testing helps to save transaction data, avoids data loss, and prevents unauthorized access to information.
• The database is essential for every type of software application, therefore testers need to know about SQL
during testing.
• Consider testing for a banking application, and during tests a few things to note are:
1. No loss of information during the process.
2. Application stores transaction data correctly in the database and displays it accurately.
3. No aborted or partial operation data is saved by the application.
So, these were the nine common pitfalls to avoid during database design. For developers, creating a neat and
tight database structure is essential for a seamless project flow.
Concept of Normalization
• It is a technique to reduce or remove redundancy from a table.
• Normalization is the process of organizing data in a database. This includes creating tables and
establishing relationships between those tables according to rules designed both to protect the data
and to make the database more flexible by eliminating redundancy and inconsistent dependency.
Anomalies/ Irregularities in Relational Model
There are different types of anomalies which can occur in referencing and referenced relation
Player_no Name city Game_no G_name
17 Smita Nashik
6 Badminton
Deletion of record -
This anomaly happens when deletion of a data record result in losing some unrelated
information that was stored as a part of record that was deleted from a table
Example:
1. ID → Name,
2. Name → DOB
3. Semi Non Trivial Functional Dependencies
X → Y is called semi non-trivial when X intersect Y is not NULL.
Examples:
AB → BC
AD → DC
Database Normal Forms
Here is a list of Normal Forms in SQL:
• 1NF (First Normal Form)
• 2NF (Second Normal Form)
• 3NF (Third Normal Form)
• BCNF (Boyce-Codd Normal Form)
Normalization is the process of minimizing redundancy from a relation or set of relations.
Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce
redundancy in database tables.
1. First Normal Form –
• Each table cell should contain a single value.
• Each record needs to be unique.
• If a relation contain composite or multi-valued attribute, it violates first normal form
• A relation is in first normal form if it does not contain any composite or multi-valued
attribute. A relation is in first normal form if every attribute in that relation is singled
valued attribute.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE.
Its decomposition into 1NF has been shown in table 2.
Not in 1NF
Referenced table
But even if you only know someone's employee_id, then you can determine their name, home_state,
and state_code (because they should be the same person). This means name, home_state, and state_code are
dependent on employee_id (a part of primary composite key). So, the table is not in 2NF. We should separate them
to a different table to make it 2NF.
employee_roles Table EMPLOYEE_ID JOB_CODE
Second Normal Form (2NF)
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01
JOB_CODE JOB
jobs table
J01 Chef
J02 Waiter
J03 Driver
home_state is now dependent on state_code. So, if you know the state_code, then you can find
the home_state value. Therefore convert to 3 NF.
employee_roles Table EMPLOYEE_ID JOB_CODE
Third Normal Form (3NF)
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01