0% found this document useful (0 votes)
11 views

DBMS - Keys & Normalization

Data base management notes

Uploaded by

vinayak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

DBMS - Keys & Normalization

Data base management notes

Uploaded by

vinayak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Keys &

Normalization
What are keys?

• We must have a way to specify how tuples within a given relation


are distinguished.
No two tuples in a
• This is expressed in terms of their attributes.
relation are allowed
to have exactly the
• That is, the values of the attribute values of a tuple must be such
same
that they can uniquely identify the tuple. value for all
attributes.
• A key is an attribute or set of an attribute which helps you to
identify a row(tuple) in a relation(table).
What are keys?

• A DBMS key is an attribute or set of an attribute


which helps you to identify a row(tuple) in a Employee Id Name City
relation(table).
E-101 Amit Barve Pune
• Keys help you uniquely identify a row in a table
by a combination of one or more columns in E-102 Vijay Shirke Mumbai

that table. E-103 Anita Rathi Thane


• A key is a property of the entire relation, rather
E-104 Nyati Patil Vashi
than of the individual tuples.
E-105 Vijay Shirke Pune
Understand the need of keys in dbms

• Keys help you to identify any row of data in a table.

• Keys allows you to establish a relationship between and identify the relation between tables

• Keys help you to enforce identity and integrity in the relationship.


Types of keys

Keys

Candidate Composite
Super Key Primary Key Alternate Key Foreign key
Key key
Super Key

• A superkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely a
tuple in the relation.

• A Super key may have additional attributes that are not needed for unique identification.
Super Key

Employee Id Name City


• A Super Key is a set of one or more attributes
that, taken collectively, allow us to identify E-101 Amit Barve Pune
uniquely a tuple in the relation.
E-102 Vijay Shirke Mumbai
• A Super key may have additional attributes that
are not needed for unique identification. E-103 Anita Rathi Thane

E-104 Nyati Patil Vashi

E-105 Vijay Shirke Pune


Candidate Key

• Minimal super key is candidate key.

• Candidate Key is a super key with no repeated attributes.

• It is possible that several distinct sets of attributes could serve as a candidate key.

• Every table must have at least a single candidate key. A table can have multiple candidate keys.
Candidate Key

 Properties of Candidate Keys:


Employee Id Name City Applicant Id
 It must contain unique values
E-101 Amit Barve Pune A-15
 Candidate key may have
E-102 Vijay Shirke Mumbai A-30 multiple attributes

E-103 Anita Rathi Thane A-56


 Must not contain null values
 It should contain minimum
E-104 Nyati Patil Vashi A-67 fields to ensure uniqueness
E-105 Vijay Shirke Pune A-99  Uniquely identify each record in
a table
{employee ID} {name, city} {applicant id}
Primary Key

• Primary Key is a column or group of columns in a table that uniquely identify every row in that table.

• The Primary Key can't be a duplicate meaning the same value can't appear more than once in the
table.

• One of the candidate keys of a relation is chosen as its Primary Key.

• A table cannot have more than one primary key.


Primary Key

 Properties of Primary Keys:


Employee Name City Applicant Id
 A table can have only one primary key
Id
 Two rows can't have the same primary
E-101 Amit Barve Pune A-15 key value
E-102 Vijay Shirke Mumbai A-30  It must for every row to have a primary
key value.
E-103 Anita Rathi Thane A-56
 The primary key field cannot be null.
E-104 Nyati Patil Vashi A-67
 The value in a primary key column can
E-105 Vijay Shirke Pune A-99
never be modified or updated if any
foreign key refers to that primary key.
Alternate Key

• ALTERNATE KEYS is a column or group of columns in a table that uniquely identify every row in that
table.

• A table can have multiple choices for a primary key but only one can be set as the primary key.

• All the keys which are not primary key are called an Alternate Key.
ALTERNATE Key

Employee Id Name City Applicant Id

E-101 Amit Barve Pune A-15

E-102 Vijay Shirke Mumbai A-30

E-103 Anita Rathi Thane A-56

E-104 Nyati Patil Vashi A-67

E-105 Vijay Shirke Pune A-99

{ name, city } { Applicant id }


Composite key

• Composite Key is a combination of two or more columns that uniquely identify rows in a table.

• The combination of columns guarantees uniqueness, though individually uniqueness is not


guaranteed.

• Hence, they are combined to uniquely identify records in a table.


Composite key

Employee Id Project id No of hrs. worked

E-102 P-1107 100

E-101 P-1107 125

E-102 P-1116 25

E-105 P-1170 200

{ employee id, project id }

Composite Key
Foreign Key

• FOREIGN KEY is a column that creates a relationship between two tables.

• The purpose of Foreign keys is to maintain data integrity and allow navigation between two different
instances of an entity.

• It acts as a cross-reference between two tables as it references the primary key of another table.
Foreign Key

applicant employee

Applicant Id Job Profile Score Employee Id Name City Applicant


Id
A-15 Sr. Manager 14
E-101 Amit Barve Pune A-15
A-20 Team Lead 15
E-102 Vijay Shirke Mumbai A-30
A-30 Manager 18
E-103 Anita Rathi Thane A-56
A-56 Engineer 16

A-67 Tester 12 E-104 Nyati Patil Vashi A-67

A-99 Engineer 18 E-105 Vijay Shirke Pune A-99

Primary Key
Foreign Key
Quick Review
Keys
• A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple) in a relation(table)

• A super key is a group of single or multiple attributes which identifies rows in a table.

• A super key with no repeated attribute is called candidate key or a minimal super keys are candidate keys. A
table can have multiple candidate keys.

• The Primary key should be selected from the candidate keys.

• All the keys which are not primary key are called an alternate key

• A key which has multiple attributes to uniquely identify rows in a table is called a composite key

• Primary Key never accept null values while a foreign key may accept multiple null values.

• Keys allow you to establish a relationship between and identify the relation between tables
Normalization
Database Tables and Normalization

• The table is the basic building block of database design.

• Ideally, the database design process explored in Entity Relationship (ER) Modeling, yields good table
structures.

• Yet, it is possible to create poor table structures even in a good database design.
• How do you recognize a poor table structure, and

• How do you produce a good table?

• The answer to both questions involves normalization.


Normalization

• Normalization is a process for evaluating and correcting table structures to minimize data
redundancies, thereby reducing the likelihood of data anomalies.

• Normalization works through a series of stages called normal forms.

• The first three stages are described as


• First normal form (1NF)

• Second normal form (2NF), and

• Third normal form (3NF).


The Need For Normalization

• Normalization is typically used in conjunction with the entity relationship modeling.

• Consider the simplified database activities of a construction company that manages several building
projects.

• Each project has its own project number, name, assigned employees, and so on.

• Each employee has an employee number, name, and job classification, such as engineer or computer
technician.

• The company charges its clients by billing the hours spent on each contract.

• The hourly billing rate is dependent on the employee’s position.

• For example, one hour of computer technician time is billed at a different rate than one hour of
engineer time.
Employee Project Details each project includes only a single occurrence
of any one employee.
Sample Project Layout

Project No Project Name Emp No Emp Name Job class Charge per hr Hours Billed Total Charge
15 Evergreen 103 Amit Verma Elect. Engineer 84.5 23.8 2011.1
101 Shubha Sinha Database Designer 105 19.4 2037
105 Rupa Mahajan Programmer 50 12.6 630
102 David Database Designer 105 35.7 3748.5
106 Arav Patil System Analyst 100 23.8 2380
18 AmberWave 114 Shaila Phatak Application Designer 48.1 24.6 1183.26
118 Ameya Chavan General Support 18.36 45.3 831.708
104 Reshma Singh System Analyst 100 32.4 3134.7
112 Amrit Shet DSS Analyst 45.95 44 2021.8
22 Rolling Tide 105 Rupa Mahajan Programmer 50 47.5 2375
104 Reshma Singh System Analyst 100 238.2 23045.85
113 Anna John Application Designer 48.1 85.4 4107.74
111 Delbert Clerical Support 26.87 34.3 921.641
106 Arav Patil System Analyst 100 94.6 9460
25 Star Flight 107 Maria Jones Programmer 50 24.6 1230
115 Travis Bawangi System Analyst 100 45.8 4431.15
101 Shubha Sinha Database Designer 105 56.3 5911.5
114 Shaila Phatak Application Designer 48.1 33 1587.3
108 Ralph Washington General Support 18.36 23.6 433.296
118 Ameya Chavan General Support 18.36 30.5 559.98
112 Amrit Shet DSS Analyst 45.95 41.2 1893.14
Impact of Data Redundancies
Employee Project Details - Sample Project Layout
• Update anomalies
• Modifying the JOB_CLASS for employee number 105 requires many potential alterations, one for each EMP_NUM =
105.

• Insertion anomalies
• Just to complete a row definition, a new employee must be assigned to a project.

• If the employee is not yet assigned, a phantom project must be created to complete the employee data entry.

• Deletion anomalies
• Suppose that only one employee is associated with a given project.

• If that employee leaves the company and the employee data is deleted, the project information will also be deleted.

• To prevent the loss of the project information, a fictitious employee must be created.
The Normalization Process

• The objective of normalization is to ensure that each table conforms to the concept of well-formed
relations—in other words, tables that have the following characteristics:
• Each table represents a single subject.

• No data item will be unnecessarily stored in more than one table

• All nonprime attributes in a table are dependent on the primary key

• Each table is void of insertion, update, or deletion anomalies, which ensures the integrity and consistency of
the data.
Normal Forms
Functional Dependence

• The attribute B is fully functionally dependent on the attribute A if each value of A determines one and
only one value of B.

• Example: PROJ_NUM → PROJ_NAME


• PROJ_NUM functionally determines PROJ_NAME

• In this case, the attribute PROJ_NUM is known as the determinant attribute, and the attribute
PROJ_NAME is known as the dependent attribute.
Fully functional dependence (composite key)

• If attribute B is functionally dependent on a composite key A but not on any subset of that composite
key, the attribute B is fully functionally dependent on A.
Functional dependencies

Partial dependencies Transitive dependencies

• A partial dependency exists when there is a • A transitive dependency exists when there are
functional dependence in which the determinant functional dependencies such that X → Y, Y →
is only part of the primary key (remember the Z, and X is the primary key.
assumption that there is only one candidate
• In that case, the dependency X → Z is a
key).
transitive dependency because X determines
• For example, the value of Z via Y.
• If and (A, B) is the primary key and B → C, then
the functional dependence B → C is a partial
dependency because only part of the primary
key (B) is needed to determine the value of C.
Conversion To First Normal Form

• Step 1: Eliminate the Repeating Groups

• Step 2: Identify the Primary Key

• Step 3: Identify All Dependencies

• 1NF (Project No, Emp No, Project Name, Emp Name, Job class, Charge per hr, Hours Billed, Total Charge)

• Partial Dependencies
• (Project No → Project Name)
• (Emp No → Emp Name, Job class, Charge per hr, Hours Billed)

• Transitive Dependency
• ( Job class → Charge per hr)
Employee Project Details
A table in first normal form
Project No Project Name Emp No Emp Name Job class Charge per hr Hours Billed Total Charge

15 Evergreen 103 Amit Verma Elect. Engineer 84.5 23.8 2011.1


15 Evergreen 101 Shubha Sinha Database Designer 105 19.4 2037
15 Evergreen 105 Rupa Mahajan Programmer 50 12.6 630
15 Evergreen 102 David Database Designer 105 35.7 3748.5
15 Evergreen 106 Arav Patil System Analyst 100 23.8 2380
18 AmberWave 114 Shaila Phatak Application Designer 48.1 24.6 1183.26
18 AmberWave 118 Ameya Chavan General Support 18.36 45.3 831.708
18 AmberWave 104 Reshma Singh System Analyst 100 32.4 3134.7
18 AmberWave 112 Amrit Shet DSS Analyst 45.95 44 2021.8
22 Rolling Tide 105 Rupa Mahajan Programmer 50 47.5 2375
22 Rolling Tide 104 Reshma Singh System Analyst 100 238.2 2382
22 Rolling Tide 113 Anna John Application Designer 48.1 85.4 4107.74
22 Rolling Tide 111 Delbert Clerical Support 26.87 34.3 921.641
22 Rolling Tide 106 Arav Patil System Analyst 100 94.6 9460
25 Star Flight 107 Maria Jones Programmer 50 24.6 1230
25 Star Flight 115 Travis Bawangi System Analyst 100 45.8 4580
25 Star Flight 101 Shubha Sinha Database Designer 105 56.3 5911.5
25 Star Flight 114 Shaila Phatak Application Designer 48.1 33 1587.3
25 Star Flight 108 Ralph Washington General Support 18.36 23.6 433.296
25 Star Flight 118 Ameya Chavan General Support 18.36 30.5 559.98
25 Star Flight 112 Amrit Shet DSS Analyst 45.95 41.2 1893.14
Conversion To Second Normal Form

• Conversion to 2NF occurs only when the 1NF has a composite primary key.

• Step 1: Make New Tables to Eliminate Partial Dependencies

• Step 2: Reassign Corresponding Dependent Attributes

• A table is in second normal form (2NF) when:


• It is in 1NF and

• It includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key.
Employee Project Details
Tables in second normal form

Project No Project Name Emp No Emp Name Job class Charge per hr
Project No Emp No Hours Billed
15 Evergreen 103 Amit Verma Elect. Engineer 84.5 15 103 23.8
18 AmberWave 101 Shubha Sinha Database Designer 105 15 101 19.4
15 105 12.6
22 Rolling Tide 105 Rupa Mahajan Programmer 50 15 102 35.7
25 Star Flight 102 David Database Designer 105 15 106 23.8
106 Arav Patil System Analyst 100 18 114 24.6
18 118 45.3
Application
114 Shaila Phatak Designer 48.1 18 104 32.4
18 112 44
118 Ameya Chavan General Support 18.36 22 105 47.5
22 104 238.2
104 Reshma Singh System Analyst 100
22 113 85.4
112 Amrit Shet DSS Analyst 45.95 22 111 34.3
Application 22 106 94.6
113 Anna John Designer 48.1 25 107 24.6
111 Delbert Clerical Support 26.87 25 115 45.8
25 101 56.3
107 Maria Jones Programmer 50 25 114 33
115 Travis Bawangi System Analyst 100 25 108 23.6
25 118 30.5
108 Ralph Washington General Support 18.36 25 112 41.2
Conversion To Third Normal Form

• Step 1: Make New Tables to Eliminate Transitive Dependencies

• Step 2: Reassign Corresponding Dependent Attributes

• A table is in third normal form (3NF) when:


• It is in 2NF. And

• It contains no transitive dependencies.


Employee Project Details
Tables in third normal form

Project No Project Name Emp No Emp Name Job class Job class Charge per hr Project No Emp No Hours Billed
15 Evergreen 103 Amit Verma Elect. Engineer 15 103 23.8
Elect. Engineer 84.5
18 AmberWave 15 101 19.4
101 Shubha Sinha Database Designer Database Designer 105 15 105 12.6
22 Rolling Tide 15 102 35.7
105 Rupa Mahajan Programmer Programmer 50
25 Star Flight 15 106 23.8
102 David Database Designer System Analyst 100
18 114 24.6
106 Arav Patil System Analyst Application Designer 48.1 18 118 45.3
General Support 18.36 18 104 32.4
114 Shaila Phatak Application Designer 18 112 44
DSS Analyst 45.95
118 Ameya Chavan General Support 22 105 47.5
Clerical Support 26.87 22 104 238.2
104 Reshma Singh System Analyst 22 113 85.4
112 Amrit Shet DSS Analyst 22 111 34.3
22 106 94.6
113 Anna John Application Designer 25 107 24.6
25 115 45.8
111 Delbert Clerical Support 25 101 56.3
25 114 33
107 Maria Jones Programmer
25 108 23.6
115 Travis Bawangi System Analyst 25 118 30.5
25 112 41.2
108 Ralph Washington General Support
Higher-Level Normal Forms

The Boyce-Codd Normal Form Fourth Normal Form (4NF)

• A table is in Boyce-Codd normal form (BCNF) • A table is in fourth normal form (4NF) when it is
when every determinant in the table is a in 3NF and has no multivalued dependencies.
candidate key.
• Clearly, when a table contains only one
candidate key, the 3NF and the BCNF are
equivalent. In other words, BCNF can be violated
only when the table contains more than one
candidate key.
Thank You

You might also like