Database Normalization
Database Normalization
StudentID Course
12345 3100,3600,3900
54321 1300,2300,1200
In this table, the Course field is a multi-valued attribute. There is not a single
value for each field.
The proper way to store this data follows. First Normal form is satisfied.
StudentID Course
12345 3100
12345 3600
12345 3900
54321 1300
54321 2300
54321 1200
In the first two designs, selecting students that are enrolled in a certain course is
difficult. Say I want to do the following
Tell me all of the students enrolled in course 3100. In the first design, you'll have
to pull all of the course data and parse it somehow. And in the second design,
you'll have to check 3 different fields for course 3100. In the final design, a simple
Select StudentID from StudentCourses where Course=3100
Second Normal Form requires that any non-key field be dependent upon the
entire key. For example, consider the StudentCourses table below, where
StudentID and CourseID form a compound primary key.
The Student Name field does not depend at all on CourseID, but only on Student
ID. CourseLocation has be dependency on StudentID, but only on CourseID.
Students Table
StudentID Name
12345 April
Courses Table
CourseID CourseLocation
3100 Math Building
1300 Science Building
StudentCourses Table
In this example, grade was the only field dependent on the combination of
StudentID and CourseID.
Lets suppose that in the first table design, the first row of data was entered with
a StudentName of Aprok, a simple typo. Now, suppose the following SQL is run.
The erroneous "Aprok" row will not be deleted. However, in the final design,
using the following SQL
will delete every course that April was in by using the ID.
The professor is uniquely identified by the CourseID and Section of the course.
However, ProfessorName depends on ProfessorID and has no relation to
CourseID or Section.
Professors Table
ProfessorID ProfessorName
6789 David
CourseSections Table
By splitting the data into two tables, the transitive dependency is removed.
Taking the original design of the CourseSections table introduces the chance
that ProfessorName may be Corrupted. Perhaps on the second row, the
ProfessorName is entered as Davif, a simple typo. Since there is no such
professor Davif, there is a problem