Relational Database Model - Normalisation Part 1
Relational Database Model - Normalisation Part 1
MODEL
FERRY KEMPERMAN
NANJING FOREIGN LANGUAGE SCHOOL
OCTOBER 2019
DATA ANALYSIS : A TWO STEP PROCESS
• In our previous lesson we saw how to analyze ‘raw data’ presented on a form using data
classification.
• Any form can be part of the information need that needs to be analyzed.
• Forms can be orders, receipts, bills, product lists, teacher schedule, student grades, report
cards etcetera.
• Data analysis is comprised of two parts:
• Data classification (1) = determine the nature of data and decide if it needs to be stored or not.
• Normalization (2) = design a storage model based on the relationship between the data
• To put it simply: Step 1 is at data-level, Step 2 is at information-level (context)
PHYSICAL STORAGE OF DATA: HOW?
• Any data has to be stored on secondary memory (physical storage) in order to be accessible to
software that should be able to…..
• Retrieve the data (read the data from disk)
• Modify the data (read the data from disk, data is modified by user or software, write the modified
data to disk)
• Insert new data (write new data to disk)
• These basic functionalities have to be met against the following conditions:
• Data retrieval, modification and insertion needs to be fast (performance).
• All data stored and retrieved needs to be integer, consistent, up-to-date and complete at all times
• Data needs to be available at all times for end-users.
FILE BASED STORAGE AND ITS PROBLEMS
From now on we will call all these different kind of data Attributes
STEP 1: 0NF, ZERO NORMAL FORM
(UNOFFICIAL)
A. Create one group with all elementary data.
(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,Pro
ductNo,ProductDescription,ProductOrderQuantity,ProductUnitPrice,CustomerStreet,CustomerCity)
B. Create a repeating group (RG) within this group with the repetitive date.
(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,Cus
tomerStreet,CustomerCity,RG(ProductNo,ProductDescription,ProductOrderQuantity,ProductUnitPrice))
STEP 1: 0NF, CONTINUED
C. Determine which attribute is the primary key for this form. A primary key is an
attribute or group of attributes that uniquely identify this specific order form.
Ask yourself…..
“If you give me this attribute, I am able to give you this exact form from a pile of similar forms”.
Let’s try….If I give you a customer number, I will give you this form. No, because this customer can have more orders,
so a customer number will show up on more than 1 order form.
Let’s try again…If I give you a product number I will give you this form. No, because a product can be ordered by more
customers, so it will show up on more than 1 order form.
Let’s try again..If I give you an order number, I will give you this form. This is correct, because only this
Order form has this order number: an unique number!
OrderNo is the primary key for this form. Add it to the group, put it in the front and underline it.
0NF(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,
SalesRepSurName,CustomerStreet,CustomerCity,RG(ProductNo,ProductDescription,
ProductOrderQuantity,ProductUnitPrice))
STEP 2: 1NF, FIRST NORMAL FORM
• Start with 1NF and apply the following rule to create 2NF
• Look at groups that have a composite key (two or more
attributes underlined) .
• Determine if the non-key attributes in a group are dependent on
the entire key or one attribute of the key?
• If only dependent on one attribute, create a separate group with
this attribute as primary key and take out the dependent
element from the original group.
2NF: RESULT
• Let’s analyze…..
• 1NF
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,Cust
omerStreet,CustomerCity)
• ORDERDETAILS(OrderNo,ProductNo,ProductDescription, ProductOrderQuantity,ProductUnitPrice)
• For 2NF we only need to look at groups that have a composite key, which is ORDERDETAILS
• The first non-key attribute is ProductDescription. We need to determine if ProductDescription is dependent on:
• The entire key (OrderNo,ProductNo) or on OrderNo only of ProductNo only.
• Let’s try: If I give you OrderNo, I will give you the ProductDescription. This is false, since an order can have multiple
products.
• If I give you ProductNo, I will give you the ProductDescription. Yes, because a product has a unique number!
• So we create a new group for ProductDescription with ProductNo as primary key.
• PRODUCT(ProductNo,ProductDescription)
2NF: CONTINUED
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo)
• ORDERDETAILS(OrderNo,ProductNo,ProductOrderQuantity)
• PRODUCT(ProductNo,ProductDescription,ProductUnitPrice)
• CUSTOMER(CustomerNo,CustomerName,CustomerStreet,CustomerCity)
• SALESREP(SalesRepNo, SalesRepGivenName,SalesRepSurName)
• This relational database model is in 3NF, the final stage here. There are a few more NF’s but they are beyond the
scope of this course.
• The groups represent database tables, the attributes in the group is the actual data in the table.
• As we can clearly see, the our design contains 5 tables. The next step is to determine the nature of the relationship
between these tables by designing a so called ERD.
• Based on the 3NF we can now create a conceptual database design called an ERD, an entity relationship diagram.
ENTITY RELATIONSHIP DIAGRAM (ERD)
• An Entity Relationship Diagram is a visual representation of your database design, usually
made after normalization.
• Sometimes an ERD is made before normalization, based on a situation sketch of an
organization.
• An ERD is comprised of three parts:
• Entities (rectangles in ERD) that act in the reality that you are modelling, e.g. customer,
booking, order etcetera. Entities correspond with tables! Entities are always nouns. They can
act or acted upon!
• Relationships between entities (shown as lines between rectangles in ERD) that represent
any real relationship that can be describe by a sentence with a verb between to entities in
this form: <entity1> <verb> <entity2>.
• The cardinality of the relationship (shown as fixed symbols on both sides of the line): how
many times can one entity act on another one. It describes the nature of the relationship
between entities. Possible cardinalities are 1:1 , 1:M, M:1, M:M, the latter is not permitted.
ERD BASED ON OUR 3NF
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo)
• ORDERDETAILS(OrderNo,ProductNo,ProductOrderQuantity)
• PRODUCT(ProductNo,ProductDescription,ProductUnitPrice)
• CUSTOMER(CustomerNo,CustomerName,CustomerStreet,CustomerCity)
• SALESREP(SalesRepNo, SalesRepGivenName,SalesRepSurName)
• A table represent an entity, object from the real world. In this case Customers.
• Attributes (properties of the entity) are represented in the columns.
• Every row (called a record or a tuple in database jargon, not row!) represents ONE customer!
• The first column (or columns in case of a composite key) is the primary key.
• The primary key values are unique per record, are mandatory (cannot be null)
Let’s have a look at this simple relational database design that is populated (means: has data, records) that is comprised
Of a customers table and order table.
- How many orders did Customer Brown place?
- Did William order Jeans?
- Did Ronchi order a Computer and a Mug?
- How many orders did Ronchi place?
- Who has ordered the Shorts?
CONCEPTUAL / LOGICAL / PHYSICAL
DATABASE DESIGN
• Database design is always based on a real world scenario and set in a specific domain.
• Information Analysis leads to a conceptual design of your database: it identifies the major entities
(order, customer, suppliers, products, etc) and describes the relationships between these entities. You
are modelling reality without worrying about the technical database implementation. An ERD is a
conceptual design.
• A conceptual design leads to a logical design. In a logical design (also represented as an ERD) you add
in more information on the attributes, if they are mandatory or not (e.g. CustomerPhone), the
datatypes of the attributes (act as contraints!) and the cardinality of the relationships. Logical model
implements the business rules of your domain. E.g. A Productcode consists of 5 alphanumerical chars.
• A physical design is the actual implementation of the ERD in a DBMS like SQL Server.
DIFFERENCES ARE NOT FIXED!