0% found this document useful (0 votes)
38 views

Relational Database Model - Normalisation Part 1

This document discusses the process of data analysis and physical storage of data in a database. It describes how raw data can be classified and normalized in order to design a relational database for storage. Specifically, it provides an example of classifying data from a sample order form and performing normalization steps to transform the data into first normal form. The goal of normalization is to structure the data in a way that eliminates redundancy and dependencies while ensuring data integrity and efficient retrieval. The example shows how the raw data can be organized into attributes and relationships to design tables for a relational database that meets these goals.

Uploaded by

Ferry Kemperman
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Relational Database Model - Normalisation Part 1

This document discusses the process of data analysis and physical storage of data in a database. It describes how raw data can be classified and normalized in order to design a relational database for storage. Specifically, it provides an example of classifying data from a sample order form and performing normalization steps to transform the data into first normal form. The goal of normalization is to structure the data in a way that eliminates redundancy and dependencies while ensuring data integrity and efficient retrieval. The example shows how the raw data can be organized into attributes and relationships to design tables for a relational database that meets these goals.

Uploaded by

Ferry Kemperman
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

RELATIONAL DATABASE

MODEL
FERRY KEMPERMAN
NANJING FOREIGN LANGUAGE SCHOOL
OCTOBER 2019
DATA ANALYSIS : A TWO STEP PROCESS

• In our previous lesson we saw how to analyze ‘raw data’ presented on a form using data
classification.
• Any form can be part of the information need that needs to be analyzed.
• Forms can be orders, receipts, bills, product lists, teacher schedule, student grades, report
cards etcetera.
• Data analysis is comprised of two parts:
• Data classification (1) = determine the nature of data and decide if it needs to be stored or not.
• Normalization (2) = design a storage model based on the relationship between the data
• To put it simply: Step 1 is at data-level, Step 2 is at information-level (context)
PHYSICAL STORAGE OF DATA: HOW?

• Any data has to be stored on secondary memory (physical storage) in order to be accessible to
software that should be able to…..
• Retrieve the data (read the data from disk)
• Modify the data (read the data from disk, data is modified by user or software, write the modified
data to disk)
• Insert new data (write new data to disk)
• These basic functionalities have to be met against the following conditions:
• Data retrieval, modification and insertion needs to be fast (performance).
• All data stored and retrieved needs to be integer, consistent, up-to-date and complete at all times
• Data needs to be available at all times for end-users.
FILE BASED STORAGE AND ITS PROBLEMS

• The simplest solution would be storing data in a flat file.


• A flat file is a text file (.txt) that technically just stores ASCII characters in a row
(sequential storage).
• All data that you want to store has to be structured per line. Every line contains the
same kind of data.
• Per line in the text file you define: first 5 characters represents a teacher name, next 10
characters represent the subject name etcetera.
• Let’s see what this looks like
• Consider the following data collection as a result from data analysis in a school:
SubjectCode, Curriculum, TeacherSurName, StudentFirstName, StudentSurName,
Classroom.
• We want to store all this data in a flat file, what does that look like?
THE PROBLEMS WITH A FLAT FILE
 Flat file has a fixed number of characters and position  Duplicate or redundant data is unwanted, because
for data. If software reads data from this file, it needs to  If you update data, you need to update all duplicates
know that SubjectCode is stored in position 1 till 7 and  Risk of inconsistency if you do not update all duplicates
this. This creates a multitude of problems:  Unnecessary use of storage, since the teacher name is
 Data storage should ALWAYS be independent of the same for all CS students, so…….
software. Changing the structure of the file should never “Kemperman” should only be stored once!
lead to rewriting part of the software using the file. ……..But impossible in a flat file, why?
 Changing the length of SubjectCode to 9 cannot be done
without updating the software. Why not?
 This data set is not integer, complete and consistent.
 Data is missing (teacher name, student name), data is
wrong (Joey’s surname is not Zhang), data is incomplete
(Student’s surname P), data is inconsistent: Tonia is the
name of a person, not a subjectcode, ---- and E407 are
not valid classrooms. No input validation is done.
Validation should be done in both software and storage,
but flat files cannot do this.
 Any list printed based on this dataset is useless
 A lot of data is stored more than once, e.g. the same
name of the teacher is stored every time you enter a
new student!
PROBLEMS TO BE SOLVED
• No data dependency: data storage should be independent of the software accessing it for
two major reasons:
• - Changes made to the way data is stored, should not lead to code updates in the software
that accesses the data.
• - Data is used by a variety of software, not just one.
• No data redundancy: same data should only be stored to avoid data integrity problems
and waste valuable resources (storage and processing)
• Data integrity needs to be ensured: all data that is stored needs to be a reflection of reality
and therefor needs to be complete, consistent and up-to-date and correct. This is mainly
done by input validation in client software, validation by the database at storage level,
constraints set at database level. You are simply not allowed to store ‘wrong’ data!
RELATIONAL DATABASE MODEL

• Edgar Codd (1923-2003), a famous computer scientist who worked at IBM,


acknowledged all these problems and invented a theoretical design model to
store data in structured way that would solve all these issues.
• This model is known as the relational database model.
• In a relational database there is no redundant data, very strong validation of
data by using constraints (rules that technically prevent storing wrong data)
and ensures fast retrieval and modification of data (performance).
• A relational database design is comprised of tables that store the actual data
and relations that describe the logical connection between the data in the
tables, hence the name relational database design.
RELATIONAL DATABASE DESIGN: TWO
STAGES
• Be careful here!
• Relational database design is a theoretical process done on paper using data classification and
normalization that focuses on how the data is used and related in reality (within the organization,
by end-users). The information analyst interviews people to make this design! This is called the
database design at conceptual level.
• After this design is made and verified by the organization, an implementation of this database
design is made. This means that database designers programmers are actually going to make this
design in specialized software called a DBMS, DataBase Management System. This is the database
design at internal level.
• SQL Server (Microsoft) and Oracle are the two leading DBMS in the industry.
• DBMS software allows DBA’ers (DataBase Administrators, technical database experts) to create
relational databases according to a fixed design, make backups of the databases, setup connections
between software and databases, tune databases (improve the performance of databases), etc.
LET’S DESIGN A RELATIONAL DATABASE!
• The information need was the following form.
• Step 1: Classify all data present on this form.
 Constant data: All labels, everything in bold font
 Composite data: SalesRepName (SalesRepGivenName, SalesRepSurName), CustomerAddress (CustomerStreet,CustomerCity)
 Process data: TotalPriceProductOrder, TotalPriceOrder
 Elementary data: OrderNo, CustomerNo, CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,
ProductNo,ProductDescription,ProductOrderQuantity,ProductUnitPrice,CustomerStreet,CustomerCity
 Repetitive data: ProductNo, ProductDescription,ProductOrderQuantity,ProductUnitPrice
STEP 2: NORMALISATION

• After data classification we need to determine which data is related to each


other. This will result in a relational database design that meets the
requirements (no redundancy, no dependency, ensured data integrity)
• We do this by applying Edgar Codd’s normalization method.
• Normalization is a structured way to design your relational database by
applying four (in reality more) basic steps that result in so called normal
forms.
WE START FROM OUR DATA CLASSIFICATION
 Constant data: All labels, everything in bold font
 Composite data: SalesRepName (SalesRepGivenName, SalesRepSurName), CustomerAddress
(CustomerStreet,CustomerCity)
 Process data: TotalPriceProductOrder, TotalPriceOrder
 Elementary data: OrderNo, CustomerNo,
CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,
ProductNo,ProductDescription,ProductOrderQuantity,ProductUnitPrice,CustomerStreet,CustomerCity
 Repetitive data: ProductNo, ProductDescription,ProductOrderQuantity,ProductUnitPrice

 From now on we will call all these different kind of data  Attributes
STEP 1: 0NF, ZERO NORMAL FORM
(UNOFFICIAL)
A. Create one group with all elementary data.
(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,Pro
ductNo,ProductDescription,ProductOrderQuantity,ProductUnitPrice,CustomerStreet,CustomerCity)
B. Create a repeating group (RG) within this group with the repetitive date.
(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,Cus
tomerStreet,CustomerCity,RG(ProductNo,ProductDescription,ProductOrderQuantity,ProductUnitPrice))
STEP 1: 0NF, CONTINUED
C. Determine which attribute is the primary key for this form. A primary key is an
attribute or group of attributes that uniquely identify this specific order form.
Ask yourself…..

“If you give me this attribute, I am able to give you this exact form from a pile of similar forms”.

Let’s try….If I give you a customer number, I will give you this form. No, because this customer can have more orders,
so a customer number will show up on more than 1 order form.
Let’s try again…If I give you a product number I will give you this form. No, because a product can be ordered by more
customers, so it will show up on more than 1 order form.
Let’s try again..If I give you an order number, I will give you this form. This is correct, because only this
Order form has this order number: an unique number!
OrderNo is the primary key for this form. Add it to the group, put it in the front and underline it.
0NF(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,
SalesRepSurName,CustomerStreet,CustomerCity,RG(ProductNo,ProductDescription,
ProductOrderQuantity,ProductUnitPrice))
STEP 2: 1NF, FIRST NORMAL FORM

• Start from the 0NF.


• Take the repeating group out.
• Define a primary key for this new group. What uniquely identifies the data in this group?
• Add the original primary key of 0NF to the new group. A group becomes a table!
• Write down all groups. Assign logical names to groups. You design is now in 1NF.
• 0NF(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepS
urName,CustomerStreet,CustomerCity,RG(ProductNo,ProductDescription,
ProductOrderQuantity,ProductUnitPrice))
• 1NF
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesR
epSurName,CustomerStreet,CustomerCity)
• ORDERDETAILS(OrderNo,ProductNo,ProductDescription, ProductOrderQuantity,ProductUnitPrice)
2NF: SECOND NORMAL FORM
1NF
ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSur
Name,CustomerStreet,CustomerCity)
ORDERDETAILS(OrderNo,ProductNo,ProductDescription, ProductOrderQuantity,ProductUnitPrice)

• Start with 1NF and apply the following rule to create 2NF
• Look at groups that have a composite key (two or more
attributes underlined) .
• Determine if the non-key attributes in a group are dependent on
the entire key or one attribute of the key?
• If only dependent on one attribute, create a separate group with
this attribute as primary key and take out the dependent
element from the original group.
2NF: RESULT

• Let’s analyze…..
• 1NF
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,Cust
omerStreet,CustomerCity)
• ORDERDETAILS(OrderNo,ProductNo,ProductDescription, ProductOrderQuantity,ProductUnitPrice)
• For 2NF we only need to look at groups that have a composite key, which is ORDERDETAILS
• The first non-key attribute is ProductDescription. We need to determine if ProductDescription is dependent on:
• The entire key (OrderNo,ProductNo) or on OrderNo only of ProductNo only.
• Let’s try: If I give you OrderNo, I will give you the ProductDescription. This is false, since an order can have multiple
products.
• If I give you ProductNo, I will give you the ProductDescription. Yes, because a product has a unique number!
• So we create a new group for ProductDescription with ProductNo as primary key.
• PRODUCT(ProductNo,ProductDescription)
2NF: CONTINUED

• What about the next non-key attribute ProductOrderQuantity?


• If I give you ProductNo, I will give you ProductOrderQuantity. No, because multiple people can order the same
product with multiple quantities.
• If I give you OrderNo, I will give you ProductOrderQuantity. No, because an order has multiple products and therefor
quantities.
• Conclusion: you need both! If I give you ProductNo and OrderNo, I will give you ProductOrderQuantity. Yes. So
ProductOrderQuantity is dependent of both OrderNo and ProductNo and will stay in the original group.
• ProductUnit price is only dependent on ProductNo.
• Our 2NF will be:
• 2NF
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurName,Custom
erStreet,CustomerCity)
• ORDERDETAILS(OrderNo,ProductNo,ProductOrderQuantity)
3NF, AS FAR AS WE WILL GO…..
• 2NF
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo,SalesRepGivenName,SalesRepSurNa
me,CustomerStreet,CustomerCity)
• ORDERDETAILS(OrderNo,ProductNo,ProductOrderQuantity)
• PRODUCT(ProductNo,ProductDescription,ProductUnitPrice)
• In 3NF you ask yourself:
• Is there any non-key attribute dependent on another non-key attribute in a group?
• If so, take it out, create a new group with the attribute that has dependent attributes as a new key.
• ORDER group: CustomerNo is the first non-key attribute. Does this has any dependencies on any other non-
key attribute? Well no, but the other way around, yes. Give me your CustomerNo, I will give you
CustomerName, CustomerStreet and CustomerCity.
• CustomerName, CustomerStreet and CustomerCity are all dependent on CustomerNo.
• Same goes for SalesRepGivenName, SalesRepSurname which are dependent on SalesRepNo.
3NF: FINAL STAGE OF OUR DESIGN

• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo)
• ORDERDETAILS(OrderNo,ProductNo,ProductOrderQuantity)
• PRODUCT(ProductNo,ProductDescription,ProductUnitPrice)
• CUSTOMER(CustomerNo,CustomerName,CustomerStreet,CustomerCity)
• SALESREP(SalesRepNo, SalesRepGivenName,SalesRepSurName)

• This relational database model is in 3NF, the final stage here. There are a few more NF’s but they are beyond the
scope of this course.
• The groups represent database tables, the attributes in the group is the actual data in the table.
• As we can clearly see, the our design contains 5 tables. The next step is to determine the nature of the relationship
between these tables by designing a so called ERD.
• Based on the 3NF we can now create a conceptual database design called an ERD, an entity relationship diagram.
ENTITY RELATIONSHIP DIAGRAM (ERD)
• An Entity Relationship Diagram is a visual representation of your database design, usually
made after normalization.
• Sometimes an ERD is made before normalization, based on a situation sketch of an
organization.
• An ERD is comprised of three parts:
• Entities (rectangles in ERD) that act in the reality that you are modelling, e.g. customer,
booking, order etcetera. Entities correspond with tables! Entities are always nouns. They can
act or acted upon!
• Relationships between entities (shown as lines between rectangles in ERD) that represent
any real relationship that can be describe by a sentence with a verb between to entities in
this form: <entity1> <verb> <entity2>.
• The cardinality of the relationship (shown as fixed symbols on both sides of the line): how
many times can one entity act on another one. It describes the nature of the relationship
between entities. Possible cardinalities are 1:1 , 1:M, M:1, M:M, the latter is not permitted.
ERD BASED ON OUR 3NF
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo)
• ORDERDETAILS(OrderNo,ProductNo,ProductOrderQuantity)
• PRODUCT(ProductNo,ProductDescription,ProductUnitPrice)
• CUSTOMER(CustomerNo,CustomerName,CustomerStreet,CustomerCity)
• SALESREP(SalesRepNo, SalesRepGivenName,SalesRepSurName)

• This design shows 5 entities (corresponds with the tables):


• Order, Orderdetails, Product, Customer and SalesRep. They are all ‘real things’ in the outside world. Entities!
• These are 5 actors in this domain.
• We need to describe the relationship between these entities. We do this by creating sentences in the form
<entity1><verb><entity2>
• Does Order has something to do with Customer?
• Yes, of course! A Customer can place an Order! Watch this: <Customer><places><Order>  <entity1><verb><entity2>
• And…in reverse….An Order is placed by a Customer! And…<Order><placed by><Customer>  <entity2><verb><entity1>
• What about a SalesRep and a Customer…..? No!
• No direct relationship, because….SalesRep ….verb?.....Customer….what verb?
DESCRIBE ALL RELATIONSHIPS BETWEEN
ENTITIES IN SENTENCES
<ENTITY1><VERB><ENTITY2>
• We have the following entities: Order, Orderdetails, Product, Customer, SalesRep.
• Describe relationship  cardinality of the relationship
• Customer places Order  Customer can place 1 or M(many) Orders.
• Order placed by Customer  Order is placed by exactly 1 Customer.
• Order contains Orderdetails  Order contains 1 or M Orderdetails.
• Orderdetail belongs to Order  Orderdetail belongs to exactly 1 Order.
• Product ordered on Orderdetail  Product can be ordered 0,1 or M times on Orderdetail.
• Orderdetail shows ordered Product  Orderdetail shows exactly 1 ordered Product.
• SalesRep enters Order  SalesRep can enter 0, 1 or M Orders.
• Order is entered by SalesRep  Order is entered by exactly 1 SalesRep.
ENTITY RELATIONSHIP DIAGRAM MADE IN
DRAW.IO
Cardinalities:
In a relational database model, ALL relationship need to be 1:M or M:1
Customer – Order 1:M
If M:M relationships occur during design, than you have to introduce
Order – Orderdetail 1:M
an entity in between these two entities! For example
SalesRep – Order 1:M
Order – Product M:M …..we use OrderDetail in between Order and Product!
Product – OrderDetail 1:M
RELATIONSHIPS BETWEEN ENTITIES
(TABLES):
PRIMARY AND FOREIGN KEYS
• If you design correctly using normalization, you will find a primary key for every table (entity) you create.
• The relationships with other entities as described in the ERD are implemented by using a so called foreign key. This
means that the primary of another table is present in another table to ‘link’ these to table according to your design.
• Let’s take a look at your 3NF database model:
• ORDER(OrderNo,CustomerNo,CustomerName,OrderDate,SalesRepNo)
• ORDERDETAILS(OrderNo,ProductNo,ProductOrderQuantity)
• PRODUCT(ProductNo,ProductDescription,ProductUnitPrice)
• CUSTOMER(CustomerNo,CustomerName,CustomerStreet,CustomerCity)
• SALESREP(SalesRepNo, SalesRepGivenName,SalesRepSurName)
• You can clearly see that Orders and Customers can be linked by matching the CustomerNo.
• CustomerNo is the primary key (PK) in the Customer table, but also present in the Order table as foreign key (FK)
• ALL relationships in a relational database are made by matching the PK in one table with the same FK in the other
table!
DATABASE DESIGN: TERMINOLOGY
OVERVIEW

• A table represent an entity, object from the real world. In this case Customers.
• Attributes (properties of the entity) are represented in the columns.
• Every row (called a record or a tuple in database jargon, not row!) represents ONE customer!
• The first column (or columns in case of a composite key) is the primary key.
• The primary key values are unique per record, are mandatory (cannot be null)
Let’s have a look at this simple relational database design that is populated (means: has data, records) that is comprised
Of a customers table and order table.
- How many orders did Customer Brown place?
- Did William order Jeans?
- Did Ronchi order a Computer and a Mug?
- How many orders did Ronchi place?
- Who has ordered the Shorts?
CONCEPTUAL / LOGICAL / PHYSICAL
DATABASE DESIGN
• Database design is always based on a real world scenario and set in a specific domain.
• Information Analysis leads to a conceptual design of your database: it identifies the major entities
(order, customer, suppliers, products, etc) and describes the relationships between these entities. You
are modelling reality without worrying about the technical database implementation. An ERD is a
conceptual design.
• A conceptual design leads to a logical design. In a logical design (also represented as an ERD) you add
in more information on the attributes, if they are mandatory or not (e.g. CustomerPhone), the
datatypes of the attributes (act as contraints!) and the cardinality of the relationships. Logical model
implements the business rules of your domain. E.g. A Productcode consists of 5 alphanumerical chars.
• A physical design is the actual implementation of the ERD in a DBMS like SQL Server.
DIFFERENCES ARE NOT FIXED!

You might also like