Information Systems - Data Modeling
Information Systems - Data Modeling
Module Code | Module Name | Lecture Title | Lecturer SLIIT - Faculty of Computing
Contact Person
• Lecturer In Charge:
Mrs. Manori Gamage
• Lecturers:
Prof. Mahesha Kapurubandara
Mrs. Muditha Thissera
Mrs. Sanjeewi Chandrasiri
Mrs. Pradeepa Bandara
Mrs. Lokesha Weerasinghe
• Tutorials
• 1 × 1 hour tutorial per week
• Lab Sessions
• 2 × 1 hours of lab session per week
Module Code | Module Name | Lecture Title | Lecturer SLIIT - Faculty of Computing
Learning Outcome
• LO1: Explain the importance and impact of
information systems in business organizations.
• Contains Subsystems
Human
Body
•Better information
•Improved service
•Increased productivity
•Competitive Advantage
• Globalization opportunities
• Internet has drastically reduced costs of operating
on global scale
• Presents both challenges and opportunities
Growing interdependence
between ability to use
information technology and
ability to implement
corporate strategies and
achieve corporate goals.
Questions..?
IT1090| ISDM | Lecture 1 |Mrs. Muditha Thissera
Reference
• K. C. Laudon and J.P. Laudon, “Management
Information Systems: Managing the digital Firm”,
Chapter 1, INFORMATION SYSTEMS IN BUSINESS
TODAY, 13th Ed, 2014
Lecture – 02
1
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Lecture 02 - Recap
2
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Learning Outcomes
3
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Session Outcome
After completing this session you will be able to;
4
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Because…
These questions require a good, documented and
communicated understanding of business processes!
Process mapping?
7
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
8
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
• When is it happening?
• Where is it happening?
9
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
10
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Decision
Document Database
Multidocument Delay
Phase Separator
12
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Example:
Start Note:
• use terminator/start symbol
to show start and end of the
Activity / A process
Sub process 1 delay
Page 1 Page 2
documents A phase:
Time:
Process Ending
If P2 1 1 P3 Activity
Starting P1 Yes
documents
No
document
P3 File-away
Different
process
14
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Walk towards
the bus halt Wait until
bus reach
to the
Waiting destination
for a halt
bus
A bus
reached to
the bus No
halt?
Get down the bus
Yes
Reach to the
Get in to the bus Walk to the Destination
destination
15
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
16
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Answer:
Student
Receiving a Issue a RR form RR form Registration
Registration Request with deadlines Book
Waiting for
the form
submission Record Registration
Checks RR form
17
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
18
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
• Does the process always work this way, are there exception?
• What happens when things get really busy, do people do things differently?
• What are all the documents that can be used in this process?
• What reports does this process produce and how are they used?
19
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Class Activity :
Analyze this
process map and
prepare a list of
questions to ask in
order to improve
the process.
20
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Benefits
• Gives everyone a clear understanding
of the process
21
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Next Lecture
• Modern Information Systems in Business
22
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
End of Lecture 02
23
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Lecture – 03
1
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Lecture 03
• What is a business process.?
• How business processes work
• What are its components
2
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Learning Outcomes
• LO2: Evaluate the information systems strategies to
achieve organizational goals
3
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Session Objectives
At the end of this session, you will be able to;
Harvard Mark I
ASCC: IBM Automatic Sequence Controlled Calculator
55 feet long, 8 feet high, 5 tons
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
In 1980s
• Invention of a microprocessor
• Development of PC’s - major landmark
• Start using computers by people, customers
and end users of Business organizations
Key Idea:
Get ready to adopt to the modern world
with modern Information Systems!!!!
Decision Enterprise
E-Commerce
Support Systems
Systems
Systems
IoT BI(Business
Cloud based (Internet Intelligence) Systems
Systems of Things)
Systems
Expert systems
Based on Artificial Intelligence technologies.
Computer system that emulates the decision-making ability of
a human expert.
Represent the knowledge and decision-making skills of
specialists.
Enterprise Systems
Support the business processes of an organisation across any
functional boundaries that exist within that organisation.
Customer
Supplier
Business Organization
Business partners
Watch video on
SAP BUSINESS SUITE 7
RETAIL DEMO
These systems refer to all the activities involved with obtaining items from a supplier,
including procurement, transportation and warehousing.
Aim to streamline and make more effective the processes between an enterprise and its
suppliers
E-Commerce Systems
What is E-Commerce?
E-commerce is usually associated with buying and selling
over the Internet
or
Conducting any transaction involving the transfer of
ownership or rights to use goods or services through a
computer-mediated network
Thomas L. Mesenbourg, Measuring Electronic Business:
Definitions, Underlying Concepts, and Measurement Plans
Saleforce.com
Google Cloud Platform (GCP)
Oracle Cloud,
Amazon Cloud Drive
Microsoft Azure
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Ex:
• Oracle NetSuite - a complete cloud-based ERP
solution on Oracle Cloud.
IOT Concept:
The Internet of Things (IoT) is a network of objects each embedded with
sensors(things) which are connected to the Internet.
This concept is connecting any device with an on and off switch to the
Internet. This includes everything from mobiles, air conditioners,
headphones, wearable devices and almost anything.
With IoT…
IoT systems have applications across industries through their unique flexibility and ability to
be suitable in any environment.
Ex:
Transportation and logistics , Healthcare, Agriculture,
Smart environment (home, office, plant), Personal and social,
Energy and power
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
http://www.oracle.com/technetwork/middleware/bi-
https://www.yellowfinbi.com/ enterprise-edition/overview/index.html?ssSourceSiteId=opn
https://www.sisense.com/
http://www.sap.com/pc/analytics/business-intelligence.html SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Reference
• K. C. Laudon and J.P. Laudon, “Management
Information Systems: Managing the digital Firm”,
INFORMATION SYSTEMS IN BUSINESS TODAY, 13th
Ed, 2014
37
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Learning Outcomes
• LO2: Evaluate the information systems strategies to
achieve organizational goals
38
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Next Lecture
• Data Modeling
• (Introduction to Database and DBMS)
39
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
End of Lecture 03
40
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Introduction to Database
Modelling
Lecture - 04
1
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Introduction to Database
To better understand what drives the design of databases,
first need to understand the difference between data and
information.
➢ What is Data?
➢ What is Information?
➢ What is Database (DB)?
➢ What is Database Management System (DBMS)?
Cont.
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Cont.
Sales Order
Accounts Payable Processing Payroll
Program Program Program
Inventory Employee
Vendor Invoice Customer
file file
file file file
Database Approach
Limitations of Conventional File-based Approach:
Database Approach
➢ Arose because:
Definition of data was embedded in application
programs, rather than being stored separately and
independently
No control over access and manipulation of data
beyond that imposed by application programs
➢ Result:
The Database and Database Management System
(DBMS).
Cont.
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Database Approach
Order Dept. Accounting Payroll
Dept. Dept.
A B C
Database Approach
Cont.
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Cont.
History in a Nutshell
➢ First DBMS: Bachman at General Electric, early 60’s
(Network Data Model). Standardized by CODASYL.
➢ Late 60’s : IBM’s IMS (Inf. Mgmt.Sys.) (Hierarchical Data
Model).
➢ 1970: Edgar Codd (at IBM) proposed the Relational Data
Model. Strong theoretical basis.
➢ 1980’s -90’s: Relational model consolidated. Research on
query languages and data models => logic-based
languages, OO DBMSs => Object-relational data model
(extend DBMSs with new data types)
Directed Reading Section 1.4, 1.5 and 1.6 in Elmasri and Navathe.
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Cont.
Data Independence
Data Independence is the capacity to change the
schema at one level of a database system without
having to change the schema at the next higher
level
➢Logical Data Independence: Change conceptual
schema without having to change external schemas
and their application programs.
➢Physical Data Independence: Change internal
schema without having to change conceptual
schema.
Data Independence
Database Design
Why is Database Design important?
4.Schema Refinement
Fine tune the result
6.Security Design
Implement Controls to ensure security and integrity
Data Modeling
What is a Data Model?
• A data model focuses on what data should be stored in
the DB and how it should be organized
• Without representing the data as a database would see
it, a data model represents the data as the user sees it
in the ‘real world’
• A data model can be considered similar to an
architect's building plan
The goal of the data model is to make sure that all data objects
required by the database are completely and accurately represented
Hierarchical Model
Network Model
Relational Model
6.Security Design
Implement Controls to ensure security and integrity
Eg. ER Model
End of Lecture - 04
Questions ?
37
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Learning Outcome
• LO3: Model data requirements using data models
Conceptual Modeling
ER Diagrams
• Many versions of ER-diagrams; differ both in their
appearance and in their meaning
• We will use the version appearing in the book
(Database Management Systems by Elmasri
Navathe)
• Have a formal semantics (meaning) that must be
thoroughly understood, in order to create correct
diagrams
ER Diagrams
E-r Notation
SYMBOL MEANING
ENTITY
WEAK ENTITY
RELATIONSHIP
IDENTIFYING
RELATIONSHIP
ATTRIBUTE
KEY ATTRIBUTE
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
E-r Notation
SYMBOL MEANING
MULTIVALUED ATTRIBUTE
COMPOSITE ATTRIBUTE
DERIVED ATTRIBUTE
E1 R E2 TOTAL PARTICIPATION OF E2 IN R
E1
1
R
N
E2 CARDINALITY RATIO 1:N FOR E1:E2 IN R
• Graphically,
ENTITY STUDENT
• First letter of each word in the entity name is uppercase
• E.g., Student
Age
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
BOC Network
Entity Set
Example: name, id, age & salary are attributes in Employee
entity
Salary
salary name
e1
age id
e2 Age
e3 EMPLOYEE
e4
e5 Id
e6
…
Name
Employee Entity set
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Example
Name Manf
JUICE
Domain of an Attribute
Composite Attributes
Graphically,
Name EMPLOYEE
Surname Middle_name
First_name
Your Turn!!
E-mail
Surname First_name
Your Turn!!
Derived Attributes
age
name
Derived attribute
An exercise
stuNu stuName stuMajor stuDob stuHrs stuYr stuGpa stuAge
m
stuName
stuMajor
stuDob
stuHrs
stuYr
stuGpa
stuAge
Key Attributes
• key attribute - Minimal set of attributes which
uniquely identify an entity in the entity list. Underlined
salary name
e1
age id
e2
e3
e4
e5
e6
…
Employee Entity set
Primary Key
Employee
Candidate
eid nic
key
Composite Key
• Sometimes, a group of attributes make up the key. This
is called a composite key.
• Example :
Composite key = (student no + Unit Number + marks)
ST ID Unit ID Marks
IT1601 IT103 85
IT1601 IT104 78
IT1602 IT103 72
IT1603 IT104 82
SLIIT - Faculty of Computing
Super Key
IT1090 - Information Systems and Data Modeling
name
address
phone
student gender
courseNo
studentID dob
course coursename
fee
duration faculty
relationship
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Example: Relationships
Drinkers frequent
some bars.
name Drinker addr
Degree of a Relationship
N 1
Student registers Course
relationship
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Degree of Relationship
• = no of participating entities
• Relationships can be classified based on their degree
into
– Binary – relationship with two participants-
Degree/No of Entities = 2
– Ternary – relationship with three participants
Degree/No of Entities = 3
– Quaternary – relationship with four participants
Degree/No of Entities = 4
Ternary Relationships
Ternary Relationship
name addr name manf
Preferences
Drinker
name addr
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Ternary relationship
name
Location
name dname
ssn lot did budget
12-233 D10
12-354 •
•
D12
12-243
• D13
12-299 • Rome
London
Paris
• one-to-one (1: 1)
• one-to-many (1 : N)
• many-to-many. (N : M)
One-to-One relationship
1 1
Employee Manages Department
Your Turn !!
ONE-TO-MANY RELATIONSHIP
N 1
Employee Works in Department
Drinker N Favorite
1 Juice
Your Turn !!
Many-to-many relationship
Your Turn !!
• Think about the Library database and give an example
of a N:M relationship
Exercise
Identify the Cardinality
• Doctor patient
• Principal School
• Mother child
• Husband wife
• Teacher Student
Descriptive Attributes
N 1
Employee Works in Department
Descriptive Attributes
1/2/99
e011 1/2/09
HR
e022
1/5/01
Sales
e033 1/2/07
Marketing
e044
1/3/05
Departments
Employees
Work_in
Attribute on Relationship
price
ER Diagram
Your Turn !
• Determine
– Cardinality
Restrictions- constraints
What is the criteria to become a student at SLIIT?
Register for a degree
Mandatory /compulsory
• N 1
Student Register Course
Participating Constraints
Participation Constraints
Participation Constraint
Student?
Weak Entity
• Parents employed?
• Does the company cover THE CHILDRENS
medical insurance?
• How do you claim your medical bills
– Can you get it reimbursed or through your
parents
• Is the same coverage given to children after
resignation
Weak Entity
1 M
Employee covers Dependent
Double lined box
Partial key
Weak Entity
• Some entities can’t exist on its own.
• Its’ existence-dependent on another entity, i.e.,
it cannot exist without the entity with which it
has a relationship.
address
atmID since transac# amount
1 type
N
ATM facilitates Transactions
Recursive Relationship
Student 1 Leads
N
XStudent
leads
Student
Recursive Relationship
Recursive Relationships
• In most companies, each employee (except the CEO) is supervised
by one manager. Of course, not all employees are managers.
X
Manager supervise Employee
salary salary
Job tittle Job tittle
eno name
1 Manager
Employee supervise
salary N Subordinates
Job tittle
Your Turn !
(1,1)
(0,1)
1 1
(0,N)
(1,1)
N 1
(NNwork for a department. .
Employee must
A department may or may not have any employee
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Enhanced ER Modelling
Staff Specialization/generalization
indicator
Hours worked
Contract_ duration
Hourly rate
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
End of Lecture 05
Questions ?
75
Lecture - 06
Learning Outcome
• LO3: Model data requirements using data models
Relational Model
• Data model on which most DBMS implementations
are based. CODD 1970
• Relation consists of
– Relation schema
– Relation instance (table)
Relation
• RELATION:
– Schema
– Instance
Relation Schema
• Describes the column heads (attributes) of the
relation
Relation Schema
Schema Name(relation name)
• Students
(sid:string,
name:string,
Field name login:string, domain
age :integer,
gpa :real )
• Domain GPA : real (0-4)
Relation Instance
• Set of tuples or records or rows :
• Each tuple has the same number of fields as the
relation schema
Example : Relation Instance
Degree of a relation
• The degree of R is the number of attributes in R
• (ID,Name,Address,Phone)=4
Example
Make a list of students in the class, keeping their ID,
name and phone number.
ID Name Phone
S01 Mike 111
S02 Elisa 222
Formalizing : Relations
• Definition: A relation is a named table of data
– Table is made up of rows (records or tuples), and columns
(attributes or fields)
• Not all tables qualify as relations. Requirements:
1. Every relation has a unique name
2. Every attribute value is atomic (not multivalued, not
composite)
3. Every row is unique (can’t have two rows with exactly the same
values for all their fields)
4. Attributes (columns) in tables have unique names
5. The order of the columns is irrelevant
6. The order of the rows is irrelevant
Foreign Key
• A constraint involving two relations
– referencing relation
– referenced relation.
• Tuples in the referencing relation have attributes FK
(called foreign key attributes) that reference the
primary key attributes PK of the referenced relation
referencing Enrolled (cid ,grade, studid) FK
Integrity Constraints IC
• DBMS must prevent entry of incorrect
information
• To prevent : Constraints / conditions are specified
on a relational schema = ICs
• Database which satisfies all constraints specified
on a database schema is a legal instance.
• DBMS enforces constraints - permits only legal
instances to be stored
• When the application is run the DBMS checks for
the violation and disallows the changes to the
data that violates the specified IC
Integrity Constraints
• Specified and enforced at different times.
Integrity Constraints
Many kinds of ICs:
– Domain constraints
– Key constraints
– Entity integrity constraints
– Referential integrity constraints
Domain Constraints
• Domain constraints: value in the Column must
be drawn from the domain associated with that
column
• Restricts the :
• Type
• Values that can appear in the field
Eg.
• Name Char (25)
• GPA (real >=0, =<4)
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Key constraints
• Is a statement that;
– A certain minimal subset of the fields of a
relation is a unique identifier for a tuple.
• Which Means
– Two tuples in a legal instance cannot have
identical values in all the fields of a key.
Constraints…
• Entity Integrity Constraints: states that primary key
values cannot be null
• This is because primary key values are used to identify
the individual tuples.
• Referential Integrity Constraints
• Some times information stored in one relation is
linked to information stored in another relation.
• If one is modified the other must be modified to keep
the data consistent.
• An IC involving both relations must be specified
• IC involving 2 relations is a foreign key constraint.
• Foreign keys enforce referential integrity constraints
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Referential Integrity
• The value in the foreign key column ( can be either:
– a value of an existing primary key in the referenced
relation or a null
FK
PK Students
cid grade stuid
sid name login age gpa
Carnatic101 C 53666
53666 Jones jones@cs 18 3.4
Reggae203 B 53666
53688 Smith smith@eecs 18 3.2
Topology112 A 53650
53650 Smith smith@math 19 3.8
History105 B 53666
Enrolled
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Insert operation
The insert operation can violate the following
constraints:
– Domain constraints (invalid value)
– Key constraints (duplicate key values)
– Entity integrity constraints (null primary key
value)
– Referential integrity constraint (non-existing
primary key value)
examples
Students sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
examples
Students sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
examples
sid name login age gpa
Students
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
Delete operation
• Delete operation can violate referential integrity.
Enrolled Students
cid grade stuid sid name login age gpa
Carnatic101 C 53666 53666 Jones jones@cs 18 3.4
Reggae203 B 53666 53688 Smith smith@eecs 18 3.2
Topology112 A 53650 53650 Smith smith@math 19 3.8
History105 B 53666
• Two options:
• Reject the deletion
• Cascade the delete
Update operation
• Update operation can be considered as a deleting
a tuple and re-inserting the tuple with new values
• All constraints discussed in Insert & Delete need to
be considered
– Domain constraints (invalid value)
– Key constraints (duplicate key values)
– Entity integrity constraints (null primary key
value)
– Referential integrity constraint (non-existing
primary key value)
ER to Relational Mapping…
• In the Database Design process, we firstly derive a
conceptual model (ER Diagram)
• This model needs to be mapped to the relational
model in order to be implemented using a relational
DBMS (RDBMS). Moving from Conceptual (ER) to
lower level Logical Model (Relational)
• ER is independent of the details of the
implementation (relational, network or OO)_
• This section discusses the rules that can be used for
this process
– For example,
name ARTIST
ARTIST
ID
Primary key
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
fullname firstname
surname
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Employee
Employee
name id name
id
phone Employee_contact
phone id
Employee Dept
manages Dept
dno mgreno
eid
dno
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
AB
A has B
Is registered FK
N S-DEGREE
Student
S-DOB
S-ID S-NAME
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
FK
FK
A B
R pkA pkB pkC pkD
r
C D
C D pkC pkD
Policy id
id
N
Dependents DEPENDENTS
id name age
name age
SLIIT - Faculty of Computing
IT1090 - Information Systems and Data Modeling
Lecture Outline
• ER-to-Relational Mapping Algorithm
– Step 1: Mapping of Regular Entity Types
• Multivalued attributes.
• Composite attributes
End of Lecture 06
Questions ?
43
SLIIT - Faculty of Computing
Information systems and Data Modeling (IT 1090)
Lecture 07 – Schema Refinement
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Learning Outcome
• LO3: Apply formal methods to refine the data
model
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Database Design process
1.Requirements Analysis
What does the user want?
2.Conceptual Database Design
Defining the entities and attributes, and the relationships between these
--> The ER model
3.Logical Database Design (Map ER to Relational Schema)
4.Schema Refinement (fine tune )
5.Physical Database Design
Implementation of the design using a Database Management System
6.Security Design
Implement Controls to ensure security and integrity
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization
• Conceptual Modeling is a subjective process
• Therefore, the schema after the logical database design phase may not be very
good (contain redundant data)
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization
• Relational database schema = set of relations
• Relation = set of attributes
• How we group the attributes to relations is
very important
• Normalization or Schema Refinement help
determine “GOOD” relations
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Purpose of Normalization
• To avoid redundancy by storing each ‘fact’ within
the database only once.
• To put data into a form that conforms to relational
principles (e.g., single valued attributes, each
relation represents one entity) - no repeating
groups.
• To put the data into a form that is more able to
accurately accommodate change.
• To avoid certain updating ‘anomalies’.
• To facilitate the enforcement of intergrity
constraints.
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Redundancy and Data Anomalies
Redundant data is where we have stored the same
‘information’ more than once. i.e., the redundant data
could be removed without the loss of information.
Example: We have the following relation that contains staff and department details:
Insert Anomaly: We can’t add a new a dept without inserting a member of staff
that works in that department
Update Anomaly: Change the name of the Accounts dept to Finance dept. We have
to change all other records to avoid update anomaly.
Deletion Anomaly: Employee SL10 resigns. We remove the record. With that we
lose all information pertaining to the Sales dept.
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Repeating Groups
Is an attribute (or set of attributes) that can have more than one
value
staffNo job dept dname city contact number
SL10 Salesman 10 Sales Stratford 018111777, 018111888, 079311122
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Schema Refinement
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Problems related to decompositions
What problems (if any) does a given
decomposition cause?
To help:
Two properties of decompositions:
• Loss-less join property
• Dependency preserving property
Normal forms have been proposed to preserve
above properties.
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Loss-less join property
Loss-less join property: we might lose
information if we decompose relations…
S R1 R2
S P D S P P D
S1 P1 D1 S1 P1 P1 D1
S2 P2 D2 S2 P2 P2 D2
S3 P1 D3 S3 P1 P1 D3
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Loss-less join property (contd.)
Joining them together, we get spurious tuples…
R1 R2 S P D
S1 P1 D1
S1 P1 D3
S2 P2 D2
S3 P1 D1
S3 P1 D3
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Dependency-preserving property
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Informal Guidelines
Guideline 1 : The relation’s semantics should be clear
and easy to explain.
Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed in
the same relation
Only foreign keys should be used to refer to other
entities
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Informal Guidelines
Guideline 2 : Minimize the storage space used by the
base relations and design a schema that does not
suffer from the insertion, deletion and update
anomalies.
Relations
suffer from
anomalies
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Informal Guidelines (contd.)
Guideline 3 : Relations should be designed such that
their tuples will have as few NULL values as possible
Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
• Reasons for nulls:
attribute not applicable or invalid
attribute value unknown (may exist)
value known to exist, but unavailable
Guideline 4 : Design schemas so that they can be joined
with equality conditions on attributes that are primary
key , foreign key (this will avoid spurious tuples
generated).
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Formal Process
Formal process for good relational schema:
• To avoid the above mentioned issues in the
relational schema, we can apply a formal process
called Normalization
• Normalization is based on functional
dependencies
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Functional Dependencies
• FDs are used to specify formal measures of the
"goodness" of relational designs
• FDs and keys are used to define normal forms for
relations
Chapter 10-
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Functional dependency
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Functional dependency
(Ssn,Pnumber) -> Hours (SSN & PNUMBER detrmines hrs emp work on a
project)
Ssn -> Ename
Pnumber -> (pname, plocation) (PNUMBER determines pname & location)
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Functional dependency
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Functional dependency
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Functional dependency
Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold
Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X → Y
IR2. (Augmentation) If X → Y, then XZ → YZ
(Notation: XZ stands for X U Z or {X, Z})
IR3. (Transitive) If X → Y and Y → Z, then X → Z
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Functional dependency
Closure of a set F of FDs is the set F+ of all FDs that can be
inferred from F
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization
• Key points:
– Redundancy is based on functional dependencies
– Therefore, normalization is based on functional
dependencies
– Therefore, relational database schema need to be
refined
• Schema Refinement Steps:
– Determine Functional dependencies for relation
– Find all keys in relation
– Normalize the relation
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Database basics - review
Review of some terms
• Key: A key is a superkey with the additional
property that removal of any attributes from the
key will not satisfy the key condition (minimal set
of attributes)
•eg : Student-No
• Superkey: Set of attributes S in relation R that
can be used to identify each tuple uniquely.
•eg : ( Student-No, name)
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Database basics - review
• Candidate Key: Each key of a relation is called a
candidate key
• Primary Key: A candidate key is chosen to be the
primary key
• Prime Attribute: an attribute which is a member
of a candidate key
• Nonprime Attribute: An attribute which is not
prime
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Stages of Normalization
• There are many Normal Forms proposed to reduce redundancies
Unnormalised
(UDF) Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
Remove transitive dependencies
Third normal form
(3NF) Remove remaining functional
dependency anomalies
Boyce-Codd normal
form (BCNF)
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Un-normalized Normal Form (UNF)
A relation is un-normalized when it has not had any
normalization rules applied to it, and it suffers from various
anomalies
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization - 1st Normal Form
• A relation R is in first normal form (1NF) if domains of all attributes in the
relation are atomic (simple & indivisible).
• Avoid multi valued & composite attributes
• Remove repeating groups into a new relation
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization - 1st Normal Form
For example:
DEPARTMENT (Dname,Dnumber, DMGRSSN, (DLocation))
DEPARTMENT
• Department is in UNF
• Department relation not in 1NF
• How to take into 1NF ?
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Solution
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Full Functional Dependency
Partial dependency
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization – 2nd Normal Form
Steps from 1NF to 2NF:
Remove the attributes that are only partially functionally dependent
on the composite key, and place them in a new relation.
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization – 2nd Normal Form
Example: After normalized into 2NF
TEACHER CAMPUS COURSE
CAMPUS ADDRESS
Metro BoC Merchant Tower
Malabe Malabe Campus
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
EMP_PROJ
SSN PNUM HOURS ENAME PNAME LOC
FD1
FD2
FD3
FD3
FD2 SSN ENAME PNUM PNAME PLOC
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization – 3rd Normal Form
• A relation R is in 3rd normal form (3NF) if every
– R is in 2NF, and
– No nonprime attribute is transitively dependent on any key
– Remove transitive dependencies into a new relation
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization – 3rd Normal Form
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization – 3rd NormalForm
FD1
ENAME SSN BDATE ADD DNUM
FD2
DNUM DNAME DMGR
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Boyce-Codd Normal Form
A relation schema is in Boyce-Codd Normal Form
– If every nontrivial functional dependency X→A
hold in R, then X is a superkey of R
– A relation is in BCNF if and only if, every
determinant is a candidate key
– Every relation in BCNF is also in 3NF
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization –(BCNF)
PROPERTY_ID COUNTY_NAME LOT# AREA PRICE TAX_RATE
FD1
FD2
FD3
FD4
Keys: PropertyID, (County_Name, Lot#)
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization –BCNF
(a) LOTS1A
PROPERTY ID# COUNTY_NAME LOT# AREA
FD1
FD2
FD5
BCNF Normalization
LOTS1AX LOTS1AY
PROPERTY ID# AREA LOT# AREA COUNTY_NAME
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Normalization
• 1NF, 2NF, 3NF & BCNF guarantee to preserve lossless join
property
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
De-normalization
Sometime for performance reasons, database
designer may leave the relation in a lower
normal form. This process is known as de-
normalization.
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
Exercise 1
FD1
Patient# Surgeon# Surgery_Date Patient_Name Patient_Addr Surgeon_Name Surgery Drug_Admin Side_Effects
FD2
FD3 FD4
Questions ?
IT1090 | Information Systems and Data Modeling | Schema Refinement | Pradeepa Bandara
51