RDBMS
RDBMS
RDBMS
2
Evolution of database systems
Data important to us must be stored so that we can retrieve it
latter when we need it.
I need to save information about my100s of
patientsso that I know their history when they
come next time. I need a mechanism toretrieve
the information easily. I also dont want any body
to view my data. Can you also ensurethat I dont
enter thesame data again! And
3
Ways to store data
There are two ways to store data in computer system:
1. Using file system
2. Using database
Note here we are talking about how data is organized when stored.
Why ?
4
Disadvantages with file
For data entry and retrieval large special programs need to be written.
Programs must make sure that there is no redundant data.
Programs must ensure that data is consistent when different user
manipulates the same data concurrently or if system crashes.
Retrieval of particular data from a large volume will require complex
coding.
5
More
Providing security mechanism for different subset of data again will
require complex design.
Different users in different system may store the data on different
files. This leads to data isolation or islands of data. Writing programs
to merge the data, retrieve data etc. becomes extremely difficult.
No way to check the integrity of data. That is no way to check if some
inappropriate values (like negative salary) is entered in the file.
6
As an application programmer my focus should be more on
business aspects of my customer. Writing C programs to store,
retrieve and maintain consistency for every application I write
takes most of my time and attention!
Storing, retrieving and maintaining data is common for all
applications. I wish I had a prewritten system which takes care
of all of these and gives me a simple interface to interact with -
may be couple of simple commands to work with my data !
7
Database system to rescue!
Database is a collection of meaningful and related data.
The software that manages this data is database management
system (DBMS) or database system.
For a given database there is a structural description of the type of
facts held in the database which is called Schema. The schema
describes the items or objects that represent the database. Based
on the schema there are different database models but before that
8
Advantages of database system
Reduced application development time
Easy storage and retrieval
Data integrity
Controlled Redundancy
Data Consistency
Security
Transaction and Concurrency
Crash Recovery
Administration
Features of a database system
9
Data Models
Hierarchical Model
Network Model
Relational Model
Object Model
Our focus
10
Hierarchical Model
Data is organized into tree-like structure.
Used by the older mainframe database systems like IMS by IBM.
Parent-child relationship between the data. 1:N mapping.
Doctor 1
Patient A Patient B
Treatment 1 Treatment 2
pointers
a record or a set of data
node
11
Network Model
Network model is very much like hierarchical model with the
exception that it allows a record to have more than one parent
N:M mapping (many to many relationship)
Customer Bank
Account
A customer has accounts in many banks.
A bank has many customers who have accounts.
12
Disadvantages of both of these models
Complexity
No structural independence
Limited ad hoc querying capability
13
Relational Model
Most database system use this model.
The schema for relational model is a table (relation) which has a name,
attributes or column or field and a type for each field.
Relationship is maintained by the common attributes in the tables. Can be
one-one or one to many or many to many.
An example of representation of a relation Student is:
Student(studid: integer, name: string)
Marks(studid: integer, semester:integer, marks:double)
14
Table
Attributes or fields
Student
Stud
Id
Name
1 Raghu R
2 Mary Molle
3 Henry F.Korth
4 Jennifer Wisdom
5 Uma R
Stud
Id
Sem Marks
1 1 78
2 2 76.6
3 3 89
4 4 90
5 5 89.3
Marks
Tuples, records or rows
15
Advantages
Structural independence
Easy database design, implementations and management.
Ad hoc query capability
RDBMS Products
Oracle
DB2
MySQL
Sybase
MS-SQL Server
16
Object-Oriented database model
The new wave of database technologies use Object-oriented data
model (OODM).
OODBMS
Object-oriented features
Conventional DBMS features
17
Object schema representation
EMPLOYEE
NAME
ADDRESS
DOB
Will Smith
Smith house, Ford Road
.
14-Sept-1980
instances
class
18
ORDBMS
An object-relation database is a relational database that allows
developers to integrate their own custom data and types: object-
relation mapping.
Object model
Relational model
Object-relation model
19
Schema
Disk
External Schema 1
Conceptual/Logical Schema
Physical Schema
External Schema 2
Levels of abstraction in DBMS
How should the data be designed so that only correct and
useful data is stored TABLES OR RELATION in RDBMS
How should the data appear to users - VIEWS in RDBMS
How should the data be stored- files, record data structures , INDEX
20
Views and Index
A view is a relation which is computed from one or more table. It
consists of a subset of columns from one or more tables.
Views are not stored. While the tables are created in such a way that
data stored is correct, consistent and useful, views are created solely
based on the users requirement.
Index is created for columns of a table so that internally the data is
stored in such way that retrieval of data using the indexed column is
faster.
21
Data Independence
Application programs must be insulated from the changes that
happen in the way the data is structured or stored Data
Independence.
Changes in logical schema should not affect the application program
Logical/Structural Data Independence.
Changes in physical schema should not affect the application
program (conceptual schema shields application programs from the
physical storage changes)- Physical Data Independence.
22
RDBMS Features
Queries for easy storage and retrieval
Data independence
Integrity constraints and Data Consistency
Controlled Redundancy
Security
Transaction management and concurrency
Crash Recovery
23
Roles
Application programs
Procedures &
Standards
Data
DBMS and utilities
access
calls
write
designs
Write and enforces
uses
H/W
PROGRAMMER
DATABASE
DESIGNER
DATABASE
ADMINISTRATOR
END
USER
ADMINISTRATOR
manages
24
Types of database system
Type of user:
Desktop Database
Multi-user Database
Type of location
Centralized database
Distributed database
Distributed database
Transactional or Production DBMS
Data warehousing and decision support
system
Database and RDBMS Concepts
Courseware designed by
Training Group
What is a
Data Base Management System?
Introduction to Database
Management Systems
The most traditional/common way of managing
data is to store them in files.
File processing system
Files can be
Sequential
Indexed
Relative
In this kind of scenario, the files are managed
directly by the operating system services
The primary focus here is to manage the files
effectively rather than the contents of them
Though this mechanism is very good, it has
many draw backs
Types of files
Data File
Others
Source File
Object File
Library File
Executable File
Temporary Files
List File
Some major drawbacks of File
Processing System
File processing system manages files
rather than the data in them.
File Processing System does not provide
effective security
Either the entire file is secured or it is not
Some part of the file cannot be secured
record or column level security is not available
Any program which works with files has
to know the physical as well as the logical
structure of the file
Any change in either physical or logical structure of the file, makes it
necessary for the program to be rewritten
Physical Data Dependence
Logical Data Dependence
Introduction to DBMS
Data Base Management System
A DBMS is a special set of software which is used to manage data
(information)
Following are some of the important
functions of a DBMS
Data is Accurate (Accuracy)
Data is provided within a timeframe (Timeliness)
Only required data is provided (Relevancy)
Why Database ?
A database system provides a central
control of its data
This is very different from traditional file
processing systems where each
application / department has its own set
of data (Data redundancy)
Advantages of Database
A database is a collection of data
A file is a collection of bits and pieces
stored together as a single entity.
A database system internally relies on the file processing system to
manage its data
Externally to the user, he feels that he is storing data, rather than a file
Advantages of Database
Redundancy can be reduced
Inconsistency can be avoided
Sharing of Data
Standards can be enforced
Security restrictions can be applied
Integrity of data can be maintained
Conflicting requirements can be balanced
Characteristics of DBMS
Data independence
Speedy handling of spontaneous
information requests
Non-Redundancy
Versatility in representing relationships
between data items
Security protection
Real Time accessibility
Comparison of different systems
For better understanding we have broken the
different types of data processing systems into the
following four categories
Rudimentary I/O software
Programs working with flat files
COBOL or C language
File Storage with Access Method
Programs working with file managers like
VSAM or ISAM or Btrieve
COBOL or C Language
Database Management System
Programs written in COBOL or C or Java to work with RDBMS
like DB2, Oracle or Sybase
Programs written in COBOL or RPG to work with Hierarchical
and Network DBMS
Lot of efforts in generating reports
Advanced Management Tools
Use of Rapid Application Development Tools like Report
Generator or Forms Generator for creating large applications
Comparison of Different Systems
Rudimentary I/O
Software
File Storage with
Access method
Database Management Advanced Management
Systems
No independence Storage Independence Physical Data
Independence
Logical Data
Independence
Application
programmer
designs the physical
data layout and
embeds it into his
program
Storage units can be
changed without
changing program
Physical data structure
can be changed
without changing
programs
Global logical data be
changed without
changing the program
Application
programmers view
of the file is same as
the physical
structure
Application
programmers view of
the file is converted
into physical file by
the software
Physical database
structures are
completely separated
from the logical
structures
Schema, subschema and
physical data is
absolutely separate
Programmer uses
simple files
Application
programmer more
concerned with
structures and
addressing
Application
programmer more
concerned with
structures and
addressing
Application programs are
completely insulated
from the data structures
Comparison of Different Systems
Rudimentary I/O
Software
File Storage with
Access method
Database Management Advanced Management
Systems
Entirely sequential
files
Sequential files with
some direct access
Direct access files for
some applications
Direct access to all live
data
Data accessed with
primary key
Data accessed with
primary as well as
secondary keys
Generalized searching
facility
Generalized searching
facility
No attention to data
security
Security procedures
applied to individual
data files
Security procedures
applied to individual
data files
Corporate world wide
security
No tele processing Simple remote inquiry Remote data entry and
inquiry
Complete interactive
processing remotely.
Computer networks and
distributed databases
Different types of DBMS
Hierarchical Model
P1 Nut Red 12 London
S1 Jones 10 Paris 300
S2 Smith 20 London 300
P2 Bolt Green 17 Paris
S1 Jones 10 Paris 300
S2 Smith 20 London 400
S3 Blake 10 Paris 200
P3 Screw Blue 12 London
S1 Jones 10 Paris 300
P4 Screw Green 17 Paris
Hierarchical Model
In a hierarchical model a tree like
structure is maintained
For each parent record, there can be
many siblings
The record at the top is called the root
The root may have any number of
dependants. Each of which may have
lower level dependants.
Drawbacks of Hierarchical model
Insert
It is not possible to insert a supplier who is not supplying a part
currently
Delete
Deleting part information will also delete information about all the
suppliers below it
Update
If you need to update any supplier information, you need to search the
entire database for the same supplier and update all the records for the
same supplier.
Network Model
S1 Smith 20 London
S2 Jones 10 Paris S3 Blake 10 Paris
P1 Nut Red 12 London P2 Bolt Green 17 Paris P3 Screw Red 12 Rome P4 Screw Green 10 Paris
300
200 500 100
300
Network Model
In case of network model, data is
represented by records and links
Many to Many relation can be expressed
easily in a network model
This is not possible in hierarchical model
Records are linked to each other using
connectors
Advantages of Network model
Insert
You can simply insert a new part or supplier without any need to
establish any link
Delete
Deleting a supplier or a part does not delete recursively
Similarly, just by deleting the connector, the link between supplier and
parts can be broken.
Update
Since suppliers and parts occur only once, updating requires only
change to be made to a single record rather than searching and
updating many records
Relational Model
S# SNAME STATUS CITY
S1 Smith 20 London
S2 J ones 10 Paris .
S3 Blake 30 Paris .
S
P# Pname Color Weight City .
P1 Nut Red 12 London .
P2 Screw Red 17 Paris .
P3 Bolt Blue 17 Rome .
P4 Screw Yellow 14 London .
P
S# P# Qty
S1 P1 200
S1 P2 100
S1 P3 300
S1 P4 230
S2 P2 100
SP
Relational Model
In relational model the data is simply
represented in the form of tables
If you compare, these three tables closely
resemble sequential files
Compared to the other two models,
relational is simple to understand
There are no links or pointers which
connect different tables
The model is called relational because it uses
relational algebra to represent and manage
information
More On Relational Model
Each row is called as a tuple
Each column is called as a attribute
Domain
It is a set of permissible values that can be stored in an attribute
This feature is not available in the other models
Relational model provides a set of
operators to the user. Using these
operators, the user can perform any
operations on the tables.
Advantages of Relational Model
Insert
Adding a new supplier or a part is not a problem. They are independent
entities
If you want to represent a relation between supplier and parts, then
insert a tuple in SP table.
Delete
Delete operations are independent of other tables
Update
Updating supplier or part information is very simple
Relational Database Management
Systems (RDBMS)
The basic functionality of RDBMS was
conceptualized by Dr. E F Codd, when he
was working for IBM
He laid down certain principles which govern the functioning of any
RDBMS
In 1974, the first standard of SQL was
also developed
Also C J Date from IBM also contributed
towards standardization of RDBMS
RDBMS Terminology
Relation
It is equivalent of a table
Tuple
It is equivalent of a single row in a relation
Attribute
It is equivalent of a column in a relation
Primary Key
It is a unique identifier which identifies each tuple uniquely
The Relational Data Structure
Each relation is made up of two parts
Intension
It is the fixed part of the relation which contains
the column names
Extension
It is the data part of a relation
The Relational Data Structure
The smallest unit of data in the relational
model is the individual value
Each value is atomic. They dont have any
internal structure as far as the relational
model is concerned
A domain is a set of all possible values
which a attribute can take
Domains are conceptual in nature.
They can be stored in the database as a set of values
Once stored in the database, they can be used in any table definition.
Degree and Cardinality of a relation
The number of attributes in a relation is
called the degree of the relation
The number of tuples in a relation is
called the cardinality of the relation
The cardinality of a relation changes with
more addition of tuples, but the degree
does not
The Relational Data Integrity
Every relation has a Candidate Key. A candidate
key is a key which can uniquely identify a tuple in
a relation with n cardinality
A candidate key should posses the following
characteristic
Uniqueness
Minimality
Every relation has at least one candidate key. This
key is designated as a Primary Key
In case if there are more than one candidate keys,
then the most appropriate one is designated as
the primary key and the rest are called Alternate
Keys.
The Relational Data Integrity
Similarly, there is a concept of Foreign
Key. A foreign key is a set of attributes
from a table, whose values depend on the
primary key of other table.
In our example, in the SP relation, the
values of S# depend on the values from
the S relation.
Two integrity rules in a relational
database
Entity Integrity Rule
No attribute participating in a the primary key of a base relation is
allowed to contain any NULL values.
Referential Integrity Rule
The value of the foreign key must be one of the values of the primary
key (or unique alternate key) from the other table on which it is
dependent OR it may contain NULL values.
CODD RULE
CODD Rule
Information Rule
All information in a relational database including the table names and
the column names is represented by values in tables
Guaranteed Access Rule
By specifying the name of the table, the value of the primary key and the
column name, the data must be made available to the user
Representation of NULL values Rule
Null values are distinct from spaces and zeroes
CODD Rule
Catalog facility
All information about the database should be represented in the form
of a online read only catalog
Language support rule
Some simple language to define and manipulate data. (SQL)
View Updatability Rule
Any view that is theoretically updateable can be updated using a
RDBMS
Insert, Delete and Update
The RDBMS must support insertion, deletion and updation at a table
level. This improves performance since the command act on a set of
records rather than single record at a time
CODD Rule
Physical data independence
Any changes made to the physical structure of the database should not
require changes to be made to the application programs
Logical data independence
Any changes to the tables like addition or removal of columns should
not require the application program to be modified if the program does
not refer to those columns
Integrity Independence
Integrity rules must be definable at the time of creation of the relations
and must be always implemented by the RDBMS once it is defined and
activated
CODD Rule
Distribution Independence
RDBMS should allow the facility to distribute the data on different
locations without any need to change the application programs
Non-Subversion rule
If the RDBMS allows different languages to work one row at a time, it
should ensure that the integrity constraints are not violated
Data Modeling
Keys
A key uniquely identifies a row in a table
There are two types of keys
Intelligent
An intelligent key is based upon some values which are
themselves meaningful
Non Intelligent
An non intelligent key is simply arbitrary value
An intelligent key must be continuously changed to
reflect the data it is based on
An updateable key presents its own problem
A primary key is a key which uniquely identifies rows
from a table
A foreign key is a key which is based on value from
primary key from other table
Data Modeling
Data modeling is a process of building
database structures to store data
This involves doing analysis to find out what
data to store and what is the best method
to store it.
There are several techniques which deal
with data modeling.
We will cover two of the most important
methods
ER Modeling
Normalization
ER Modeling
Entity: It is an object which exists and is
distinguishable from other objects
Employee, Customer, Part, Supplier
Entity Set: A collection of similar objects
grouped together
Employees, Customers, Suppliers
Attributes: Attributes describe the entity.
Primary Key: It is the minimum number of
attributes that uniquely identify an entity
from a entity set
Domain: It is the set of permissible values
which can be entered into a attribute
Simple ER Diagram
SUPPLIER
PARTS
SP
S# P#
S#
P#
QTY
SNAME CITY
ADDRESS
UNIT COLOR
PNAME
S# P#
Relationships
One to One
One to many
Many to Many
Principles of Normalization
It is a process of developing data
structures in a way that reduces data
redundancy and promotes data integrity
There are several stages in normalization.
We will study the following normal stages
1NF
2NF
3NF
Data With repeating Group
Example
Employees working on different projects
Supplier supplying many parts
Fixed information
First Normal Form
Identify the primary key
Establish the relationship of the non
key values with the primary key
Emp# Empname Desig Project# EndDate Hours Client Location
A1 VIVEK MANAGER P1 10/02 300 ABB WORLI
A1 VIVEK MANAGER P2 10/12 100 GE WORLI
A1 VIVEK MANAGER P3 15/03 20 ABB WORLI
A2 PRASAD PROGRAMMER P1 10/02 300 ABB BANGALORE
A2 PRASAD PROGRAMMER P2 10/12 130 GE BANGALORE
A3 VEENA ANALYST P1 10/02 100 ABB WHITEFIELD
A3 VEENA ANALYST P2 10/12 250 GE WHITEFIELD
A4 VIDHYUT PROGRAMMER P4 10/04 100 ABB THANE
Employee_Project
Problems in 1NF
Insertion is a problem
You cannot insert a new employee if he is not assigned a project work
Deletion is a problem
If you delete a employee, all information about the project is lost for
ever
Updation is a problem
Changing the name of the employee or end date of a project is a
problem
All the problem in 1NF is because of
Functional Dependency (or lack of it)
Identify the dependency of all the columns with respect to the primary
key
Second Normal Form
Data redundancy results in 1NF because
of partial functional dependency
We should now try to eliminate partial
functional dependency
This can be done by splitting the table in
1NF into many tables as shown in the
next screen
Second Normal Form
Emp# Empname Desig Location
A1 VIVEK MANAGER WORLI
A2 PRASAD PROGRAMMER BANGALORE
A3 VEENA ANALYST WHITEFIELD
A4 VIDHYUT PROGRAMMER THANE
Project# EndDate Client
P1 10/02 ABB
P2 10/12 GE
P3 15/03 DSP
P4 10/04 ABB
Emp# Project# Hours
A1 P1 300
A1 P2 100
A1 P3 20
A2 P1 300
A2 P2 130
A3 P1 100
A3 P2 250
A4 P4 100
Employee
Project
Emp_Project
Problems and Advantages of 2NF
All the problems which were there in
1NF have been automatically eliminated
Problem in 2NF
Insert: You cannot add a new designation or location information in the
Employee table, if no employee is assigned the designation or location.
Similarly, new client cannot be added, if there is no project from them
Deletion: If you delete a employee, then information about designation
and location is lost
Updation: Updating a designation or a location requires lot of searching
All these problems exist in the tables
because of transitive dependency
Third Normal Form
Third normal form involves removal of
transitive dependency
Transitive columns are ideal candidates
for Foreign Key relationship
Third normal Form
Emp# Empname DesigID LocID
A1 VIVEK D1 L1
A2 PRASAD D2 L2
A3 VEENA D3 L3
A4 VIDHYUT D2 L4
Employee
DesigID Desig Min
Experience
D1 MANAGER
D2 PROGRAMMER
D3 ANALYST
D4 FRESHER
Designations
LocId Location EmpCount
L1 WORLI
L2 BANGALORE
L3 WHITEFIELD
L4 THANE
Locations
Project# EndDate ClientID
P1 10/02 C1
P2 10/12 C2
P3 15/03 C3
P4 10/04 C2
Project
ClientId Client Details Client Address
C1 ABB
C2 GE
C3 DSP
C4 HSBC
Client
Emp# Project# Hours
A1 P1 300
A1 P2 100
A1 P3 20
A2 P1 300
A2 P2 130
Emp_Project
Relational Database Table Design
Problems with a bad database design
Redundant Storage
Update Anomalies
Insertion Anomalies
Deletion Anomalies
A bad database design- A simple flat table
Definition
First proposed by Dr. Codd.
Normalization is the process of breaking down a relation schema
into smaller relations in order to achieve good database design.
To achieve good design, several step by step decomposition rules
are laid out in the form of NORMAL FORMS.
There are 6 Normal forms.
For most relation schema, good design is attained at 3 NF itself.
Take care
While normalizing we also pay attention to two important aspects:
Lossless J oin: There should be no loss of information when tables
are split.
Dependency Preservation: The dependency between the
attributes must be preserved.
1 NF
First Normal Form
There are no duplicated rows in the table.
Each cell is single-valued (i.e., there are no
repeating groups or arrays).
Entries in a column (attribute, field) are of the
same kind
2 NF
Second Normal Form:
A table is in 2NF if it is in 1NF and if all non-key attributes are fully
functionally dependent on all of the key attributes ( or prime
Attribute).
Key attribute or Prime attribute is an attribute that is a part of any
candidate key.
Functional Dependency: Let A and B be two attributes or set of
attributes. B is functionally dependent on A if Bs value is determined
by As value.
A is the determinant.
Full Functional Dependency
Full Functional Dependency (AB) : B is fully functionally dependent
on A, if on removing any one of the attribute x from A, B cannot be
determined.
If B can still be determined on removal of x from A, the B is partially
dependent A.
The relation that resulted from the 1 NF
Student (Stud ID, Student Name, CourseNo, Course
Name, Staff ID, Staff Name, PaperNo, Subject Name,
Grade)
Step 1: Determine the primary key of the above relation.
Stud ID and Staff ID uniquely determine the row.
Step 2: Determine the partial dependency.
Stud ID Student Name, Course No, Course Name
Staff IDStaff Name, Paper No and Subject Name
PaperNoStaff ID
Stud ID, PaperNoGrade
PaperNo Course No
3 NF
Third Normal Form
A table is in 3NF if it is in 2NF and if it has no transitive
dependencies of a non-key attribute on the primary key.
Transitive dependencies (AB, BC):
Let A, B and C be three attributes or set of attributes. If B is
functionally dependent on A and C functionally dependent on B (in
other words C is indirectly dependent on A) then C is transitively
dependent on A.
Let us look at the Student relation that resulted from 2NF
Student ( StudID, Student Name, Course No, Course Name)
Transitive dependency:
StudIDCourse No
Course NoCourse Name
Both the attributes are non-key.
Boyce-Codd Normal Form
A table is in BCNF if every determinant in the table is a candidate
key.
This happens when non-key attribute is a determinant of a key
attribute.
If the table has only one candidate key, then BCNF and 3 NF are
same. Hence BCNF is sometimes referred to as a special case of 3
NF.
However a table that is in 3NF may be in BCNF.
BCNF comes into picture only where there are two candidate keys.
4 NF
Forth Normal Form
A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.
A multi-valued dependency occurs when
a. a table has at least three attributes,
b. two of the attributes are multi-valued, and
c. the values of the multi-valued attributes depend on only one of the
remaining attributes.
5 NF
Fifth Normal Form
A table is in 5NF, also called "Projection-J oin Normal
Form" (PJ NF), if it is in 4NF and if every join
dependency in the table is a consequence of the
candidate keys of the table.
J oin Dependency: A relation if decomposed should
yield to lossless-join decomposition.
5 NF requires that there are no non-trivial join
dependencies that do not follow from key
constraints.
This normal form handles three-way relationships or
greater.
Unlike all the other normal forms for this normal
form there is no set of rules.
DKNF
Domain/Key Normal Form
Another way of ensuring 5 NF is making sure that the table is in
DKNF.
DKNF requires that the database contains no constraints other than
domain constraints and key constraints.
6th Normal Form
A table is in sixth normal form (6NF) if and only if it satisfies no non-
trivial join dependencies at all.
This obviously means that the fifth normal form is also satisfied.
The sixth normal form was only defined when extending the
relational model to take into account the temporal dimension.
1
ER Model
2
Database design- steps
Requirement Analysis
Conceptual Database Design
Logical Database Design
Additional design components like views,
stored procedures, triggers etc.
ER Diagram
Converting ER Diagram
into RDBMS tables etc.
3
ER Model
ER data model allows us to conceptually understand the data of the
real world enterprise which we want to store in terms of their
relationships.
This model is realized by diagrammatically representing the collection
of data and their relationship. There are two established versions for
entity relationship diagram (ERD):
Chens notation and Crows foot notation.
As stated earlier, this is done in conceptual database design phase.
4
Defining Entity
Entity: It is an object that is distinct and unique. Examples: a
student, a keyboard, a college etc.
Entity set: A collection of similar entities is entity set. Examples :
Students, Colleges
Attributes: Properties that describe an entity is attribute. All
entities in the entity set have same attributes (but the values of
the attribute may be different). For example, a student will have
registration number, name and address as attributes.
5
Primary Key
A minimal set of attributes that uniquely identifies an entity is called a
candidate key.
An entity can have many candidate keys. For example, for a student
entity, if we assume that students have unique names, both registration
no. and name are candidate keys.
One among the candidate keys is chosen to be a primary key. A primary
key should not be null.
A composite key is a primary key with more than one attribute.
A super key is a set of attributes which contains the primary key.
6
Representing an entity in Chens notation
Students
regNo
name
address
Courses
streamId
title
noOfSems
Entity Set
Attributes
Primary key
7
Relationship
Relationship is the association between entities or entity sets. For
example between two entities student and course, the relationship is
a course enrolls for many students.
Relationships may or may not also have attributes. For example, if
we were to maintain enrollment date, then the relation enrolls will
have an attribute enrollment-date.
Relationship set is a collection of similar relationships.
8
Binary Relationship Example
Subject
Is Taught
Teacher
teacherId
name address
subId
title
1
1
Relationship Set
Mapping/Connectivity
One to One
9
Binary Relationship Example
Courses
Has
Students
regNo
name address
streamId
title
noOfSems
1
N
Relationship Set
Mapping/Connectivity
One to Many
10
Binary Relationship Example
Customer
Bank Account
acctNo
acctType
balance
custId
name
address
M
N
Relationship Set
Mapping/Connectivity
Many to Many
owns
11
Relationship with attributes
Courses
Has
Students
regNo
name address
streamId
title
noOfSems
1
N
Relationship Set
Mapping/Connectivity
enrollment-date
Attribute of relationship
Binary Relationship
12
Cardinality
Courses
Has
Students
regNo
name address
streamId
title
noOfSems
enrollment-date
(1,35)
(1,6)
Cardinality
Cardinality is used to specify number of entity
occurrences with one occurrence of related entity.
A course can have a minimum of 1 and a
maximum of 35 students. A student can take a
maximum of 6 and a minimum of 1 course.
13
Example of a Ternary relationship
Employees
Departments
Works-In
Location
dateOfJoining
name
deptId
name
empId
address
capacity
locId
N
M
P
14
Example of a Unary relationship
Customer custId
name
address
referred
by
N
1
15
Key Constraints on relationship
Consider a restriction where a student can join only one course. This is an
example of a key constraint.
In the below diagram the directional arrow ensures that there is only one
instance of student in the relationship set Has. In other words, given a
student, has relationship can be uniquely determined.
Courses
Has
Students
regNo
name
address
streamId
title
noOfSems
1
N
enrollment-date
Directional arrow
16
Customer
buy
ArtPiece
title
custId
name
address
1
N
date-of-purchase
Directional arrow
itemId
lastprice
price
17
Participation constraints
Total participation: If all entities in an entity-set need to be in the
relationship set then the entity is in total participation with respect to
relationship.
Example: Every employee works in at least one department. Each
department has at least one employee. Therefore both Employee and
Department entities in Works-In relationship is total.
Partial participation: A participation that is not total is partial .
18
Example of a total participation
Employees
Departments
Works-In
location
dateOfJoining
name
deptId
name
empId
address
capacity
locId
Indicate total participation
or
19
Modeling Attributes
20
Single-Valued Attributes
An attribute which has only one value.
All the attributes that we have seen so far were single valued
attribute.
Employees
name
single-valued attribute
21
Composite Attribute
A composite attribute is an attributes that is made up of more than
one attribute.
For example address can be thought of as having more than one
attribute.
address
house number
street number
colony or area
city or town
district
house number
state
country
Employees
composite attribute
22
Multi-Valued Attribute
A multi-value attribute is an attribute which can have multiple
values.
phone_numbers
Employees
multi-valued attribute
23
Derived Attribute
A derived attribute is an attribute whose value is determined or
computed using existing values of the other attributes (or otherwise also
called stored attribute).
While the value of the composite attribute for a row is computed as
concatenation of the values of a set of attributes of the same row, the
derived attribute can be calculated based on values of attributes in
multiple rows also( example sum of gross_salary of all entities in
Employee entity set)
24
Example
Employee
deductions
gross_salary
net_salary
stored attribute
derived attribute
25
Weak Entities
A weak entity is an entity that cant be uniquely identified by its
own attributes alone, and therefore must use as its primary key
both its own attributes and the primary key of an entity it is related
to.
In other words, a weak entity cannot exist without the entity it is
related to (owner entity).
The relationship between owner entity and weak entity is one-to-
many.
26
Weak Entity example
Assume a shopping application.
A customer orders for many items.
We have 2 entities to maintain customer order.
Orders entity: records the order details like order number, order
date and customer id
Order-Items: records the items pertaining to the order.
Order-Items entity is a weak entity.
27
Orders
Order-Items contains
A weak entity set diagram
itemNo
custId
orderDate
orderNo
noOfPieces
partial key
1
n
contains
Order-Items
or
or
Identifying
relationship
Invariably the entity
will be in total
participation
28
Hierarchy
The entities can be classified as general type and specific type.
IS-A relationship.
For example, an Employee can be a programmer or a manager.
Employee
name empId
IS-A
Programmer
Marketing
Current-Project
Main-Role
area
num-of-contracts
s
p
e
c
i
a
l
i
z
a
t
i
o
n
g
e
n
e
r
a
l
i
z
a
t
i
o
n
IS-A relationship.
29
Redundant relationship problem and
Aggregation
Consider a relationship where employee works on an assignment and
may use machinery.
Employee
name
joiningDate
works
Project
uses
Machinery
machineID
purchaseDate
redundant relationships
empId
hours
30
Aggregation
Solution is to use aggregation. Aggregation is a relationship between
collection of entities and relationships.
Aggregation allows us to indicate that a relationship set participates
in another relationship set.
Use aggregation when we need to have a relationship among
relationship.
31
Employee
name
joiningDate
works
Project
uses
Machinery
machineID
purchaseDate
aggregation
empId
hours
32
Entering Into Relational Model
33
The main construct for relation model rests on the relation (table).
A relational database is a collection of relations.
The degree ( or arity) of a relation is the number of fields in a relation.
The cardinality of a relation is number of tuples in it.
More terms
34
Integrity Constraints
An integrity constraint is a condition or a rule specified on the table
which restricts the kind of data that can be stored in a table.
These rules thereby ensures that data entered in the database is
accurate and valid.
DBMS ensures that the constraints defined in the schema is strictly
adhered to when ever or however data enters into the database.
35
Types of Integrity Constraints
Entity integrity allows no two rows with the same identity in a table.
Primary Key and Unique constraint are used to enforce entity
integrity.
Referential integrity allows only the consistent values for certain fields
across the related tables. Foreign Key is used to enforce referential
integrity.
Domain integrity allows only predefined values for the attributes.
User-defined integrity allows only what you predefine. Example: Null
constraint.
36
Foreign Key
A foreign key is a reference to a primary key in another table.
Therefore referential integrity is ensuring that the foreign key has only
those data values that correspond to the primary key data values of
another table.
DeptName Manager
Marketing EMP100
Training EMP200
Department
ID Name
EMP100 Rama
EMP101 Rahim
EMP200 Raghu
Employee
Foreign key
Primary key
Enforcing referential integrity
37
Other advantages of foreign key
Specifying foreign key also allows us to specify what action should be
taken when the corresponding primary key data is updated or deleted.
Action could be either
NO ACTION: Dont allow update or deletion of the primary whose value is
referenced by the foreign key.
CASCADE: Delete or Update foreign key also when the corresponding
primary key is changed.
SET NULL: set null values to foreign key values when the corresponding
primary key value is changed.
38
Null values and Keys
A column which does not have any value has Null Value.
For attributes which should not have null value, NOT NULL
constraints can be specified.
For a foreign key null value is allowed.
A primary key cannot be null.
Unique key can be null.
39
Translating E-R Diagram into
Relational Model
40
Modeling Entity Set
Model entity sets to tables with primary key constraints.
Courses
streamId
title
noOfSems
streamId title noOfSems
123 B.Tech 8
124 B.E. 8
Courses
41
Modeling Relationship-set
Modeling One-to-One relationship
Modeling One-to-Many relationship
Modeling Many-to-Many relationship
Modeling relationship with attributes
42
One-to-One into Relational Model
Primary Key of any of the entity set can become foreign Key of the
other entity set.
Subject
Is Taught
Teacher
teacherId
name
address
subId
title
1
1
43
One-to-One into Relational Model
subId title teacherId
S101 J ava EMP103
S102 C EMP102
S103 OOPS and UML EMP101
Subject
teacherId name address
EMP101 Mona Lisa XXX YY
EMP102 Lora Nillie AAA BB
EMP103 Peter Patter MMM NN
Teacher
Foreign key
44
One-to-Many into Relational Model
Primary Key of the entity set with one mapping becomes foreign
key of the entity set with many mapping.
Courses
Has
Students
regNo
name address
streamId
title
noOfSems
1
N
45
One-to-Many into Relational Model
streamId title noOfSems
123 B.Tech 8
124 B.E. 8
Courses
regNo name address streamId
ST100 Bill Smith Xxx yyy 123
ST200 Rani Raja Mmm nnn 123
ST202 Neeta Roy Aaa bbb 124
Students
Foreign key
46
Many-to-Many into Relational Model
A new relation for relationship between the two entity sets is created.
This relation will contain the attributes as primary keys of both the
entity sets. Combination of these two attributes become the primary
key of this relation.
Customer
owns
Bank Account
acctNo
balance
custId
name
address
M
N
47
custId name address
C100 Bill Smith Xxx yyy
C200 Rani Raja Mmm nnn
C202 Neeta Roy Aaa bbb
Customer
acctNo balance
4343434 10000
4343435 20000
4343436 25000
BankAccount
custId acctNo
C100 4343434
C200 4343435
C202 4343436
C202 4343434
Owns
Primary Key
48
Relationships with attribute
Modeling relationship set with attributes:
Like entity set, relationship is also translated into table.
The primary key attributes of each participating entity set, as foreign key
fields.
49
Employees
Departments
Works-In
Location
dateOfJoining
name
deptId
name
empId
address
capacity
locId
Relationships with attribute
50
Works-In
empId deptId locId dateOfJ oining
EMP101 MKT BLR1 7-APR-2006
EMP102 TRA BLR2 21-JUN-2006
empId name
EMP101 Mona Lisa
EMP102 Lora Nillie
EMP103 Peter Patter
Employees
deptId name
MKT Marketing
TRA Training
DEV Development
Departments
locId address capacity
BLR1 Banashankari 100
BLR2 MG Road 50
Location
Together form primary key
Foreign key
51
Modeling Key Constraints
Key Constraints is modeled as a Unique constraint.
Courses
Has
Students
regNo
name
address
streamId
title
noOfSems
1
N
enrollment-date
Directional arrow
52
streamId title noOfSems
123 B.Tech 8
124 B.E. 8
Courses
regNo name address
ST100 Bill Smith 8
ST200 Rani Raja 8
ST202 Neeta Roy 6
Students
regNo streamId enrollment-date
ST100 123 8
ST200 124 8
ST100
Has
UNIQUE(regNo)
53
Modeling Participation Constraint
Total participation constraint is modeled as foreign key with
CASCADE on deletion or updation or NO ACTION is specified.
Partial participation constraint is modeled as foreign key with SET
NULL on deletion or updation.
54
Employees
Departments
Works-In
location
dateOfJoining
name
deptId
name
empId
address
capacity
locId
55
Works-In
empId deptId locId dateOfJ oining
EMP101 MKT BLR1 7-APR-2006
EMP102 TRA BLR2 21-JUN-2006
empId name
EMP101 Mona Lisa
EMP102 Lora Nillie
EMP103 Peter Patter
Employees
deptId name
MKT Marketing
TRA Corporate Training
DEV Development
Departments
locId address capacity
BLR1 Banashankari 100
BLR2 MG Road 50
Location
Foreign key with
CASCADE on delete
or update
COR
COR
56
Modeling Weak Entity
Weak entity is modeled as foreign key with CASCADE on deletion
or updation with NOT NULL constraint (if the foreign key attribute is
not part-key).
57
Modeling Hierarchy
Each of entity set in the hierarchy is modeled as distinct relation.
The primary key of the child entity set is same as the primary key of
the parent entity set.
The primary key of the child entity set is also foreign key referencing
to its parent entity set.
The foreign key must be with cascade delete and update constraint.
58
Employee
name empId
IS-A
Programmer
Marketing
Current-Project
Main-Role
area
contracts
EmpId Name
EMP1 K.Narayanan
EMP2 M.J oshna
Employee
EmpId Current-Project Main-Role
EMP1 ABC Designing
EMP2 XYZ Coding
Programmer
EmpId contracts area
EMP1 XXX Training
Marketing
Foreign key
Foreign key
59
Modeling Aggregation
The aggregation relationship contains the primary key attributes as
combination of all keys of the entity sets and relationships it is linked
with.
The key attributes are also the foreign key.
60
Employee
name
joiningDate
works
Project
uses
Machinery
machineID
purchaseDate
aggregation
empId
hours
Uses(empId, pid, machineID, hours)
pid
startDate
Primary Key
Foreign Key
61
Question for you?
How will you model unary relationship into relational model?
Customer custId
name
address
referred
by
62
Modeling composite attribute
Each of the attributes of the composite attribute is modeled as columns
of the table.
The composite attribute does not appear as a column in the table. Its
value is obtained by concatenating the values of the attributes. It appears
as a column in the view.
address
house number
street number
colony or area
city or town
district
state
country
Employees
composite attribute
empId
63
EmpId house
no
street
number
area or
town
dist state count
ry
EMP1 12 8
th
main MG rd. BLR Karnataka India
EMP2 50 2
nd
Main MG rd. BLR Karnataka India
Employee
EmpId Address
EMP1 12, 8
th
main, MG rd., BLR, Karnataka, India
EMP2 50, 2
nd
main, MG rd., BLR, Karnataka, India
EmployeeAddressView
64
Modeling Multi-Valued Attribute
A table is created for multi-value attribute containing the primary key
of the entity, multi-value attribute and its corresponding attributes.
phone_numbers
Employees
multi-valued attribute
empId
name
65
EmpId Name
EMP1 Nelson Mac
EMP2 Kate Dell
Employee
EmpId Phone_Numbers
EMP1 9831610763
EMP1 54783990
EMP2 9831688789
EMP2 5575390
PhoneNumbers
Table for multi-valued attribute
66
Modeling Derived Attribute
Like composite attribute, derived attribute is also implemented
as a view.
Employee
deductions
gross_salary
net_salary
derived attribute
empId
67
EmpId Gross_Salary Deduction
EMP1 20000 2000
EMP2 15000 1000
Employee
EmpId Net_Salary
EMP1 18000
EMP2 14000
EmployeeNetSalaryView
derived attribute
Gross_Salary - Deduction
Introduction to
Database Design
Entity - Relationship Model
A logical design method which emphasizes
simplicity and readability.
Basic objects of the model are:
Entities
Relationships
Attributes
Entities
Data objects detailed by the information in the
database.
Denoted by rectangles in the model.
Employee Department
Attributes
Characteristics of entities or relationships.
Denoted by ellipses in the model.
Name SSN
Employee Department
Name Budget
Relationships
Represent associations between entities.
Denoted by diamonds in the model.
Name SSN
Employee Department
Name Budget
works in
Start date
Relationship Connectivity
Constraints on the mapping of the associated
entities in the relationship.
Denoted by variables between the related entities.
Generally, values for connectivity are expressed as one or
many
Name SSN
Employee Department
Name Budget
work
1 N
Start date
Connectivity
Department Manager
has
1 1
Department Project
has
N 1
Employee Project
works on
N M
one-to-one
one-to-many
many-to-many
ER example
Volleyball coach needs to collect information
about his team.
The coach requires information on:
Players
Player statistics
Games
Sales
Team Entities & Attributes
Players - statistics, name, start date, end date
Games - date, opponent, result
Sales - date, tickets, merchandise
Players
Sales
Start date End date
Statistics Name
tickets merchandise
Games
opponent date result
Team Relationships
Identify the relationships.
The player statistics are recorded at each game
so the player and game entities are related.
For each game, we have multiple players so the relationship is
one-to-many
Players
Games
N
1
play
Team Relationships
Identify the relationships.
The sales are generated at each game so the
sales and games are related.
We have only 1 set of sales numbers for each game, one-to-one.
Games Sales
generates
1 1
Team ER Diagram
Players
Games
Sales
play
generates
N 1
1
1
Start date End date Statistics
Name
tickets merchandise
opponent date result
Logical Design to Physical Design
Creating relational SQL schemas from entity-
relationship models.
Transform each entity into a table with the key and its
attributes.
Transform each relationship as either a relationship
table (many-to-many) or a foreign key (one-to-many
and many-to-many).
Entity tables
Transform each entity into a table with a key and
its attributes.
Name SSN
Employee
create table employee
(emp_no number,
name varchar2(256),
ssn number,
primary key (emp_no));
Foreign Keys
Transform each one-to-one or one-to-many relationship
as a foreign key .
Foreign key is a reference in the child (many) table to the primary
key of the parent (one) table.
create table employee
(emp_no number,
dept_no number,
name varchar2(256),
ssn number,
primary key (emp_no),
foreign key (dept_no) references department);
Employee
Department
has
1
N
create table department
(dept_no number,
name varchar2(50),
primary key (dept_no));
Foreign Key
dept_no Name
1 Accounting
2 Human Resources
3 IT
emp_no dept_no Name
1 2 Nora Edwards
2 3 Ajay Patel
3 2 Ben Smith
4 1 Brian Burnett
5 3 John O'Leary
6 3 Julia Lenin
Department
Employee
Accounting has 1 employee:
Brian Burnett
Human Resources has 2 employees:
Nora Edwards
Ben Smith
IT has 3 employees:
Ajay Patel
J ohn OLeary
J ulia Lenin
Many-to-Many tables
Transform each many-to-many relationship as a table.
The relationship table will contain the foreign keys to the related
entities as well as any relationship attributes.
create table proj_has_emp
(proj_no number,
emp_no number,
start_date date,
primary key (proj_no, emp_no),
foreign key (proj_no) references project
foreign key (emp_no) references employee);
Employee
Project
has
N
M
Start date
Many-to-Many tables
emp_no dept_no Name
1 2 Nora Edwards
2 3 Ajay Patel
3 2 Ben Smith
4 1 Brian Burnett
5 3 John O'Leary
6 3 Julia Lenin
Project
Employee
proj_has_emp
proj_no Name
1 Employee Audit
2 Budget
3 Intranet
proj_no emp_no start_date
1 4 4/7/03
3 6 8/12/02
3 5 3/4/01
2 6 11/11/02
3 2 12/2/03
2 1 7/21/04
Employee Audit has 1 employee:
Brian Burnett
Budget has 2 employees:
J ulia Lenin
Nora Edwards
Intranet has 3 employees:
J ulia Lenin
J ohn OLeary
Ajay Patel
Normalization
A logical design method which minimizes data
redundancy and reduces design flaws.
Consists of applying various normal forms to
the database design.
The normal forms break down large tables into
smaller subsets.
First Normal Form (1NF)
Each attribute must be atomic
No repeating columns within a row.
No multi-valued columns.
1NF simplifies attributes
Queries become easier.
1NF
Employee (unnormalized)
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Employee (1NF)
Second Normal Form (2NF)
Each attribute must be functionally dependent on
the primary key.
Functional dependence - the property of one or more
attributes that uniquely determines the value of other
attributes.
Any non-dependent attributes are moved into a
smaller (subset) table.
2NF improves data integrity.
Prevents update, insert, and delete anomalies.
Functional Dependence
Name, dept_no, and dept_name are functionally dependent on
emp_no. (emp_no -> name, dept_no, dept_name)
Skills is not functionally dependent on emp_no since it is not unique
to each emp_no.
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Employee (1NF)
2NF
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Employee (1NF)
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D
Employee (2NF)
emp_no skills
1 C
1 Perl
1 Java
2 Linux
2 Mac
3 DB2
3 Oracle
3 Java
Skills (2NF)
Data Integrity
Insert Anomaly - adding null values. eg, inserting a new department does not
require the primary key of emp_no to be added.
Update Anomaly - multiple updates for a single name change, causes
performance degradation. eg, changing IT dept_name to IS
Delete Anomaly - deleting wanted information. eg, deleting the IT department
removes employee Barbara J ones from the database
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
Employee (1NF)
Third Normal Form (3NF)
Remove transitive dependencies.
Transitive dependence - two separate entities exist
within one table.
Any transitive dependencies are moved into a smaller
(subset) table.
3NF further improves data integrity.
Prevents update, insert, and delete anomalies.
Transitive Dependence
Dept_no and dept_name are functionally dependent on
emp_no however, department can be considered a
separate entity.
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D
Employee (2NF)
3NF
emp_no name dept_no dept_name
1 Kevin Jacobs 201 R&D
2 Barbara Jones 224 IT
3 Jake Rivera 201 R&D
Employee (2NF)
emp_no name dept_no
1 Kevin Jacobs 201
2 Barbara Jones 224
3 Jake Rivera 201
Employee (3NF)
dept_no dept_name
201 R&D
224 IT
Department (3NF)
Other Normal Forms
Boyce-Codd Normal Form (BCNF)
Strengthens 3NF by requiring the keys in the
functional dependencies to be superkeys (a column or
columns that uniquely identify a row)
Fourth Normal Form (4NF)
Eliminate trivial multivalued dependencies.
Fifth Normal Form (5NF)
Eliminate dependencies not determined by keys.
Normalizing our team (1NF)
players
games sales
game_id date opponent result
34 6/3/05 Chicago W
35 6/8/05 Seattle W
40 6/15/05 Phoenix L
42 6/20/05 LA W
sales_id game_id merch tickets
120 34 5000 25000
122 35 4500 30000
125 40 2500 15000
126 42 6500 40000
player_id game_id name start_date end_date aces blocks spikes digs
45 34 Mike Speedy 1/1/00 12 3 20 5
45 35 Mike Speedy 1/1/00 10 2 15 4
45 40 Mike Speedy 1/1/00 7 2 10 3
78 42 Frank Newmon 5/1/05
102 34 Joe Powers 1/1/02 7/1/05 8 6 18 10
102 35 Joe Powers 1/1/02 7/1/05 10 8 24 12
103 42 Tony Tough 1/1/05 15 10 20 14
Normalizing our team (2NF & 3NF)
players
games sales
player_stats
player_id name start_date end_date
45 Mike Speedy 1/1/00
78 Frank Newmon 5/1/05
102 Joe Powers 1/1/02 7/1/05
103 Tony Tough 1/1/05
game_id date opponent result
34 6/3/05 Chicago W
35 6/8/05 Seattle W
40 6/15/05 Phoenix L
42 6/20/05 LA W
sales_id game_id merch tickets
120 34 5000 25000
122 35 4500 30000
125 40 2500 15000
126 42 6500 40000
player_id game_id aces blocks spikes digs
45 34 12 3 20 5
45 35 10 2 15 4
45 40 7 2 10 3
102 34 8 6 18 10
102 35 10 8 24 12
103 42 15 10 20 14
Revisit team ER diagram
games sales
generates
1 1
tickets merchandise
opponent date result
player_stats
tracked
Recorded
by
1
N
N
aces blocks digs
players
Start date End date Name
1
spikes
Star Schemas
Designed for data retrieval
Best for use in decision support tasks such as Data
Warehouses and Data Marts.
Denormalized - allows for faster querying due to less
joins.
Slow performance for insert, delete, and update
transactions.
Comprised of two types tables: facts and dimensions.
Fact Table
The main table in a star schema is the Fact table.
Contains groupings of measures of an event to be
analyzed.
Measure - numeric data
Invoice Facts
units sold
unit amount
total sale price
Dimension Table
Dimension tables are groupings of descriptors
and measures of the fact.
descriptor - non-numeric data
Customer Dimension
cust_dim_key
name
address
phone
Time Dimension
time_dim_key
invoice date
due date
delivered date
Location Dimension
loc_dim_key
store number
store address
store phone
Product Dimension
prod_dim_key
product
price
cost
Star Schema
The fact table forms a one to many relationship with each
dimension table.
Customer Dimension
cust_dim_key
name
address
phone
Time Dimension
time_dim_key
invoice date
due date
delivered date
Location Dimension
loc_dim_key
store number
store address
store phone
Product Dimension
prod_dim_key
product
price
cost
Invoice Facts
cust_dim_key
loc_dim_key
time_dim_key
prod_dim_key
units sold
unit amount
total sale price
1
1
1
1
N
N N
N
Analyzing the team
Team Facts
date
merchandise
tickets
The coach needs to analyze how the team
generates income.
From this we will use the sales table to create our fact
table.
Team Dimension
Player Dimension
player_dim_key
name
start_date
end_date
aces
blocks
spikes
digs
We have 2 dimensions for the schema:
player and games.
Game Dimension
game_dim_key
opponent
result
Team Star Schema
Player Dimension
player_dim_key
name
start_date
end_date
aces
blocks
spikes
digs
Team Facts
player_dim_key
game_dim_key
date
merchandise
tickets
1
N
Game Dimension
game_dim_key
opponent
result
1
N