Info Sys 222 Notes
Info Sys 222 Notes
Info Sys 222 Notes
Example (Amazon.com)
- Tries to predict other items a customer may want to purchase based on
whats in their shopping cart and the purchasing behaviors of other
customers based on historical data influence buyer behaviour
Infosys 222
Why Data Matters?
- https://www.youtube.com/watch?v=f2Kji24833Y
Essential features of Information
- Timely
- Accurate
- Complete
Exercise
- Example of using Data/information for decision making in a specific Industry
Metadata
Infosys 222
-
Infosys 222
All general purpose DBMS are based on the relational data model. This
means that all data is stored in a number of tables (with named columns)
For historical, mathematical reasons such tables are referred to as
relations
The tables show data together with relationships between the data
Enables users to view data logically as two-dimensional structure composed
of rows and columns
This course is solely on relational database, and on relational DBMS
Relational Database
- A precise, conceptual way of describing the data stored in a relational
database
o Structure of the data
o Operations on the data
o Constraints on the data
Relation (Table)
- Stores data on individual things, which are considered important
o People (Employee, Student, Staff Member)
o Objects (Book, Product, Lecture Room)
o Concepts / Actions (Transaction in an ATM, Borrowing a book from the
library)
Structure of the data Relation (Table)
- A relation consists of rows and columns
- The column header will describe the data
- The number of columns are fixed - definite number
- The number of rows are not fixed indefinite number
- Each intersection between a row and column (cell) contains a single item
of data
- Each row will describe a single instance of the data
Example Relation
- Book
Infosys 222
o Each row should describe a single book using the column headers
o This makes each row a record providing information about a single book
- Student
o (possible columns header for storing a students identity information)
Exercise
Tuple (Row)
- A relation consists of tuples (Rows)
- A tuple is an orders list of values
- Tuples are usually written in parentheses, with commas separating the
values (or components)
o Example: Employee Relation (7369, SMITH, M, Technician)
- Order is significant
o Example: the tuple (7369, Technician, SMITH, M,) is different from the
tuple above
Attribute (Column/Field)
- In order to be able to refer to the different components in a tuple, we will
assign them names (called attributes)
o Example: For the tuple (7369, SMITH, Male, Technician), we might
choose the attributes ID, Name, Gender, and JobDescription
Data Type
- The value of an attribute belongs to a domain; also known as a data type
of an attribute
- All attributes must have a data type, but the data types available depend on
the particular DBMS
- Commonly available data type among different implementations
o TEXT for text strings
o INTEGER for integers
o REAL for real numbers
o DATE for dates
Schema
- In the relational data model, a relation is often described using a schema
which consists of
Infosys 222
o The names of the relation
o The set of its attributes (sometimes with data types)
- Example: The relation Employee can be described by the schema
o Employee (ID, Name, Gender, JobDescription)
o Employee(ID INGEGER, Name TEXT, Gender TEXT, JobDescription TEXT)
- The schemas of all relations in a database form a database schema
Relation Instance
- A relation is not static; it changes over time
o Inserting new tuples
o Updating components of existing tuples
o Deleting tuples
- A set of tuples for a relation at a moment is an instance of that relation
- A DBMS maintains the current instance
Key
- An attribute or a set of attributes used to uniquely identify a tuple
o Two employees will not have the same ID
- This unique attribute/ attributes is called the Primary Key
- You can introduce an artificial key, if no suitable attribute/attributes exist
Infosys 222
-
The concept is originally defined by Chen (1976), which has been adopted
and refined by practitioners as the leading method to carry out database
design
- An ER model is a systematic way of describing and defining a business
process. The process is modelled as entity sets that are linked with each
other by relationships that express the dependencies and requirements
between them
- An ER diagram (ERD) is used as a tool for ER modelling, which also provides
a representation of the ER model
ER Modelling and the Relational Data Model
- Entity set -> relation
- Attributes -> attributes
- Relationships -> The connections between the relations
Entity/Relationship Modelling
- ER Modelling is used for conceptual design
o Entity Set: objects or items of interest
o Attributes: facts about, or properties of, an entity. They describe an entity
o Relationships: links between entities
Entity Set / Entities
- Entity set represents object or things of interest
- A general type
o Physical things like students, lecturers, employees, products
o More abstract things like transactions, orders, courses, projects
- Instances of that particular type, are entities
- An entity set should be named with a singular noun
o Related to business characteristics, meaningful and self-documenting
o Unique and concise, readable
Infosys 222
- One or more attributes define the key
Attributes
- Characteristics of entities
- Domain is set of possible values (defined by data type)
- Represented as columns in a database
- Design note
o Name descriptively and meaningfully
o Naming convention camel casing (the first letter of the first word is
lowercase, but subsequent first letters are uppercase)
Types of Attributes
1. Primary Key
2. Simple (Single-valued)
o Cannot be subdivided
Gender, marital status
3. Composite
o Is composed of several component parts
Address: streetNumber, surburb, city, zip code
Name: firstName, lastName
o To model Operational decision reduce redundancy and
inconsistencies, ease of retrieval, usage
Create additional attributes for an entity viable option
Create an entirely new entity Needs Relationships TBC
4. Multi-valued
o Multiple values possible
o Customer entity with a phone attribute
homePhone
officePhone
o facultyMember with a qualification attribute
BSc
MSc
PhD
Infosys 222
To model Operational decision reduce redundancy and
inconsistencies, ease of retrieval, usage
Create additional attributes for an entity not the best option
Create an entirely new entity Needs Relationships TBC
5. Derived
- Values that are calculated from other attributes
o Age: calculated from dateOfBirth
o Ordertotal -> calculated from unitPrice x quantity
- To model
o Normally not stored
o Operational decision resource use, usage of data
Decisions How to model attributes
- How would the data be used
- Take future growth into consideration
- Operational efficiency
o Eliminate inconsistencies
o Reduce redundancy
o
Infosys 222
Primary Key
- The primary key is an attribute or a set of attributes that uniquely identify a
specific instance of an entity
- Every entity in the data model must have a primary key whose values
uniquely identify instances of the entity
- To qualify as a primary key for an entity
o It must have a non-null value for each instances of the entity
o The value must be unique for each instance of an entity
o The values must not change or become null during the life of each entity
instance
Candidate Key
- In some instances, an entity will have more than one attribute that can
serve as a primary key
- Any key or minimum set of keys that could be a primary key us called a
candidate key
- Once candidate keys are identified, choose one, and only one, primary key
for each entity
- Candidate keys which are not chosen as the primary key are known as
alternate keys
- If none of the candidate keys are suitable introduce
- Example
o Publisher Entity From the case description
Publisher Name
Publisher Phone number
May change over time
o Author Entity From the case description
Author Name
Infosys 222
Relationships
- Relationships are an association between two or more entities
o Case description
o Boos can be written by one or more authors
o Authors can also write more than one book
o Publishers publish many books
o One book is published by one publisher
- Relationships have
o A name verb
o A set of entities that participate in them
o Operate in both directions
o A cardinality ratio
o A degree the number of entity sets that participate (most have degree
2)
Infosys 222
Cardinality Ratios
- Each entity in a relationship can participate in zero, one, or more than one
instances of that relationship
- This leds to 3 types of relationship
- Multiplicity
o One to many (1:M)
o Many to many (M:M)
o One to one (1:1)
- Optionality
o Optional or mandatory
One To Many Relationship
Infosys 222
-
Foreign Key
Infosys 222
Infosys 222
An entity of either set can be connected to many entities of the other set
o B1 -> A1, A3
o A1 -> B1, B2, B3
In the initial model
Infosys 222
Infosys 222
Primary key?
Sometimes more than one attribute is required to uniquely identify an entity
A primary key is made up of more than one attribute is known as a
composite key
Infosys 222
Each entity of either entity set is related to at most one entity of the other
set
o E1 -> A1
o E2 -> A3
o E3 -> A4
- An author has one and only one address
- Address -> attribute of employee
Attribute vs Entity
- If we have several addresses per Author, (Home Address and Studio
Address)
o Address must be an entity attributes cannot be multi valued
Infosys 222
Infosys 222
Reading
- ER modelling with crows foot notation
Summary you should
- Be able to arrive at a logical ER model based on a case description
- Now about some key ERD concepts: entities, attributes, keys and
relationships
Infosys 222
Degree of a relationship
- Is the number of entity sets that participate in a relationship
- The three common relationship degrees
1. Unary (degree 1)
2. Binary (degree 2)
3. Ternary (degree 3)
- Higher degree relationships are possible but rarely encountered in practice
Binary Relationships
- Between the instances of two entity sets
- The most common type of relationship encountered in data modelling
Unary Relationship
- Between the instances of a single entity set (recursive relationships)
- Cardinality could be 1:1, 1:M or M:N
Infosys 222
o
o
o
Infosys 222
Ternary Relationships
- Simultaneous relationship among the instances of 3 entity sets
- E.g. Employees with many required skills can be assigned to many projects
o One employee has many skills and is assigned to many projects
o One project includeds many employees with many required skills
o One skill can be possessed by many Employees working in many projects
o THREE M;N relationships
Infosys 222
Surrogate Keys
- Can be substitute single value surrogate keys for large composite keys
Infosys 222
Infosys 222
Infosys 222
Weak Entity
- An entity is considered weak if the existence (of an instance) of that entity
depends on the existence (of an instance) of another entity
- A weak entity can be identified uniquely only by considering the primary key
of another (owner) entity
Infosys 222
-
Owner entity set and weak entity set must participate in a one-to-many
relationship set (one owner, many weak entities
Infosys 222
Non Identifying
o A non- identifying relationship means that a child entity is related to
parent entity but it can be identified independently of the parent entity
o The child item should be kept even though the parent is deleted
Infosys 222
o
Generalization
- Process of defining general entity types from a set of specialised entity
types by identifying their common characteristics
Superset: a generic entity set that has a 1:1 relationship with one or more
subsets
Infosys 222
-
o
Exclusive (Disjoint)
- States that if an instance of a superset is a member of any subset, then it
cannot be a member of more than one subset
o A student is either a Graduate or PostGraduate, not both
o
Infosys 222
Discriminators
- A discriminator is an (optinal) attribute that determines which subtype is
appropriate
- Example: The attribute isGradStudent, which appears in STUDENT on the
prior slide is a discriminator
o Will have a domain of Yes and No
Superset and Subset Identifiers and Inheritance
- The identifier of the super type and all of its subtypes must be identical
- The identifier of the super type becomes the identifier of the related
subtype(s)
- Rename if required
- Inheritance means that the entities in the subtypes inherit the attributes
of the supertype entity class
- Example: Graduate inherits the attributes of Person and Student
Infosys 222
Infosys 222
Infosys 222
Infosys 222
Reading: http://www.inf.unibz.it/~franconi/teaching/2000/ct481/er-modelling/
You should be able to
- Arrive at a logical ER model based on a case description
- Apply the ERD concepts to a database design task
Database Journey
- Conceptual Model
o Entity, Attributes, Relationship (1 to M, M to M and 1 to 1)
Infosys 222
-
Logical Model
o PK, FK, Associative Entity, Weak and Strong Entity,
Generalization/Specialization (Exclusive and Inclusive), Unary
Relationship .
- Physical Model
o Table, Column, Data
Infosys 222
Normalisation
- A process of organizing the fields and tables of a relational database to
minimize redundancy and dependency
- It is a theoretical technique to refine and improve (or even to begin) the
logical data modelling
- The idea is that an entity set (table) should be about a specific topic and
that only those attributes (columns) which support that topic are included
Infosys 222
Data Duplication
- Increases storage and decreases performance
2. Update Anomaly
Infosys 222
3. Delete Anomaly
Infosys 222
Infosys 222
In summary
- Having one entity that serves many purpses introduces many challenges
o Data Duplication
o Data Update Issues
- Need Normalisation
o To minimize duplicate data
o To minimize or avoid data modification issues
Steps of Normalization
- First Normal Form (1NF)
o To remove all multivalued attributes and to define a primary key for a
given data structure
- Second Normal Form (2NF)
o To remove all parial functional dependedcies that exist between a non-key
attribute and part of a primary key for a given data structure with a
composite key
- Third Normal Form (3NF)
o To remove all transitive funcional dependecies that exist between a
nonkey attribute with another non-key attribute for a given data structure
Infosys 222
2. Find a good PK
o Find the repeating group
Infosys 222
3. Make the obivous identifier of the set and the identifier of the repeating
froup a composite primary key
o Obivous identifier Order ID
o Identifier of the repeating group Product ID
o Primary Key Order ID, Product ID
1NF
Order(orderID, orderDate, customerID, customerName,
customerAddress, productID, productDesc, productFinish, unitPrice,
orderedQuantity)
Second Normal Form (2NF)
- Remove all Partial Funcional dependencies
Functional Dependencies
- We say an attribute, B, has a functional dependency on antoher attribute, A,
if for any two records, which have the same value for A, then the values for
B in these two records must be the same
- We illustriate this as: A -> B
Infosys 222
Partial Dependency
- When an attribute B is functinally dependent on an attribute A, and A is a
component of a multipart candidate key
Infosys 222
Infosys 222
-
Infosys 222
Infosys 222
Infosys 222
Notes on Normalization
- If the given data structure in 1NF has a single attribute primary key, then
there is no partial functional dependencies and hence the 1NF is in 2NF
- If the given data structure in 2NF has no transitive functional dependencies
then it is in 3NF
- Derived attributes are not included
The Database Oath
- Every non-key attribtue must provide a fact about the key, the whole key,
and nothing but the key, so help me Codd
o The key refers to 1NF
o The whole key refers to 2NF
o Nothing but the key refers to 3NF
Common Considerations
- Derived attributes
o For the purpose of improving the performance of certain queries, it could
be argued to store selected derived attributes to aviod the ad-hoc
computation among large volume of data
- 1:1 relationship to decompose entity set
o If there are reasons beyond data modelling to physically separate some
attributes from the same entity set into multiple ones (e.g. security), the
physical data model should reflect that
- Denormalisation
Infosys 222
It is not uncommon to reverse the process of normalisation to induce
redundancy and eliminate the number of eneity sets for the purpose of
performane and maintenance of the database
Note: all these considerations must not be taken lightly without reasoning
and weighting between benefits and costs
Infosys 222
Infosys 222
Summary:
You should be able to apply the normalisation concepts to a database design
task