0% found this document useful (0 votes)
4 views449 pages

IDBMS1

The document outlines a course on Database Systems, covering topics such as database management concepts, data modeling, and the relational model. It discusses the advantages and disadvantages of database approaches compared to conventional file handling, and introduces key elements like SQL, normalization, and data integrity. The course aims to provide a comprehensive understanding of database systems, their applications, and the development processes involved.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views449 pages

IDBMS1

The document outlines a course on Database Systems, covering topics such as database management concepts, data modeling, and the relational model. It discusses the advantages and disadvantages of database approaches compared to conventional file handling, and introduces key elements like SQL, normalization, and data integrity. The course aims to provide a comprehensive understanding of database systems, their applications, and the development processes involved.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 449

Introduction to Database Systems

Instructor: Dr. Sohail Khan

1
General Course Outline
• Database Management system Concepts:
• Introduction and history
• conventional file handling versus database.
• Conceptual, Community and user views of data, the interface
between their view
• Data modeling: Hierarchical, network and relational models
(we will study in detail), entities, attributes and relations,
Relationship one-to-one, one-to-N, M to N representations
Bachman diagrams. We will study most commonly used ERDs
rather to explain the above terms.).

2
Course Outline

• The Network Model, CODASYL DBAWG terminology.


Construction and manipulation of such a model (An old
model… no need to go into detail) .
• The relational model in detail. An existing relational
database as an example. Construction and manipulation of a
relational model, High level operators, relational algebra,
relational calculus
• Query by example approach to using relational database.
• Normalization, the need to normalize and the concept of
normal forms up to BCNF.

3
Course Outline

• SQL, the query language


and if we are not short of time then we will also study the
following
Database operational requirements:
• Integrity of data Integrity rules and triggered procedures.
• Security of data, passwords, profiles, statistical databases
problem, recovery from failure, transaction failures and system
failure, two phases commit. Restart facilities.
• Concurrency, locking techniques and time stamping techniques.
Protocols to ease the problem.
• State of the art: Distributed database, database machine.

4
TEXT AND REFERENCE BOOKS

Main text book


Modern Database Management, by Jeffrey A. Hoffer,
Mary B. Prescott, Fred R. McFadden (6th Edition available
in library.. But the newer the better)
Some material will be taken from other sources such as
relational algebra (Material will be provided)

5
Week 1:
Database management concepts

6
Objectives
• Definition of terms
• Explain growth and importance of databases
• Name limitations of conventional file processing
• Identify five categories of databases
• Explain advantages of databases
• Identify costs and risks of databases
• List components of database environment
• Describe evolution of database systems

7
•What is a database?

8
Database
• Database: organized collection of logically related data

• Can you think of some real life example??

9
Application of database

• Banking: all transactions


• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Online retailers: order tracking, customized
recommendations
• Manufacturing: production, inventory, orders, supply
chain
• Human resources: employee records, salaries, tax
deductions
• Databases touch all aspects of our lives

10
•Coming back to the definition of
database, “organized collection
of logically related data”

•the question arises, what is


data???

11
Data and Information
• Data: stored representations of meaningful objects
and events
• Broadly two types:
• Structured: numbers, text, dates
• Unstructured: images, video, documents
• Information: data processed to increase knowledge in
the person using the data
• Metadata: data that describes the properties and
context of user data

12
Data

13
Figure 1-1a Data in context

Context helps users understand data,


gives information
14
Figure 1-1b Summarized data

Graphical displays turn data into useful


information that managers can use for
decision making and interpretation

15
Descriptions of the properties or characteristics of the
data, including data types, field sizes, allowable
values, and data context

16
Flat File databases

• Collection of data in rows and columns in a single file.


• For example a csv file containing names
Addresses and telephone numbers or an excel file

17
Advantages of flat file DBs

• Easy to setup
Doesn’t require expert computer knowledge
Easy to understand
Simple to get information from them. Simple design

18
Disadvantages of File Processing

• Program-Data Dependence
• All programs maintain metadata for each file they use
• Duplication of Data
• Different systems/programs have separate copies of the same data
• Limited Data Sharing
• No centralized control of data
• Lengthy Development Times
• Programmers must design their own file formats
• Excessive Program Maintenance
• 80% of information systems budget

19
Problems with Data Dependency

• Each application programmer must maintain


his/her own data
• Each application program needs to include code
for the metadata of each file
• Each application program must have its own
processing routines for reading, inserting,
updating, and deleting data
• Lack of coordination and central control
• Non-standard file formats

20
Figure 1-3 Old file processing systems at Pine Valley
Furniture Company
Duplicate Data

21
Problems with Data Redundancy

•Waste of space to have duplicate data


•Causes more maintenance headaches
•The biggest problem:
•Data changes in one file could cause
inconsistencies
•Compromises in data integrity

22
SOLUTION:
The DATABASE Approach

•Central repository of shared data


•Data is managed by a controlling agent
•Stored in a standardized, convenient form

Requires a Database Management System (DBMS)

23
SOLUTION:
The DATABASE Approach

24
Database Management System
 A software system that is used to create, maintain, and provide
controlled access to user databases

Order Filing
System

Invoicing Central database


DBMS
System
Contains employee,
order, inventory,
pricing, and
Payroll
customer data
System

DBMS manages data resources like an operating system manages hardware resources

25
Advantages of the Database Approach
• Program-data independence
• Planned data redundancy
• Improved data consistency
• Improved data sharing
• Increased application development productivity
• Enforcement of standards
• Improved data quality
• Improved data accessibility and responsiveness
• Reduced program maintenance
• Improved decision support

26
Costs and Risks of the Database Approach
• New, specialized personnel
• Installation and management cost and complexity
• Conversion costs
• Need for explicit backup and recovery
• Organizational conflict

27
Elements of the Database Approach
• Data models
• Graphical system capturing nature and relationship of data
• Enterprise Data Model–high-level entities and relationships for the
organization
• Project Data Model–more detailed view, matching data structure in
database or data warehouse
• Relational Databases
• Database technology involving tables (relations) representing entities
and primary/foreign keys representing relationships
• Use of Internet Technology
• Networks and telecommunications, distributed databases, client-server,
and 3-tier architectures
• Database Applications
• Application programs used to perform database activities (create, read,
update, and delete) for database users

28
Segment of an Enterprise Data Model

Segment of a Project-Level Data Model

29
One customer
may place many
orders, but each
order is placed by
a single customer
 One-to-many
relationship

30
One order has
many order lines;
each order line is
associated with a
single order
 One-to-many
relationship

31
One product can
be in many
order lines, each
order line refers
to a single
product
 One-to-many
relationship

32
Therefore, one
order involves
many products
and one product is
involved in many
orders

 Many-to-many
relationship

33
Figure 1-4 Enterprise data model for Figure 1-3 segments

34
Figure 1-5 Components of the Database Environment

35
Components of the
Database Environment

• CASE Tools–computer-aided software engineering


• Repository–centralized storehouse of metadata
• Database Management System (DBMS) –software for
managing the database
• Database–storehouse of the data
• Application Programs–software using the data
• User Interface–text and graphical displays to users
• Data/Database Administrators–personnel responsible for
maintaining the database
• System Developers–personnel responsible for designing
databases and software
• End Users–people who use the applications and databases
36
The Range of Database Applications

• Personal databases
• Workgroup databases
• Departmental/divisional databases
• Enterprise database

37
38
Figure 1-6
Typical data
from a
personal
database

39
Figure 1-7 Workgroup database with wireless
local area network

40
Enterprise Database Applications

• Enterprise Resource Planning (ERP)


• Integrate all enterprise functions (manufacturing, finance,
sales, marketing, inventory, accounting, human resources)
• Data Warehouse
• Integrated decision support system derived from various
operational databases

41
Figure 1-8 An enterprise data warehouse

42
Evolution of DB Systems

43
INTRODUCTION TO DATABASE
SYSTEMS

The Database Development Process

1
Previously

• Introduction to Database Systems


• Terminologies
• Flat file systems
• DBMS
• Pros & Cons of Database approach
• Evolution of Database Systems

2
CONTENTS

• System development life cycle


• Prototyping approach
• Roles of individuals
• Three-schema approach
• Two and three-tiered architectures

3
Target CLOs

• CLO 1 Taxonomy Level: C2

• CLO 2 Taxonomy Level: C3

4
You have already studied these in detail in
the software engineering-1 course… Just an
overview here

5
Enterprise Data Model

• First step in database development


• Specifies scope and general content
• Overall picture of organizational data at high level of
abstraction
• Entity-relationship diagram
• Descriptions of entity types
• Relationships between entities
• Business rules

6
Figure 2-1 Segment from enterprise data model

Enterprise data model


describes the high-
level entities in an
organization and the
relationship between
these entities

7
Information Systems Architecture
(ISA)

• Conceptual blueprint for organization’s desired


information systems structure
• Consists of:
• Data (e.g. Enterprise Data Model–simplified ER Diagram)
• Processes–data flow diagrams, process decomposition, etc.
• Data Network–topology diagram (like Fig 1-9)
• People–people management using project management tools
(Gantt charts, etc.)
• Events and points in time (when processes are performed)
• Reasons for events and rules (e.g., decision tables)

8
Information Engineering
• A data-oriented methodology to create and
maintain information systems
• Top-down planning–a generic IS planning
methodology for obtaining a broad understanding
of the IS needed by the entire organization
• Four steps to Top-Down planning:
• Planning
• Analysis
• Design
• Implementation

9
Information Systems Planning
(Table 2-1)
• Purpose–align information technology with
organization’s business strategies
• Three steps:
1. Identify strategic planning factors
2. Identify corporate planning objects
3. Develop enterprise model

10
Identify Strategic Planning Factors (Table 2-2)

• Organization goals–what we hope to accomplish


• Critical success factors–what MUST work in order for us to survive
• Problem areas–weaknesses we now have

11
Identify Corporate Planning Objects (Table 2-3)

• Organizational units–departments
• Organizational locations
• Business functions–groups of business processes
• Entity types–the things we are trying to model for the database
• Information systems–application programs

12
Develop Enterprise Model
• Functional decomposition
• Iterative process breaking system description into finer and finer
detail
• Enterprise data model

• Planning matrixes
• Describe interrelationships
between planning objects

13
Figure 2-2 Example of process decomposition of an
order fulfillment function (Pine Valley Furniture)

Decomposition = breaking
large tasks into smaller tasks
in a hierarchical structure
chart

14
Planning Matrixes
• Describe relationships between planning objects in the
organization
• Types of matrixes:
• Function-to-data entity
• Location-to-function
• Unit-to-function
• IS-to-data entity
• Supporting function-to-data entity
• IS-to-business objective

15
Example business function-to-data
entity matrix (Fig. 2-3)

16
Two Approaches to Database and IS
Development
• SDLC
• System Development Life Cycle
• Detailed, well-planned development process
• Time-consuming, but comprehensive
• Long development cycle

• Prototyping
• Rapid application development (RAD)
• Cursory attempt at conceptual data modeling
• Define database during development of initial prototype
• Repeat implementation and maintenance activities with new
prototype versions

17
Systems Development Life Cycle

Planning

Analysis

Logical Design

Physical Design

Implementation

Maintenance

18
Systems Development Life Cycle
(cont.)
Planning
Planning Purpose–preliminary understanding
Deliverable–request for study
Analysis

Logical Design

Physical Design

Database activity– Implementation


enterprise modeling and
early conceptual data
Maintenance
modeling

19
Systems Development Life Cycle
How to proceed with Planning?
With any SE project, Planning is the key to success

• Define a business strategy.


• What are the business needs and goals of the
organisation. So study and understand the current
information system)
• Develop an Enterprise data model. Also called as
Corporate or domain model.

20
Systems Development Life Cycle
How to proceed with Planning?
With any SE project, Planning is the key to success

• Define a business strategy.


• Feasibility study
• Define scope

21
Planning (Cont)
Scope
• Define the scope of the problem (i.e., what are the problems and
what problems are you going to solve)
• Scope may also include the current and future users of the system
• Avoid scope creep.

22
Systems Development Life Cycle
(cont.)
Purpose–thorough requirements analysis
Planning Deliverable–functional system specifications

Analysis
Analysis

Logical Design

Physical Design

Database activity–Thorough Implementation


and integrated conceptual
data modeling
Maintenance

23
Analysis (Cont)
Requirement Document
• The planning phase leads to the Requirement
document.
• An important document as it form the basis of
design and ultimately the whole system.
• Requirement engineering is major SE area.
• If you have captured requirement correctly then
your system will be sound.
• Disadvantages of water fall method.

24
Analysis (Cont)
In Database
• In database this step lead to the conceptual design
in the form of high level refined ERD (The planning
and analysis usually run in parallel)
• A conceptual model may include a few significant
attributes to augment the definition and
visualization of entities. No effort need be made to
inventory the full attribute population of such a
model

25
Planning (Cont.)
Enterprise Data Model
• First step in database development
• Overall picture of organizational data at high level of
abstraction
• Preliminary Entity-relationship diagram
• Descriptions of entity types
• Relationships between entities
• Business rules

26
Segment from enterprise data model

Enterprise data model


describes the high-
level entities in an
organization and the
relationship between
these entities

27
Systems Development Life Cycle
(cont.)
Purpose–information requirements elicitation
Planning and structure
Deliverable–detailed design specifications
Analysis

Logical Design
Logical Design

Physical Design

Database activity– Implementation


logical database design
(transactions, forms,
Maintenance
displays, views, data
integrity and security)
28
Systems Development Life Cycle
(cont.)
Purpose–develop technology and
Planning organizational specifications
Deliverable–program/data
Analysis structures, technology purchases,
organization redesigns
Logical Design

Physical Design
Physical Design

Database activity– Implementation


physical database design (define
database to DBMS, physical
Maintenance
data organization, database
processing programs)
29
Systems Development Life Cycle
(cont.)
Purpose–programming, testing, training,
Planning installation, documenting
Deliverable–operational programs,
Analysis documentation, training materials

Logical Design

Physical Design

Database activity–
database implementation, Implementation
Implementation
including coded programs,
documentation, Maintenance
installation and conversion

30
Systems Development Life Cycle
(cont.)
Planning Purpose–monitor, repair, enhance
Deliverable–periodic audits
Analysis

Logical Design

Physical Design

Database activity–
database maintenance, Implementation
performance analysis
and tuning, error Maintenance
Maintenance
corrections

31
A little about testing

32
Prototyping Database Methodology

33
Prototyping Database Methodology
(cont.)

34
Prototyping Database Methodology
(cont.)

35
Prototyping Database Methodology
(cont.)

36
Prototyping Database Methodology
(cont.)

37
CASE
• Computer-Aided Software Engineering (CASE)–software tools providing
automated support for systems development
• Three database features:
• Data modeling–drawing entity-relationship diagrams
• Code generation–SQL code for table creation
• Repositories–knowledge base of enterprise information

38
Managing DB Projects: People Involved
• Systems analysts
• Database analysts and data modelers
• Users
• Programmers
• Database and Data administrators
• Other technical experts

39
Database Schema

• Physical Schema
• Physical structures–covered in Chapters 5 and 6
• Conceptual Schema
• E-R models–covered in Chapters 3 and 4
• External Schema
• User Views
• Subsets of Conceptual Schema
• Can be determined from business-function/data entity
matrices
• DBA determines schema for different users

40
Three-schema architecture

Different people
have different
views of the
database…these
are the external
schema

The internal
schema is the
underlying
design and
implementation

41
Developing the three-schema architecture

42
Figure 2-9 Three-tiered client/server database architecture

43
Pine Valley Furniture

Segment of project data model

44
Figure 2-12 Four relations (Pine Valley Furniture)

45
Figure 2-12 Four relations (Pine Valley Furniture) (cont.)

46
Conceptual Design (ERD)

1
Objectives

• Importance of data modeling


• Write good names and definitions for entities, relationships,
and attributes
• Distinguish unary, binary, and ternary relationships
• Model different types of attributes, entities, relationships, and
cardinalities
• Draw E-R diagrams for common business situations
• Convert many-to-many relationships to associative entities
• Model time-dependent data using time stamps

2
SDLC Revisited – Data Modeling is an Analysis
Activity

Purpose – thorough analysis


Project
Project Initiation
Initiation
and
and Planning
Planning
Deliverable – functional system specifications

Analysis
Analysis

Logical Design

Physical Design

Database activity – Implementation

conceptual data modeling


Maintenance

3
Business Rules

• Statements that define or constrain some


aspect of the business
• Assert business structure
• Control/ influence business behavior
• Expressed in terms familiar to end user
• Automated through DBMS software

4
A Good Business Rule is:

• Declarative: what, not how


• Precise: clear, agreed upon meaning
• Atomic: one statement
• Consistent: internally and externally
• Expressible: structured, natural language
• Distinct: Non-redundant
• Business-Oriented: understood by business people

5
A Good Data Name is:
• Related to business, not technical, characteristics
• Meaningful and self-documenting
• Unique
• Readable
• Composed of words from an approved list
• Repeatable

6
Data Definitions
• Explanation of a term or fact
• Term – word or phrase with specific meaning
• Fact – association between two or more terms
• Guidelines for good data definition
• Gathered in conjunction with systems requirements
• Accompanied by diagrams
• Iteratively created and refined
• Achieved by consensus

7
E-R Model Constructs
• Entity instance - person, place, object, event, concept
(often corresponds to a row in a table)
• Entity Type – collection of entities (often corresponds to
a table)
• Attribute - property or characteristic of an entity type
(often corresponds to a field in a table)
• Relationship instance – link between entities
(corresponds to primary key-foreign key equivalencies in
related tables)
• Relationship type – category of relationship…link
between entity types

8
Sample E-R Diagram

9
Sample E-R Diagram
(Alternate notation)

10
Relationship symbols

Entity
symbols Attribute
symbols

A special entity
that is also a
relationship
Relationship
degrees specify
number of
entity types
involved
Relationship
cardinalities
specify how
many of each
entity type is
allowed
11
Basic E-R notation (Alternate
notation in new editions)

Entity
Attribute
symbols
symbols

A special entity
that is also a Relationship
relationship symbols

Relationship
degrees specify
number of
entity types Relationship
involved cardinalities
specify how
many of each
entity type is
allowed
12
Strong and Weak Entity

13
What Should an Entity Be?
• SHOULD BE:
• An object that will have many instances in the database
• An object that will be composed of multiple attributes
• An object that we are trying to model
• SHOULD NOT BE:
• A user of the database system
• An output of the database system (e.g. a report)

14
Inappropriate entities

System user System output

Appropriate entities

15
Attributes

• Attribute - property or characteristic of an entity type


• Classifications of attributes:
• Required versus Optional Attributes
• Simple versus Composite Attribute
• Single-Valued versus Multivalued Attribute
• Stored versus Derived Attributes
• Identifier Attributes

16
Identifiers (Keys)

• Identifier (Key) - An attribute (or combination of attributes) that


uniquely identifies individual instances of an entity type
• Simple Key versus Composite Key
• Candidate Key – an attribute that could be a key…satisfies the
requirements for being a key

17
Characteristics of Identifiers

• Will not change in value


• Will not be null
• No intelligent identifiers (e.g. containing locations or people that might
change)
• Substitute new, simple keys for long, composite keys

18
A composite attribute

An attribute
broken into
component parts

19
Simple key attribute

The key is underlined

20
Composite key attribute

The key is composed


of two subparts

21
Entity with a multivalued attribute (Skill) and derived attribute
(Years_Employed)

What’s wrong with this?

Multivalued:
Derived an employee can have
from date employed and current date more than one skill

22
A composite attribute

An attribute
broken into
component parts

Entity with multivalued attribute (Skill)


and derived attribute (Years_Employed)

Multivalued
an employee can have
Derived
more than one skill
from date
employed and
current date
23
An attribute that is both multivalued and composite

This is an
example of
time-stamping

24
More on Relationships
• Relationship Types vs. Relationship Instances
• The relationship type is modeled as the diamond and
lines (or only line in alternate notation) between entity
types…the instance is between specific entity instances
• Relationships can have attributes
• These describe features pertaining to the association between the
entities in the relationship
• Two entities can have more than one type of
relationship between them (multiple relationships)
• Associative Entity – combination of relationship
and entity

25
26
Relationship types and instances

a) Relationship type

b) Relationship
instances

27
Previously

• Entities
• Entity type
• Entity Instance
• Relationships
• Type
• Instance
• Types of entities
• Degree of relationships
• Cardinalities etc.
• Visio notations and Crow’s foot representation of ERD

28
Degree of Relationships

•Degree of a relationship is the


number of entity types that
participate in it
•Unary Relationship
•Binary Relationship
•Ternary Relationship

29
Degree of relationships

Entities of
One entity two different
related to types related
another of to each other Entities of three
the same different types
entity type related to each
other
30
Cardinality of Relationships

• One-to-One
• Each entity in the relationship will have exactly one
related entity
• One-to-Many
• An entity on one side of the relationship can have many
related entities, but an entity on the other side will have a
maximum of one related entity
• Many-to-Many
• Entities on both sides of the relationship can have many
related entities on the other side

31
Cardinality Constraints

• Cardinality Constraints - the number of instances of one entity that can or


must be associated with each instance of another entity
• Minimum Cardinality
• If zero, then optional
• If one or more, then mandatory
• Maximum Cardinality
• The maximum number

32
33
34
Note: a relationship can have attributes of its own
35
Basic relationship with only maximum cardinalities showing

Mandatory minimum cardinalities

36
Optional cardinalities with unary degree, one-to-one relationship

37
A binary relationship with an attribute

Here, the date completed attribute pertains specifically to the


employee’s completion of a course…it is an attribute of the
relationship

38
Figure 3-12c -- A ternary relationship with attributes

39
Figure 3-13a – A unary relationship with an attribute.
This has a many-to-many relationship

Representing a bill-of -materials structure

40
Entities can be related to one another in more than one way

41
Here,max
cardinality
constraint is 4

42
Multivalued attributes can be represented as relationships

43
Strong vs. Weak Entities, and
Identifying Relationships
• Strong entities
• exist independently of other types of entities
• has its own unique identifier
• represented with single-line rectangle
• Weak entity
• dependent on a strong entity…cannot exist on its own
• does not have a unique identifier
• represented with double-line rectangle
• Identifying relationship
• links strong entities to weak entities
• represented with double line diamond

44
Strong entity Identifying relationship Weak entity

45
Identifying relationship (Implicit)

Strong entity Weak entity

46
Associative Entities
• It’s an entity – it has attributes

• AND it’s a relationship – it links entities together


• When should a relationship with attributes instead be an
associative entity?
• All relationships for the associative entity should be many
• The associative entity could have meaning independent of the other
entities
• The associative entity preferably has a unique identifier, and should also
have other attributes
• The associative entity may participate in other relationships other than the
entities of the associated relationship
• Ternary relationships should be converted to associative entities

47
An associative entity (CERTIFICATE)

Associative entity involves a rectangle with a diamond inside.


Note that the many-to-many cardinality symbols face toward
the associative entity and not toward the other entities

48
49
Unary Many to many relation (BOM): An associative entity

50
An associative entity

This could just be a relationship with


attributes…it’s a judgment call

51
Ternary relationship as an associative entity

52
Ternary relationship as an associative entity

53
Modelling Time Dependent Data

• Database contents vary over time.


• The need to include a time series of data has become essential.
• Due to interest in traceability and reconstruction of a historical picture
• Due to regulatory requirements, such as HIPAA and Sarbanes-Oxley
• the unit price for each product may be changed (cost of material, labor etc.)

• Time stamp
A time value that is associated with a data value, often indicating when
some event occurred that affected the data value

• Example: Product prices over time

54
Modelling time dependent data

• History of price will be maintained as a separate entity and normally it will be an


associative entity.
• History cannot be maintained in each and every relationship.
• Every relationship cannot be M:N relationship.

• if history or a time series of values might ever be desired or required by


regulation, you should consider using an M:N relationship

55
E-R diagram for Pine
Valley Furniture

56
Microsoft Visio
Notation for Pine
Valley Furniture

Different modeling
software tools may have
different notation for the
same constructs

57
Conceptual Design
The Enhanced ER Model

1
Supertypes and Subtypes

◼ Subtype: A subgrouping of the entities in an entity type


which has attributes that are distinct from those in other
subgroupings
◼ Supertype: An generic entity type that has a relationship
with one or more subtypes
◼ Attribute Inheritance:
 Subtype entities inherit values of all attributes of the
supertype
 An instance of a subtype is also an instance of the supertype

2
3
MS Visio notations

Different modeling tools may have different notation for the same
modeling constructs

4
Supertype and subtype representation in Oracle

5
Employee supertype with three subtypes

All employee subtypes


will have emp nbr, name,
address, and date-hired

Each employee subtype


will also have its own
attributes

6
*Advantages of Enhanced ERD
◼ It avoids the need to describe similar
concept more than once thus saving time
for data modeller.
◼ It results in more readable and better-
looking E-R diagrams
◼ Add more information to the design in a
concise form.

*Database Systems: Concepts, Design and Applications By S. K. Singh 7


Relationships and Subtypes
◼ Relationships at the supertype level
indicate that all subtypes will participate in
the relationship
◼ The instances of a subtype may
participate in a relationship unique to that
subtype. In this situation, the relationship
is shown at the subtype level

8
Supertype/subtype relationships in a hospital
Both outpatients and
resident patients are
cared for by a
responsible physician

Only resident patients are


assigned to a bed

9
When to use Supertypes/subtypes
Relationships.
◼ There are attributes that apply to some but
not all of the instances of an entity type.
E.g. Employee entity
◼ The instances of a subtype participate in a
relationship unique to that subtype.

10
Generalization and
Specialization
◼ Generalization: The process of defining
a more general entity type from a set of
more specialized entity types. BOTTOM-
UP
◼ Specialization: The process of defining
one or more subtypes of the supertype,
and forming supertype/subtype
relationships. TOP-DOWN
11
Example of generalization
Three entity types: CAR, TRUCK, and MOTORCYCLE

All these types


of vehicles
have common
attributes

12
Generalization to VEHICLE supertype

So we put
the shared
attributes in
a supertype

Note: no subtype for motorcycle, since it has no unique attributes


13
Example of specialization
Entity type PART
Applies only to purchased parts

Only applies to
manufactured
parts

14
Specialization to MANUFACTURED PART and PURCHASED PART

Created 2 subtypes

Note: multivalued attribute was replaced by a relationship to another entity

15
Specialization to MANUFACTURED PART and PURCHASED PART

Note: Associative entity from supplies relationship to accommodate the unit price
attribute
16
Constraints in Supertype/ Completeness
Constraint

◼ Completeness Constraints: Whether an instance


of a supertype must also be a member of at least
one subtype
 Total Specialization Rule: Yes (double line)
 Partial Specialization Rule: No (single line)

17
– Examples of completeness constraints
Total specialization rule

A patient must be either


an outpatient or a
resident patient

18
– Partial specialization rule

A vehicle could be a car,


a truck, or neither

19
Constraints in Supertype/ Disjointness
constraint

◼ Disjointness Constraints: Whether an


instance of a supertype may simultaneously be a
member of two (or more) subtypes
 DisjointRule: An instance of the supertype can be
only ONE of the subtypes
 Overlap Rule: An instance of the supertype could be
more than one of the subtypes

20
Examples of disjointness constraints
Disjoint rule

A patient can either be outpatient


or resident, but not both

21
Overlap rule

A part may be both


purchased and
manufactured

22
Constraints in Supertype/ Subtype
Discriminators
◼ Subtype Discriminator: An attribute of the
supertype whose values determine the target
subtype(s)
 Disjoint – a simple attribute with alternative values to
indicate the possible subtypes
 Overlapping – a composite attribute whose subparts
pertain to different subtypes. Each subpart contains a
boolean value to indicate whether or not the instance
belongs to the associated subtype

23
Introducing a subtype discriminator (disjoint rule)

A simple attribute with


different possible values
indicating the subtype

24
Subtype discriminator (overlap rule)

A composite attribute
with sub-attributes
indicating “yes” or “no”
to determine whether it
is of each subtype

25
26
EER based on Question 1

27
Defining supertype/subtype
hierarchies

◼ It is possible for the subtype to have other


subtypes defined on it.

◼ A supertype and its subtypes and their


subtypes, and so on, is called a type
hierarchy.

28
29
Entity Clusters
◼ EER diagrams are difficult to read when
there are too many entities and
relationships
◼ Solution: group entities and relationships
into entity clusters
◼ Entity cluster: set of one or more entity
types and associated relationships
grouped into a single abstract entity type
30
Related
groups of
entities could
become
clusters

31
EER diagram of PVF entity clusters

More readable,
isn’t it?

32
Exercise

33
End of Lecture

34
Logical Database Design
and Relational Data model

1
What is a relational data model.
 Represent data in the form of tables.
 First introduced by E.F. Codd of IBM in
1970.
 This model consists of three
components
i. Data structure (table)
ii. Data Manipulation (SQL)
iii. Data Integrity (Constraints)

2
Relation
 Definition: A relation is a named, two-dimensional table of
data
 Table consists of rows (records) and columns (attribute or
field)
 Requirements for a table to qualify as a relation:
 It must have a unique name
 Every attribute value must be atomic (not multivalued, not
composite)
 Every row must be unique (can’t have two rows with exactly the
same values for all their fields)
 Attributes (columns) in tables must have unique names
 The order of the columns must be irrelevant
 The order of the rows must be irrelevant

3
Is the following a relation?

4
Alternate Representation

5
Is the following a relation?

6
Correspondence with E-R
Model
 Relations (tables) correspond with entity types
and with many-to-many relationship types
 Rows correspond with entity instances and with
many-to-many relationship instances
 Columns correspond with attributes

 NOTE: The word relation (in relational


database) is NOT the same as the word
relationship (in E-R model)
7
Some terms of a relation.
 Degree: The total number of columns or
attributes
 Cardinality: The total number of rows present in
a table at any one time is known as the
cardinality of the table.
 Snapshot or instance: The content of table at
any particular point in time
 Schema: Used to describe the structure of the
table

8
Schema description
 Can be described by:
 Short
text statements
NAME_OF_RELATION(List of Attributes)

 Graphical Representation

9
Key Fields
 Keys are special fields that serve two main purposes:
 Primary keys are unique identifiers of the relation in question.
Examples include employee numbers, social security numbers,
etc. This is how we can guarantee that all rows are unique
 Foreign keys are identifiers that enable a dependent relation
(on the many side of a relationship) to refer to its parent relation
(on the one side of the relationship)
 Keys can be simple (a single field) or composite (more
than one field)
 Keys usually are used as indexes to speed up the
response to user queries

10
Schema for four relations (Pine Valley Furniture Company)

Primary Key
Foreign Key
(implements 1:N relationship
between customer and order)

Combined, these are a composite


primary key (uniquely identifies the
order line)…individually they are
foreign keys (implement M:N
relationship between order and product)

11
Instance of relation schema (Pine Valley Furniture Company)

12
Integrity Constraints
 Domain Constraints
 For each column of a table there is a set of
possible values called its domain. (Similar to
a data type in a programming language).
 According to Domain constraint: All the
values that appear in a column of a relation
must be taken from the same domain.

13
Domain definitions enforce domain integrity constraints

14
Integrity Constraints
 Null is a value that may be assigned to an
attribute when no other values applies or
the applicable value is unknown.
 Entity Integrity
 Is designed to assure that every relation has
a primary key and
 No primary key attribute may be null. All
primary key fields MUST have data

15
Integrity Constraints
 Referential Integrity–rule states that any foreign key value (on
the relation of the many side) MUST match a primary key value
in the relation of the one side. (Or the foreign key can be null)
 Enables consistency between the rows of two tables.
 For example: Delete Rules
 Restrict–don’t allow delete of “parent” side if related rows exist in
“dependent” side
 Cascade–automatically delete “dependent” side rows that correspond
with the “parent” side row to be deleted
 Set-to-Null–set the foreign key in the dependent side to null if deleting
from the parent side  not allowed for weak entities

16
Figure 5-5
Referential integrity constraints (Pine Valley Furniture)

Referential
integrity
constraints are
drawn via arrows
from dependent to
parent table

17
Figure 5-6 SQL table definitions

Referential
integrity
constraints are
implemented with
foreign key to
primary key
references

18
Logical Database Design
 What do we come up with after the end of
database analysis phase?

19
Logical Design
 At the end of analysis phase System and
database analyst get a clear
understanding of the data storage and
access requirement.
 However, avoids any specific database
technology.
 During logical design, conceptual model is
transformed in to a specific database
model.
20
Transforming EER Diagrams
into Relations
Mapping Regular Entities to Relations
1. Simple attributes: E-R attributes map directly
onto the relation
2. Composite attributes: Use only their simple,
component attributes
3. Multivalued Attribute–Becomes a separate
relation with a foreign key taken from the
superior entity

21
Figure 5-8 Mapping a regular entity

(a) CUSTOMER
entity type with
simple
attributes

(b) CUSTOMER relation

22
Figure 5-9 Mapping a composite attribute

(a) CUSTOMER
entity type with
composite
attribute

(b) CUSTOMER relation with address detail

23
Figure 5-10 Mapping an entity with a multivalued attribute
(a)

Multivalued attribute becomes a separate relation with foreign key


(b)

One–to–many relationship between original entity and new relation


24
Transforming EER Diagrams
into Relations (cont.)
Mapping Weak Entities
Becomes a separate relation with a
foreign key taken from the superior
entity
Primary key composed of:
 Partial identifier of weak entity
 Primary key of identifying relation (strong
entity)
25
Figure 5-11 Example of mapping a weak entity

a) Weak entity DEPENDENT

26
Figure 5-11 Example of mapping a weak entity (cont.)

b) Relations resulting from weak entity

NOTE: the domain constraint


for the foreign key should
NOT allow null value if
DEPENDENT is a weak
entity

Foreign key

Composite primary key

27
Transforming EER Diagrams
into Relations (cont.)
Mapping Binary Relationships
 One-to-Many–Primary key on the one side
becomes a foreign key on the many side
 Many-to-Many–Create a new relation with
the primary keys of the two entities as its
primary key
 One-to-One–Primary key on the mandatory
side becomes a foreign key on the optional
side
28
Figure 5-12 Example of mapping a 1:M relationship
a) Relationship between customers and orders

Note the mandatory one

b) Mapping the relationship

Again, no null value in the


foreign key…this is because
of the mandatory minimum
cardinality
Foreign key

29
Figure 5-13 Example of mapping an M:N relationship
a) Completes relationship (M:N)

The Completes relationship will need to become a separate relation

30
Figure 5-13 Example of mapping an M:N relationship (cont.)
b) Three resulting relations

Composite primary key

Foreign key New


Foreign key
intersection
relation

31
Figure 5-14 Example of mapping a binary 1:1 relationship
a) In_charge relationship (1:1)

Often in 1:1 relationships, one direction is optional.

32
Figure 5-14 Example of mapping a binary 1:1 relationship (cont.)
b) Resulting relations

Foreign key goes in the relation on the optional side,


Matching the primary key on the mandatory side

33
Transforming EER Diagrams
into Relations (cont.)
Mapping Associative Entities
Identifier Not Assigned
 Default primary key for the association
relation is composed of the primary keys of
the two entities (as in M:N relationship)
Identifier Assigned
 It
is natural and familiar to end-users
 Default identifier may not be unique

34
Figure 5-15 Example of mapping an associative entity
a) An associative entity

35
Figure 5-15 Example of mapping an associative entity (cont.)
b) Three resulting relations

Composite primary key formed from the two foreign keys

36
Figure 5-16 Example of mapping an associative entity with
an identifier
a) SHIPMENT associative entity

37
Figure 5-16 Example of mapping an associative entity with
an identifier (cont.)
b) Three resulting relations

Primary key differs from foreign keys

38
Transforming EER Diagrams
into Relations (cont.)
Mapping Unary Relationships
 One-to-Many–Recursive foreign key in the
same relation
 Many-to-Many–Two relations:
 One for the entity type

 One for an associative relation in which the


primary key has two attributes, both taken
from the primary key of the entity

39
Figure 5-17 Mapping a unary 1:N relationship

(a) EMPLOYEE entity with


unary relationship

(b) EMPLOYEE
relation with
recursive foreign
key

40
Transforming EER Diagrams
into Relations (cont.)

Mapping Unary Relationships

Many-to-Many–Two relations:
• One for the entity type
• One for an associative relation in which
the primary key has two attributes, both
taken from the primary key of the entity

41
Mapping a unary M:N relationship

(b) ITEM and


COMPONENT
relations

42
Transforming EER Diagrams
into Relations (cont.)
Mapping Ternary (and n-ary)
Relationships
One relation for each entity and
one for the associative entity
Associative entity has foreign keys
to each entity in the relationship

43
Mapping a ternary relationship

a) PATIENT TREATMENT Ternary relationship with


associative entity

44
Figure 5-19 Mapping a ternary relationship (cont.)

b) Mapping the ternary relationship PATIENT TREATMENT

Remember This is why But this makes a It would be


that the treatment date very better to create a
primary key and time are cumbersome surrogate key
MUST be included in the key… like Treatment#
unique composite
primary key
45
Transforming EER Diagrams
into Relations (cont.)
Mapping Supertype/Subtype Relationships
 One relation for supertype and for each subtype

 Supertype attributes (including identifier and


subtype discriminator) go into supertype relation

 Subtype attributes go into each subtype;


primary key of supertype relation also becomes
primary key of subtype relation

46
Figure 5-20 Supertype/subtype relationships

47
Figure 5-21
Mapping Supertype/subtype relationships to relations

48
Well-Structured Relations
 A relation that contains minimal data redundancy and
allows users to insert, delete, and update rows
without causing data inconsistencies
 Goal is to avoid anomalies
 Insertion Anomaly–adding new rows forces user to create
extra data
 Deletion Anomaly–deleting rows may cause a loss of data
that would be needed for other future rows
 Modification Anomaly–changing data in a row forces
changes to other rows because of duplication

49
Example–Figure 5-2b

Question–Is this a relation?

Question–What’s the primary key?

50
Anomalies in this Table

 Insertion–can’t enter a new employee without


having the employee take a class
 Deletion–if we remove employee 140, we lose
information about the existence of a Tax Acc class
 Modification–giving a salary increase to employee
100 forces us to update multiple records

51
52
The End

53
Logical Database
Design

Functional Dependencies and


Normalisation

1
Contents
 Normalization
 1 Normal Form
 Functional Dependencies
 Identifying Functional Dependencies
 Transitive Dependencies
 2nd Normal Form
 3rd Normal Form
 Merging Relations
 Enterprise Keys

2
CLO(s) & Taxonomy Level
 The contents target CLO 3
 Taxonomy Level is 3

3
Data Normalization
 Primarily a tool to validate and improve
a logical design so that it satisfies
certain constraints that avoid
unnecessary duplication of data
 The process of decomposing relations
with anomalies to produce smaller,
well-structured relations

4
Well-Structured Relations
 A relation that contains minimal data redundancy and
allows users to insert, delete, and update rows
without causing data inconsistencies
 Goal is to avoid anomalies
 Insertion Anomaly–adding new rows forces user to create
duplicate data
 Deletion Anomaly–deleting rows may cause a loss of data
that would be needed for other future rows
 Modification Anomaly–changing data in a row forces
changes to other rows because of duplication

General rule of thumb: A table should not pertain to


more than one entity type
5
Example–Figure 5-2b

Question–Is this a relation? Answer–Yes: Unique rows and no


multivalued attributes

Question–What’s the primary key? Answer–Composite: Emp_ID, Course_Title

6
Anomalies in this Table
 Insertion–can’t enter a new employee without
having the employee take a class
 Deletion–if we remove employee 140, we lose
information about the existence of a Tax Acc class
 Modification–giving a salary increase to employee
100 forces us to update multiple records

Why do these anomalies exist?


Because there are two themes (entity types) in this
one relation. This results in data duplication and an
unnecessary dependency between the entities
7
Steps in normalization

8
First Normal Form
 No multivalued attributes

 Every attribute value is atomic

9
Table with multivalued attributes, not in 1st normal form

Is this a relation?

10
Table with no multivalued attributes and unique rows, in 1st
normal form

Note: this is relation, but not a well-structured one

11
Anomalies in this Table
 Insertion–if new product is ordered for order 1007
of existing customer, customer data must be re-
entered, causing duplication
 Deletion–if we delete the Dining Table from Order
1006, we lose information concerning this item's
finish and price
 Update–changing the price of product ID 4 requires
update in several records

Why do these anomalies exist?


Because there are multiple themes (entity types) in
one relation. This results in duplication and an
unnecessary dependency between the entities
12
Second Normal Form
 1NF PLUS every non-key attribute is
fully functionally dependent on the
ENTIRE primary key
 Every non-key attribute must be defined by
the entire key, not by only part of the key
 No partial functional dependencies

13
Functional Dependencies and Keys
 Functional Dependency: The value of
one attribute (the determinant)
determines the value of another
attribute

14
Functional Dependencies (FD) -
Definition
 Let R be a relation scheme and A, B be sets of attributes in R.
 A functional dependency from A to B exists if and only if:
 For every instance of |R| of R, if two tuples in |R| agree on the
values of the attributes in A, then they agree on the values of the
attributes in B
 We write A  B and say that A determines B

 A is called the ‘determinant’ and B the ‘dependent’

15
Example

16
Example (cont…)

17
Example Functional Dependency
that holds for all Time
 Consider the values shown in staffNo and
sName attributes of the Staff relation.

 Based on sample data, the following


functional dependencies appear to hold.

staffNo → sName
sName → staffNo

© Pearson Education Limited 1995, 2005 18


Example Functional Dependency
that holds for all Time
 However, the only functional dependency that
remains true for all possible values for the staffNo
and sName attributes of the Staff relation is:
staffNo → sName
 One approach to identifying the set of all possible
values of attributes in a relation is to more clearly
understand the purpose of each attribute. E.g., the
purpose of ‘staffNo’ attribute is to uniquely
identify each member of staff.

© Pearson Education Limited 1995, 2005 19


Characteristics of Functional
Dependencies
 Determinants should have the
minimal number of attributes
necessary to maintain the
functional dependency with the
attribute(s) on the right hand-side.

 Thisrequirement is called full


functional dependency.
© Pearson Education Limited 1995, 2005 20
Characteristics of Functional
Dependencies
 Fullfunctional dependency
indicates that if A and B are
attributes of a relation, B is fully
functionally dependent on A, if B is
functionally dependent on A, but
not on any proper subset of A.

© Pearson Education Limited 1995, 2005 21


Example Full Functional
Dependency
 Exists in the Staff relation

staffNo, sName → branchNo

 True - each value of (staffNo, sName) is associated


with a single value of branchNo.

 However, branchNo is also functionally dependent on


a subset of (staffNo, sName), namely staffNo. Example
above is a partial dependency.

© Pearson Education Limited 1995, 2005 22


Characteristics of Functional
Dependencies
 Main characteristics of functional dependencies
used in normalization:
 There is a one-to-one relationship between the
attribute(s) on the left-hand side (determinant)
and those on the right-hand side of a functional
dependency.
 Holds for all time.
 The determinant has the minimal number of
attributes necessary to maintain the dependency
with the attribute(s) on the right hand-side.

© Pearson Education Limited 1995, 2005 23


Transitive Dependencies

 Important to recognize a transitive


dependency because its existence in a
relation can potentially cause update
anomalies.

 Transitive dependency describes a condition


where A, B, and C are attributes of a relation
such that if A → B and B → C, then C is
transitively dependent on A via B (provided
that A is not functionally dependent on B or
C).
© Pearson Education Limited 1995, 2005 24
Example Transitive Dependency

 Consider functional dependencies in the


StaffBranch relation.

staffNo → sName, position, salary,


branchNo, bAddress
branchNo → bAddress

 Transitive dependency, branchNo →


bAddress exists on staffNo via branchNo.

© Pearson Education Limited 1995, 2005 25


Trivial FDs
 A functional dependency X  Y is trivial if Y is a subset of X
 {name, supervisor_id}  {name}
 If two records have the same values on both the name and
supervisor_id attributes, then they obviously have the
same name.
 Trivial dependencies hold for all relation instances

 A functional dependency X  Y is non-trivial if YX = 


 {supervisor_id}  {specialization}
 Non-trivial FDs are given implicitly in the form of
constraints when designing a database.
 For instance, the specialization of a students must be the
same as that of the supervisor.
 They constrain the set of legal relation instances. For
instance, if I try to insert two students under the same
supervisor with different specializations, the insertion will
be rejected by the DBMS
26
Transitive Dependencies

 Important to recognize a transitive


dependency because its existence in a
relation can potentially cause update
anomalies.

 Transitive dependency describes a condition


where A, B, and C are attributes of a relation
such that if A → B and B → C, then C is
transitively dependent on A via B (provided
that A is not functionally dependent on B or
C).
© Pearson Education Limited 1995, 2005 27
Example Transitive Dependency
 Consider functional dependencies in the
StaffBranch relation.

staffNo → sName, position, salary,


branchNo, bAddress
branchNo → bAddress

 Transitive dependency, branchNo →


bAddress exists on staffNo via branchNo.

© Pearson Education Limited 1995, 2005 28


Identifying Functional
Dependencies
 Identifying all functional dependencies
between a set of attributes is relatively
simple if the meaning of each attribute and
the relationships between the attributes are
well understood.

 This information should be provided by the


enterprise in the form of discussions with
users and/or documentation such as the
users’ requirements specification.
© Pearson Education Limited 1995, 2005 29
Identifying Functional
Dependencies
 However, if the users are unavailable
for consultation and/or the
documentation is incomplete then
depending on the database application
it may be necessary for the database
designer to use their common sense
and/or experience to provide the
missing information.
© Pearson Education Limited 1995, 2005 30
Example - Identifying a set of
functional dependencies for
the StaffBranch relation
 Examine semantics of attributes in
StaffBranch relation. Assume that
position held and branch determine a
member of staff’s salary.

© Pearson Education Limited 1995, 2005 31


Example - Identifying a set of
functional dependencies for the
StaffBranch relation
 With sufficient information available, identify the
functional dependencies for the StaffBranch relation
as:
staffNo → sName, position, salary, branchNo,
bAddress
branchNo → bAddress
bAddress → branchNo
branchNo, position → salary
bAddress, position → salary

32
© Pearson Education Limited 1995, 2005
Identifying the Primary Key for a
Relation using Functional
Dependencies
 Main purpose of identifying a set of
functional dependencies for a relation is to
specify the set of integrity constraints that
must hold on a relation.

 An important integrity constraint to consider


first is the identification of candidate keys,
one of which is selected to be the primary
key for the relation.
© Pearson Education Limited 1995, 2005 33
Example - Identify Primary
Key for StaffBranch Relation
 StaffBranch relation has five functional
dependencies.
staffNo → sName, position, salary, branchNo, bAddress
branchNo → bAddress
bAddress → branchNo
branchNo, position → salary
bAddress, position → salary
 The determinants are staffNo, branchNo, bAddress,
(branchNo, position), and (bAddress, position).
 Identify all candidate key(s), identify the attribute (or
group of attributes) that uniquely identifies each
tuple in this relation.
© Pearson Education Limited 1995, 2005 34
Example - Identifying Primary
Key for StaffBranch Relation
 All attributes that are not part of a candidate
key should be functionally dependent on the
key.

 The only candidate key and therefore primary


key for StaffBranch relation, is staffNo, as all
other attributes of the relation are
functionally dependent on staffNo.

© Pearson Education Limited 1995, 2005 35


Reasoning about Functional
Dependencies
 It is sometimes possible to infer new functional
dependencies from a set of given functional
dependencies
 independently from any particular instance of the
relation scheme or of any additional knowledge
 Example:
 From
 {sid}  {first_name} and
 {sid} {last_name}
 We can infer
 {sid}  {first_name, last_name}
36
Armstrong’s Axioms
 Be X, Y, Z be subset of the relation scheme of a relation
R
 Reflexivity:
If YX, then XY (trivial FDs)
 {staffNo, branchNo}{branchNo}
 Augmentation:
If XY , then XZ YZ
 if {staffNo} {sName} ,
 then {staffNo, branchNo}{staffNo, branchNo}
 Transitivity:
If XY and YZ, then XZ
 if {staffNo} {branchNo} and {branchNo}
{branchAddress}, then {staffNo}{branchAddress}
37
Additional Rules based on Armstrong’s
axioms
 Armstrong’s axioms can be used to produce additional rules that are
useful:

 Weak Augmentation rule: Let X, Y, Z be subsets of the relation R


If XY , then XZY
Other useful rules:
 If X  Y and X  Z, then X  YZ (Addition or union)
e.g. staffNo branchNo and staffNo branchAddress
then staffNo branchNo,branchAddress
 If X  YZ, then X  Y and X  Z (decomposition)
e.g. staffNo branchNo,branchAddress then
staffNo branchNo and staffNo branchAddress
 If X  Y and ZY  W, then ZX  W (pseudotransitivity)
e.g. staffNo branchNo and branchNo,position salary
then position,staffNo salary

38
Example 1

39
Example 2

40
Back to Normalisation
Second Normal Form
 1NF PLUS every non-key attribute is
fully functionally dependent on the
ENTIRE primary key
 Every non-key attribute must be defined by
the entire key, not by only part of the key
 No partial functional dependencies

41
Table with no multivalued attributes and unique rows, in 1st
normal form

Note: this is relation, but not a well-structured one

42
Figure 5-27 Functional dependency diagram for INVOICE

Order_ID  Order_Date, Customer_ID, Customer_Name, Customer_Address


Customer_ID  Customer_Name, Customer_Address
Product_ID  Product_Description, Product_Finish, Unit_Price
Order_ID, Product_ID  Order_Quantity

Therefore, NOT in 2nd Normal Form


43
Figure 5-28 Removing partial dependencies

Getting it into
Second Normal
Form

Partial dependencies are removed, but there


are still transitive dependencies

44
Third Normal Form
 2NF PLUS no transitive dependencies
(functional dependencies on non-primary-key
attributes)
 Note: This is called transitive, because the
primary key is a determinant for another
attribute, which in turn is a determinant for a
third
 Solution: Non-key determinant with transitive
dependencies go into a new table; non-key
determinant becomes primary key in the new
table and stays as foreign key in the old table

45
Figure 5-28 Removing partial dependencies

Getting it into
Third Normal
Form

Transitive dependencies are removed

46
Merging Relations
 View Integration–Combining entities from
multiple ER models into common relations
 Issues to watch out for when merging entities
from different ER models:
 Synonyms–two or more attributes with different
names but same meaning (e.g., Student_ID and
Matriculation_No)
 Homonyms–attributes with same name but different
meanings (E.g., Account_No)
 Transitive dependencies–even if relations are in 3NF
prior to merging, they may not be after merging (e.g.,
Student(ID,Major) and Student(ID,Advisor)
 Supertype/subtype relationships–may be hidden prior
to merging e.g. Pat1(ID,Name,Address, 47
Date_treated)& Pat2(ID,Name,Add,Room_No).
Enterprise Keys

 Primary keys that are unique in the


whole database, not just within a
single relation
 Corresponds with the concept of an
object ID in object-oriented systems

48
Figure 5-31 Enterprise keys

a) Relations with
enterprise key

b) Sample data with


enterprise key

49
Example

50
Introduction to
SQL

1
Previously
▪Logical database design
▪ Anomalies

▪Normalization
▪ Normalization forms
▪ First Normal Form
▪ Functional Dependencies
▪ Second Normal Form
▪ Third Normal Form

2
Contents
▪ history and role of SQL in database development.
▪ Database definition using the SQL data definition language.
▪ Writing single-table queries using SQL commands.
▪ Establishing referential integrity using SQL.
▪ Discussion on SQL:1999 and SQL:200n standards.

3
Target CLO & Taxonomy Level
This content targets CLO 4 at Taxonomy Level 3

4
SQL Overview
Structured Query Language

The standard for relational database management


systems (RDBMS)

RDBMS: A database management system that


manages data as a collection of tables in which all
relationships are represented by common values in
related tables

5
History of SQL
1970–E. Codd develops relational database concept
1974-1979–System R with Sequel (later SQL) created at
IBM Research Lab
1979–Oracle markets first relational DB with SQL
1986–ANSI SQL standard released
1989, 1992, 1999, 2003–Major ANSI standard updates
(2006 and 2008)
Current–SQL is supported by most major database
vendors

For further details; Origin of SQL in chap 6 of textbook.

6
Purpose of SQL Standard
Specify syntax/semantics for data definition and
manipulation
Define data structures
Enable portability
Allow for later growth/enhancement to standard
Many ‘extensions’ to standards by vendors.

7
Benefits of a Standardized
Relational Language
Reduced training costs
Productivity
Application portability
Application longevity
Reduced dependence on a single vendor
Cross-system communication

8
SQL Environment
Catalog
◦ A set of schemas that constitute the description of a database
◦ Two catalogs are maintained
◦ Dev_C (development catalog)
◦ Prod_C (Production catalog)

Schema
◦ The structure that contains descriptions of objects created by a user
(base tables, views, constraints) (More like database)

9
Figure 7-1
A simplified schematic of a typical SQL environment, as
described by the SQL-2003 standard

10
Figure 7-1
A simplified schematic of a typical SQL environment, as
described by the SQL-2008 standard

11
Types of SQL Statements
Can be divided into three types
◦ Data Definition Language (DDL)
◦ create, modify, or delete database objects such as tables, views, schemas,
domains, triggers, and stored procedures.
◦ The SQL keywords most often associated with DDL statements are CREATE,
ALTER, and DROP
◦ Data Manipulation Language (DML)
◦ Commands that maintain and query a database i.e. statements are used to
retrieve, add,modify, or delete data stored in your database objects. The
primary keywords associated with DML statements are SELECT, INSERT, UPDATE,
and DELETE.
◦ Data Control Language (DCL)
◦ Commands that control a database, including administering privileges
and committing data. GRANT, ADD and REVOKE are primary DCL
commands.

12
Some SQL Data types

13
SQL Database Definition
Data Definition Language (DDL)
Major CREATE statements:
◦ CREATE SCHEMA–defines the schema structure.
◦ CREATE TABLE–defines a table and its columns
◦ CREATE VIEW–defines a logical table from one or more
views
Other CREATE statements: CHARACTER SET(for
globalization), COLLATION, TRANSLATION,
ASSERTION, DOMAIN

14
Table Creation Steps in table creation:
1. Identify data types for
Figure 7-5 General syntax for CREATE TABLE
attributes
2. Identify columns that can
and cannot be null
3. Identify columns that
must be unique
(candidate keys)
4. Identify primary key–
foreign key mates
5. Determine default values
6. Identify constraints on
columns (domain
specifications)
7. Create the table and
associated indexes
15
The following slides create tables for this
enterprise data model

16
Figure 7-6 SQL database definition commands for Pine Valley Furniture

Overall table
definitions

17
Defining attributes and their data types

18
Non-nullable specification

Primary keys
can never have
Identifying primary key NULL values

19
Non-nullable specifications

Primary key

Some primary keys are composite–


composed of multiple attributes

20
Controlling the values in attributes

Default value

Domain constraint

21
Identifying foreign keys and establishing relationships

Primary key of
parent table

Foreign key of
dependent table

22
Data Integrity Controls
Referential integrity–constraint that ensures that foreign key values of a
table must match primary key values of a related table in 1:M
relationships
Restricting:
◦ Deletes of primary records
◦ Updates of primary records
◦ Inserts of dependent records

23
Figure 7-7 Ensuring data integrity through updates

Relational
integrity is
enforced via
the primary-
key to foreign-
key match

24
Changing and Removing
Tables
ALTER TABLE statement allows you to rename tables and
change column specifications:
Syntax: ALTER TABLE table_name
RENAME TO new_table_name;
Syntax: ALTER TABLE table_name
ADD column_name column-definition;
◦ ALTER TABLE CUSTOMER_T ADD (TYPE VARCHAR(2))

Syntax: ALTER TABLE table_nameADD (column_1column-


definition, column_2column-
definition, ... column_ncolumn_definition );

25
Changing and Removing
Tables
Modify Columns
Syntax: ALTER TABLE table_name
MODIFY column_name column_type;
Dropping Columns in a table.
ALTER TABLE table_name
DROP COLUMN column_name;
DROP TABLE statement allows you to remove tables
from your schema:
◦ DROP TABLE CUSTOMER_T

26
Schema Definition
Control processing/storage efficiency:
◦ Choice of indexes
◦ File organizations for base tables
◦ File organizations for indexes
◦ Data clustering
◦ Statistics maintenance
Creating indexes
◦ Speed up random/sequential access to base table data
◦ Example
◦ CREATE INDEX NAME_IDX ON CUSTOMER_T(CUSTOMER_NAME)
◦ This makes an index for the CUSTOMER_NAME field of the CUSTOMER_T
table

27
Insert Statement
Adds data to a table
Inserting into a table
◦ INSERT INTO CUSTOMER_T VALUES (001, ‘Contemporary Casuals’,
‘1355 S. Himes Blvd.’, ‘Gainesville’, ‘FL’, 32601);
Inserting a record that has some null attributes requires
identifying the fields that actually get data
◦ INSERT INTO PRODUCT_T (PRODUCT_ID, PRODUCT_DESCRIPTION,PRODUCT_FINISH,
STANDARD_PRICE, PRODUCT_ON_HAND) VALUES (1, ‘End Table’, ‘Cherry’, 175, 8);

Inserting from another table


◦ INSERT INTO CA_CUSTOMER_T SELECT * FROM CUSTOMER_T WHERE STATE = ‘CA’;

28
Delete Statement
Removes rows from a table
Delete certain rows
◦ DELETE FROM CUSTOMER_T WHERE STATE = ‘HI’;

Delete all rows


◦ DELETE FROM CUSTOMER_T;

29
Update Statement
Modifies data in existing rows

UPDATE PRODUCT_T SET UNIT_PRICE = 775 WHERE


PRODUCT_ID = 7;

30
SELECT Statement
Used for queries on single or multiple tables
Clauses of the SELECT statement:
◦ SELECT
◦ List the columns (and expressions) that should be returned from the query
◦ FROM
◦ Indicate the table(s) or view(s) from which data will be obtained
◦ WHERE
◦ Indicate the conditions under which a row will be included in the result
◦ GROUP BY
◦ Indicate categorization of results
◦ HAVING
◦ Indicate the conditions under which a category (group) will be included
◦ ORDER BY
◦ Sorts the result according to specified criteria

31
SELECT Example
Find products with standard price less than $275

SELECT PRODUCT_NAME, STANDARD_PRICE


FROM PRODUCT_V
WHERE STANDARD_PRICE < 275;

Table 7-3: Comparison Operators in SQL

32
SELECT Example Using Alias
Alias is an alternative column or table name

SELECT CUST.CUSTOMER AS NAME,


CUST.CUSTOMER_ADDRESS
FROM CUSTOMER_V CUST
WHERE NAME = ‘Home Furnishings’;

33
SELECT Example
Using a Function
Functions implemented by most DBMSs
◦ COUNT Returns the number of rows
◦ SUM Returns the total of the values in a column from a group of
rows
◦ AVG Returns the average of the values in a column from a group of
rows
◦ MIN Returns the minimum value in a column from among a group
ofrows
◦ MAX Returns the maximum value in a column from among a group
of rows

34
Using the aggregate functions.
SELECT COUNT(*) FROM ORDER_LINE_V
WHERE ORDER_ID = 1004;
SELECT SUM(population) from city
where city.district = 'Zuid-Holland';
SELECT AVG(population) from city
where city.district = 'Zuid-Holland';
SELECT MIN(population) from city;
SELECT MAX(population) from city;

35
SELECT Example–Boolean Operators
AND, OR, and NOT Operators for customizing conditions in
WHERE clause

SELECT PRODUCT_DESCRIPTION, PRODUCT_FINISH,


STANDARD_PRICE
FROM PRODUCT_V
WHERE (PRODUCT_DESCRIPTION LIKE ‘%Desk’
OR PRODUCT_DESCRIPTION LIKE ‘%Table’)
AND UNIT_PRICE > 300;

Note: the LIKE operator allows you to compare strings using wildcards.
For example, the % wildcard in ‘%Desk’ indicates that all strings that
have any number of characters preceding the word “Desk” will be allowed

36
SELECT Example –
sorting results with the ORDER BY clause
and selecting a list of values with IN clause.

Sort the results first by STATE, and within a state by CUSTOMER_NAME


SELECT CUSTOMER_NAME, CITY, STATE
FROM CUSTOMER_V
WHERE STATE IN (‘FL’, ‘TX’, ‘CA’, ‘HI’)
ORDER BY STATE, CUSTOMER_NAME;

Note: the IN operator in this example allows you to include rows whose
STATE value is either FL, TX, CA, or HI. It is more efficient than separate
OR conditions

37
Result

38
SELECT Example–
Categorizing Results Using the GROUP BY Clause
For use with aggregate functions
◦ Scalar aggregate: single value returned from SQL query with aggregate
function
◦ Vector aggregate: multiple values returned from SQL query with aggregate
function (via GROUP BY)

SELECT COUNT(CUSTOMER_STATE), CUSTOMER_STATE


FROM CUSTOMER_V
GROUP BY CUSTOMER_STATE;

39
Result

40
SELECT Example–
Qualifying Results by Categories
Using the HAVING Clause
For use with GROUP BY

SELECT CUSTOMER_STATE, COUNT(CUSTOMER_STATE)


FROM CUSTOMER_V
GROUP BY CUSTOMER_STATE
HAVING COUNT(CUSTOMER_STATE) > 1;

Like a WHERE clause, but it operates on groups (categories), not on individual rows.
Here, only those groups with total numbers greater than 1 will be included in final
result

41
Result

42
Using and Defining Views
Views provide users controlled access to tables
Base Table–table containing the raw data
Dynamic View
◦ A “virtual table” created dynamically upon request by a user
◦ No data actually stored; instead data from base table made available to user
◦ Based on SQL SELECT statement on base tables or other views
Materialized View
◦ Copy or replication of data
◦ Data actually stored
◦ Must be refreshed periodically to match the corresponding base tables

43
Sample CREATE VIEW
CREATE VIEW EXPENSIVE_STUFF_V AS
SELECT PRODUCT_ID, PRODUCT_NAME, UNIT_PRICE
FROM PRODUCT_T
WHERE UNIT_PRICE >300;

▪View has a name


▪View is based on a SELECT statement

44
Advantages of Views
Simplify query commands
Assist with data security (but don't rely on views for security, there are more
important security measures)
Enhance programming productivity
Contain most current base table data
Use little storage space
Provide customized view for user

45
Disadvantages of Views
Use processing time each time view is referenced
May or may not be directly updateable

46
END OF LECTURE

47
Introduction to Database
Systems

WEEK 13
Advanced SQL

1
Previously

• Introduction to SQL
• MySQL environment
• Basic SQL statements
• Create
• Alter
• Drop
• Single Table DML Statements
• Insert
• Delete
• Update
• Select and various clauses

2
Contents
• Write multiple table SQL queries
• Define and use three types of joins
• Write subqueries
• Understand triggers and stored procedures

3
Target CLOs & Taxonomy Levels

• CLO 4, Taxonomy Level 3

4
Processing Multiple Tables–Joins
• Join–a relational operation that causes two or more tables with a common
domain to be combined into a single table or view
• Equi-join–a join in which the joining condition is based on equality
between values in the common columns; common columns appear
redundantly in the result table
• Natural join–an equi-join in which one of the duplicate columns is
eliminated in the result table
• Outer join–a join in which rows that do not have matching values in
common columns are nonetheless included in the result table (as opposed
to inner join, in which rows must have matching values in order to appear in
the result table)
• Union join–includes all columns from each table in the join, and an
instance for each row of each table

The common columns in joined tables are usually the primary key of the
dominant table and the foreign key of the dependent table in 1:M relationships
5
MySql Supported Join statements

6
The following slides create tables for this
enterprise data model

7
Figure 8-1 Pine Valley Furniture Company Customer and Order
tables with pointers from customers to their orders

These tables are used in queries that follow

8
Equi-Join or Inner Join

• Equi-join–a join in which the joining condition is based on


equality between values in the common columns; common
columns appear redundantly in the result table

9
Natural Join Example

• For each customer who placed an order, what is the


customer’s name and order number?
Join involves multiple tables in FROM clause

SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, ORDER_ID


FROM CUSTOMER_T NATURAL JOIN ORDER_T ON

CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID;

Note: from Fig. 1, you see


that only 10 Customers
ON clause performs the equality
have links with orders.
check for common columns of the
two tables ➔ Only 10 rows will be
returned from this INNER
join.
10
Outer Join Example
• List the customer name, ID number, and order number for all
customers. Include customer information even for
customers that don’t have an order

SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, ORDER_ID


FROM CUSTOMER_T, LEFT OUTER JOIN ORDER_T
ON CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID;

LEFT OUTER JOIN syntax with


ON causes customer data to
appear even if there is no Unlike INNER join, this will
include customer rows with
corresponding order data
no matching order rows

11
Results

Unlike
INNER
join, this
will include
customer
rows with
no
matching
order rows

12
Multiple Table Join Example
• Assemble all information necessary to create an invoice for order
number 1006 Four tables involved in this join
SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, CUSTOMER_ADDRESS, CITY, SATE,
POSTAL_CODE, ORDER_T.ORDER_ID, ORDER_DATE, QUANTITY, PRODUCT_DESCRIPTION,
STANDARD_PRICE, (QUANTITY * UNIT_PRICE)
FROM CUSTOMER_T, ORDER_T, ORDER_LINE_T, PRODUCT_T

WHERE CUSTOMER_T.CUSTOMER_ID = ORDER_LINE.CUSTOMER_ID AND


ORDER_T.ORDER_ID = ORDER_LINE_T.ORDER_ID
AND ORDER_LINE_T.PRODUCT_ID = PRODUCT_PRODUCT_ID
AND ORDER_T.ORDER_ID = 1006;

Each pair of tables requires an equality-check condition in the WHERE clause,


matching primary keys against foreign keys

13
Figure 8-2 Results from a four-table join

From CUSTOMER_T table

From ORDER_T table From PRODUCT_T table

14
Union Join

• Not implemented by all DBMS.

• Results in a table that includes all of the data from


each table that is joined i.e. all of the columns from
each table and an instance for each row of data
from each table.
• Not same as ‘UNION’ command that join multiple
select statements.

15
Union Queries
• Combine the output (union of multiple queries)
together into a single result table

First query

Combine

Second query

16
End of Lecture

17
Introduction to Database
Systems

WEEK 14
Advanced SQL
Part II

18
Previously

• Write multiple table SQL queries


• Getting data from two or more tables
• Joining multiple tables using the where clause
• Joining multiple tables using explicit join statements
• Equi-Join / inner join
• Natural Join
• Outer Join
• Outer Left Join
• Outer right Join
• Union Join

19
Contents
• Write subqueries
• Correlated vs Non-correlated subqueries
• Combining queries (union)
• Transaction integrity (commit/roll back)
• SQL triggers
• Stored procedures
• Embedded and Dynamic SQL

20
Target CLOs & Taxonomy Levels

• CLO 4, Taxonomy Level 3

21
Processing Multiple Tables
Using Subqueries
• Subquery–placing an inner query (SELECT
statement) inside an outer query

• Subqueries can be:


• Noncorrelated–executed once for the entire outer query
• Correlated–executed once for each row returned by the
outer query

22
Using Join

23
Same Query Using subquery

24
Subquery Example

• Show all customers who have placed an order


The IN operator will test to see if the
CUSTOMER_ID value of a row is
included in the list returned from the
subquery
SELECT CUSTOMER_NAME FROM CUSTOMER_T
WHERE CUSTOMER_ID IN
(SELECT DISTINCT CUSTOMER_ID FROM ORDER_T);

Subquery is embedded in
parentheses. In this case it
returns a list that will be used
in the WHERE clause of the
outer query

25
Correlated vs. Noncorrelated Subqueries

• Noncorrelated subqueries:
• Do not depend on data from the outer query
• Execute once for the entire outer query
• Correlated subqueries:
• Make use of data from the outer query
• Execute once for each row of the outer query
• Can use the EXISTS operator

26
Processing a
noncorrelated
subquery

No reference to data in
outer query, so
1. The subquery
subquery executes once
executes and only
returns the
customer IDs from
the ORDER_T table

2. The outer query on


the results of the These are the only
subquery customers that have
IDs in the ORDER_T
table

27
Correlated Subquery Example

• Show all orders that include furniture finished in natural ash


The EXISTS operator will return a
TRUE value if the subquery resulted
in a non-empty set, otherwise it
returns a FALSE
SELECT DISTINCT ORDER_ID FROM ORDER_LINE_T
WHERE EXISTS
(SELECT * FROM PRODUCT_T
WHERE PRODUCT_ID = ORDER_LINE_T.PRODUCT_ID
AND PRODUCT_FINISH = ‘Natural ash’);

The subquery is testing for a value that


comes from the outer query

28
Figure
Processing a
correlated Subquery refers to outer-
subquery query data, so executes once
for each row of outer query

Note: only the


orders that
involve products
with Natural
Ash will be
included in the
final results

29
Correlated or Non-correlated??

30
Combining Queries (Union Join)

• Not implemented by all DBMS.

• Results in a table that includes all of the data from


each table that is joined i.e. all of the columns from
each table and an instance for each row of data
from each table.
• Not same as ‘UNION’ command that join multiple
select statements.

31
Union Queries
• Combine the output (union of multiple queries)
together into a single result table

First query

Combine

Second query

We can also use relative position of fields in ORDER BY such as 32


ORDER BY 3 mean the same as ordered_quantitiy is 3rd field in the table
Combined output of UNION

• Just like UNION, we can also use INTERSECT and MINUS operations
on two queries for combined output.
• INTERSECT return common records in two sets
• MINUS returns all records of SET A which are not in SET B
• There is no INTERSECT and MINUS operations in MySQL

33
Conditional Expressions

• Establishing IF-THEN-ELSE logical processing within an SQL statement


can now be accomplished by using the CASE keyword in a statement.

SELECT orderline_t.OrderID, orderline_t.OrderedQuantity as Quantity,


CASE
WHEN OrderedQuantity > 5 THEN "The quantity is greater than 5"
WHEN OrderedQuantity = 5 THEN "The quantity is 5"
ELSE "The quantity is under 5“
END as Stock
FROM orderline_t;

34
Ensuring Transaction Integrity

• Transaction = A discrete unit of work that must be


completely processed or not processed at all
• May involve multiple updates
• If any update fails, then all other updates must be
cancelled
• SQL commands for transactions
• BEGIN TRANSACTION/END TRANSACTION
• Marks boundaries of a transaction
• COMMIT
• Makes all updates permanent
• ROLLBACK
• Cancels updates since the last COMMIT

35
Figure 8-5 An SQL Transaction sequence (in pseudocode)

36
Data Dictionary Facilities
• System tables that store metadata
• Users usually can view some of these tables
• Users are restricted from updating them
• Some examples in Oracle 10g
• DBA_TABLES–descriptions of tables
• DBA_CONSTRAINTS–description of constraints
• DBA_USERS–information about the users of the system
• Examples in Microsoft SQL Server 2000
• SYSCOLUMNS–table and column definitions
• SYSDEPENDS–object dependencies based on foreign keys
• SYSPERMISSIONS–access permissions granted to users

37
Routines and Triggers
• Routines
• Program modules that execute on demand
• Functions–routines that return values and take input parameters
• Procedures–routines that do not return values and can take input or output
parameters
• Triggers
• Routines that execute in response to a database event (INSERT, UPDATE, or
DELETE)

38
Figure 8-6 Triggers contrasted with stored procedures
Procedures are called explicitly

Triggers are event-driven


Source: adapted from Mullins, 1995.

39
Figure 8-7 Simplified trigger syntax, SQL:2003

Figure 8-8 Create routine syntax, SQL:2003

40
Trigger Example

• CREATE TRIGGER tr_Orders_INSERT


ON Orders AFTER INSERT AS IF (SELECT COUNT(*) FROM inserted
WHERE Ord_Priority = 'High') = 1
BEGIN PRINT 'Email Code Goes Here' END

41
Stored Procedure Example

• CREATE PROCEDURE ps_Orders_INSERT @Ord_Priority


varchar(10) AS
BEGIN TRANSACTION
INSERT INTO Orders (Ord_Priority) VALUES (@Ord_Priority)
IF @Ord_Priority = 'High' PRINT
'Email Code Goes Here'
COMMIT TRANSACTION

42
Embedded and Dynamic SQL

• Embedded SQL
• Including hard-coded SQL statements in a program written in another
language such as C or Java
• Dynamic SQL
• Ability for an application program to generate SQL code on the fly, as the
application is running

43
END OF LECTURE

44
Introduction to Database
Systems

WEEK 14
Advanced SQL
Part II

1
Previously

• Write multiple table SQL queries


• Getting data from two or more tables
• Joining multiple tables using the where clause
• Joining multiple tables using explicit join statements
• Equi-Join / inner join
• Natural Join
• Outer Join
• Outer Left Join
• Outer right Join
• Union Join

2
Contents
• Write subqueries
• Correlated vs Non-correlated subqueries
• Combining queries (union)
• Transaction integrity (commit/roll back)
• SQL triggers
• Stored procedures
• Embedded and Dynamic SQL

3
Target CLOs & Taxonomy Levels

• CLO 4, Taxonomy Level 3

4
Processing Multiple Tables
Using Subqueries
• Subquery–placing an inner query (SELECT
statement) inside an outer query

• Subqueries can be:


• Noncorrelated–executed once for the entire outer query
• Correlated–executed once for each row returned by the
outer query

5
Using Join

6
Same Query Using subquery

7
Subquery Example

• Show all customers who have placed an order


The IN operator will test to see if the
CUSTOMER_ID value of a row is
included in the list returned from the
subquery
SELECT CUSTOMER_NAME FROM CUSTOMER_T
WHERE CUSTOMER_ID IN
(SELECT DISTINCT CUSTOMER_ID FROM ORDER_T);

Subquery is embedded in
parentheses. In this case it
returns a list that will be used
in the WHERE clause of the
outer query

8
Correlated vs. Noncorrelated Subqueries

• Noncorrelated subqueries:
• Do not depend on data from the outer query
• Execute once for the entire outer query
• Correlated subqueries:
• Make use of data from the outer query
• Execute once for each row of the outer query
• Can use the EXISTS operator

9
Processing a
noncorrelated
subquery

No reference to data in
outer query, so
1. The subquery
subquery executes once
executes and only
returns the
customer IDs from
the ORDER_T table

2. The outer query on


the results of the These are the only
subquery customers that have
IDs in the ORDER_T
table

10
Correlated Subquery Example

• Show all orders that include furniture finished in natural ash


The EXISTS operator will return a
TRUE value if the subquery resulted
in a non-empty set, otherwise it
returns a FALSE
SELECT DISTINCT ORDER_ID FROM ORDER_LINE_T
WHERE EXISTS
(SELECT * FROM PRODUCT_T
WHERE PRODUCT_ID = ORDER_LINE_T.PRODUCT_ID
AND PRODUCT_FINISH = ‘Natural ash’);

The subquery is testing for a value that


comes from the outer query

11
Figure
Processing a
correlated Subquery refers to outer-
subquery query data, so executes once
for each row of outer query

Note: only the


orders that
involve products
with Natural
Ash will be
included in the
final results

12
Correlated or Non-correlated??

13
Combining Queries (Union Join)

• Not implemented by all DBMS.

• Results in a table that includes all of the data from


each table that is joined i.e. all of the columns from
each table and an instance for each row of data
from each table.
• Not same as ‘UNION’ command that join multiple
select statements.

14
Union Queries
• Combine the output (union of multiple queries)
together into a single result table

First query

Combine

Second query

We can also use relative position of fields in ORDER BY such as 15


ORDER BY 3 mean the same as ordered_quantitiy is 3rd field in the table
Combined output of UNION

• Just like UNION, we can also use INTERSECT and MINUS operations
on two queries for combined output.
• INTERSECT return common records in two sets
• MINUS returns all records of SET A which are not in SET B
• There is no INTERSECT and MINUS operations in MySQL

16
Conditional Expressions

• Establishing IF-THEN-ELSE logical processing within an SQL statement


can now be accomplished by using the CASE keyword in a statement.

SELECT orderline_t.OrderID, orderline_t.OrderedQuantity as Quantity,


CASE
WHEN OrderedQuantity > 5 THEN "The quantity is greater than 5"
WHEN OrderedQuantity = 5 THEN "The quantity is 5"
ELSE "The quantity is under 5“
END as Stock
FROM orderline_t;

17
Ensuring Transaction Integrity

• Transaction = A discrete unit of work that must be


completely processed or not processed at all
• May involve multiple updates
• If any update fails, then all other updates must be
cancelled
• SQL commands for transactions
• BEGIN TRANSACTION/END TRANSACTION
• Marks boundaries of a transaction
• COMMIT
• Makes all updates permanent
• ROLLBACK
• Cancels updates since the last COMMIT

18
Figure 8-5 An SQL Transaction sequence (in pseudocode)

19
Data Dictionary Facilities
• System tables that store metadata
• Users usually can view some of these tables
• Users are restricted from updating them
• Some examples in Oracle 10g
• DBA_TABLES–descriptions of tables
• DBA_CONSTRAINTS–description of constraints
• DBA_USERS–information about the users of the system
• Examples in Microsoft SQL Server 2000
• SYSCOLUMNS–table and column definitions
• SYSDEPENDS–object dependencies based on foreign keys
• SYSPERMISSIONS–access permissions granted to users

20
Routines and Triggers
• Routines
• Program modules that execute on demand
• Functions–routines that return values and take input parameters
• Procedures–routines that do not return values and can take input or output
parameters
• Triggers
• Routines that execute in response to a database event (INSERT, UPDATE, or
DELETE)

21
Figure 8-6 Triggers contrasted with stored procedures
Procedures are called explicitly

Triggers are event-driven


Source: adapted from Mullins, 1995.

22
Figure 8-7 Simplified trigger syntax, SQL:2003

Figure 8-8 Create routine syntax, SQL:2003

23
Trigger Example

• CREATE TRIGGER tr_Orders_INSERT


ON Orders AFTER INSERT AS IF (SELECT COUNT(*) FROM inserted
WHERE Ord_Priority = 'High') = 1
BEGIN PRINT 'Email Code Goes Here' END

24
Stored Procedure Example

• CREATE PROCEDURE ps_Orders_INSERT @Ord_Priority


varchar(10) AS
BEGIN TRANSACTION
INSERT INTO Orders (Ord_Priority) VALUES (@Ord_Priority)
IF @Ord_Priority = 'High' PRINT
'Email Code Goes Here'
COMMIT TRANSACTION

25
Embedded and Dynamic SQL

• Embedded SQL
• Including hard-coded SQL statements in a program written in another
language such as C or Java
• Dynamic SQL
• Ability for an application program to generate SQL code on the fly, as the
application is running

26
END OF LECTURE

27
Introduction to Database
Systems

WEEK 15
Advanced SQL
Stored Procedures & Triggers

1
Previously

• Write subqueries
• Correlated vs Non-correlated subqueries
• Combining queries (union)
• Transaction integrity (commit/roll back)
• SQL triggers
• Stored procedures

2
Contents
• Stored procedures
• SQL triggers
• Embedded and Dynamic SQL

• MySQL Implementation
• Write subqueries
• Correlated vs Non-correlated subqueries
• Combining queries (union)
• Transaction integrity (commit/roll back)

• Stored procedures
• SQL triggers

3
Target CLOs & Taxonomy Levels

• CLO 4, Taxonomy Level 3

4
Routines and Triggers
• Routines
• Program modules that execute on demand
• Functions–routines that return values and take input parameters
• Procedures–routines that do not return values and can take input or output
parameters
• Triggers
• Routines that execute in response to a database event (INSERT, UPDATE, or
DELETE)

5
Figure 8-6 Triggers contrasted with stored procedures
Procedures are called explicitly

Triggers are event-driven


Source: adapted from Mullins, 1995.

6
Stored Procedure Example

• CREATE PROCEDURE ps_Orders_INSERT @Ord_Priority


varchar(10) AS
BEGIN TRANSACTION
INSERT INTO Orders (Ord_Priority) VALUES (@Ord_Priority)
IF @Ord_Priority = 'High' PRINT
'Email Code Goes Here'
COMMIT TRANSACTION

7
MySQL Syntax

• Creating Procedure
DELIMITER //

CREATE PROCEDURE GetAllProducts()


BEGIN
SELECT * FROM products;
END //

DELIMITER ;

• Calling procedure

CALL GetAllProducts();

8
MySQL stored procedure parameters

• In MySQL, a parameter has one of three modes: IN,OUT, or INOUT


DELIMITER //

CREATE PROCEDURE GetOfficeByCountry(


IN countryName VARCHAR(255)
)
BEGIN
SELECT *
FROM offices
WHERE country = countryName;
END //

DELIMITER ;

9
MySQL stored procedure parameters

DELIMITER $$

CREATE PROCEDURE GetOrderCountByStatus (


IN orderStatus VARCHAR(25),
OUT total INT
)
BEGIN
SELECT COUNT(orderNumber)
INTO total
FROM orders
WHERE status = orderStatus;
END$$

DELIMITER ;

10
Calling procedure with parameters

• To find the number of orders that already shipped,


• session variable ( @total ) is passed to receive the return value.

CALL GetOrderCountByStatus('Shipped',@total,@remaining);

SELECT @total,@remaining;

11
Stored Functions vs Procedures

Functions Procedures
A function has a return type and returns a value. A procedure does not have a return type. But it
returns values using the OUT parameters.

You cannot use a function with Data Manipulation You can use DML queries such as insert, update, select
queries. Only Select queries are allowed in functions. etc… with procedures.

A function does not allow output parameters A procedure allows both input and output parameters.

You cannot manage transactions inside a function. You can manage transactions inside a procedure.

You cannot call stored procedures from a function You can call a function from a stored procedure.

You can call a function using a select statement. You cannot call a procedure using select statements.

12
Stored Functions

DELIMITER $$

CREATE FUNCTION function_name( param1, param2,…)


RETURNS datatype
[NOT] DETERMINISTIC
BEGIN
-- statements
RETURN var_name;
END $$

DELIMITER ;

13
Stored Functions

DELIMITER $
CREATE FUNCTION getCustomers(state VARCHAR(2))
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE x int;
Select count(*) into x from customer_t where CustomerState=state;
RETURN x;
END $
DELIMITER ;

SELECT getCustomers('FL') as counter;


14
Triggers

• Triggers
• Routines that execute in response to a database event (INSERT, UPDATE, or
DELETE)

• Same as an event handler in event driven programs (click event)

• NEW keyword: works with INSERT Statements in Triggers


• OLD keyword: works with DELETE Statement in Triggers
• OLD & NEW both can be used with UPDATE statements in Triggers.

• Triggers are useful for tasks such as enforcing business rules, validating input
data, and keeping an audit trail.

15
Figure 8-7 Simplified trigger syntax, SQL:2003

Figure 8-8 Create routine syntax, SQL:2003

16
MySQL Trigger

Delimiter $$
CREATE TRIGGER trigger_name
{BEFORE | AFTER} {INSERT | UPDATE| DELETE }
ON table_name FOR EACH ROW
BEGIN
trigger_body;

END $$
Delimiter ;

• If you want to execute multiple statements, you use the BEGIN END
compound statement otherwise not needed.

17
Insert Trigger Example
Delimiter $$
Drop trigger if exists safety_trg;
Create trigger safety_trg
After insert on order_t for each row
Begin
Insert into order1_t(order1_t.OrderID, order1_t.CustomerID,
order1_t.OrderDate) values (new.orderID, new.customerID, new.orderDate);
End $$
Delimiter ;

In this case order1_t should be replicated table which would store a copy of
each new row inserted into the order table.

18
Update trigger example

Delimiter $$
Drop trigger if exists price_udpate;
Create trigger price_udpate
After update on product_t for each row
Begin
Insert into pricehistory_t(productid, updatedprice, pricehistory.date)
values (old.productid , new.productstandardprice, NOW());
End $$
Delimiter ;

We have to create a producthistory table with productid, updatedprice


and date.

19
Embedded and Dynamic SQL

• Embedded SQL
• Including hard-coded SQL statements in a program written in another
language such as C or Java
• Dynamic SQL
• Ability for an application program to generate SQL code on the fly, as the
application is running

20
END OF LECTURE

21
INTRODUCTION TO DATABASE SYSTEMS

DATA WAREHOUSING

Dr. Sohail

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-1


CONTENTS

 Introduction to data warehouse


 Definitions
 Why Data Warehouse
 Issues with company-wide view (normal data)
 Data Characteristics
 Status vs Event data
 Transient vs Periodic data
 Star Schema
 Issues regarding Star schema
 Variations of Star Schema
 Normalization issues in dimension tables
Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-2
TARGET CLO & TAXONOMY LEVEL

 CLO-1 Taxonomy Level 2

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-3


INTRODUCTION

 Sep 2004, Hurricane Frances was heading for Florida Atlantic


Coast (Weeks earlier Hurricane Charley Hit the Florida Gulf
coast. What Walmart did??
 Modern Organizations are drowning in data but starving for
information
 WHY??
 Fragmented development of information system and supported
databases.
 Systems are developed to support operational processing, with
little or no thought given to the information or analytical tools
needed for decision making.
 Effect of Internet.
Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-4
DEFINITIONS
 Data Warehouse
 A subject-oriented, integrated, time-variant, non-
updatable collection of data used in support of
management decision-making processes
 Subject-oriented: data organized by customers,
patients, students, products
 Integrated: consistent naming conventions, formats;
from multiple data sources
 Time-variant: can study trends and changes

 Non-updatable: read-only, periodically refreshed

 Data Mart
 A data warehouse that is limited in scope

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-5


NEED FOR DATA WAREHOUSING

 Integrated, company-wide view of high-quality


information (from disparate databases)
 Separation of operational and informational
systems and data (for improved performance)
 Operational system – real time, transaction
processing systems
 Informational system – decision support, read only.

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-6


ISSUES WITH COMPANY-WIDE VIEW

 Inconsistent key structures


 Synonyms
 Free-form vs. structured fields
 Inconsistent data values
 Missing data

See figure 9-1 for example

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-7


Figure 9-1
Examples of
heterogeneous
data

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-8


Figure 9-2 Independent data mart Data marts:
data warehousing architecture Mini-warehouses, limited in scope

T
E
Separate ETL for each Data access complexity
independent data mart due to multiple data marts
Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-9
Figure 9-3 Dependent data mart with ODS provides option for
operational data store: a three-level architecture obtaining current data

T
E
Simpler data access
Single ETL for Dependent data marts
enterprise data warehouse (EDW) loaded from EDW
Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-10
Figure 9-4 Logical data mart and real
ODS and data warehouse
time warehouse architecture are one and the same

T
E
Near real-time ETL for Data marts are NOT separate databases,
Data Warehouse but logical views of the data warehouse
➔ Easier to create new data marts
Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-11
DATA CHARACTERISTICS
STATUS VS. EVENT DATA
Figure 9-6
Example of DBMS
Status log entry

Event = a
database action
(create/ update/
delete) that
results from a
transaction

Status

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-12


DATA CHARACTERISTICS
TRANSIENT VS. PERIODIC DATA
Figure 9-7
Transient
operational data

With transient
data, changes
to existing
records are
written over
previous
records, thus
destroying the
previous data
content.

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-13


DATA CHARACTERISTICS
TRANSIENT VS. PERIODIC DATA
Figure 9-8 Periodic
warehouse data

Periodic data
are never
physically
altered or
deleted once
they have been
added to the
store.

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-14


STAR SCHEMA
(DIMENSIONAL MODEL)

 Fact and dimension tables.


 Each dimension table has a one-to-many
relationship to the central fact table.
 The primary key of a dimension table is a
foreign key in the fact table. The primary
key of the fact table is a composite key
consisting of all foreign keys.

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-15


Figure 9-9 Components of a star schema
Fact tables contain factual
or quantitative data

1:N relationship between Dimension tables are denormalized


dimension tables and fact tables to maximize performance

Dimension tables contain descriptions


about the subjects of the business

Excellent for ad-hoc queries, but bad for online transaction processing
Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-16
Figure 9-10 Star schema example

Fact table provides statistics for sales


broken down by product, period and
store dimensions

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-17


Figure 9-11 Star schema with sample data

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-18


19
ISSUES REGARDING STAR SCHEMA
 Dimension table keys should be surrogate (non-
intelligent and non-business related), because:
 Keys may change over time
 Length/format consistency
 Granularity of fact table: level of detail (in time,
location, product grouping, etc.) for each record in the
fact table
 Transactional grain – finest level
 Aggregated grain – more summarized
 Finer grain ➔ more dimension tables, more rows in
fact table
 Duration of the database – how much history should
be kept?
Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-19
VARIATIONS OF THE STAR SCHEMA
 Multiple Facts Tables
 Can improve performance
 Often used to store facts for different combinations
of dimensions
 Conformed dimensions

 Factless Facts Tables


 No nonkey data, but foreign keys for associated
dimensions
 Used for:
 Tracking events
 Inventory coverage

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-20


Figure 9-13 Conformed dimensions

Two fact tables ➔ two (connected) start schemas.

Conformed
dimension
Associated with
multiple fact
tables

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-21


Figure 9-14a Factless fact table showing occurrence of
an event
No data in fact
table, just keys
associating
dimension records

Fact table forms an


n-ary relationship
between
dimensions

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-22


NORMALIZATION ISSUES IN
DIMENSION TABLES
 Includeall information for a
dimension table in a single de-
normalized table
 Normalize the dimension into a
snowflake schema

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-23


STATE OF THE ART

 Previously, organizations had to build lots of


infrastructure for data warehousing.
 Today, cloud computing technology has
amazingly reduced the cost & efforts for
building data warehouse for businesses.
 Data warehouses and their tools are moving
from physical data centers to cloud-based data
warehouses.

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-24


CLOUD BASED SOLUTIONS
 Amazon Redshift
 Cloud based RDBMS with SQL Clients and Business
Intelligence (BI) tools via ODBC and JDBC connections.
 Microsoft Azure
 Azure cloud platform provides more than 200 products and
cloud services such as Data Analytics, Virtual Computing,
Storage, Virtual Network, Internet Traffic Manager, Web
Sites, Media Services, Mobile Services, Integration, etc.
 Google BigQuery
 BigQuery is a serverless data warehouse that allows
scalable analysis over petabytes of data with ANSI SQL and
ML capabilities.
Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-25
BI TOOLS

 Sisense

 Microsoft Power BI

 Clear analytics

Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-26


Chapter 9 Copyright © 2016 Pearson Education, Inc. 9-27

You might also like