0% found this document useful (0 votes)
142 views566 pages

DBMS Merged

The document provides an introduction to databases, including: - A database is a collection of related data managed by a database management system (DBMS). - Traditional file-based systems had limitations like data duplication and separation that a database approach aimed to address. - The ANSI-SPARC architecture defines three levels of abstraction: external, conceptual, and internal to provide independence between levels.

Uploaded by

FIZA SAIF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views566 pages

DBMS Merged

The document provides an introduction to databases, including: - A database is a collection of related data managed by a database management system (DBMS). - Traditional file-based systems had limitations like data duplication and separation that a database approach aimed to address. - The ANSI-SPARC architecture defines three levels of abstraction: external, conceptual, and internal to provide independence between levels.

Uploaded by

FIZA SAIF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 566

INTRODUCTION TO DATABASES

A database is a collection of related data and a


database management system (DBMS) to be the
software that manages and control access to the
database. A database application is simply a
program that interacts with the database at some
point in its execution.
EXAMPLES OF DATABASE SYSTEMS

➢ Purchases from the supermarket


➢ Purchases using your credit card
➢ Booking a vacation with a travel agent
➢ Using the local library
➢ Renting a video
➢ Using the Internet
➢ Studying at University
TRADITIONAL FILE BASED SYSTEMS

File-based Approach
 A collection of application programs that perform services for the end-users such as the
production of reports. Each program defines and manages its own data.

File
 A file is simply a collection of records, which contains logically related data.
EXAMPLE OF FILE BASED APPROACH
LIMITATIONS OF THE FILE-BASED APPROACH

 Separation and Isolation of data


 Duplication of data
 Data dependence
 Incompatible File Formats
 Fixed Queries/ Proliferation of application programs
DATABASE APPROACH

Arose because:
 Definition of data was embedded in application programs, rather than being stored
separately and independently.
 No control over access and manipulation of data beyond that imposed by
application programs.
Result:
 the database and Database Management System (DBMS).
Database
A shared collection of logically related data (and a
description of this data), designed to meet the information
needs of an organization.

 Shared collection – can be used simultaneously by many departments and users.


 Logically related data- comprises entities, attributes, and relationships of an organization’s
information.
 Description of the data – System catalog (metadata) provides description of data to enable
program–data independence.
DATABASE MANAGEMENT SYSTEM (DBMS)
 A software system that enables users to define, create, maintain, and control access to the
database.
 A DBMS provides the following facilities:
 Data definition language (DDL)
 Permits specification of data types, structures and any data constraints.
 All specifications are stored in the database.
 Data manipulation language (DML)
 General enquiry facility (query language) of the data.

 Controlled access to database may include:


 a security system
 an integrity system
 a concurrency control system
 a recovery control system
 a user-accessible catalog.
(DATABASE) APPLICATION PROGRAM
 (Database) application program: a computer program that interacts with database by issuing an
appropriate request (SQL statement) to the DBMS.
VIEWS
 It allows each user to have his or her own view of the database.
 A view is essentially some subset of the database.

Benefits of Views:
 Reduce complexity
 Provide a level of security
 Provide a mechanism to customize the appearance of the database
 Present a consistent, unchanging picture of the structure of the database, even if the underlying
database is changed
Components of DBMS Environment

 Hardware
It can range from a PC to a network of computers.
 Software
DBMS, operating system, network software (if necessary) and also the application programs.
 Data
Used by the organization and a description of this data called the schema.
 Procedures
Instructions and rules that should be applied to the design and use of the database and DBMS.
 People
Includes database designers, DBAs, application programmers, and end-users.
Roles in the Database Environment
 Data Administrator (DA)
 Database Administrator (DBA)
 Database Designers (Logical and Physical)
 Application Programmers
 End Users (naive and sophisticated)
ADVANTAGES OF DBMS
 Control of data redundancy
 Data consistency
 More information from the same amount of data
 Sharing of data
 Improved data integrity
 Improved security
 Enforcement of standards
 Economy of scale
 Balance conflicting requirements
 Improved data accessibility and responsiveness
 Increased productivity
 Improved maintenance through data independence
 Increased concurrency
 Improved backup and recovery services
THE THREE-LEVEL ANSI-SPARC ARCHITECTURE
The levels form a three-level architecture comprising an external, a
conceptual, and an internal level, as depicted in Figure
OBJECTIVES OF THREE LEVEL ARCHITECTURE
 Each user should be able to access the same data, but have a different customized view of the
data. Each user should be able to change the way he or she views the data, and this change
should not affect other users.
 Users should not have to deal directly with physical database storage details, such as indexing or
hashing .In other words, a user’s interaction with the database should be independent of storage
considerations.
 The Database Administrator (DBA) should be able to change the database storage structures
without affecting the users’ views.
 The internal structure of the database should be unaffected by changes to the physical aspects
of storage, such as the changeover to a new storage device.
 The DBA should be able to change the conceptual structure of the database without affecting all
users.
EXTERNAL LEVEL
The users’ view of the database. This level describes that part of the level database that is relevant
to each user.

CONCEPTUAL LEVEL
The community view of the database. This level describes what data level is stored in the database
and the relationships among the data.
The conceptual level represents:
 all entities, their attributes, and their relationships
 the constraints on the data
 semantic information about the data
 security and integrity information.
INTERNAL LEVEL
The physical representation of the database on the computer. This level describes how the data is
stored in the database. The internal level is concerned with such things as:
 storage space allocation for data and indexes
 record descriptions for storage (with stored sizes for data items)
 record placement
 data compression and data encryption techniques
Difference between Three Levels of ANSI
APARC Architecture
DATA INDEPENDENCE
LOGICAL DATA INDEPENDENCE
Logical data independence refers to the immunity of the external independence
schemas to changes in the conceptual schema.

PHYSICAL DATA INDEPENDENCE


Physical data independence refers to the immunity of the conceptual
independence schema to changes in the internal schema.
DATABASE LANGUAGES
The Data Definition Language (DDL)
A language that allows the DBA or user to describe and name the entities,
attributes, and relationships required for the application, together with any
associated integrity and security constraints.

The Data Manipulation Language (DML)


A language that provides a set of operations to support the basic data
manipulation operations on the data held in the database. Data manipulation
operations usually include the following:
 insertion of new data into the database
 modification of data stored in the database
 retrieval of data contained in the database
 deletion of data from the database
DMLs are distinguished by their underlying retrieval constructs. We can
distinguish between two types of DML: procedural and non-procedural.

PROCEDURAL DMLS
A language that allows the user to tell the system what data is needed and
exactly how to retrieve the data.

NON-PROCEDURAL DMLS
A language that allows the user to state what data is needed rather than how it is
to be retrieved.
Fourth-Generation Languages (4GLs)
Fourth generation languages encompass:
 presentation languages, such as query languages and report generators;
 speciality languages, such as spreadsheets and database languages;
 application generators that define, insert, update, and retrieve data from the
database to build applications;
 very high-level languages that are used to generate application code.
 SQL and QBE, mentioned above, are examples of 4GLs.
 We now briefly discuss some of the other types of 4GL.
▪ Forms generators
▪ Report generators
▪ Graphics generators
▪ Application generators
Data Models and Conceptual Modeling
DATA MODEL:
An integrated collection of concepts for describing and manipulating data,
relationships between data, and constraints on the data in an organization.
Data model comprises:
(1) a structural part, consisting of a set of rules according to which
databases can be constructed
(2) a manipulative part, defining the types of operation that are allowed on
the data (this includes the operations that are used for updating or
retrieving data from the database and for changing the structure of the
database)
(3) possibly a set of integrity constraints, which ensures that the data is
accurate.
DATA MODEL
 Purpose
 To represent data in an understandable way.
 To reflect the ANSI-SPARC architecture discussed in lecture 3, we can identify
three related data models:
 an external data model, to represent each user’s view of the organization,
sometimes called the Universe of Discourse (UoD);
 a conceptual data model, to represent the logical (or community) view that is
DBMS independent;
 an internal data model, to represent the conceptual schema in such a way that it
can be understood by the DBMS.
 Categories of data model includes:
 object-based
 record-based
 physical
Object-Based Data Models
 Object-based data models use concepts such as entities, attributes, and
relationships.
 An entity is a distinct object (a person, place, thing, concept, event) in the
organization that is to be represented in the database.
 An attribute is a property that describes some aspect of the object that we wish to
record.
 a relationship is an association between entities.
 Some of the more common types of object-based data model are:
 Entity–Relationship(ER)
 Semantic
 Functional
 Object-Oriented
Record-Based Data Models
 In a record-based model, the database consists of a
number of fixed-format records possibly of differing
types. Each record type defines a fixed number of fields,
each typically of a fixed length.
 There are three principal types of record-based logical
data model:
 the relational data model,
 the network data model, and
 the hierarchical data model.
Relational data model
 The relational data model is based on the concept of mathematical relations.
 In the relational model, data and relationships are represented as tables,
each of which has a number of columns with a unique name. Figure 4.1 is a
sample instance of a relational schema for part of the DreamHome case study,
showing branch and staff details.

Figure 4.1:A sample instance of a


relational schema
Network data model
 In the network model, data is represented as collections of records, and
relationships are represented by sets.
 Compared with the relational model, relationships are explicitly modeled by
the sets, which become pointers in the implementation. The records are
organized as generalized graph structures with records appearing as nodes
(also called segments) and sets as edges in the graph.
 Figure 4.2 illustrates an instance of a network schema for the same data set
presented in Figure 4.1. The most popular network DBMS is Computer
Associates’ IDMS/ R.

Figure 4.2: A sample instance of a network schema.


Hierarchical data model
 The hierarchical model is a restricted type of network
model.
 Again, data is represented as collections of records and
relationships are represented by sets. However, the
hierarchical model allows a node to have only one parent.
A hierarchical model can be represented as a tree graph,
with records appearing as nodes (also called segments)
and sets as edges.
 Figure 4.3 illustrates an instance of a hierarchical schema
for the same data set presented in Figure 4.1. The main
hierarchical DBMS is IBM’s IMS, although IMS also provides
non-hierarchical features.
Figure 4.3: A sample instance of a hierarchical schema.
Physical Data Models
 Physical data models describe how data is stored in the
computer, representing information such as record
structures, record orderings, and access paths.
 There are not as many physical data models as logical
data models, the most common ones being the unifying
model and the frame memory.
Conceptual Modeling
 Conceptual schema is the core of a system supporting all
users views.
 Should be complete and accurate representation of an
organization’s data requirements.
 Conceptual modeling is process of developing a model of
information use that is independent of implementation
details.
 Result is conceptual data modeling.
Functions of DBMS
(1) Data storage, retrieval, and update
 A DBMS must furnish users with the ability to store, retrieve, and update data in the
database.
(2) A user-accessible catalog
 A DBMS must furnish a catalog in which descriptions of data items are stored and which is
accessible to users.
 A key feature of the ANSI-SPARC architecture is the recognition of an integrated system catalog to
hold data about the schemas, users, applications, and so on. The catalog is expected to be
accessible to users as well as to the DBMS. A system catalog, or data dictionary, is a repository of
information describing the data in the database: it is, the ‘data about the data’ or metadata.
The amount of information and the way the information is used vary with the DBMS. Typically,
the system catalog stores:
 names, types, and sizes of data items;
 names of relationships;
 integrity constraints on the data;
 names of authorized users who have access to the data;
 the data items that each user can access and the types of access allowed; for example, insert, update,
delete, or read access;
 external, conceptual, and internal schemas and the mappings between the schema
 usage statistics, such as the frequencies of transactions and counts on the number of accesses made to
objects in the database.
Functions of DBMS
 Some benefits of a system catalog are:
 Information about data can be collected and stored centrally. This helps to
maintain control over the data as a resource.
 The meaning of data can be defined, which will help other users understand the
purpose of the data.
 Communication is simplified, since exact meanings are stored. The system catalog
may also identify the user or users who own or access the data.
 Redundancy and inconsistencies can be identified more easily since the data is
centralized.
 Changes to the database can be recorded.
 The impact of a change can be determined before it is implemented, since the
system catalog records each data item, all its relationships, and all its users.
 Security can be enforced.
 Integrity can be ensured.
 Audit information can be provided.
Functions of DBMS
(3) Transaction support
A DBMS must furnish a mechanism which will ensure either that all the updates
corresponding to a given transaction are made or that none of them is made.
(4) Concurrency control services
A DBMS must furnish a mechanism to ensure that the database is updated
correctly when multiple users are updating the database concurrently.

Figure 4.4: The lost update problem


Functions of DBMS
(5) Recovery services
A DBMS must furnish a mechanism for recovering the database in the event
that the database is damaged in any way.
(6) Authorization services
A DBMS must furnish a mechanism to ensure that only authorized users can access
the database.
(7) Support for data communication
A DBMS must be capable of integrating with communication software.
(8) Integrity services
A DBMS must furnish a means to ensure that both the data in the database and
changes to the data follow certain rules.
(9) Services to promote data independence
A DBMS must include facilities to support the independence of programs from the
actual structure of the database.
(10) Utility services
A DBMS should provide a set of utility services.
MULTI USER DBMS ARCHITECTURES
 Common architectures that are used to
implement multi-user database management
systems are as follows:
 Teleprocessing
 File-server
 Client–server
Teleprocessing
 Traditional architecture.
 Single mainframe with a number of terminals attached.
 Trend is now downsizing.

Figure 5.1: Teleprocessing Topology


File-Server Architecture
 File-server is connected to several workstations across a network.
 Database resides on file-server.
 DBMS and applications run on each workstation.

Figure 5.2: File-sever architecture


File-Server Architecture
 Example: Consider a user request that requires the names of staff who work
in the branch at 163 Main St. We can express this request in SQL:

 As the file-server has no knowledge of SQL, the DBMS has to request the files
corresponding to the Branch and Staff relations from the file-server, rather
than just the staff names that satisfy the query.
 The file-server architecture, therefore, has three main disadvantages:
 There is a large amount of network traffic.
 A full copy of the DBMS is required on each workstation.
 Concurrency, recovery, and integrity control are more complex because there can
be multiple DBMSs accessing the same files.
Traditional Two-Tier Client–Server Architecture
 Client (tier 1) manages user interface and runs applications.
 Server (tier 2) holds database and DBMS.

Figure 5.3: Client-server architecture


Figure 5.4: Alternative client–
server topologies: (a) single client,
single server; (b) multiple clients,
single server; (c) multiple clients,
multiple servers.
 Data-intensive business applications consist of four major components: the
database, the transaction logic, the business and data application logic, and
the user interface.
 The traditional two-tier client–server architecture provides a very basic
separation of these components.
 The client (tier 1) is primarily responsible for the presentation of data to the
user, and the server (tier 2) is primarily responsible for supplying data
services to the client, as illustrated in Figure 5.5

Figure 5.5: The traditional two-tier client–server architecture


 There are many advantages to this type of architecture.
For example:
 wider access to existing databases
 Increased performance
 Possible reduction in hardware cost
 Reduction in communication cost
 Increased consistency
 Itmaps on to open systems architecture quite
naturally.
Three-Tier Client–Server Architecture
 Client side presented two problems that prevented true scalability:
 A ‘fat’ client, requiring considerable resources on the client’s computer to run
effectively. This includes disk space, RAM, and CPU power.
 A significant client-side administration overhead.
 By 1995, three layers proposed, each potentially running on a different
platform.
 The user interface layer, which runs on the end-user’s computer (the client).
 The business logic and data processing layer. This middle tier runs on a server and
is often called the application server.
 A DBMS, which stores the data required by the middle tier. This tier may run on a
separate server called the database server.
 As illustrated in Figure 5.6 the client is now responsible only for the application’s
user interface and perhaps performing some simple logic processing, such as
input validation, thereby providing a ‘thin’ client. The core business logic of the
application now resides in its own layer, physically connected to the client and
database server over a local area network (LAN) or wide area network (WAN).
One application server is designed to serve multiple clients.

Figure 5.6:The three-tier


architecture
 Advantages:
 ‘Thin’ client requiring less expensive hardware
 Application maintenance centralized
 easier to modify or replace one tier without affecting others
 Separating business logic from database functions makes it easier
to implement load balancing.
 Maps quite naturally to web environment.
n-Tier Client-Server (e.g. 4-Tier)
 The three-tier architecture can be expanded to n tiers, with
additional tiers providing more flexibility and scalability.
 Applications servers host API to expose business logic and business
processes for use by other applications.

Figure 5.7:
Four-tier
architecture
with the middle
tier split into a
Web server and
application
server
Middleware
 Middleware is a generic term used to describe software that
mediates with other software and allows for communication
between disparate applications in a heterogeneous system.
 The need for middleware arises when distributed systems become
too complex to manage efficiently without a common interface.
 Six main types of middleware are as follows:
 Asynchronous Remote Procedure Call (RPC)
 Synchronous RPC
 Publish/subscribe
 Message-oriented middleware (MOM)
 Object-request broker (ORB)
 SQL-oriented data access
Transaction Processing Monitors
 TP monitor is a program that controls data transfer between
clients and servers in order to provide a consistent
environment, particularly for online transaction processing
(OLTP).
Figure 5.8:
Transaction
Processing Monitor
as middletierof
3-tierclient-server
TP Monitors provide significant advantages, including:

 Transaction routing
 Managing distributed transactions
 Load balancing
 Funneling
 Increased reliability
Web Services andService-Oriented Architectures

 Web service is a software system designed to support interoperable


machine-to-web service machine interaction over a network.
 Web services share business logic, data, and processes through
a programmatic interface across a network.
 Developers can add the Web service to a Web page (or an
executable program) to offer specific functionality to users.
 Examples of Web services include:
 Microsoft Bing Maps and Google Maps
 Amazon Simple Storage Service
 Geonames
 DOTS
 Xignite
Web Services and Service-Oriented Architectures

➢ Web services approach uses accepted technologies


and standards, such as:
➢ XML (extensible Markup Language).
➢ SOAP (Simple Object Access Protocol) is a communication protocol for
exchanging structured information over the Internet and uses a
message format based on XML. It is both platform and language-
independent.
➢ WSDL (Web Services Description Language) protocol, again based on
XML, is used to describe and locate a Web service.
➢ UDDI (Universal Discovery, Description, and Integration) protocol is a
platform independent, XML-based registry for businesses to list
themselves on the Internet.
3

Web Services and Service- Oriented Architectures

Figure 6.1: Relationship between WSDL, UDDI, and SOAP


4

Service-Oriented Architectures (SOA)


 A business-centric software architecture for building applications that
implement business processes as sets of services published at a granularity
relevant to the service consumer. Services can be invoked, published, and
discovered, and are abstracted away from the implementation using a single
standards-based form of interface.
 The following are a set of common SOA principles that provide a unique design
approach for building Web services for SOA:
 Loose coupling
 Reusability
 Contract
 Abstraction
 Compatibility
 Autonomy
 Stateless
 Discoverability
5

Distributed DBMSs
➢ A distributed database is a logically interrelated collection of
shared data (and a description of this data), physically
distributed over a computer network.
➢ A distributed DBMS is the software system that permits the
management of the distributed database and makes the
distribution transparent to users.
➢ A DDBMS consists of a single logical database split into a number of
fragments.
➢ Each fragment is stored on one or more computers (replicas) under
the control of a separate DBMS, with the computers connected by a
network.
➢ Each site is capable of independently processing user requests that
require access to local data (that is, each site has some degree of local
autonomy) and is also capable of processing data stored on other
computers in the network.
Figure 6.2:
Distributed database
management system
Distributed Processing
 A centralized database that can be accessed over a computer network.

Figure 6.3: Distributed


Processing
Data Warehousing
➢ A data warehouse was deemed the solution to meet the requirements
of a system capable of supporting decision making, receiving data
from multiple operational data sources.
➢ A data held in the data warehouse is described as being subject-
oriented, integrated, time-variant and nonvolatile.

Figure 6.5: The typical


architecture of a data
warehouse
Cloud Computing
 The National Institute of Standards and
Technology (NIST) provided a definition.

 Defined as “A model for enabling


ubiquitous, convenient, on-demand
network access to a shared pool of
configurable computing resources (e.g.
networks, servers, storage, applications,
and services) that can be rapidly
provisioned and released with minimal
management effort or service provider
interaction”.
Cloud Computing – Key Characteristics
 On-demand self-service
 Consumers can obtain, configure and deploy
cloud services without help from provider.

 Broad network access


 Accessible from anywhere, from any standardized
platform (e.g. desktop computers, laptops, mobile
devices).
 Resource pooling
 Provider’s computing resources are pooled to serve
multiple consumers, with different physical and
virtual resources dynamically assigned and
reassigned according to consumer demand.
Examples of resources include storage, processing,
memory, and network bandwidth.
2
Cloud Computing – Key Characteristics (contd..)
 Rapid elasticity
 Provider’s capacity caters for customer’s spikes in
demand and reduces risk of outages and service
interruptions. Capacity can be automated to scale
rapidly based on demand.
 Measured service
 Provider uses a metering capability to measure
usage of service (e.g. storage, processing,
bandwidth, and active user accounts).

3
Cloud Computing – Service Models
 Software as a Service (SaaS):
 Software and data hosted on cloud. Accessed
through using thin client interface (e.g. web
browser). Consumer may be offered limited user
specific application configuration settings.
 Examples include Salesforce.com sales management
applications, NetSuite’s integrated business
management software, Google’s Gmail and
Cornerstone OnDemand.

4
Cloud Computing – Service Models
 Platform as a Service (PaaS)
 Allows creation of web applications without
buying/maintaining the software and underlying
infrastructure. Provider manages the infrastructure
including network, servers, OS and storage, while
customer controls deployment of applications and
possibly configuration.
 Examples include Salesforce.com’s Force.com,
Google’s App Engine, and Microsoft’s Azure.

5
Cloud Computing – Service Models
 Infrastructure as a Service (IaaS)
 Provider’s offer servers, storage, network and
operating systems – typically a platform
virtualization environment – to consumers as an
on-demand service, in a single bundle and billed
according to usage.
 A popular use of IaaS is in hosting websites.
Examples Amazon’s Elastic Compute Cloud (EC2),
Rackspace and GoGrid.
Cloud Computing – Comparison of
Services Models
Four main deployment models
for the cloud
 Public cloud
 Private cloud
 Community cloud
 Hybrid cloud
Benefits of Cloud Computing
 Cost-Reduction: Avoid up-front capital expenditure.
 Scalability/Agility: Organisations set up resources on an
as-needs basis.
 Improved Security: Providers can devote expertise &
resources to security; not affordable by customer.
 Improved Reliability: Providers can devote expertise &
resources on reliability of systems; not affordable by
customer.
 Access to new technologies: Through use of provider’s
systems, customers may access latest technology.
 Faster development: Provider’s platforms can provide
many of the core services to accelerate development
cycle.

9
Benefits of CloudComputing
 Large scale prototyping/load testing: Providers
have the resources to enable this.
 More flexible working practices: Staff can access
files using mobile devices.
 Increased competitiveness: Allows organizations to
focus on their core competencies rather than their IT
infrastructures.

10
Risks of Cloud Computing
 Network Dependency: Power outages, bandwidth
issues and service interruptions.
 System Dependency: Customer’s dependency on
availability and reliability of provider’s systems.
 Cloud Provider Dependency: Provider could became
insolvent or acquired by competitor, resulting in
the service suddenly terminating.
 Lack of control: Customers unable to deploy
technical or organisational measures to safeguard
the data. May result in reduced availability,
integrity, confidentiality, intervenability and
isolation.
 Lack of information on processing transparency:
Insufficient information about a cloud service’s
processing operations poses a risk to data controllers
as well as to data subjects because they might not be
aware of potential threats and risks and thus cannot
take measures they deem appropriate
Cloud-based database solutions

 As a type of Software as a Service (SaaS),


cloud- based database solutions fall into
two basic categories:
 Data as a Service (DaaS) and Database
as a Service (DBaaS).

 Key difference between the two options is


mainly how the data is managed.
Cloud-based database solutions

➢ DBaaS
Offers full database functionality to
application developers.
Provides a management layer that provides
continuous monitoring and configuring of the
database to optimized scaling, high
availability, multi-tenancy (that is, serving
multiple client organizations), and effective
resource allocation in the cloud, thereby
sparing the developer from ongoing database
administration tasks.
Cloud-based database solutions
DaaS:
Services enables data definition in the
cloud and subsequently querying.
Does not implement typical DBMS
interfaces (e.g. SQL) but instead data is
accessed via common APIs.
Enables organization with valuable data to
offer access to others. Examples Urban
Mapping (geography data service), Xignite
(financial data service) and Hoovers
(business data service.)
Components of aDBMS
 A DBMS is partitioned into several software
components (or modules), each of which is
assigned a specific operation. As stated
previously, some of the functions of the DBMS
are supported by the underlying operating
system.
 The DBMS interfaces with other software
components, such as user queries and access
methods (file management techniques for
storing and retrieving data records).
Components of a DBMS
Components of aDBMS (Contd..)
 Query processor is a major DBMS component
that transforms queries into a series of low-
level instructions directed to the database
manager.
 Database manager (DM) interfaces with
user-submitted application programs and
queries. The DM examines the external and
conceptual schemas to determine what
conceptual records are required to satisfy
the request. The DM then places a call to the
file manager to perform the request.
Components of aDBMS (Contd..)
 File manager manipulates the underlying
storage files and manages the allocation of
storage space on disk. It establishes and
maintains the list of structures and indexes
defined in the internal schema.
 DML preprocessor converts DML statements
embedded in an application program into
standard function calls in the host language.
The DML preprocessor must interact with the
query processor to generate the appropriate
code.
Components of aDBMS (Contd..)
 DDL compiler converts DDL statements into
a set of tables containing metadata. These
tables are then stored in the system
catalog while control information is stored
in data file headers.
 Catalog manager manages access to and
maintains the system catalog. The system
catalog is accessed by most DBMS
components.
Components of Database Manager (DM)
Components of theDatabase Manager
 Authorization control to confirm whether the
user has the necessary permission to carry out
the required operation.
 Command processor on confirmation of user
authority, control is passed to the command
processor.
 Integrity checker ensures that requested
operation satisfies all necessary integrity
constraints (e.g. key constraints) for an
operation that changes the database.
Components of theDatabase Manager (Contd..)

 Query optimizer determines an optimal


strategy for the query execution.
 Transaction manager performs the
required processing of operations that it
receives from transactions.
 Scheduler ensures that concurrent
operations on the database proceed without
conflicting with one another. It controls the
relative order in which transaction operations
are executed.
Components of theDatabase Manager (Contd..)
 Recovery manager ensures that the
database remains in a consistent state in
the presence of failures. It is responsible
for transaction commit and abort.
 Buffer manager responsible for the
transfer of data between main memory
and secondary storage, such as disk and
tape.
 The recovery manager and the buffer
manager also known as the data
manager. The buffer manager is also
called the cache manager
The Relational
Model
Relational Model Terminology

◆ A relation is a table with columns and rows.


– Only applies to logical structure of the
database, not the physical structure.

◆ Attribute is a named column of a relation.

◆Domain is the set of allowable values for one or


more attributes.
Relational Model Terminology
◆ Tuple is a row of a relation.

◆ Degree is the number of attributes in a relation.

◆ Cardinality is the number of tuples in a relation.

◆Relational Database is a collection of normalized


relations with distinct relation names.
Instances of Branch and Staff
Relations
Examples of Attribute Domains
Alternative Terminology for
Relational Model
Mathematical Definition of Relation

◆ Consider two sets, D1 & D2, where D1 = {2, 4} and


D2 = {1, 3, 5}.
◆Cartesian product, D1  D2, is set of all ordered
pairs, where first element is member of D1 and
second element is member of D2.
D1  D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4,5)}

◆Alternative way is to find all combinations of


elements with first from D1 and second from D2.
Mathematical Definition of Relation
◆ Any subset of Cartesian product is a relation; e.g.
 R = {(2, 1), (4, 1)}
◆May specify which pairs are in relation using
some condition for selection; e.g.
– second element is 1:
 R = {(x, y) | x  D1, y  D2, and y = 1}
–first element is always twice the second:
 S = {(x, y) | x  D1, y  D2, and x = 2y}
Mathematical Definition of Relation
◆Consider three sets D1, D2, D3 with Cartesian
Product D1  D2  D3; e.g.

D1 = {1, 3} D2 = {2, 4} D3 = {5, 6}


D1  D2  D3 = {(1,2,5), (1,2,6), (1,4,5),
(1,4,6), (3,2,5), (3,2,6), (3,4,5), (3,4,6)}

◆Any subset of these ordered triples is a


relation.
Mathematical Definition of
Relation
◆ Cartesian product of n sets (D1, D2, . . ., Dn) is:

D1  D2  . . .  Dn = {(d1, d2, . . . , dn) | d1 D1, d2 D2, . . .


, dnDn}
usually written as:
n
XDi
i=1

◆Any set of n-tuples from this Cartesian product


is a relation on the n sets.
Database Relations
◆ Relation schema
– Named relation defined by a set of attribute
and domain name pairs.

◆ Relational database schema


– Set of relation schemas, each with a distinct
name.
Properties of Relations
◆Relation name is distinct from all other
relation names in relational schema.

◆Each cell of relation contains exactly one


atomic (single) value.

◆ Each attribute has a distinct name.

◆Values of an attribute are all from the


same domain.
Properties of Relations
◆Each tuple is distinct; there are no duplicate
tuples.

◆ Order of attributes has no significance.

◆ Order of tuples has no significance, theoretically.


Relational Keys
◆ Superkey
– An attribute, or set of attributes, that
uniquely identifies a tuple within a
relation.

◆ Candidate Key
– Superkey (K) such that no proper subset is a
superkey within the relation.
– In each tuple of R, values of K uniquely identify
that tuple (uniqueness).
– No proper subset of K has the uniqueness
property (irreducibility).
Relational Keys
◆ Primary Key
– Candidate key selected to identify tuples uniquely
within relation.

◆ Alternate Keys
– Candidate keys that are not selected to be primary
key.

◆ Foreign Key
– Attribute, or set of attributes, within one relation
that matches candidate key of some (possibly same)
relation.
Integrity Constraints
◆ Null
– Represents value for an attribute that is
currently unknown or not applicable for
tuple.
– Deals with incomplete or exceptional data.
– Represents the absence of a value and is
not the same as zero or spaces, which are
values.
Integrity Constraints
◆ Entity Integrity
– In a base relation, no attribute of a
primary key can be null.

◆ Referential Integrity
– If foreign key exists in a relation, either
foreign key value must match a
candidate key value of some tuple in its
home relation or foreign key value must
be wholly null.
Integrity Constraints
◆ General Constraints
– Additional rules specified by users or
database administrators that define or
constrain some aspect of the enterprise.
Views
◆ Base Relation
– Named relation corresponding to an entity
in conceptual schema, whose tuples are
physically stored in database.

◆ View
– Dynamic result of one or more relational
operations operating on base relations to
produce another relation.
Views
◆A virtual relation that does not necessarily
actually exist in the database but is
produced upon request, at time of request.

◆Contents of a view are defined as a query on


one or more base relations.

◆Views are dynamic, meaning that changes


made to base relations that affect view
attributes are immediately reflected in the
view.
Purpose of Views
◆Provides powerful and flexible security
mechanism by hiding parts of database from
certain users.

◆Permits users to access data in a customized way,


so that same data can be seen by different users
in different ways, at same time.

◆Can simplify complex operations on base


relations.
Updating Views
◆All updates to a base relation should be
immediately reflected in all views that
reference that base relation.

◆If view is updated, underlying base relation


should reflect change.
Updating Views
◆There are restrictions on types of modifications
that can be made through views:
– Updates are allowed if query involves a single
base relation and contains a candidate key of
base relation.
– Updates are not allowed involving multiple base
relations.
– Updates are not allowed involving aggregation
or grouping operations.
Updating Views
◆ Classes of views are defined as:
– theoretically not updateable;
– theoretically updateable;
– partially updateable.
Lecture #10

Relational Algebra and


Relational Calculus
Introduction

 Relational algebra and relational calculus are formal


languages associated with the relational model.
 Informally, relational algebra is a (high-level)
procedural language and relational calculus a non-
procedural language.
 However, formally both are equivalent to one another.
 A language that produces a relation that can be derived
using relational calculus is relationally complete.

www.bzupages.com
Relational Algebra

 Relational algebra operations work on one or more


relations to define another relation without changing
the original relations.

 Both operands and results are relations, so output from


one operation can become input to another operation.

 Allows expressions to be nested, just as in arithmetic.


This property is called closure.

3
www.bzupages.com
Relational Algebra
 Five basic operations in relational algebra:
Selection, Projection, Cartesian product, Union,
and Set Difference.

 These perform most of the data retrieval operations


needed.

 Also have Join, Intersection, and Division


operations, which can be expressed in terms of 5
basic operations.

4
www.bzupages.com
Relational Algebra Operations

5
www.bzupages.com
Relational Algebra Operations

6
www.bzupages.com
Selection (or Restriction)
 predicate (R)
 Works on a single relation R and
defines a relation that contains only
those tuples (rows) of R that satisfy
the specified condition (predicate).

7
www.bzupages.com
Example - Selection (or Restriction)

 List all staff with a salary greater than £10,000.

salary > 10000 (Staff)

8
www.bzupages.com
Projection

 col1, . . . , coln(R)
 Works on a single relation R and defines a
relation that contains a vertical subset of R,
extracting the values of specified attributes
and eliminating duplicates.

9
www.bzupages.com
Example - Projection

 Produce a list of salaries for all staff, showing only staffNo,


fName, lName, and salary details.

staffNo, fName, lName, salary(Staff)

10
www.bzupages.com
Union

 RS
 Union of two relations R and S defines a relation
that contains all the tuples of R, or S, or both R
and S, duplicate tuples being eliminated.
 R and S must be union-compatible.

 If R and S have I and J tuples, respectively, union is


obtained by concatenating them into one relation
with a maximum of (I + J) tuples.

11
www.bzupages.com
Example - Union

 List all cities where there is either a branch office or a property for
rent.

city(Branch)  city(PropertyForRent)

12
www.bzupages.com
Set Difference

 R–S
 Definesa relation consisting of the
tuples that are in relation R, but not in
S.
R and S must be union-compatible.

13
www.bzupages.com
Example - Set Difference
 List all cities where there is a branch office but no properties
for rent.

city(Branch) – city(PropertyForRent)

14
www.bzupages.com
Intersection
 RS
 Defines a relation consisting of the set of all tuples
that are in both R and S.
 R and S must be union-compatible.

 Expressed using basic operations:


R  S = R – (R – S)

15
www.bzupages.com
Example - Intersection

 List all cities where there is both a branch office and at least one
property for rent.

city(Branch)  city(PropertyForRent)

www.bzupages.com
Cartesian product

 RXS
 Defines a relation that is the concatenation of every tuple of relation R
with every tuple of relation S.

17
www.bzupages.com
Example - Cartesian product

 List the names and comments of all clients who have viewed a property
for rent.

(clientNo, fName, lName(Client)) X (clientNo, propertyNo,


comment (Viewing))

18
www.bzupages.com
Example - Cartesian product and
Selection
 Use selection operation to extract those tuples where Client.clientNo =
Viewing.clientNo.
Client.clientNo = Viewing.clientNo((clientNo, fName, lName(Client)) 
(clientNo, propertyNo, comment(Viewing)))

Cartesian product and Selection can be reduced to a single


operation called a Join.
19
www.bzupages.com
Join Operations

 Join is a derivative of Cartesian product.

 Equivalent to performing a Selection, using join


predicate as selection formula, over Cartesian product
of the two operand relations.

 One of the most difficult operations to implement


efficiently in an RDBMS and one reason why RDBMSs
have intrinsic performance problems.

20
www.bzupages.com
Join Operations
 Various forms of join operation
 Theta join
 Equijoin (a particular type of Theta join)
 Natural join
 Outer join
 Semijoin

www.bzupages.com
Theta join (-join)
 R FS

 Defines a relation that contains tuples satisfying the


predicate F from the Cartesian product of R and S.
 The predicate F is of the form R.ai  S.bi where  may
be one of the comparison operators (<, , >, , =, ).

22
www.bzupages.com
Theta join (-join)

 Can rewrite Theta join using basic Selection and


Cartesian product operations.

R FS = F(R  S)

Degree of a Theta join is sum of degrees of the


operand relations R and S. If predicate F contains
only equality (=), the term Equijoin is used.

23
www.bzupages.com
Example - Equijoin

 List the names and comments of all clients who have viewed a property
for rent.
(clientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo (clientNo, propertyNo,
comment(Viewing))

24
www.bzupages.com
Natural join
 R S
 An Equijoin of the two relations R and S over all
common attributes x. One occurrence of each
common attribute is eliminated from the result.

25
www.bzupages.com
Example - Natural join

 List the names and comments of all clients who have viewed a property
for rent.
(clientNo, fName, lName(Client))
(clientNo, propertyNo, comment(Viewing))

26
www.bzupages.com
Outer join

 To display rows in the result that do not have


matching values in the join column, use Outer join.

 R S
 (Left) outer join is join in which tuples from
R that do not have matching values in
common columns of S are also included in
result relation.

27
www.bzupages.com
Example - Left Outer join

 Produce a status report on property viewings.

propertyNo, street, city(PropertyForRent)


Viewing

28
www.bzupages.com
Semijoin

 R FS
 Defines a relation that contains the tuples
of R that participate in the join of R with
S.

Can rewrite Semijoin using Projection and Join:

R FS = A(R F S)

29
www.bzupages.com
Example - Semijoin

 List complete details of all staff who work at the branch in Glasgow.

Staff Staff.branchNo=Branch.branchNo(city=‘Glasgow’(Branch))

30
www.bzupages.com
Lecture #11
Relational algebra and Relational Calculus
(contd…)
Division

R S
 Defines a relation over the attributes C that
consists of set of tuples from R that match
combination of every tuple in S.

 Expressed using basic operations:


T1  C(R)
T2  C((S X T1) – R)
T  T1 – T2

2
Example - Division

 Identify all clients who have viewed all properties with three rooms.

(clientNo, propertyNo(Viewing)) 
(propertyNo(rooms = 3 (PropertyForRent)))

3
Aggregate Operations

 AL(R)
 Applies aggregate function list, AL, to R to define a relation over the
aggregate list.
 AL contains one or more (<aggregate_function>, <attribute>) pairs .
 Main aggregate functions are: COUNT, SUM, AVG, MIN, and MAX.

4
Example – Aggregate Operations

 How many properties cost more than £350 per month to rent?

R(myCount) COUNT propertyNo (σrent > 350


(PropertyForRent))
Grouping Operation

 GAAL(R)
 Groups tuples of R by grouping attributes, GA, and then applies aggregate
function list, AL, to define a new relation.
 AL contains one or more (<aggregate_function>, <attribute>) pairs.
 Resulting relation contains the grouping attributes, GA, along with results
of each of the aggregate functions.
Example – Grouping Operation

 Find the number of staff working in each branch and the sum of
their salaries.

R(branchNo, myCount, mySum)


branchNo  COUNT staffNo, SUM salary (Staff)
Relational Calculus

 Relational calculus query specifies what is to be retrieved rather than


how to retrieve it.
 No description of how to evaluate a query.

 In first-order logic (or predicate calculus), predicate is a truth-valued


function with arguments.

 When we substitute values for the arguments, function yields an


expression, called a proposition, which can be either true or false.

8
Relational Calculus

 If predicate contains a variable (e.g. ‘x is a member of staff’), there


must be a range for x.

 When we substitute some values of this range for x, proposition may


be true; for other values, it may be false.

 When applied to databases, relational calculus has forms: tuple and


domain.

9
Tuple Relational Calculus

 Interested in finding tuples for which a predicate is true. Based on use of


tuple variables.

 Tuple variable is a variable that ‘ranges over’ a named relation: i.e., variable
whose only permitted values are tuples of the relation.

 Specify range of a tuple variable S as the Staff relation as:


Staff(S)
 To find set of all tuples S such that P(S) is true:
{S | P(S)}

10
Tuple Relational Calculus - Example

 To find details of all staff earning more than £10,000:


{S | Staff(S)  S.salary > 10000}

 To find a particular attribute, such as salary, write:

{S.salary | Staff(S)  S.salary > 10000}

11
Tuple Relational Calculus

 Can use two quantifiers to tell how many instances the predicate applies to:
 Existential quantifier $ (‘there exists’)
 Universal quantifier " (‘for all’)

 Tuple variables qualified by " or $ are called bound variables, otherwise


called free variables.

12
Tuple Relational Calculus

 Existential quantifier used in formulae that must be true for at least one
instance, such as:

Staff(S)  ($B)(Branch(B) 
(B.branchNo = S.branchNo)  B.city = ‘London’)

 Means ‘There exists a Branch tuple with same branchNo as the


branchNo of the current Staff tuple, S, and is located in London’.

13
Tuple Relational Calculus

 Universal quantifier is used in statements about every instance, such as:


("B) (B.city  ‘Paris’)

 Means ‘For all Branch tuples, the address is not in Paris’.

 Can also use ~($B) (B.city = ‘Paris’) which means ‘There are no branches with
an address in Paris’.

14
Domain Relational Calculus
 Uses variables that take values from domains instead of tuples of
relations.

 If F(d1, d2, . . . , dn) stands for a formula composed of atoms and d1, d2, . .
. , dn represent domain variables, then:
{d1, d2, . . . , dn | F(d1, d2, . . . , dn)}
is a general domain relational calculus expression.

15
Example - Domain Relational Calculus

 Find the names of all managers who earn more than £25,000.

{fN, lN | ($sN, posn, sex, DOB, sal, bN)


(Staff (sN, fN, lN, posn, sex, DOB, sal, bN) 
posn = ‘Manager’  sal > 25000)}

16
Example - Domain Relational Calculus

 List the staff who manage properties for rent in Glasgow.

{sN, fN, lN, posn, sex, DOB, sal, bN |


($sN1,cty)(Staff(sN,fN,lN,posn,sex,DOB,sal,bN) 
PropertyForRent(pN, st, cty, pc, typ, rms,
rnt, oN, sN1, bN1) 
(sN=sN1)  cty=‘Glasgow’)}

17
Example - Domain Relational Calculus

 List the names of staff who currently do not manage any properties for rent.

{fN, lN | ($sN)
(Staff(sN,fN,lN,posn,sex,DOB,sal,bN) 
(~($sN1) (PropertyForRent(pN, st, cty, pc, typ,
rms, rnt, oN, sN1, bN1)  (sN=sN1))))}

18
Example - Domain Relational Calculus

 List the names of clients who have viewed a property for rent in Glasgow.

{fN, lN | ($cN, cN1, pN, pN1, cty)


(Client(cN, fN, lN,tel, pT, mR) 
Viewing(cN1, pN1, dt, cmt) 
PropertyForRent(pN, st, cty, pc, typ,
rms, rnt,oN, sN, bN) 
(cN = cN1)  (pN = pN1)  cty = ‘Glasgow’)}
Domain Relational Calculus

 When restricted to safe expressions, domain relational


calculus is equivalent to tuple relational calculus restricted to
safe expressions, which is equivalent to relational algebra.

 Means every relational algebra expression has an equivalent


relational calculus expression, and vice versa.

20
Other Languages

 Transform-oriented languages are non-procedural


languages that use relations to transform input data
into required outputs (e.g. SQL).

 Graphical languages provide user with picture of the


structure of the relation. User fills in example of what
is wanted and system returns required data in that
format (e.g. QBE).

21
Other Languages

 4GLs can create complete customized application using limited


set of commands in a user-friendly, often menu-driven
environment.

 Some systems accept a form of natural language, sometimes


called a 5GL, although this development is still at an early
stage.

22
Lecture#12
SQL: Data Manipulation

www.bzupages.com
OBJECTIVES
 Purpose and importance of SQL.
 How to retrieve data from database using SELECT and:

 Use compound WHERE conditions.


 Sort query results using ORDER BY.
 Use aggregate functions.
 Group data using GROUP BY and HAVING.
 Use subqueries.
 Join tables together.
 Perform set operations (UNION, INTERSECT, EXCEPT).

 How to update database using INSERT, UPDATE, and DELETE.

www.bzupages.com
Objectives of SQL

 Ideally, database language should allow user to:


 create the database and relation structures;
 perform insertion, modification, deletion of data from
relations;
 perform simple and complex queries.
 Must perform these tasks with minimal user effort and
command structure/syntax must be easy to learn.
 It must be portable.

www.bzupages.com
Objectives of SQL
 SQL is a transform-oriented language with 2 major
components:

 A DDL for defining database structure.


 A DML for retrieving and updating data.

 Until SQL:1999, SQL did not contain flow of control


commands. These had to be implemented using a
programming or job-control language, or interactively by
the decisions of user.

www.bzupages.com
Objectives of SQL
 SQL is relatively easy to learn:

 it is non-procedural - you specify what information you


require, rather than how to get it;
it is essentially free-format.

 Consists of standard English words:

1) CREATE TABLE Staff (StaffNo VARCHAR(5),


LName VARCHAR(15),
Salary DECIMAL(7,2));
2) INSERT INTO Staff VALUES (‘SG16’, ‘Brown’, 8300);
3) SELECT StaffNo, LName, salary
FROM Staff
WHERE salary > 10000;

www.bzupages.com
Objectives of SQL

 Can be used by range of users including DBAs, management,


application developers, and other types of end users.

 An ISO standard now exists for SQL, making it both the formal
and de facto standard language for relational databases.

www.bzupages.com
Importance of SQL
 SQL has become part of application architectures such as
IBM’s Systems Application Architecture.
 SQL is Federal Information Processing Standard (FIPS) to
which conformance is required for all sales of databases to
American Government.
 SQL is used in other standards and even influences
development of other standards as a definitional tool.
Examples include:

 ISO’s Information Resource Directory System (IRDS) Standard


 Remote Data Access (RDA) Standard.

www.bzupages.com
Writing SQL Commands

 SQL statement consists of reserved words and user-


defined words.

– Reserved words are a fixed part of SQL and must be spelt


exactly as required and cannot be split across lines.
– User-defined words are made up by user and represent
names of various database objects such as relations,
columns, views.

www.bzupages.com
Writing SQL Commands

 Most components of an SQL statement are case insensitive,


except for literal character data.
 More readable with indentation and lineation:
 Each clause should begin on a new line.
 Start of a clause should line up with start of other clauses.
 If clause has several parts, should each appear on a separate line
and be indented under start of clause.

www.bzupages.com
Writing SQL Commands

 Use extended form of BNF notation:

- Upper-case letters represent reserved words.


- Lower-case letters represent user-defined
words.
- | indicates a choice among alternatives.
- Curly braces indicate a required element.
- Square brackets indicate an optional element.
- … indicates optional repetition (0 or more).

10

www.bzupages.com
Literals

 Literals are constants used in SQL


statements.

 All non-numeric literals must be enclosed in


single quotes (e.g. ‘London’).

 All numeric literals must not be enclosed in


quotes (e.g. 650.00).

11

www.bzupages.com
SELECT Statement
SELECT [DISTINCT | ALL]
{* | [columnExpression [AS newName]] [,...] }
FROM TableName [alias] [, ...]
[WHEREcondition]
[GROUP BY columnList] [HAVING condition]
[ORDER BY columnList]
FROM Specifies table(s) to be used.
WHERE Filters rows.
GROUP BY Forms groups of rows with same
column value.
HAVING Filters groups subject to some
condition.
SELECT Specifies which columns are to
appear in output.
ORDER BY Specifies the order of the output.

 Order of the clauses cannot be changed.


 Only SELECT and FROM are mandatory. 12

www.bzupages.com
Example 12.1 All Columns, All Rows

List full details of all staff.

SELECT StaffNo, FName, LName, Address,


position, sex, DOB, salary, branchNo
FROM Staff;

 Can use * as an abbreviation for ‘all columns’:


SELECT *
FROM Staff;

13

www.bzupages.com
Example 12.1 All Columns, All Rows

12.1 12.1

14

www.bzupages.com
Example 12.2 Specific Columns, All Rows
Produce a list of salaries for all staff, showing only staff
number, first and last names, and salary.

SELECT StaffNo, FName, LName, salary


FROM Staff;

12.2 12.2

15

www.bzupages.com
Example 12.3 Use of DISTINCT

List the property numbers of all properties that have been viewed.

SELECT PropertyNo
FROM Viewing;

16

www.bzupages.com
Example 12.3 Use of DISTINCT

 Use DISTINCT to eliminate duplicates:

SELECT DISTINCT propertyNo


FROM Viewing;

17

www.bzupages.com
Example 12.4 Calculated Fields

Produce list of monthly salaries for all staff, showing staff


number, first/last name, and salary.

SELECT StaffNo, FName, LName, salary/12


FROM Staff;

www.bzupages.com
Example 12.4 Calculated Fields

 To name column, use AS clause:

SELECT staffNo, fName, lName, salary/12


AS monthlySalary
FROM Staff;

19

www.bzupages.com
Example 12.5 Comparison Search
Condition
List all staff with a salary greater than 10,000.

SELECT staffNo, fName, lName, position, salary


FROM Staff
WHERE salary > 10000;

12.5 12.5

20

www.bzupages.com
In SQL, the following simple comparison operators are available:
 = equals
 <> is not equal to (ISO standard)
 ! = is not equal to (allowed in some dialects)
 < is less than
 <= is less than or equal to
 > is greater than
 >= is greater than or equal to

More complex predicates can be generated using the logical


operators AND, OR, and NOT, with parentheses (if needed or
desired) to show the order of evaluation. The rules for evaluating
a conditional expression are:
▪ an expression is evaluated left to right;
▪ subexpressions in brackets are evaluated first;
▪ NOTs are evaluated before ANDs and ORs;
▪ ANDs are evaluated before ORs.

The use of parentheses is always recommended in order to


remove any possible ambiguities.

www.bzupages.com
Example 12.6 Compound Comparison
Search Condition
List addresses of all branch offices in London or Glasgow.

SELECT *
FROM Branch
WHERE city = ‘London’ OR city = ‘Glasgow’;

12.6 12.6

22

www.bzupages.com
Example 12.7 Range Search Condition

List all staff with a salary between 20,000 and 30,000.

SELECT staffNo, fName, lName, position, salary


FROM Staff
WHERE salary BETWEEN 20000 AND 30000;

 BETWEEN test includes the endpoints of range.

23

www.bzupages.com
Example 12.7 Range Search Condition

12.7 12.7

24

www.bzupages.com
Example 12.7 Range Search Condition

 Also a negated version NOT BETWEEN.


 BETWEEN does not add much to SQL’s expressive power. Could also write:

SELECT staffNo, fName, lName, position, salary


FROM Staff
WHERE salary>=20000 AND salary <= 30000;

 Useful, though, for a range of values.

25

www.bzupages.com
Example 12.8 Set Membership
List all managers and supervisors.

SELECT staffNo, fName, lName, position


FROM Staff
WHERE position IN (‘Manager’, ‘Supervisor’);

12.8 12.8

26

www.bzupages.com
Example 12.8 Set Membership

 There is a negated version (NOT IN).


 IN does not add much to SQL’s expressive power. Could have expressed this
as:

SELECT staffNo, fName, lName, position


FROM Staff
WHERE position=‘Manager’ OR
position=‘Supervisor’;

 IN is more efficient when set contains many values.

27

www.bzupages.com
Example 12.9 Pattern Matching

Find all owners with the string ‘Glasgow’ in their address.

SELECT ownerNo, fName, lName, address, telNo


FROM PrivateOwner
WHERE address LIKE ‘%Glasgow%’;

12.9 12.9

28

www.bzupages.com
Example 12.9 Pattern Matching

 SQL has two special pattern matching symbols:


 %: sequence of zero or more characters;
 _ (underscore): any single character.
 LIKE ‘%Glasgow%’ means a sequence of characters of any length containing
‘Glasgow’.

29

www.bzupages.com
Example 12.10 NULL Search Condition

List details of all viewings on property PG4 where a comment


has not been supplied.
 There are 2 viewings for property PG4, one with and one
without a comment.
 Have to test for null explicitly using special keyword IS NULL:

SELECT clientNo, viewDate


FROM Viewing
WHERE propertyNo = ‘PG4’ AND
comment IS NULL;

30

www.bzupages.com
Example 12.10 NULL Search Condition

 Negated version (IS NOT NULL) can test for non-null


values.

www.bzupages.com
Lecture#13
SQL: Data Manipulation (contd…)
Example 13.1 Single Column Ordering
List salaries for all staff, arranged in descending order of
salary.

SELECT staffNo, fName, lName, salary


FROM Staff
ORDER BY salary DESC;

13.1 13.1
Example 13.2 Multiple Column Ordering
Produce abbreviated list of properties in order of property type.

SELECT propertyNo, type, rooms, rent


FROM PropertyForRent
ORDER BY type;

13.2 13.2

3
Example 13.2 Multiple Column Ordering

 Four flats in this list - as no minor sort key specified, system arranges
these rows in any order it chooses.
 To arrange in order of rent, specify minor order:

SELECT propertyNo, type, rooms, rent


FROM PropertyForRent
ORDER BY type, rent DESC;

4
Example 13.2 Multiple Column Ordering

13.2
13.1 13.2

5
SELECT Statement - Aggregates

 ISO standard defines five aggregate functions:

COUNT returns number of values in specified column.


SUM returns sum of values in specified column.
AVG returns average of values in specified column.
MIN returns smallest value in specified column.
MAX returns largest value in specified column.

6
SELECT Statement - Aggregates

 Each operates on a single column of a table and returns a single value.


 COUNT, MIN, and MAX apply to numeric and non-numeric fields, but SUM
and AVG may be used on numeric fields only.
 Apart from COUNT(*), each function eliminates nulls first and operates
only on remaining non-null values.

 COUNT(*) counts all rows of a table, regardless of whether nulls or


duplicate values occur.
 Can use DISTINCT before column name to eliminate duplicates.
 DISTINCT has no effect with MIN/MAX, but may have with SUM/AVG.

7
SELECT Statement - Aggregates

 Aggregate functions can be used only in SELECT list and in HAVING


clause.

 If SELECT list includes an aggregate function and there is no GROUP BY


clause, SELECT list cannot reference a column out with an aggregate
function. For example, the following is illegal:

SELECT staffNo, COUNT(salary)


FROM Staff;

8
Example 13.3 Use of COUNT(*)

How many properties cost more than £350 per month to rent?

SELECT COUNT(*) AS myCount


FROM PropertyForRent
WHERE rent > 350;
Example 13.4 Use of COUNT(DISTINCT)

How many different properties viewed in May ‘13?

SELECT COUNT(DISTINCT propertyNo) AS myCount


FROM Viewing
WHERE viewDate BETWEEN ‘1-May-13’
AND ‘31-May-13’;
Example 13.5 Use of COUNT and SUM

Find number of Managers and sum of their salaries.

SELECT COUNT(staffNo) AS myCount,


SUM(salary) AS mySum
FROM Staff
WHERE position = ‘Manager’;
Example 13.6 Use of MIN, MAX, AVG

Find minimum, maximum, and average staff salary.

SELECT MIN(salary) AS myMin,


MAX(salary) AS myMax,
AVG(salary) AS myAvg
FROM Staff;
SELECT Statement - Grouping

 Use GROUP BY clause to get sub-totals.


 SELECT and GROUP BY closely integrated: each item in SELECT list must
be single-valued per group, and SELECT clause may only contain:

 column names
 aggregate functions
 constants

 expression involving combinations of the above.

13
SELECT Statement - Grouping

 All column names in SELECT list must appear in GROUP BY clause unless
name is used only in an aggregate function.
 If WHERE is used with GROUP BY, WHERE is applied first, then groups are
formed from remaining rows satisfying predicate.
 ISO considers two nulls to be equal for purposes of GROUP BY.

14
Example 13.7 Use of GROUP BY
Find number of staff in each branch and their total salaries.

SELECT branchNo,
COUNT(staffNo) AS myCount,
SUM(salary) AS mySum
FROM Staff
GROUP BY branchNo
ORDER BY branchNo;

15
Restricted Groupings – HAVING clause

 HAVING clause is designed for use with GROUP BY to restrict groups that
appear in final result table.
 Similar to WHERE, but WHERE filters individual rows whereas HAVING
filters groups.
 Column names in HAVING clause must also appear in the GROUP BY list or
be contained within an aggregate function.

16
Example 13.8 Use of HAVING

For each branch with more than 1 member of staff, find number of
staff in each branch and sum of their salaries.

SELECT branchNo,
COUNT(staffNo) AS myCount,
SUM(salary) AS mySum
FROM Staff
GROUP BY branchNo
HAVING COUNT(staffNo) > 1
ORDER BY branchNo;

17
Example 13.8 Use of HAVING

18
Subqueries

 Some SQL statements can have a SELECT embedded


within them.
 A subselect can be used in WHERE and HAVING clauses
of an outer SELECT, where it is called a subquery or
nested query.
 Subselects may also appear in INSERT, UPDATE, and
DELETE statements.

19
Example 13.9 Subquery with Equality

List staff who work in branch at ‘163 Main St’.

SELECT staffNo, fName, lName, position


FROM Staff
WHERE branchNo =
(SELECT branchNo
FROM Branch
WHERE street = ‘163 Main St’);

20
Example 13.9 Subquery with Equality
 Inner SELECT finds branch number for branch at ‘163 Main St’ (‘B003’).
 Outer SELECT then retrieves details of all staff who work at this branch.
 Outer SELECT then becomes:

SELECT staffNo, fName, lName, position


FROM Staff
WHERE branchNo = ‘B003’;

13.9 13.9

21
Example 13.10 Subquery with Aggregate

List all staff whose salary is greater than the average salary, and show by
how much.

SELECT staffNo, fName, lName, position,


salary – (SELECT AVG(salary) FROM Staff) As SalDiff
FROM Staff
WHERE salary >
(SELECT AVG(salary)
FROM Staff);

22
Example 13.10 Subquery with Aggregate

 Cannot write ‘WHERE salary > AVG(salary)’


 Instead, use subquery to find average salary (17000), and then use outer
SELECT to find those staff with salary greater than this:

SELECT staffNo, fName, lName, position,


salary – 17000 As SalDiff
FROM Staff
WHERE salary > 17000;

23
Example 13.10 Subquery with Aggregate

13.10 13.10

24
Subquery Rules

 ORDER BY clause may not be used in a subquery (although it may be


used in outermost SELECT).

 Subquery SELECT list must consist of a single column name or


expression, except for subqueries that use EXISTS.

 By default, column names refer to table name in FROM clause of


subquery. Can refer to a table in FROM using an alias.
 When subquery is an operand in a comparison, subquery must appear on
right-hand side.

 A subquery may not be used as an operand in an expression.

25
Example 13.11 Nested subquery: use of IN

List properties handled by staff at ‘163 Main St’.

SELECT propertyNo, street, city, postcode, type, rooms,


rent
FROM PropertyForRent
WHERE staffNo IN
(SELECT staffNo
FROM Staff
WHERE branchNo =
(SELECT branchNo
FROM Branch
WHERE street = ‘163 Main St’));
26
Example 13.11 Nested subquery: use of IN

13.11 13.11

27
ANY and ALL

 ANY and ALL may be used with subqueries that produce a single column
of numbers.
 With ALL, condition will only be true if it is satisfied by all values
produced by subquery.
 With ANY, condition will be true if it is satisfied by any values produced
by subquery.
 If subquery is empty, ALL returns true, ANY returns false.
 SOME may be used in place of ANY.

28
Example 13.12 Use of ANY/SOME

Find staff whose salary is larger than salary of at least one member of
staff at branch B003.

SELECT staffNo, fName, lName, position, salary

FROM Staff
WHERE salary > SOME
(SELECT salary
FROM Staff
WHERE branchNo = ‘B003’);

29
Example 13.12 Use of ANY/SOME
 Inner query produces set {12000, 18000, 24000} and outer query selects
those staff whose salaries are greater than any of the values in this set.

13.12 13.12

30
Example 13.13 Use of ALL

Find staff whose salary is larger than salary of every member of staff at
branch B003.

SELECT staffNo, fName, lName, position, salary

FROM Staff
WHERE salary > ALL
(SELECT salary
FROM Staff
WHERE branchNo = ‘B003’);

31
Example 13.13 Use of ALL

13.13 13.13

32
Lecture#14
SQL: Data Manipulation (contd…)
Multi-Table Queries

 Can use subqueries provided result columns come from same table.

 If result columns come from more than one table must use a join.

 To perform join, include more than one table in FROM clause.

 Use comma as separator and typically include WHERE clause to specify


join column(s).
 Also possible to use an alias for a table named in FROM clause.

 Alias is separated from table name with a space.

 Alias can be used to qualify column names when there is ambiguity.

2
Example 14.1 Simple Join

List names of all clients who have viewed a property along with any
comment supplied.

SELECT c.clientNo, fName, lName,propertyNo, comment


FROM Client c, Viewing v
WHERE c.clientNo = v.clientNo;

3
Example 14.1 Simple Join
 Only those rows from both tables that have identical values in the
clientNo columns (c.clientNo = v.clientNo) are included in result.

 Equivalent to equi-join in relational algebra.

14.1 14.1

4
Alternative JOIN Constructs

 SQL provides alternative ways to specify joins:

FROM Client c JOIN Viewing v ON c.clientNo = v.clientNo


FROM Client JOIN Viewing USING clientNo
FROM Client NATURAL JOIN Viewing

 In each case, FROM replaces original FROM and WHERE. However, first
produces table with two identical clientNo columns.

5
Example 14.2 Sorting a join

For each branch, list numbers and names of staff who manage
properties, and properties they manage.

SELECT s.branchNo, s.staffNo, fName, lName,


propertyNo
FROM Staff s, PropertyForRent p
WHERE s.staffNo = p.staffNo
ORDER BY s.branchNo, s.staffNo, propertyNo;

6
Example 14.2 Sorting a join

14.2 14.2

7
Example 14.3 Three Table Join

For each branch, list staff who manage properties, including city in which branch
is located and properties they manage.

SELECT b.branchNo, b.city, s.staffNo, fName, lName,propertyNo


FROM Branch b, Staff s, PropertyForRent p
WHERE b.branchNo = s.branchNo AND s.staffNo = p.staffNo
ORDER BY b.branchNo, s.staffNo, propertyNo;

8
Example 14.3 Three Table Join
14.3 14.3

 Alternative formulation for FROM and WHERE:

FROM (Branch b JOIN Staff s USING branchNo) AS


bs JOIN PropertyForRent p USING staffNo
9
Example 14.4 Multiple Grouping Columns

Find number of properties handled by each staff member.

SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount


FROM Staff s, PropertyForRent p
WHERE s.staffNo = p.staffNo
GROUP BY s.branchNo, s.staffNo
ORDER BY s.branchNo, s.staffNo;

10
Example 14.4 Multiple Grouping Columns

11
Computing a Join
Procedure for generating results of a join are:

1. Form Cartesian product of the tables named in FROM clause.

2. If there is a WHERE clause, apply the search condition to each row of


the product table, retaining those rows that satisfy the condition.

3. For each remaining row, determine value of each item in SELECT list to
produce a single row in result table.
4. If DISTINCT has been specified, eliminate any duplicate rows from the
result table.

5. If there is an ORDER BY clause, sort result table as required.

 SQL provides special format of SELECT for Cartesian product:

SELECT [DISTINCT | ALL]{* | columnList}


FROM Table1 CROSS JOIN Table2
12
Outer Joins
 If one row of a joined table is
unmatched, row is omitted from result
table.
 Outer join operations retain rows that do
not satisfy the join condition.
 Consider following tables:

13
Outer Joins
 The (inner) join of these two tables:

SELECT b.*, p.*


FROM Branch1 b, PropertyForRent1 p
WHERE b.bCity = p.pCity;

14.5(b)

14
Outer Joins

 Result table has two rows where cities are


same.
 There are no rows corresponding to branches
in Bristol and Aberdeen.
 To include unmatched rows in result table, use
an Outer join.

15
Example 14.6 Left Outer Join

List branches and properties that are in same city along with any
unmatched branches.

SELECT b.*, p.*


FROM Branch1 b LEFT JOIN
PropertyForRent1 p ON b.bCity = p.pCity;

16
Example 14.6 Left Outer Join

 Includes those rows of first (left) table unmatched with rows from second
(right) table.
 Columns from second table are filled with NULLs.

14.6 14.6

17
Example 14.7 Right Outer Join

List branches and properties in same city and any unmatched properties.

SELECT b.*, p.*


FROM Branch1 b RIGHT JOIN
PropertyForRent1 p ON b.bCity = p.pCity;

18
Example 14.7 Right Outer Join
 Right Outer join includes those rows of second (right) table that are
unmatched with rows from first (left) table.
 Columns from first table are filled with NULLs.

14.7 14.7

19
Example 14.8 Full Outer Join

List branches and properties in same city and any unmatched branches
or properties.

SELECT b.*, p.*


FROM Branch1 b FULL JOIN
PropertyForRent1 p ON b.bCity = p.pCity;

20
Example 14.8 Full Outer Join

 Includes rows that are unmatched in both tables.


 Unmatched columns are filled with NULLs.

14.8 14.8

21
EXISTS and NOT EXISTS
 EXISTS and NOT EXISTS are for use only with subqueries.

 Produce a simple true/false result.

 True if and only if there exists at least one row in result table returned
by subquery.

 False if subquery returns an empty result table.

 NOT EXISTS is the opposite of EXISTS.


 As (NOT) EXISTS check only for existence or non-existence of rows in
subquery result table, subquery can contain any number of columns.

 Common for subqueries following (NOT) EXISTS to be of form:

(SELECT * ...)

22
Example 14.9 Query using EXISTS

Find all staff who work in a London branch.

SELECT staffNo, fName, lName, position


FROM Staff s
WHERE EXISTS (SELECT *
FROM Branch b
WHERE s.branchNo = b.branchNo AND city = ‘London’);

23
Example 14.9 Query using EXISTS

14.9 14.9

24
Example 14.9 Query using EXISTS

 Note, search condition s.branchNo = b.branchNo is necessary to


consider correct branch record for each member of staff.
 If omitted, would get all staff records listed out because subquery:

SELECT * FROM Branch WHERE city=‘London’


 would always be true and query would be:

SELECT staffNo, fName, lName, position FROM Staff


WHERE true;

25
Example 14.9 Query using EXISTS

 Could also write this query using join construct:

SELECT staffNo, fName, lName, position


FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND city = ‘London’;

26
Union, Intersect, and Difference (Except)

 Can use normal set operations of Union, Intersection, and


Difference to combine results of two or more queries into a
single result table.
 Union of two tables, A and B, is table containing all rows in
either A or B or both.
 Intersection is table containing all rows common to both A
and B.
 Difference is table containing all rows in A but not in B.
 Two tables must be union compatible.

27
Union, Intersect, and Difference (Except)

 Format of set operator clause in each case is:

op [ALL] [CORRESPONDING [BY {column1 [, ...]}]]


 If CORRESPONDING BY specified, set operation performed on the named
column(s).
 If CORRESPONDING specified but not BY clause, operation performed on
common columns.
 If ALL specified, result can include duplicate rows.

28
Union, Intersect, and Difference (Except)

29
Example 14.10 Use of UNION

List all cities where there is either a branch office or a property.

(SELECT city
FROM Branch
WHERE city IS NOT NULL) UNION
(SELECT city
FROM PropertyForRent
WHERE city IS NOT NULL);

30
Example 14.10 Use of UNION

 Or

(SELECT *
FROM Branch
WHERE city IS NOT NULL)
UNION CORRESPONDING BY city
(SELECT *
FROM PropertyForRent
WHERE city IS NOT NULL);

31
Example 14.10 Use of UNION

 Produces result tables from both queries and merges both tables
together.

14.10 14.10

32
Example 14.11 Use of INTERSECT

List all cities where there is both a branch office and a property.

(SELECT city FROM Branch)


INTERSECT
(SELECT city FROM PropertyForRent);

33
Example 14.11 Use of INTERSECT

 Or

(SELECT * FROM Branch)


INTERSECT CORRESPONDING BY city
(SELECT * FROM PropertyForRent);

14.11 14.11

34
Example 14.11 Use of INTERSECT

 Could rewrite this query without INTERSECT operator:

SELECT b.city
FROM Branch b PropertyForRent p
WHERE b.city = p.city;
 Or:
SELECT DISTINCT city FROM Branch b
WHERE EXISTS
(SELECT * FROM PropertyForRent p
WHERE p.city = b.city);

35
Example 14.12 Use of EXCEPT

List of all cities where there is a branch office but no properties.

(SELECT city FROM Branch)


EXCEPT
(SELECT city FROM PropertyForRent);
 Or
14.12 14.12
(SELECT * FROM Branch)
EXCEPT CORRESPONDING BY city
(SELECT * FROM PropertyForRent);

36
Example 14.13 Use of EXCEPT

 Could rewrite this query without EXCEPT:

SELECT DISTINCT city FROM Branch


WHERE city NOT IN
(SELECT city FROM PropertyForRent);
 Or

SELECT DISTINCT city FROM Branch b


WHERE NOT EXISTS
(SELECT * FROM PropertyForRent p
WHERE p.city = b.city);

37
INSERT

INSERT INTO TableName [ (columnList) ]


VALUES (dataValueList)

 columnList is optional; if omitted, SQL assumes a list of all columns in


their original CREATE TABLE order.
 Any columns omitted must have been declared as NULL when table was
created, unless DEFAULT was specified when creating column.

38
INSERT

 dataValueList must match columnList as follows:


 number of items in each list must be same;
 must be direct correspondence in position of items in two lists;
 data type of each item in dataValueList must be compatible with data type
of corresponding column.

39
Example 14.14 INSERT … VALUES

Insert a new row into Staff table supplying data for all columns.

INSERT INTO Staff


VALUES (‘SG16’, ‘Alan’, ‘Brown’, ‘Assistant’, ‘M’,
Date‘1957-05-25’, 8300, ‘B003’);

40
Example 14.15 INSERT using Defaults

Insert a new row into Staff table supplying data for all mandatory
columns.

INSERT INTO Staff (staffNo, fName, lName,


position, salary, branchNo)
VALUES (‘SG44’, ‘Anne’, ‘Jones’,
‘Assistant’, 8100, ‘B003’);
 Or
INSERT INTO Staff
VALUES (‘SG44’, ‘Anne’, ‘Jones’, ‘Assistant’, NULL,
NULL, 8100, ‘B003’);

41
INSERT … SELECT

 Second form of INSERT allows multiple rows to be copied from one or


more tables to another:

INSERT INTO TableName [ (columnList) ]


SELECT ...

42
Example 14.16 INSERT … SELECT

Assume there is a table StaffPropCount that contains names of staff and


number of properties they manage:

StaffPropCount(staffNo, fName, lName, propCnt)

Populate StaffPropCount using Staff and PropertyForRent tables.

43
Example 14.16 INSERT … SELECT

INSERT INTO StaffPropCount


(SELECT s.staffNo, fName, lName, COUNT(*)
FROM Staff s, PropertyForRent p
WHERE s.staffNo = p.staffNo
GROUP BY s.staffNo, fName, lName)
UNION
(SELECT staffNo, fName, lName, 0
FROM Staff
WHERE staffNo NOT IN
(SELECT DISTINCT staffNo
FROM PropertyForRent));
44
Example 14.16 INSERT … SELECT
14.16 14.16

 If second part of UNION is omitted, excludes those staff who currently do


not manage any properties.

45
UPDATE

UPDATE TableName
SET columnName1 = dataValue1
[, columnName2 = dataValue2...]
[WHERE searchCondition]

 TableName can be name of a base table or an updatable view.


 SET clause specifies names of one or more columns that are to be
updated.

46
UPDATE

 WHERE clause is optional:


 if omitted, named columns are updated for all rows in table;
 if specified, only those rows that satisfy searchCondition are updated.
 New dataValue(s) must be compatible with data type for corresponding
column.

47
Example 14.17/18 UPDATE All Rows

Give all staff a 3% pay increase.

UPDATE Staff
SET salary = salary*1.03;

Give all Managers a 5% pay increase.

UPDATE Staff
SET salary = salary*1.05
WHERE position = ‘Manager’;

48
Example 14.19 UPDATE Multiple Columns

Promote David Ford (staffNo=‘SG14’) to Manager and change his salary


to £18,000.

UPDATE Staff
SET position = ‘Manager’, salary = 18000
WHERE staffNo = ‘SG14’;

49
DELETE

DELETE FROM TableName


[WHERE searchCondition]

 TableName can be name of a base table or an updatable view.


 searchCondition is optional; if omitted, all rows are deleted from table.
This does not delete table. If search_condition is specified, only those
rows that satisfy condition are deleted.

50
Example 14.20/21 DELETE Specific Rows

Delete all viewings that relate to property PG4.

DELETE FROM Viewing


WHERE propertyNo = ‘PG4’;

Delete all records from the Viewing table.

DELETE FROM Viewing;

51
Lecture #15
SQL: Data Definition
ISO SQL Data Types
Integrity Enhancement Feature

 Integrity constraints:

 required data
 domain constraints
 entity integrity
 referential integrity
 general constraints.
Integrity Enhancement Feature

Required Data
position VARCHAR(10) NOT NULL

Domain Constraints
(a) CHECK
sex CHAR NOT NULL
CHECK (sex IN (‘M’, ‘F’))
Integrity Enhancement Feature

(b) CREATE DOMAIN

CREATE DOMAIN DomainName [AS] dataType


[DEFAULT defaultOption]
[CHECK (searchCondition)]
For example:

CREATE DOMAIN SexType AS CHAR


CHECK (VALUE IN (‘M’, ‘F’));
sex SexType NOT NULL
Integrity Enhancement Feature
 searchCondition can involve a table lookup:

CREATE DOMAIN BranchNo AS CHAR(4)


CHECK (VALUE IN (SELECT branchNo
FROM Branch));

 Domains can be removed using DROP DOMAIN:

DROP DOMAIN DomainName


[RESTRICT | CASCADE]
IEF - Entity Integrity

 Primary key of table must contain unique, non-null


value for each row
 ISO standard supports PRIMARY KEY clause in
CREATE and ALTER TABLE statements:
PRIMARY KEY(staffNo)
PRIMARY KEY(clientNo, propertyNo)

 Can only have one PRIMARY KEY clause per table


 Can still ensure uniqueness for alternate keys using
UNIQUE:
UNIQUE(telNo)
IEF - Referential Integrity

 FK is column or set of columns that links each row in child table


containing foreign FK to row of parent table containing matching PK
 Referential integrity means that, if FK contains value, that value must
refer to existing row in parent table
 ISO standard supports definition of FKs with FOREIGN KEY clause in
CREATE and ALTER TABLE:

FOREIGN KEY(branchNo) REFERENCES Branch


IEF - Referential Integrity

 Any INSERT/UPDATE attempting to create FK value in child


table without matching CK value in parent is rejected
 Action taken attempting to update/delete CK value in
parent table with matching rows in child is dependent on
referential action specified using ON UPDATE and ON
DELETE subclauses:

 CASCADE - SET NULL


 SET DEFAULT - NO ACTION
IEF - Referential Integrity

CASCADE: Delete row from parent and delete matching


rows in child, in cascading manner
SET NULL: Delete row from parent and set FK column(s)
in child to NULL
Only valid if FK columns are NOT NULL
SET DEFAULT: Delete row from parent and set each
component of FK in child to specified default
Only valid if DEFAULT specified for FK columns
NO ACTION: Reject delete from parent - Default
IEF - Referential Integrity

FOREIGN KEY (staffNo) REFERENCES Staff ON DELETE SET NULL

FOREIGN KEY (ownerNo) REFERENCES Owner ON UPDATE CASCADE


IEF - General Constraints

 Could use CHECK/UNIQUE in CREATE and ALTER TABLE


 Similar to CHECK clause:

CREATE ASSERTION AssertionName


CHECK (searchCondition)
IEF - General Constraints

CREATE ASSERTION StaffNotHandlingTooMuch


CHECK (NOT EXISTS(SELECT staffNo
FROM PropertyForRent
GROUP BY staffNo
HAVING COUNT(*) > 100))
Data Definition

 SQL DDL allows database objects such as schemas, domains, tables, views, and
indexes to be created and destroyed
 Main SQL DDL statements:

CREATE SCHEMA DROP SCHEMA


CREATE/ALTER DOMAIN DROP DOMAIN
CREATE/ALTER TABLE DROP TABLE
CREATE VIEW DROP VIEW

 Many DBMSs also provide:

CREATE INDEX DROP INDEX


Data Definition

 Relations and other database objects exist in an environment


 Each environment contains one or more catalogs, and each
catalog consists of set of schemas
 Schema is named collection of related database objects
 Objects in schema can be tables, views, domains, assertions
 All have same owner
CREATE SCHEMA

CREATE SCHEMA [Name |


AUTHORIZATION CreatorId ]
DROP SCHEMA Name [RESTRICT | CASCADE ]

 With RESTRICT (default)


 Schema must be empty or operation fails
 With CASCADE
 Operation cascades to drop all objects
associated with schema in order defined above
 If any operations fail → DROP SCHEMA fails
CREATE TABLE

CREATE TABLE TableName


{(colName dataType [NOT NULL] [UNIQUE]
[DEFAULT defaultOption]
[CHECK searchCondition] [,...]}
[PRIMARY KEY (listOfColumns),]
{[UNIQUE (listOfColumns),] […,]}
{[FOREIGN KEY (listOfFKColumns)
REFERENCES ParentTableName [(listOfCKColumns)],
[ON UPDATE referentialAction]
[ON DELETE referentialAction ]] [,…]}
{[CHECK (searchCondition)] [,…] })
CREATE TABLE

 Creates table with one or more columns of


specified dataType
 With NOT NULL
 System rejects any attempt to insert null in
column
 Can specify DEFAULT value for column
 Primary keys should always be specified as NOT
NULL
 FOREIGN KEY clause specifies FK along with
referential action
Example 15.1 - CREATE TABLE
CREATE DOMAIN OwnerNumber AS VARCHAR(5)
CHECK (VALUE IN (SELECT ownerNo FROM
PrivateOwner));
CREATE DOMAIN StaffNumber AS VARCHAR(5)
CHECK (VALUE IN (SELECT staffNo FROM
Staff));
CREATE DOMAIN PNumber AS VARCHAR(5);
CREATE DOMAIN PRooms AS SMALLINT;
CHECK(VALUE BETWEEN 1 AND 15);
CREATE DOMAIN PRent AS DECIMAL(6,2)
CHECK(VALUE BETWEEN 0 AND 9999.99);
Example 15.1 - CREATE TABLE

CREATE TABLE PropertyForRent (


propertyNo PNumber NOT NULL, ….
rooms PRooms NOT NULL DEFAULT 4,
rent PRent NOT NULL, DEFAULT 600,
ownerNo OwnerNumber NOT NULL,
staffNo StaffNumber
Constraint StaffNotHandlingTooMuch ….
branchNo BranchNumber NOT NULL,
PRIMARY KEY (propertyNo),
FOREIGN KEY (staffNo) REFERENCES Staff
ON DELETE SET NULL ON UPDATE CASCADE ….);
ALTER TABLE

 Add new column


 Drop column
 Add new table constraint
 Drop table constraint
 Set default for column
 Drop default for column
Example 15.2(a) - ALTER TABLE

Change Staff table by removing default of ‘Assistant’ for position


column and setting default for sex column to female (‘F’).

ALTER TABLE Staff


ALTER position DROP DEFAULT;
ALTER TABLE Staff
ALTER sex SET DEFAULT ‘F’;
Example 15.2(b) - ALTER TABLE

Remove constraint from PropertyForRent that staff are not allowed to


handle more than 100 properties at a time. Add new column to Client
table.

ALTER TABLE PropertyForRent


DROP CONSTRAINT StaffNotHandlingTooMuch;
ALTER TABLE Client
ADD prefNoRooms PRooms;
DROP TABLE

DROP TABLE TableName [RESTRICT | CASCADE]

e.g. DROP TABLE PropertyForRent;

 Removes named table and all rows


 With RESTRICT
 If any other objects depend for their existence
on continued existence of this table → SQL
does not allow request
 With CASCADE
 SQL drops all dependent objects (and objects
dependent on these objects)
Lecture #16
SQL: Data Definition (contd..)
Views
View
 Dynamic result of one or more relational operations operating on base
relations to produce another relation

 Virtual relation that does not necessarily actually exist in database but is
produced upon request, at time of request
 Contents of a view are defined as query on one or more base relations
 View resolution
 Any operations on view automatically translated into operations on
relations from which derived
 View materialization
 View stored as temporary table
 Maintained as underlying base tables are updated
SQL - CREATE VIEW

CREATE VIEW ViewName [ (newColumnName [,...]) ]


AS subselect
[WITH [CASCADED | LOCAL] CHECK OPTION]

 Can assign name to each column in view


 If list of column names specified
Must have same number of items as number of columns
produced by subselect
 If omitted
Each column takes name of corresponding column in
subselect
SQL - CREATE VIEW

 List must be specified if any ambiguity in column


name
 Subselect known as defining query
 WITH CHECK OPTION
 Ensures if row fails to satisfy WHERE clause of
defining query - not added to underlying base table
 Need SELECT privilege on all tables referenced in
subselect
 Need USAGE privilege on any domains used in
referenced columns
Example 16.1 - Create Horizontal View

Create view so that manager at branch B003 can only see details for staff who work in
his or her office.

CREATE VIEW Manager3Staff


AS SELECT *
FROM Staff
WHERE branchNo = ‘B003’;
Example 16.2 - Create Vertical View
Create view of staff details at branch B003 excluding salaries.

CREATE VIEW Staff3


AS SELECT staffNo, fName, lName, position, sex
FROM Staff
WHERE branchNo = ‘B003’;
Example 16.3 - Grouped and Joined Views
Example 16.3 - Grouped and Joined Views
Create view of staff who manage properties for rent, including branch
number they work at, staff number, and number of properties they
manage.

CREATE VIEW StaffPropCnt (branchNo, staffNo, cnt)


AS SELECT s.branchNo, s.staffNo, COUNT(*)
FROM Staff s, PropertyForRent p
WHERE s.staffNo = p.staffNo
GROUP BY s.branchNo, s.staffNo;
SQL - DROP VIEW

DROP VIEW ViewName [RESTRICT | CASCADE]

 Causes definition of view to be deleted from database


 For example:

DROP VIEW Manager3Staff;


SQL - DROP VIEW

 With CASCADE
 All related dependent objects deleted; i.e. any views defined on
view being dropped.
 With RESTRICT (default)
 If any other objects depend for existence on continued existence
of view being dropped → command rejected
View Resolution

Count number of properties managed by each member at branch B003.

SELECT staffNo, cnt


FROM StaffPropCnt
WHERE branchNo = ‘B003’
ORDER BY staffNo;
View Resolution

(a) View column names in SELECT list are translated into corresponding
column names in defining query:

SELECT s.staffNo As staffNo, COUNT(*) As cnt


(b) View names in FROM replaced with corresponding FROM lists of
defining query:

FROM Staff s, PropertyForRent p


View Resolution

(c) WHERE from user query combined with WHERE of defining query using
AND:

WHERE s.staffNo = p.staffNo AND branchNo =


‘B003’
(d) GROUP BY and HAVING clauses copied from defining query:

GROUP BY s.branchNo, s.staffNo


(e) ORDER BY copied from query with view column name translated into
defining query column name

ORDER BY s.staffNo
View Resolution

(f) Final merged query executed to produce result:

SELECT s.staffNo AS staffNo, COUNT(*) AS cnt


FROM Staff s, PropertyForRent p
WHERE s.staffNo = p.staffNo AND
branchNo = ‘B003’
GROUP BY s.branchNo, s.staffNo
ORDER BY s.staffNo;
Restrictions on Views

SQL imposes several restrictions on creation and use of


views.

(a) If column in view based on aggregate function:


 Column may appear only in SELECT and ORDER BY clauses of
queries that access view
 Column may not be used in WHERE nor be an argument to
aggregate function in any query based on view
Restrictions on Views

 For example, following queries would fail:

SELECT COUNT(cnt)
FROM StaffPropCnt;

SELECT *
FROM StaffPropCnt
WHERE cnt > 2;
Restrictions on Views
(b) Grouped view may never be joined with base
table or view
 For example
 StaffPropCnt view is grouped view, any attempt
to join this view with another table or view fails
View Updatability

 All updates to base table reflected in all views that


encompass base table
 May expect that if view updated then base table(s) will
reflect change
View Updatability

 Consider again view StaffPropCnt


 If we tried to insert record showing that at branch B003, SG5
manages 2 properties:

INSERT INTO StaffPropCnt


VALUES (‘B003’, ‘SG5’, 2);

 Have to insert 2 records into PropertyForRent showing which


properties SG5 manages. However, do not know which
properties they are; i.e. do not know primary keys!
View Updatability

 If change definition of view and replace count with actual


property numbers:

CREATE VIEW StaffPropList (branchNo,


staffNo, propertyNo)
AS SELECT s.branchNo, s.staffNo, p.propertyNo
FROM Staff s, PropertyForRent p
WHERE s.staffNo = p.staffNo;
View Updatability
 Now try to insert the record:

INSERT INTO StaffPropList


VALUES (‘B003’, ‘SG5’, ‘PG19’);

 Still problem - in PropertyForRent all columns except postcode/staffNo


are not allowed nulls
 No way of giving remaining non-null columns values
View Updatability
 ISO specifies that view is updatable if and only if:
- DISTINCT is not specified
- Every element in SELECT list of defining query is column
name and no column appears more than once
- FROM clause specifies only one table
 If source table a view – same conditions apply, excludes any views
based on join, union, intersection or difference
- No nested SELECT referencing outer table
- No GROUP BY or HAVING clause
- Every row added through view must not violate integrity
constraints of base table
Updatable View
For view to be updatable, DBMS must be able to trace any row or
column back to its row or column in source table
WITH CHECK OPTION
 Rows exist in view because they satisfy WHERE condition of defining query
 If row changes and no longer satisfies condition - disappears from view
 New rows appear within view when insert/update on view cause them to
satisfy WHERE condition
 Rows that enter or leave view called migrating rows
 WITH CHECK OPTION generally prohibits row migrating out of view
WITH CHECK OPTION

 LOCAL/CASCADED apply to view hierarchies


 With LOCAL
 Any row insert/update on view and any view directly or
indirectly defined on this view must not cause row to
disappear from view unless row also disappears from
derived view/table
 With CASCADED (default)
 Any row insert/ update on view and on any view
directly or indirectly defined on this view must not
cause row to disappear from the view
Example 16.4 - WITH CHECK OPTION

CREATE VIEW Manager3Staff


AS SELECT *
FROM Staff
WHERE branchNo = ‘B003’
WITH CHECK OPTION;
 Cannot update branch number of row B003 to B005 - would
cause row to migrate from view
 Cannot insert row into view with branch number that does not
equal B003
Example 16.4 - WITH CHECK OPTION
UPDATE Manager3Staff
SET branchNo = ‘B003’
WHERE staffNo = ‘SG37’;
Example 16.4 - WITH CHECK OPTION

 Consider the following:


CREATE VIEW LowSalary
AS SELECT * FROM Staff WHERE salary > 9000;
CREATE VIEW HighSalary
AS SELECT * FROM LowSalary
WHERE salary > 10000
WITH LOCAL CHECK OPTION;
CREATE VIEW Manager3Staff
AS SELECT * FROM HighSalary
WHERE branchNo = ‘B003’;
Example 16.4 - WITH CHECK OPTION
UPDATE Manager3Staff
SET salary = 9500
WHERE staffNo = ‘SG37’;

 This update would fail: although update would cause row to disappear from
HighSalary, row would not disappear from LowSalary
 If update tried to set salary to 8000, update would succeed as row would no
longer be part of LowSalary
Example 16.4 - WITH CHECK OPTION

 If HighSalary had specified WITH CASCADED CHECK OPTION, setting


salary to 9500 or 8000 would be rejected because row would
disappear from HighSalary
 To prevent anomalies like this
 Each view should be created using WITH CASCADED CHECK OPTION
Advantages of Views

 Data independence
 Currency
 Improved security
 Reduced complexity
 Convenience
 Customization
 Data integrity
Disadvantages of Views

 Update restriction
 Structure restriction
 Performance
View Materialization

 View resolution mechanism may be slow, if view accessed frequently


 View materialization stores view as temporary table when view first
queried
 Queries based on materialized view can be faster than recomputing
view each time
 Difficulty in maintaining currency of view while base tables(s) updated
View Maintenance
 View maintenance aims to apply only those changes necessary to keep
view current.
 Consider following view:
CREATE VIEW StaffPropRent(staffNo)
AS SELECT DISTINCT staffNo
FROM PropertyForRent
WHERE branchNo = ‘B003’ AND
rent > 400;

34
View Materialization
 If insert row into PropertyForRent with rent 400
then view would be unchanged
 If insert row for property PG24 at branch B003 with
staffNo = SG19 and rent = 550, then row would
appear in materialized view
 If insert row for property PG54 at branch B003 with
staffNo = SG37 and rent = 450, then no new row
would need to be added to materialized view
 If delete property PG24, row should be deleted from
materialized view
 If delete property PG54, then row for PG37 should
not be deleted (because of existing property PG21)
Granting Privileges to Other Users(GRANT)
 SQL GRANT is a command used to provide access or privileges on the database
objects to other users.
 The format of the GRANT statement is:
GRANT {PrivilegeList | ALL PRIVILEGES}
ON ObjectName
TO {AuthorizationIdList | PUBLIC}
[WITH GRANT OPTION]
 PrivilegeList consists of one or more of the following privileges separated by
commas:
 SELECT
 DELETE
 INSERT [(columnName [, . . . ])]
 UPDATE [(columnName [, . . . ])]
 REFERENCES [(columnName [, . . . ])]
 USAGE
 For convenience, the GRANT statement allows the keyword ALL
PRIVILEGES to be used to grant all privileges to a user instead of
having to specify the six privileges individually.

 The WITH GRANT OPTION clause allows the user(s) in


AuthorizationIdList to pass the privileges they have been given for
the named object on to other users. If these users pass a privilege
on specifying WITH GRANT OPTION, the users receiving the
privilege may in turn grant it to still other users
Example 16.5 GRANT all privileges
 Give the user with authorization identifier Manager full privileges
to the Staff table.

GRANT ALL PRIVILEGES


ON Staff
TO Manager WITH GRANT OPTION;
Example 16.6 GRANT specific privileges
 Give users Personnel and Director the privileges SELECT and
UPDATE on column salary of the Staff table.

GRANT SELECT, UPDATE (salary)


ON Staff
TO Personnel, Director;
Example 16.7 GRANT specific privileges to
PUBLIC
 Give all users the privilege SELECT on the Branch table.

GRANT SELECT
ON Branch
TO PUBLIC;
Revoking Privileges From Users(REVOKE)
 The REVOKE statement is used to take away privileges that were
granted with the GRANT statement. A REVOKE statement can take
away all or some of the privileges that were previously granted to a
user.
 The format of the statement is:
REVOKE [GRANT OPTION FOR] {PrivilegeList | ALL PRIVILEGES}
ON ObjectName
FROM {AuthorizationIdList | PUBLIC} [RESTRICT | CASCADE]
 The keyword ALL PRIVILEGES refers to all the privileges granted to
a user by the user revoking the privileges. The optional GRANT
OPTION FOR clause allows privileges passed on via the WITH GRANT
OPTION of the GRANT statement to be revoked separately from the
privileges themselves.
 The RESTRICT and CASCADE qualifiers operate exactly as in the
DROP TABLE statement.
 Since privileges are required to create certain objects, revoking a
privilege can remove the authority that allowed the object to be
created (such an object is said to be abandoned).
 The REVOKE statement fails if it results in an abandoned object,
such as a view, unless the CASCADE keyword has been specified.
 If CASCADE is specified, an appropriate DROP statement is issued
for any abandoned views, domains, constraints, or assertions
Example 16.8 REVOKE specific privileges from
PUBLIC
 Revoke the privilege SELECT on the Branch table from all users.

REVOKE SELECT
ON Branch
FROM PUBLIC;
Example 16.9 REVOKE specific privileges from
named user
 Revoke all privileges you have given to Director on the Staff table.

REVOKE ALL PRIVILEGES


ON Staff
FROM Director;

 This is equivalent to REVOKE SELECT . . . , as this was the only


privilege that has been given to Director.
Lecture #17
Entity-Relationship modelling

www.bzupages.com
Objectives

 How to use Entity–Relationship (ER) modeling in database design.

 Basic concepts associated with ER model.

 Diagrammatic technique for displaying ER model using Unified Modeling


Language (UML).

 How to identify and resolve problems with ER models called connection


traps.

 How to build an ER model from a requirements specification.

2
www.bzupages.com
ER diagram of Branch user views of
DreamHome

3
www.bzupages.com
Concepts of the ER Model

 Entity types

 Relationship types

 Attributes

4
www.bzupages.com
Entity Type

 Entity type
 Group of objects with same properties, identified by enterprise as
having an independent existence.

 Entity occurrence
 Uniquely identifiable object of an entity type.

5
www.bzupages.com
Examples of Entity Types

6
www.bzupages.com
ER diagram of Staff and Branch
entity types

7
www.bzupages.com
Relationship Types

 Relationship type
 Set of meaningful associations among entity types.

 Relationship occurrence
 Uniquely identifiable association, which includes one occurrence from each
participating entity type.

8
www.bzupages.com
Semantic net of Has relationship type

9
www.bzupages.com
ER diagram of Branch Has Staff
relationship

10
www.bzupages.com
Relationship Types

 Degree of a Relationship
 Number of participating entities in relationship.

 Relationship of degree :
 two is binary
 three is ternary
 four is quaternary.

11
www.bzupages.com
Binary relationship called POwns

12
www.bzupages.com
Ternary relationship called Registers

13
www.bzupages.com
Quaternary relationship called Arranges

14
www.bzupages.com
Relationship Types

 Recursive Relationship
 Relationship type where same entity type participates more than once in
different roles.

 Relationships may be given role names to indicate purpose that each


participating entity type plays in a relationship.

15
www.bzupages.com
Recursive relationship called
Supervises with role names

16
www.bzupages.com
Entities associated through two distinct
relationships with role names

17
www.bzupages.com
Attributes

 Attribute
 Property of an entity or a relationship type.

 Attribute Domain
 Set of allowable values for one or more attributes.

18
www.bzupages.com
Attributes

 Simple Attribute
 Attribute composed of a single component with an
independent existence.

 Composite Attribute
 Attribute composed of multiple components, each with an
independent existence.

19
www.bzupages.com
Attributes

 Single-valued Attribute
 Attribute that holds a single value for each occurrence of
an entity type.

 Multi-valued Attribute
 Attribute that holds multiple values for each occurrence of
an entity type.

20
www.bzupages.com
Attributes

 Derived Attribute
 Attribute that represents a value that is derivable from value of a related
attribute, or set of attributes, not necessarily in the same entity type.

21
www.bzupages.com
Keys

 Candidate Key
 Minimal set of attributes that uniquely identifies each
occurrence of an entity type.

 Primary Key
 Candidate key selected to uniquely identify each occurrence
of an entity type.

 Composite Key
 A candidate key that consists of two or more attributes.

22
www.bzupages.com
ER diagram of Staff and Branch entities
and their attributes

23
www.bzupages.com
Entity Type

 Strong Entity Type


 Entity type that is not existence-dependent on some other
entity type.

 Weak Entity Type


 Entity type that is existence-dependent on some other entity
type.

24
www.bzupages.com
Strong entity type called Client and weak
entity type called Preference

25
www.bzupages.com
Relationship called Advertises with attributes

26
www.bzupages.com
Lecture #18
Entity-Relationship modelling (contd..)
Structural Constraints

 Main type of constraint on relationships is called multiplicity.

 Multiplicity - number (or range) of possible occurrences of an


entity type that may relate to a single occurrence of an associated
entity type through a particular relationship.

 Represents policies (called business rules) established by user or


company.

2
Structural Constraints

 The most common degree for relationships is binary.

 Binary relationships are generally referred to as being:


 one-to-one (1:1)
 one-to-many (1:*)
 many-to-many (*:*)

3
Semantic net of Staff Manages Branch
relationship type

4
Multiplicity of Staff Manages Branch
(1:1) relationship

5
Semantic net of Staff Oversees
PropertyForRent relationship type

6
Multiplicity of Staff Oversees PropertyForRent
(1:*) relationship type

7
Semantic net of Newspaper Advertises
PropertyForRent relationship type

8
Multiplicity of Newspaper Advertises
PropertyForRent (*:*) relationship

9
Structural Constraints

 Multiplicity for Complex Relationships


 Number (or range) of possible occurrences of an
entity type in an n-ary relationship when other (n-1)
values are fixed.

10
Semantic net of ternary Registers relationship
with values for Staff and Branch entities fixed

11
Multiplicity of ternary Registers relationship

12
Summary of multiplicity constraints

13
Structural Constraints

 Multiplicity is made up of two types of restrictions on relationships:


cardinality and participation.

 Cardinality
 Describes maximum number of possible relationship occurrences for an
entity participating in a given relationship type.

 Participation
 Determines whether all or only some entity occurrences participate in a
relationship.

14
Multiplicity as cardinality and participation
constraints

15
Problems with ER Models

 Problems may arise when designing a conceptual data model


called connection traps.

 Often due to a misinterpretation of the meaning of certain


relationships.

 Two main types of connection traps are called fan traps and chasm
traps.

16
Problems with ER Models

 Fan Trap
 Where a model represents a relationship between entity types, but
pathway between certain entity occurrences is ambiguous.

 Chasm Trap
 Where a model suggests the existence of a relationship between
entity types, but pathway does not exist between certain entity
occurrences.

17
An Example of a Fan Trap

18
Semantic Net of ER Model with Fan Trap

 At which branch office does staff number SG37


work?
19
Restructuring ER model to remove Fan Trap

20
Semantic Net of Restructured ER Model
with Fan Trap Removed

 SG37 works at branch B003.

21
An Example of a Chasm Trap

22
Semantic Net of ER Model with Chasm Trap

 At which branch office is property PA14 available?

23
ER Model restructured to remove Chasm Trap

24
Semantic Net of Restructured ER Model
with Chasm Trap Removed

25
Lecture#19
Enhanced Entity-Relationship Modeling
Objectives
 Limitations of basic concepts of the ER model and requirements to
represent more complex applications using additional data modeling
concepts.

 Most useful additional data modeling concepts of Enhanced ER (EER)


model called:
 specialization/generalization;
 aggregation;
 composition.

 A diagrammatic technique for displaying specialization/generalization,


aggregation, and composition in an EER diagram using UML.
Enhanced Entity-Relationship
Model
 Since 1980s there has been an increase in emergence of new
database applications with more demanding requirements.

 Basic concepts of ER modeling are not sufficient to represent


requirements of newer, more complex applications.

 Response is development of additional ‘semantic’ modeling


concepts.
The Enhanced Entity-
Relationship Model
 Semantic concepts are incorporated into the original ER model and called
the Enhanced Entity-Relationship (EER) model.

 Examples of additional concepts of EER model are:


 specialization / generalization;
 aggregation;
 composition.
Specialization / Generalization

 Superclass
 An entity type that includes one or more distinct subgroupings of its
occurrences.

 Subclass
 A distinct subgrouping of occurrences of an entity type.
Specialization / Generalization

 Superclass/subclass relationship is one-


to-one (1:1).
 Superclass may contain overlapping or
distinct subclasses.
 Not
all members of a superclass need be
a member of a subclass.

6
AllStaff Relation Holding Details of all
Staff

7
Specialization / Generalization
 Attribute Inheritance
 An entity in a subclass represents same ‘real world’
object as in superclass, and may possess subclass-
specific attributes, as well as those associated with the
superclass.
Specialization / Generalization
 Specialization
 Process of maximizing differences between members of an
entity by identifying their distinguishing characteristics.

 Generalization
 Process of minimizing differences between entities by
identifying their common characteristics.
Specialization/Generalization of Staff Entity
into Subclasses Representing Job Roles
Specialization/Generalization of Staff Entity into
Job Roles and Contracts of Employment
EER Diagram with Shared Subclass and
Subclass with its own Subclass
Constraints on Specialization / Generalization

 Two constraints that may apply to a


specialization/generalization:
 participation constraints,

 disjoint constraints.

 Participation constraint
 Determines whether every member in superclass
must participate as a member of a subclass.
 May be mandatory or optional.
Constraints on Specialization /
Generalization

 Disjoint constraint
 Describes relationship between members of the
subclasses and indicates whether member of a
superclass can be a member of one, or more
than one, subclass.
 May be disjoint or nondisjoint.
Constraints on Specialization /
Generalization
 There are four categories of constraints of specialization and
generalization:
 mandatory and disjoint;
 optional and disjoint;
 mandatory and nondisjoint;
 optional and nondisjoint.
DreamHome Worked Example - Staff Superclass with
Supervisor and Manager Subclasses
DreamHome Worked Example - Owner Superclass
with PrivateOwner and BusinessOwner Subclasses
DreamHome Worked Example - Person Superclass with
Staff, PrivateOwner, and Client Subclasses
EER Diagram of Branch View of DreamHome with
Specialization/Generalization
Aggregation
 Represents a ‘has-a’ or ‘is-part-of’ relationship
between entity types, where one represents the
‘whole’ and the other ‘the part’.
Examples of Aggregation
Composition

 Specific form of aggregation that represents an


association between entities, where there is a
strong ownership and coincidental lifetime
between the ‘whole’ and the ‘part’.
Example of Composition
Lecture#20
Introduction to Normalization
Objectives
 The purpose of normalization.
 How normalization can be used when designing a relational database.
 The potential problems associated with redundant data in base
relations.
 The concept of functional dependency, which describes the
relationship between attributes.
 The characteristics of functional dependencies used in normalization.
 How to identify functional dependencies for a given relation.
Purpose of Normalization
 Normalization is a technique for producing a set of suitable
relations that support the data requirements of an enterprise.
 Characteristics of a suitable set of relations include:
 the minimal number of attributes necessary to support the data
requirements of the enterprise;
 attributes with a close logical relationship are found in the same
relation;
 minimal redundancy with each attribute represented only once
with the important exception of attributes that form all or part
of foreign keys.
Purpose of Normalization
 The benefits of using a database that has a suitable set of relations is
that the database will be:
 easier for the user to access and maintain the data;
 take up minimal storage space on the computer.
How Normalization Supports Database
Design
Data Redundancy and Update
Anomalies
 Major aim of relational database design is to group attributes into relations
to minimize data redundancy.
Data Redundancy and Update
Anomalies
 Potential benefits for implemented database include:
 Updates to the data stored in the database are
achieved with a minimal number of operations thus
reducing the opportunities for data inconsistencies.
 Reduction in the file storage space required by the
base relations thus minimizing costs.
Data Redundancy and Update Anomalies

 Problems associated with data redundancy are illustrated by


comparing the Staff and Branch relations with the StaffBranch
relation.
Data Redundancy and Update Anomalies

9
Data Redundancy and Update Anomalies
 StaffBranch relation has redundant data; the details of a branch are
repeated for every member of staff.

 In contrast, the branch information appears only once for each branch in
the Branch relation and only the branch number (branchNo) is repeated in
the Staff relation, to represent where each member of staff is located.
Data Redundancy and Update Anomalies

 Relations that contain redundant information may potentially suffer


from update anomalies.
 Types of update anomalies include
 Insertion
 Deletion
 Modification
Update Anomalies

12
Lossless-join and Dependency
Preservation Properties
 Two important properties of decomposition.
 Lossless-join property enables us to find any instance of the
original relation from corresponding instances in the smaller
relations.
 Dependency preservation property enables us to enforce a
constraint on the original relation by enforcing some constraint
on each of the smaller relations.
Functional Dependencies
 Important concept associated with normalization.

 Functional dependency describes relationship between attributes.

 For example, if A and B are attributes of relation R, B is functionally


dependent on A (denoted A → B), if each value of A in R is associated
with exactly one value of B in R.
Characteristics of Functional
Dependencies
 Property of the meaning or semantics of the attributes in a
relation.

 Diagrammatic representation.

 The determinant of a functional dependency refers to the


attribute or group of attributes on the left-hand side of the arrow.
An Example Functional Dependency
Example Functional Dependency
that holds for all Time

 Consider the values shown in staffNo and sName attributes of the Staff
relation.

 Based on sample data, the following functional dependencies appear to


hold.

staffNo → sName
sName → staffNo
Example Functional Dependency that
holds for all Time
 However, the only functional dependency that remains true for all
possible values for the staffNo and sName attributes of the Staff
relation is:

staffNo → sName
Characteristics of Functional Dependencies

 Determinants should have the minimal number of attributes necessary


to maintain the functional dependency with the attribute(s) on the
right hand-side.

 This requirement is called full functional dependency.


Characteristics of Functional Dependencies

 Full functional dependency indicates that if


A and B are attributes of a relation, B is fully
functionally dependent on A, if B is
functionally dependent on A, but not on any
proper subset of A.
Example Full Functional Dependency

 Exists in the Staff relation (see Slide 12).

staffNo, sName → branchNo

 True - each value of (staffNo, sName) is associated with a single


value of branchNo.

 However, branchNo is also functionally dependent on a subset of


(staffNo, sName), namely staffNo. Example above is a partial
dependency.
Characteristics of Functional
Dependencies
 Main characteristics of functional dependencies used in
normalization:
 There is a one-to-one relationship between the attribute(s) on
the left-hand side (determinant) and those on the right-hand
side of a functional dependency.
 Holds for all time.
 The determinant has the minimal number of attributes
necessary to maintain the dependency with the attribute(s)
on the right hand-side.
Transitive Dependencies
 Important to recognize a transitive dependency because its
existence in a relation can potentially cause update anomalies.

 Transitive dependency describes a condition where A, B, and C


are attributes of a relation such that if A → B and B → C, then C
is transitively dependent on A via B (provided that A is not
functionally dependent on B or C).
Example Transitive Dependency
 Consider functional dependencies in the StaffBranch relation
(see Slide 12).

staffNo → sName, position, salary, branchNo, bAddress


branchNo → bAddress

 Transitive dependency, branchNo → bAddress exists on staffNo


via branchNo.
The Process of Normalization
 Formal technique for analyzing a relation based on its primary
key and the functional dependencies between the attributes
of that relation.

 Often executed as a series of steps. Each step corresponds to


a specific normal form, which has known properties.
Identifying Functional Dependencies

 Identifying all functional dependencies between a set of attributes


is relatively simple if the meaning of each attribute and the
relationships between the attributes are well understood.

 This information should be provided by the enterprise in the form


of discussions with users and/or documentation such as the users’
requirements specification.
Identifying Functional Dependencies

 However, if the users are unavailable for consultation and/or the


documentation is incomplete then depending on the database
application it may be necessary for the database designer to use
their common sense and/or experience to provide the missing
information.
Example - Identifying a set of functional
dependencies for the StaffBranch relation

 Examine semantics of attributes in StaffBranch relation. Assume that


position held and branch determine a member of staff’s salary.
Example - Identifying a set of functional dependencies
for the StaffBranch relation

 With sufficient information available, identify the functional


dependencies for the StaffBranch relation as:

staffNo → sName, position, salary, branchNo, bAddress


branchNo → bAddress
bAddress → branchNo
branchNo, position → salary
bAddress, position → salary
Lecture#21
Normalization (contd…)
Objectives
 How to identify functional dependencies for a given relation.
 How functional dependencies identify the primary key for a relation.
 How to undertake the process of normalization.
 How normalization uses functional dependencies to group attributes into relations
that are in a known normal form.
 How to identify the most commonly used normal forms, namely First Normal Form
(1NF), Second Normal Form (2NF), and Third Normal Form (3NF) and Boyce–Codd
normal form (BCNF).
 The problems associated with relations that break the rules of 1NF, 2NF, 3NF, or
BCNF.
 How to represent attributes shown on a form as BCNF relations using normalization.
Example - Using sample data to
identify functional dependencies.
 Consider the data for attributes denoted A, B, C, D, and E in the
Sample relation (see next slide).

 Important to establish that sample data values shown in relation are


representative of all possible values that can be held by attributes A,
B, C, D, and E. Assume true despite the relatively small amount of data
shown in this relation.
Example - Using sample data to
identify functional dependencies.
Example - Using sample data to
identify functional dependencies.
 Function dependencies between attributes A to E in the Sample relation.

A→C (fd1)
C→A (fd2)
B →D (fd3)
A, B → E (fd4)
B, C → E (fd5)
Identifying the Primary Key for a
Relation using Functional Dependencies
 Main purpose of identifying a set of functional dependencies for a
relation is to specify the set of integrity constraints that must hold
on a relation.

 An important integrity constraint to consider first is the


identification of candidate keys, one of which is selected to be the
primary key for the relation.
Example - Identify Primary Key for
Sample Relation
 Sample relation has five functional dependencies, which are as follows:

A→C (fd1)
C→A (fd2)
B →D (fd3)
A, B → E (fd4)
B, C → E (fd5)

 The determinants are A,B,C,(A,B) and (B,C)

 To identify all candidate key(s), identify the attribute (or group of attributes)
that uniquely identifies each tuple in this relation.
The Process of Normalization
 As normalization proceeds, the relations become progressively more
restricted (stronger) in format and also less vulnerable to update
anomalies.
The Process of Normalization
Unnormalized Form (UNF)

 A table that contains one or more repeating groups.

 To create an unnormalized table


 Transform the data from the information source (e.g. form) into table format
with columns and rows.
First Normal Form (1NF)

 A relation in which the intersection of each row and column contains one
and only one value.

11
UNF to 1NF
 Nominate an attribute or group of attributes to act as the key for the
unnormalized table.

 Identify the repeating group(s) in the unnormalized table which repeats for the
key attribute(s).
UNF to 1NF
 Remove the repeating group by
 Entering appropriate data into the empty columns of rows containing the repeating
data (‘flattening’ the table).
 Or by
 Placing the repeating data along with a copy of the original key attribute(s) into a
separate relation.
Second Normal Form (2NF)

 Based on the concept of full functional dependency.

 Full functional dependency indicates that if


 A and B are attributes of a relation,
 B is fully dependent on A if B is functionally dependent on A but not on any proper
subset of A.
Second Normal Form (2NF)

 A relation that is in 1NF and every non-primary-key attribute is fully


functionally dependent on the primary key.
1NF to 2NF

 Identify the primary key for the 1NF relation.

 Identify the functional dependencies in the relation.

 If partial dependencies exist on the primary key remove them by placing


then in a new relation along with a copy of their determinant.
Third Normal Form (3NF)
 Based on the concept of transitive dependency.

 Transitive Dependency is a condition where


 A, B and C are attributes of a relation such that if A → B and B → C,
 then C is transitively dependent on A through B. (Provided that A is not functionally
dependent on B or C).
Third Normal Form (3NF)

 A relation that is in 1NF and 2NF and in which no non-primary-key


attribute is transitively dependent on the primary key.
2NF to 3NF

 Identify the primary key in the 2NF relation.

 Identify functional dependencies in the relation.

 If transitive dependencies exist on the primary key remove them by


placing them in a new relation along with a copy of their determinant.
General Definitions of 2NF and 3NF
 Second normal form (2NF)
 A relation that is in first normal form and every non-primary-key attribute is
fully functionally dependent on any candidate key.

 Third normal form (3NF)


 A relation that is in first and second normal form and in which no non-primary-
key attribute is transitively dependent on any candidate key.
Boyce–Codd Normal Form (BCNF)
 Based on functional dependencies that take into account all candidate keys in a relation,
however BCNF also has additional constraints compared with general definition of 3NF.

 BCNF - A relation is in BCNF if and only if every determinant is a candidate key.


Boyce–Codd normal form (BCNF)
 Difference between 3NF and BCNF is that for a functional dependency A → B, 3NF
allows this dependency in a relation if B is a primary-key attribute and A is not a
candidate key.

 Whereas, BCNF insists that for this dependency to remain in a relation, A must be a
candidate key.

 Every relation in BCNF is also in 3NF. However, relation in 3NF may not be in BCNF.
Boyce–Codd normal form (BCNF)
 Violation of BCNF is quite rare.

 Potential to violate BCNF may occur in a relation that:


 contains two (or more) composite candidate keys;
 the candidate keys overlap (i.e. have at least one attribute in common).
Review of Normalization (UNF to BCNF)

DreamHome
Lecture#22
Normalization (contd…)
Review of Normalization (UNF to BCNF)

DreamHome
Review of Normalization (UNF to BCNF)

1)Repeating Group =
(iDate,iTime,comments,staffNo,sName,carReg)
3)UNF
First Normal Form (1NF)

•(propertyNo, iDate)
• (carReg,iDate,iTime)
• (staffNo, iDate, iTime)
First Normal Form (1NF)

1NF
StaffPropertyInspection
( propertyNo,iDate,iTime,pAddress,comments,staffNo,sName,carReg )5
Second Normal Form (2NF)
Second Normal Form (2NF)
 The functional dependencies (fd1 to fd6) of the StaffPropertyInspection relation are as follows:

fd1 propertyNo, iDate → iTime, comments, staffNo, sName, carReg (Primary key)
fd2 propertyNo → pAddress (Partial dependency)
fd3 staffNo → sName (Transitive dependency)
fd4 staffNo, iDate → carReg
fd5 carReg, iDate, iTime → propertyNo, pAddress, comments, staffNo, sName (Candidate key)
fd6 staffNo, iDate, iTime → propertyNo, pAddress, comments (Candidate key)
Review of Normalization (UNF to BCNF)
1NF
StaffPropertyInspection
( propertyNo,iDate,iTime,pAddress,comments,staffNo,sName,carReg )

2NF
Property
( propertyNo,pAddress )

PropertyInspection
( propertyNo,iDate,iTime,comments,staffNo,sName,carReg )
THIRD Normal Form (3NF)

The functional dependencies within the Property and PropertyInspection


relations are as follows:
Property Relation
 fd2 propertyNo → pAddress
PropertyInspection Relation
 fd1 propertyNo, iDate → iTime, comments, staffNo, sName, carReg
 fd3 staffNo → sName
 fd4 staffNo, iDate → carReg
 fd5′ carReg, iDate, iTime → propertyNo, comments, staffNo, sName
 fd6′ staffNo, iDate, iTime → propertyNo, comments
THIRD Normal Form (3NF)
2NF
Property
( propertyNo,pAddress )

PropertyInspection
( propertyNo,iDate,iTime,comments,staffNo,sName,carReg )

3NF
Property ( propertyNo,pAddress )

Staff ( staffNo, sName )

PropertyInspect
( propertyNo,iDate,iTime,comments,staffNo,carReg )
Boyce-Codd Normal Form (BCNF)
The functional dependencies for the Property, Staff, and PropertyInspect
relations are as follows:
Property Relation
 fd2 propertyNo → pAddress
Staff Relation
 fd3 staffNo → sName
PropertyInspect Relation
 fd1′ propertyNo, iDate → iTime, comments, staffNo, carReg
 fd4 staffNo, iDate → carReg
 fd5′ carReg, iDate, iTime → propertyNo, comments, staffNo
 fd6′ staffNo, iDate, iTime → propertyNo, comments
Boyce-Codd Normal Form (BCNF)

BCNF
Property ( propertyNo,pAddress )

Staff ( staffNo,sName )

StaffCar ( staffNo, iDate, carReg)

Inspection
( propertyNo,iDate,iTime,comments,staffNo )
Review of Normalization (UNF to BCNF)
Lecture#23
Database Security
Objectives
 The scope of database security.

 Why database security is a serious concern for an organization.

 The type of threats that can affect a database system.

 How to protect a computer system using computer-based


controls.
Database Security
 Data is a valuable resource that must be strictly controlled
and managed, as with any corporate resource.

 Part or all of the corporate data may have strategic


importance and therefore needs to be kept secure and
confidential.
 Mechanisms that protect the database against intentional or
accidental threats.

 Security considerations do not only apply to the data held in


a database. Breaches of security may affect other parts of
the system, which may in turn affect the database.
Database Security
 Involves measures to avoid:
 Theft and fraud
 Loss of confidentiality (secrecy)
 Loss of privacy
 Loss of integrity
 Loss of availability
Database Security
 Threat
 Any situation or event, whether intentional or unintentional,
that will adversely affect a system and consequently an
organization.
Examples of Threats

6
Summary of Threats to Computer
Systems

7
Typical Multi-user Computer
Environment
Countermeasures – Computer-Based
Controls
 Concerned with physical controls to administrative
procedures and includes:
 Authorization
 Access controls
 Views
 Backup and recovery
 Integrity
 Encryption
 RAID technology
Countermeasures – Computer-Based
Controls

 Authorization
 The granting of a right or privilege, which enables a subject to
legitimately have access to a system or a system’s object.
 Authentication is a mechanism that determines whether a user
is, who he or she claims to be.
Countermeasures – Computer-Based
Controls
 Access control
 Based on the granting and revoking of privileges.
 A privilege allows a user to create or access (that is read, write,
or modify) some database object (such as a relation, view, and
index) or to run certain DBMS utilities.
 Privileges are granted to users to accomplish the tasks required
for their jobs.
Countermeasures – Computer-Based
Controls
 Most DBMS provide an approach called Discretionary Access
Control (DAC).

 SQL standard supports DAC through the GRANT and REVOKE


commands.

 The GRANT command gives privileges to users, and the REVOKE


command takes away privileges.
Countermeasures – Computer-Based
Controls
 DAC while effective has certain weaknesses. In particular an
unauthorized user can trick an authorized user into disclosing
sensitive data.

 An additional approach is required called Mandatory Access


Control (MAC).
Countermeasures – Computer-Based
Controls
 MAC based on system-wide policies that cannot be changed by
individual users.
 Each database object is assigned a security class and each user is
assigned a clearance for a security class, and rules are imposed on
reading and writing of database objects by users.
 MAC determines whether a user can read or write an object based
on rules that involve the security level of the object and the
clearance of the user. These rules ensure that sensitive data can
never be ‘passed on’ to another user without the necessary
clearance.

 The SQL standard does not include support for MAC.


Popular Model for MAC called
Bell-LaPudula
 Popular Model for MAC called Bell-LaPudula
 It is described in terms of:
 objects (such as relations, views, tuples, and attributes),
 subjects (such as users and programs),
 security classes, and clearances.

 Classification has four values {U, C, S, TS}


 U = unclassified
 C = confit idential
 S = secret
 TS = top secret
 Classifications are ordered: TS > S > C > U
 A > B means that class A data has a higher security level than class B data.
Bell-LaPudula Model
 The Bell–LaPadula model imposes two restrictions on all
reads and writes of database objects:
 Simple Security Property: Subject S is allowed to read
object O only if class (S) >= class (O). For example, a user
with TS clearance can read a relation with C clearance, but
a user with C clearance cannot read a relation with TS
classification.
 *_Property: Subject S is allowed to write object O only if
class (S) <=class (O). For example, a user with S clearance
can only write objects with S or TS classification.
Multilevel Relations and
Polyinstantiation
Countermeasures – Computer-Based
Controls

 View
 Is the dynamic result of one or more relational operations operating
on the base relations to produce another relation.
 A view is a virtual relation that does not actually exist in the
database, but is produced upon request by a particular user, at the
time of request.
Countermeasures – Computer-Based
Controls
 Backup
 Process of periodically taking a copy of the database and log file (and
possibly programs) to offline storage media.

 Journaling
 Process of keeping and maintaining a log file (or journal) of all changes
made to database to enable effective recovery in event of failure.
Lecture#24
Database Security(contd…)
Objectives

 Countermeasures: Integrity, encryption and RAID technology.


 Approaches for securing a DBMS on the Web.

2
Countermeasures – Computer-Based
Controls
 Integrity
 Prevents data from becoming invalid, and hence giving misleading or
incorrect results.

 Encryption
 The encoding of the data by a special algorithm that renders the data
unreadable by any program without the decryption key.
 To transmit data securely over insecure networks requires the use of a
cryptosystem, which includes:
 An an encryption key to encrypt the data (plaintext);
 An encryption algorithm that, with the encryption key, transforms the
plaintext into ciphertext;
 A decryption key to decrypt the ciphertext;
 A decryption algorithm that, with the decryption key, transforms the
ciphertext back into plaintext.
One technique, called symmetric encryption, uses the same key for both
encryption and decryption and relies on safe communication lines for
exchanging the key.

However, most users do not have access to a secure communication line and, to
be really secure, the keys need to be as long as the message .

However, most working systems are based on user keys shorter than the
message.

One scheme used for encryption is the Data Encryption Standard (DES), which is
a standard encryption algorithm developed by IBM.
RAID (Redundant Array of Independent
Disks) Technology
 Hardware that the DBMS is running on must be fault-tolerant, meaning that the
DBMS should continue to operate even if one of the hardware components fails.

 Suggests having redundant components that can be seamlessly integrated into


the working system whenever there is one or more component failures.
RAID (Redundant Array of Independent
Disks) Technology
 The main hardware components that should be fault-tolerant include disk
drives, disk controllers, CPU, power supplies, and cooling fans.

 Disk drives are the most vulnerable components with the shortest times
between failure of any of the hardware components.
RAID (Redundant Array of Independent
Disks) Technology
 One solution is to provide a large disk array comprising an arrangement of
several independent disks that are organized to improve reliability and at
the same time increase performance.
RAID (Redundant Array of Independent
Disks) Technology
 Performance is increased through data striping: the data is segmented
into equal-size partitions (the striping unit), which are transparently
distributed across multiple disks.

 Reliability is improved through storing redundant information across the


disks using a parity scheme or an error-correcting scheme.
RAID (Redundant Array of Independent
Disks) Technology
 There are a number of different disk configurations called RAID levels.
 RAID 0 Nonredundant
 RAID 1 Mirrored
 RAID 0+1 Nonredundant and Mirrored
 RAID 2 Memory-Style Error-Correcting Codes
 RAID 3 Bit-Interleaved Parity
 RAID 4 Block-Interleaved Parity
 RAID 5 Block-Interleaved Distributed Parity
 RAID 6 P+Q Redundancy
RAID 0 and RAID 1
RAID 2 and RAID 3
RAID 4 and RAID 5
DBMSs and Web Security
 Internet communication relies on TCP/IP as the underlying protocol.
However, TCP/IP and HTTP were not designed with security in mind.
Without special software, all Internet traffic travels ‘in the clear’ and
anyone who monitors traffic can read it.
DBMSs and Web Security
 Must ensure while transmitting information over the Internet that:
 inaccessible to anyone but sender and receiver
(privacy);
 not changed during transmission (integrity);
 receiver can be sure it came from sender (authenticity);
 sender can be sure receiver is genuine (non-
fabrication);
 sender cannot deny he or she sent it (non-repudiation).
DBMSs and Web Security
 Measures include:
 Proxy servers
 Firewalls
 Message digest algorithms and digital signatures
 Digital certificates
 Kerberos
 Secure sockets layer (SSL) and Secure HTTP (S-HTTP)
 Secure Electronic Transactions (SET) and Secure
Transaction Technology (SST)
 Java security
 ActiveX security
Lecture 25
Transaction Management

1
Objectives

 Function and importance of transactions.


 Properties of transactions.
 Concurrency Control
 Meaning of serializability.
 How locking can ensure serializability.
 Deadlock and how it can be resolved.
 How timestamping can ensure serializability.
 Optimistic concurrency control.
 Granularity of locking.

2
Chapter 19 - Objectives

 Recovery Control
 Some causes of database failure.
 Purpose of transaction log file.
 Purpose of checkpointing.
 How to recover following database failure.
 Alternative models for long duration transactions.

3
Transaction Support

Transaction
Action, or series of actions, carried out by user
or application, which accesses or changes
contents of database.
 Logical unit of work on the database.
 Application program is series of transactions with
non-database processing in between.
 Transforms database from one consistent state to
another, although consistency may be violated
during transaction.

4
Example Transaction

5
Transaction Support

 Can have one of two outcomes:

 Success - transaction commits and database


reaches a new consistent state.
 Failure - transaction aborts, and database must
be restored to consistent state before it started.
 Such a transaction is rolled back or undone.
 Committed transaction cannot be aborted.
 Aborted transaction that is rolled back can be restarted later.

6
State Transition Diagram for
Transaction

7
Properties of Transactions

Four basic (ACID) properties of a transaction are:

Atomicity ‘All or nothing’ property.


Consistency Must transform database from one
consistent state to another.
Isolation Partial effects of incomplete transactions
should not be visible to other transactions.
Durability Effects of a committed transaction are
permanent and must not be lost because of later
failure.

8
DBMS Transaction Subsystem

9
Concurrency Control

Process of managing simultaneous operations on the database


without having them interfere with one another.

 Prevents interference when two or more users are accessing


database simultaneously and at least one is updating data.
 Although two transactions may be correct in themselves, interleaving
of operations may produce an incorrect result.

10
Need for Concurrency Control

 Three examples of potential problems caused by concurrency:


 Lost update problem.
 Uncommitted dependency problem.
 Inconsistent analysis problem.

11
Lecture 26
Transaction Management(contd..)

1
Lost Update Problem

 Successfully completed update is overridden by another user.


 T1 withdrawing £10 from an account with balx, initially £100.
 T2 depositing £100 into same account.

 Serially, final balance would be £190.

2
Lost Update Problem

 Loss of T2’s update avoided by preventing T1 from reading balx


until after update.

3
Uncommitted Dependency Problem

 Occurs when one transaction can see intermediate results of


another transaction before it has committed.
 T4 updates balx to £200 but it aborts, so balx should be back at
original value of £100.
 T3 has read new value of balx (£200) and uses value as basis of £10
reduction, giving a new balance of £190, instead of £90.

4
Uncommitted Dependency Problem

 Problem avoided by preventing T3 from reading balx until after T4


commits or aborts.

5
Inconsistent Analysis Problem

 Occurs when transaction reads several values but second transaction


updates some of them during execution of first.
 Sometimes referred to as dirty read or unrepeatable read.
 T6 is totaling balances of account x (£100), account y (£50), and
account z (£25).
 Meantime, T5 has transferred £10 from balx to balz, so T6 now has
wrong result (£10 too high).

6
Inconsistent Analysis Problem

 Problem avoided by preventing T6 from reading


balx and balz until after T5 completed updates.
7
Serializability

 Objective of a concurrency control protocol is to schedule


transactions in such a way as to avoid any interference.
 Could run transactions serially, but this limits degree of concurrency
or parallelism in system.
 Serializability identifies those executions of transactions guaranteed
to ensure consistency.

8
Serializability
Schedule
Sequence of reads/writes by set of concurrent transactions.

Serial Schedule
Schedule where operations of each transaction are executed
consecutively without any interleaved operations from other transactions.

 No guarantee that results of all serial executions of a given set of


transactions will be identical.

9
Nonserial Schedule

 Schedule where operations from set of concurrent transactions are


interleaved.

 Objective of serializability is to find nonserial schedules that allow


transactions to execute concurrently without interfering with one
another.

 In other words, want to find nonserial schedules that are equivalent


to some serial schedule. Such a schedule is called serializable.

10
Serializability

 In serializability, ordering of read/writes is important:


(a) If two transactions only read a data item, they do not conflict and
order is not important.
(b) If two transactions either read or write completely separate data
items, they do not conflict and order is not important.
(c) If one transaction writes a data item and another reads or writes
same data item, order of execution is important.

11
Example of Conflict Serializability

12
Serializability

 Conflict serializable schedule orders any conflicting operations in


same way as some serial execution.
 Under constrained write rule (transaction updates data item based
on its old value, which is first read), use precedence graph to test for
serializability.

13
Precedence Graph

 Create:
 node for each transaction;
 a directed edge Ti → Tj, if Tj reads the value of an item written by TI;
 a directed edge Ti → Tj, if Tj writes a value into an item after it has been
read by Ti.
 If precedence graph contains cycle schedule is not conflict
serializable.

14
Example - Non-conflict serializable schedule

 T9 is transferring £100 from one account with balance balx to another


account with balance baly.
 T10 is increasing balance of these two accounts by 10%.
 Precedence graph has a cycle and so is not serializable.

15
Example - Non-conflict serializable schedule

16
View Serializability
 Offers less stringent definition of schedule equivalence than conflict
serializability.
 Two schedules S1 and S2 are view equivalent if:
 For each data item x, if Ti reads initial value
of x in S1, Ti must also read initial value of x
in S2.
 For each read on x by Ti in S1, if value read
by x is written by Tj, Ti must also read value
of x produced by Tj in S2.
 For each data item x, if last write on x
performed by Ti in S1, same transaction must
perform final write on x in S2.

17
View Serializability

 Schedule is view serializable if it is view equivalent to a serial


schedule.
 Every conflict serializable schedule is view serializable, although
converse is not true.
 It can be shown that any view serializable schedule that is not
conflict serializable contains one or more blind writes.
 In general, testing whether schedule is serializable is NP-complete.

18
Example - View Serializable schedule

19
Lecture 27
Transaction Management(contd..)

1
Recoverability

 Serializability identifies schedules that maintain database


consistency, assuming no transaction fails.
 Could also examine recoverability of transactions within schedule.
 If transaction fails, atomicity requires effects of transaction to be
undone.
 Durability states that once transaction commits, its changes cannot
be undone (without running another, compensating, transaction).

2
Recoverable Schedule

A schedule where, for each pair of transactions Ti and Tj, if Tj reads


a data item previously written by Ti, then the commit operation of Ti
precedes the commit operation of Tj.

3
Concurrency Control Techniques

 Two basic concurrency control techniques:


 Locking,
 Timestamping.
 Both are conservative approaches: delay transactions in case they
conflict with other transactions.
 Optimistic methods assume conflict is rare and only check for
conflicts at commit.

4
Locking

Transaction uses locks to deny access to other transactions and so


prevent incorrect updates.

 Most widely used approach to ensure serializability.


 Generally, a transaction must claim a shared (read) or exclusive
(write) lock on a data item before read or write.
 Lock prevents another transaction from modifying item or even
reading it, in the case of a write lock.

5
Locking - Basic Rules

 If transaction has shared lock on item, can read but not update item.
 If transaction has exclusive lock on item, can both read and update
item.
 Reads cannot conflict, so more than one transaction can hold shared
locks simultaneously on same item.
 Exclusive lock gives transaction exclusive access to that item.

6
Locking - Basic Rules

 Some systems allow transaction to upgrade read lock to an exclusive


lock, or downgrade exclusive lock to a shared lock.

7
Example - Incorrect Locking Schedule

 For two transactions above, a valid schedule using these rules is:

S = {write_lock(T9, balx), read(T9, balx), write(T9,


balx), unlock(T9, balx), write_lock(T10, balx),
read(T10, balx), write(T10, balx), unlock(T10,
balx), write_lock(T10, baly), read(T10, baly),
write(T10, baly), unlock(T10, baly), commit(T10),
write_lock(T9, baly), read(T9, baly), write(T9,
baly), unlock(T9, baly), commit(T9) }

8
Example - Incorrect Locking Schedule

 If at start, balx = 100, baly = 400, result should be:

 balx = 220, baly = 330, if T9 executes before T10, or


 balx = 210, baly = 340, if T10 executes before T9.

 However, result gives balx = 220 and baly = 340.

 S is not a serializable schedule.

9
Example - Incorrect Locking Schedule

 Problem is that transactions release locks too soon, resulting in loss


of total isolation and atomicity.
 To guarantee serializability, need an additional protocol concerning
the positioning of lock and unlock operations in every transaction.

10
Two-Phase Locking (2PL)

Transaction follows 2PL protocol if all locking operations precede


first unlock operation in the transaction.

 Two phases for transaction:


 Growing phase - acquires all locks but cannot release any locks.
 Shrinking phase - releases locks but cannot acquire any new locks.

11
Preventing Lost Update Problem using 2PL

12
Preventing Uncommitted Dependency Problem
using 2PL

13
Preventing Inconsistent Analysis Problem using
2PL

14
Cascading Rollback

 If every transaction in a schedule follows 2PL, schedule is


serializable.
 However, problems can occur with interpretation of when locks can
be released.

15
Cascading Rollback

16
Cascading Rollback

 Transactions conform to 2PL.


 T14 aborts.
 Since T15 is dependent on T14, T15 must also be rolled back. Since T16
is dependent on T15, it too must be rolled back.
 This is called cascading rollback.
 To prevent this with 2PL, leave release of all locks until end of
transaction.

17

You might also like