100% found this document useful (4 votes)
3K views1,003 pages

Fundamentals of Database Systems 4e - Elmasri

Uploaded by

taha ana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
3K views1,003 pages

Fundamentals of Database Systems 4e - Elmasri

Uploaded by

taha ana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 1003

Copyright © 2004 Pearson Education, Inc.

Chapter 1

Introduction and
Conceptual Modeling

Copyright © 2004 Pearson Education, Inc.


Types of Databases and
Database Applications
 Numeric and Textual Databases
 Multimedia Databases
 Geographic Information Systems (GIS)
 Data Warehouses
 Real-time and Active Databases
A number of these databases and applications are
described later in the book (see Chapters 24,28,29)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-3
Basic Definitions
 Database: A collection of related data.
 Data: Known facts that can be recorded and have an
implicit meaning.
 Mini-world: Some part of the real world about which
data is stored in a database. For example, student
grades and transcripts at a university.
 Database Management System (DBMS): A software
package/ system to facilitate the creation and
maintenance of a computerized database.
 Database System: The DBMS software together with
the data itself. Sometimes, the applications are also
included. Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-4
Typical DBMS Functionality
 Define a database : in terms of data types,
structures and constraints
 Construct or Load the Database on a
secondary storage medium
 Manipulating the database : querying,
generating reports, insertions, deletions and
modifications to its content
 Concurrent Processing and Sharing by a set of
users and programs – yet, keeping all data
valid and consistent
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-5
Typical DBMS Functionality
Other features:
– Protection or Security measures to
prevent unauthorized access
– “Active” processing to take internal
actions on data
– Presentation and Visualization of data

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-6
Example of a Database
(with a Conceptual Data Model)
 Mini-world for the example: Part of a UNIVERSITY
environment.
 Some mini-world entities:
– STUDENTs
– COURSEs
– SECTIONs (of COURSEs)
– (academic) DEPARTMENTs
– INSTRUCTORs
Note: The above could be expressed in the ENTITY-
RELATIONSHIP data model.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-7
Example of a Database
(with a Conceptual Data Model)
 Some mini-world relationships:
– SECTIONs are of specific COURSEs
– STUDENTs take SECTIONs
– COURSEs have prerequisite COURSEs
– INSTRUCTORs teach SECTIONs
– COURSEs are offered by DEPARTMENTs
– STUDENTs major in DEPARTMENTs

Note: The above could be expressed in the ENTITY-


RELATIONSHIP data model.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-8
Main Characteristics of the
Database Approach
 Self-describing nature of a database system: A
DBMS catalog stores the description of the
database. The description is called meta-data).
This allows the DBMS software to work with
different databases.
 Insulation between programs and data: Called
program-data independence. Allows changing
data storage structures and operations without
having to change the DBMS access programs.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-9
Main Characteristics of the
Database Approach
 Data Abstraction: A data model is used to
hide storage details and present the users with
a conceptual view of the database.
 Support of multiple views of the data: Each
user may see a different view of the
database, which describes only the data of
interest to that user.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-10
Main Characteristics of the
Database Approach
 Sharing of data and multiuser transaction
processing : allowing a set of concurrent users to
retrieve and to update the database. Concurrency
control within the DBMS guarantees that each
transaction is correctly executed or completely
aborted. OLTP (Online Transaction Processing) is
a major part of database applications.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-11
Database Users
Users may be divided into those who actually
use and control the content (called “Actors
on the Scene”) and those who enable the
database to be developed and the DBMS
software to be designed and implemented
(called “Workers Behind the Scene”).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-12
Database Users
Actors on the scene
– Database administrators: responsible for authorizing
access to the database, for co-ordinating and
monitoring its use, acquiring software, and hardware
resources, controlling its use and monitoring efficiency
of operations.
– Database Designers: responsible to define the content,
the structure, the constraints, and functions or
transactions against the database. They must
communicate with the end-users and understand their
needs.
– End-users: they use the data for queries, reports and
some of them actually update the database content.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-13
Categories of End-users
 Casual : access database occasionally when
needed
 Naïve or Parametric : they make up a large
section of the end-user population. They use
previously well-defined functions in the form
of “canned transactions” against the
database. Examples are bank-tellers or
reservation clerks who do this activity for an
entire shift of operations.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-14
Categories of End-users
 Sophisticated : these include business analysts,
scientists, engineers, others thoroughly familiar
with the system capabilities. Many use tools in
the form of software packages that work
closely with the stored database.
 Stand-alone : mostly maintain personal
databases using ready-to-use packaged
applications. An example is a tax program user
that creates his or her own internal database.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-15
Advantages of Using the
Database Approach
 Controlling redundancy in data storage and in
development and maintenence efforts.
 Sharing of data among multiple users.
 Restricting unauthorized access to data.
 Providing persistent storage for program
Objects (in Object-oriented DBMS’s – see Chs.
20-22)
 Providing Storage Structures for efficient Query
Processing
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-16
Advantages of Using the
Database Approach
 Providing backup and recovery services.
 Providing multiple interfaces to different
classes of users.
 Representing complex relationships among
data.
 Enforcing integrity constraints on the
database.
 Drawing Inferences and Actions using rules
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-17
Additional Implications of
Using the Database Approach
 Potential for enforcing standards: this is very
crucial for the success of database
applications in large organizations Standards
refer to data item names, display formats,
screens, report structures, meta-data
(description of data) etc.
 Reduced application development time:
incremental time to add each new application
is reduced.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 1-18
Additional Implications of
Using the Database Approach
 Flexibility to change data structures: database
structure may evolve as new requirements are
defined.
 Availability of up-to-date information – very
important for on-line transaction systems such as
airline, hotel, car reservations.
 Economies of scale: by consolidating data and
applications across departments wasteful overlap
of resources and personnel can be avoided.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-19
Historical Development of
Database Technology
 Early Database Applications: The
Hierarchical and Network Models were
introduced in mid 1960’s and dominated during
the seventies. A bulk of the worldwide database
processing still occurs using these models.
 Relational Model based Systems: The model
that was originally introduced in 1970 was
heavily researched and experimented with in
IBM and the universities. Relational DBMS
Products emerged in the 1980’s.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-20
Historical Development of
Database Technology
 Object-oriented applications: OODBMSs were
introduced in late 1980’s and early 1990’s to cater
to the need of complex data processing in CAD and
other applications. Their use has not taken off
much.
 Data on the Web and E-commerce Applications:
Web contains data in HTML (Hypertext markup
language) with links among pages. This has given
rise to a new set of applications and E-commerce is
using new standards like XML (eXtended Markup
Language).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-21
Extending Database
Capabilities
 New functionality is being added to DBMSs
in the following areas:
– Scientific Applications
– Image Storage and Management
– Audio and Video data management
– Data Mining
– Spatial data management
– Time Series and Historical Data Management
The above gives rise to new research and development in
incorporating new data types, complex data structures, new
operations and storage and indexing schemes in database systems.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-22
When not to use a DBMS
 Main inhibitors (costs) of using a DBMS:
– High initial investment and possible need for additional
hardware.
– Overhead for providing generality, security,
concurrency control, recovery, and integrity functions.
 When a DBMS may be unnecessary:
– If the database and applications are simple, well
defined, and not expected to change.
– If there are stringent real-time requirements that may
not be met because of DBMS overhead.
– If access to data by multiple users is not required.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-23
When not to use a DBMS
 When no DBMS may suffice:
– If the database system is not able to handle the
complexity of data because of modeling
limitations
– If the database users need special operations not
supported by the DBMS.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 1-24
Copyright © 2004 Pearson Education, Inc.
Chapter 2
Database System
Concepts and
Architecture

Copyright © 2004 Pearson Education, Inc.


Data Models
 Data Model: A set of concepts to describe the
structure of a database, and certain constraints
that the database should obey.
 Data Model Operations: Operations for
specifying database retrievals and updates by
referring to the concepts of the data model.
Operations on the data model may include basic
operations and user-defined operations.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-27
Categories of data models
 Conceptual (high-level, semantic) data models:
Provide concepts that are close to the way many users
perceive data. (Also called entity-based or object-
based data models.)
 Physical (low-level, internal) data models: Provide
concepts that describe details of how data is stored in
the computer.
 Implementation (representational) data models:
Provide concepts that fall between the above two,
balancing user views with some computer storage
details.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 2-28
History of Data Models
 Relational Model: proposed in 1970 by E.F. Codd (IBM),
first commercial system in 1981-82. Now in several
commercial products (DB2, ORACLE, SQL Server,
SYBASE, INFORMIX).
Network Model: the first one to be implemented by
Honeywell in 1964-65 (IDS System). Adopted heavily due to
the support by CODASYL (CODASYL - DBTG report of
1971). Later implemented in a large variety of systems -
IDMS (Cullinet - now CA), DMS 1100 (Unisys), IMAGE
(H.P.), VAX -DBMS (Digital Equipment Corp.).
 Hierarchical Data Model: implemented in a joint effort by
IBM and North American Rockwell around 1965. Resulted in
the IMS family of systems. The most popular model. Other
system based on this model: System 2k (SAS inc.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-29
History of Data Models
 Object-oriented Data Model(s): several models have been
proposed for implementing in a database system. One set
comprises models of persistent O-O Programming
Languages such as C++ (e.g., in OBJECTSTORE or
VERSANT), and Smalltalk (e.g., in GEMSTONE).
Additionally, systems like O2, ORION (at MCC - then
ITASCA), IRIS (at H.P.- used in Open OODB).
 Object-Relational Models: Most Recent Trend. Started
with Informix Universal Server. Exemplified in the latest
versions of Oracle-10i, DB2, and SQL Server etc. systems.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-30
Hierarchical Model
• ADVANTAGES:
• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically organized
domains - e.g., assemblies in manufacturing, personnel
organization in companies
• Language is simple; uses constructs like GET, GET UNIQUE,
GET NEXT, GET NEXT WITHIN PARENT etc.
• DISADVANTAGES:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-31
Network Model
• ADVANTAGES:
• Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
• Can handle most situations for modeling using record types and
relationship types.
• Language is navigational; uses constructs like FIND, FIND member,
FIND owner, FIND NEXT within set, GET etc. Programmers can do
optimal navigation through the database.
• DISADVANTAGES:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through a
set of records.
Little scope for automated "query optimization”

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-32
Schemas versus Instances
• Database Schema: The description of a database.
Includes descriptions of the database structure and
the constraints that should hold on the database.
• Schema Diagram: A diagrammatic display of (some
aspects of) a database schema.
• Schema Construct: A component of the schema or
an object within the schema, e.g., STUDENT,
COURSE.
• Database Instance: The actual data stored in a
database at a particular moment in time. Also called
database state (or occurrence).
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 2-33
Database Schema Vs.
Database State
• Database State: Refers to the content of a database at a
moment in time.
• Initial Database State: Refers to the database when it is
loaded
• Valid State: A state that satisfies the structure and
constraints of the database.
• Distinction
• The database schema changes very infrequently. The database
state changes every time the database is updated.
• Schema is also called intension, whereas state is called
extension.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-34
Three-Schema Architecture
• Proposed to support DBMS characteristics
of:
• Program-data independence.
• Support of multiple views of the data.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-35
Three-Schema Architecture
• Defines DBMS schemas at three levels:
• Internal schema at the internal level to describe
physical storage structures and access paths. Typically
uses a physical data model.
• Conceptual schema at the conceptual level to describe
the structure and constraints for the whole database for
a community of users. Uses a conceptual or an
implementation data model.
• External schemas at the external level to describe the
various user views. Usually uses the same data model
as the conceptual level.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-36
Three-Schema Architecture
Mappings among schema levels are needed
to transform requests and data. Programs
refer to an external schema, and are mapped
by the DBMS to the internal schema for
execution.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-37
Data Independence
• Logical Data Independence: The capacity
to change the conceptual schema without
having to change the external schemas and
their application programs.
• Physical Data Independence: The capacity
to change the internal schema without
having to change the conceptual schema.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-38
Data Independence
When a schema at a lower level is changed,
only the mappings between this schema
and higher-level schemas need to be
changed in a DBMS that fully supports data
independence. The higher-level schemas
themselves are unchanged. Hence, the
application programs need not be changed
since they refer to the external schemas.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 2-39
DBMS Languages
• Data Definition Language (DDL): Used by the
DBA and database designers to specify the
conceptual schema of a database. In many
DBMSs, the DDL is also used to define internal
and external schemas (views). In some DBMSs,
separate storage definition language (SDL) and
view definition language (VDL) are used to
define internal and external schemas.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-40
DBMS Languages
• Data Manipulation Language (DML):
Used to specify database retrievals and
updates.
• DML commands (data sublanguage) can be
embedded in a general-purpose programming
language (host language), such as COBOL, C
or an Assembly Language.
• Alternatively, stand-alone DML commands can
be applied directly (query language).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-41
DBMS Languages
• High Level or Non-procedural
Languages: e.g., SQL, are set-oriented and
specify what data to retrieve than how to
retrieve. Also called declarative languages.
• Low Level or Procedural Languages:
record-at-a-time; they specify how to
retrieve data and include constructs such as
looping.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 2-42
DBMS Interfaces
• Stand-alone query language interfaces.
• Programmer interfaces for embedding DML in
programming languages:
• Pre-compiler Approach
• Procedure (Subroutine) Call Approach
• User-friendly interfaces:
• Menu-based, popular for browsing on the web
• Forms-based, designed for naïve users
• Graphics-based (Point and Click, Drag and Drop etc.)
• Natural language: requests in written English
• Combinations of the above

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-43
Other DBMS Interfaces
• Speech as Input (?) and Output
• Web Browser as an interface
• Parametric interfaces (e.g., bank tellers) using
function keys.
• Interfaces for the DBA:
• Creating accounts, granting authorizations
• Setting system parameters
• Changing schemas or access path

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-44
Database System Utilities
• To perform certain functions such as:
• Loading data stored in files into a database. Includes
data conversion tools.
• Backing up the database periodically on tape.
• Reorganizing database file structures.
• Report generation utilities.
• Performance monitoring utilities.
• Other functions, such as sorting, user monitoring, data
compression, etc.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-45
Other Tools
• Data dictionary / repository:
• Used to store schema descriptions and other information such
as design decisions, application program descriptions, user
information, usage standards, etc.
• Active data dictionary is accessed by DBMS software and
users/DBA.
• Passive data dictionary is accessed by users/DBA only.
• Application Development Environments and CASE
(computer-aided software engineering) tools:
• Examples – Power builder (Sybase), Builder (Borland)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-46
Centralized and Client-Server
Architectures
• Centralized DBMS: combines everything
into single system including- DBMS
software, hardware, application programs
and user interface processing software.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-47
Basic Client-Server
Architectures
• Specialized Servers with Specialized
functions
• Clients
• DBMS Server

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-48
Specialized Servers with
Specialized functions:
• File Servers
• Printer Servers
• Web Servers
• E-mail Servers

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-49
Clients:
• Provide appropriate interfaces and a client-version
of the system to access and utilize the server
resources.
• Clients maybe diskless machines or PCs or
Workstations with disks with only the client
software installed.
• Connected to the servers via some form of a
network.
(LAN: local area network, wireless network,
etc.)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 2-50
DBMS Server
• Provides database query and transaction
services to the clients
• Sometimes called query and transaction
servers

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-51
Two Tier Client-Server
Architecture
• User Interface Programs and Application
Programs run on the client side
• Interface called ODBC (Open Database
Connectivity – see Ch 9) provides an
Application program interface (API) allow
client side programs to call the DBMS.
Most DBMS vendors provide ODBC
drivers.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 2-52
Two Tier Client-Server
Architecture
• A client program may connect to several DBMSs.
• Other variations of clients are possible: e.g., in
some DBMSs, more functionality is transferred to
clients including data dictionary functions,
optimization and recovery across multiple servers,
etc. In such situations the server may be called the
Data Server.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-53
Three Tier Client-Server
Architecture
• Common for Web applications
• Intermediate Layer called Application Server or Web
Server:
• stores the web connectivity software and the rules and business
logic (constraints) part of the application used to access the
right amount of data from the database server
• acts like a conduit for sending partially processed data between
the database server and the client.
• Additional Features- Security:
• encrypt the data at the server before transmission
• decrypt data at the client

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-54
Classification of DBMSs
• Based on the data model used:
• Traditional: Relational, Network, Hierarchical.
• Emerging: Object-oriented, Object-relational.
• Other classifications:
• Single-user (typically used with micro-
computers) vs. multi-user (most DBMSs).
• Centralized (uses a single computer with one
database) vs. distributed (uses multiple
computers, multiple databases)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 2-55
Classification of DBMSs
Distributed Database Systems have now
come to be known as client server based
database systems because they do not
support a totally distributed environment,
but rather a set of database servers
supporting a set of clients.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-56
Variations of Distributed
Environments:
• Homogeneous DDBMS
• Heterogeneous DDBMS
• Federated or Multidatabase Systems

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 2-57
Copyright © 2004 Pearson Education, Inc.
Chapter 3
Data Modeling Using the
Entity-Relationship (ER) Model

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline
 Example Database Application (COMPANY)
 ER Model Concepts
– Entities and Attributes
– Entity Types, Value Sets, and Key Attributes
– Relationships and Relationship Types
– Weak Entity Types
– Roles and Attributes in Relationship Types
 ER Diagrams - Notation
 ER Diagram for COMPANY Schema
 Alternative Notations – UML class diagrams, others

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-60
Example COMPANY
Database

 Requirements of the Company (oversimplified for


illustrative purposes)
– The company is organized into DEPARTMENTs.
Each department has a name, number and an
employee who manages the department. We keep
track of the start date of the department manager.
– Each department controls a number of PROJECTs.
Each project has a name, number and is located at a
single location.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-61
Example COMPANY Database
(Cont.)

– We store each EMPLOYEE’s social security number,


address, salary, sex, and birthdate. Each employee
works for one department but may work on several
projects. We keep track of the number of hours per
week that an employee currently works on each
project. We also keep track of the direct supervisor of
each employee.
– Each employee may have a number of DEPENDENTs.
For each dependent, we keep track of their name, sex,
birthdate, and relationship to employee.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-62
ER Model Concepts
 Entities and Attributes
– Entities are specific objects or things in the mini-world that are
represented in the database. For example the EMPLOYEE John
Smith, the Research DEPARTMENT, the ProductX PROJECT
– Attributes are properties used to describe an entity. For example an
EMPLOYEE entity may have a Name, SSN, Address, Sex,
BirthDate
– A specific entity will have a value for each of its attributes. For
example a specific employee entity may have Name='John Smith',
SSN='123456789', Address ='731, Fondren, Houston, TX',
Sex='M', BirthDate='09-JAN-55‘
– Each attribute has a value set (or data type) associated with it – e.g.
integer, string, subrange, enumerated type, …

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-63
Types of Attributes (1)
 Simple
– Each entity has a single atomic value for the attribute. For example,
SSN or Sex.
 Composite
– The attribute may be composed of several components. For example,
Address (Apt#, House#, Street, City, State, ZipCode, Country) or
Name (FirstName, MiddleName, LastName). Composition may form
a hierarchy where some components are themselves composite.
 Multi-valued
– An entity may have multiple values for that attribute. For example,
Color of a CAR or PreviousDegrees of a STUDENT. Denoted as
{Color} or {PreviousDegrees}.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-64
Types of Attributes (2)
 In general, composite and multi-valued attributes may be
nested arbitrarily to any number of levels although this is
rare. For example, PreviousDegrees of a STUDENT is a
composite multi-valued attribute denoted by
{PreviousDegrees (College, Year, Degree, Field)}.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-65
Entity Types and Key Attributes
 Entities with the same basic attributes are grouped or typed into an
entity type. For example, the EMPLOYEE entity type or the
PROJECT entity type.
 An attribute of an entity type for which each entity must have a
unique value is called a key attribute of the entity type. For example,
SSN of EMPLOYEE.
 A key attribute may be composite. For example, VehicleTagNumber
is a key of the CAR entity type with components (Number, State).
 An entity type may have more than one key. For example, the CAR
entity type may have two keys:
– VehicleIdentificationNumber (popularly called VIN) and
– VehicleTagNumber (Number, State), also known as license_plate number.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-66
ENTITY SET corresponding to the
ENTITY TYPE CAR
CAR
Registration(RegistrationNumber, State), VehicleID, Make, Model, Year, (Color)

car1
((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 1999, (red, black))
car2
((ABC 123, NEW YORK), WP9872, Nissan 300ZX, 2-door, 2002, (blue))
car3
((VSY 720, TEXAS), TD729, Buick LeSabre, 4-door, 2003, (white, blue))
.
.
.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-67
SUMMARY OF ER-DIAGRAM
NOTATION FOR ER SCHEMAS
Symbol Meaning

ENTITY TYPE

WEAK ENTITY TYPE

RELATIONSHIP TYPE

IDENTIFYING RELATIONSHIP TYPE

ATTRIBUTE

KEY ATTRIBUTE

MULTIVALUED ATTRIBUTE

COMPOSITE ATTRIBUTE

DERIVED ATTRIBUTE

E1 R E2 TOTAL PARTICIPATION OF E2 IN R

E1 N
R E2 CARDINALITY RATIO 1:N FOR E1:E2 IN R

(min,max)
R E STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION
OF E IN R
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 3-68
ER DIAGRAM – Entity Types are:
EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-69
Relationships and Relationship
Types (1)
 A relationship relates two or more distinct entities with a
specific meaning. For example, EMPLOYEE John Smith
works on the ProductX PROJECT or EMPLOYEE Franklin
Wong manages the Research DEPARTMENT.
 Relationships of the same type are grouped or typed into a
relationship type. For example, the WORKS_ON relationship
type in which EMPLOYEEs and PROJECTs participate, or the
MANAGES relationship type in which EMPLOYEEs and
DEPARTMENTs participate.
 The degree of a relationship type is the number of participating
entity types. Both MANAGES and WORKS_ON are binary
relationships.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-70
Example relationship instances of the WORKS_FOR
relationship between EMPLOYEE and DEPARTMENT
EMPLOYEE WORKS_FOR DEPARTMENT

r1
e1   d1

e2  r2

e3  r3  d2

e4  r4
d3
e5  
r5
e6 
r6
e7 
r7

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-71
Example relationship instances of the WORKS_ON
relationship between EMPLOYEE and PROJECT

r9
r1
e1   p1

e2  r2

e3  r3  p2

e4  r4
p3
e5  
r5
e6 
r6
e7 
r 8 r7

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-72
Relationships and Relationship
Types (2)
 More than one relationship type can exist with the same
participating entity types. For example, MANAGES and
WORKS_FOR are distinct relationships between
EMPLOYEE and DEPARTMENT, but with different
meanings and different relationship instances.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-73
ER DIAGRAM – Relationship Types are:
WORKS_FOR, MANAGES, WORKS_ON, CONTROLS,
SUPERVISION, DEPENDENTS_OF

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-74
Weak Entity Types
 An entity that does not have a key attribute
 A weak entity must participate in an identifying relationship type with
an owner or identifying entity type
 Entities are identified by the combination of:
– A partial key of the weak entity type
– The particular entity they are related to in the identifying entity
type
Example:
Suppose that a DEPENDENT entity is identified by the dependent’s first
name and birhtdate, and the specific EMPLOYEE that the dependent is
related to. DEPENDENT is a weak entity type with EMPLOYEE as its
identifying entity type via the identifying relationship type
DEPENDENT_OF

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-75
Weak Entity Type is: DEPENDENT
Identifying Relationship is: DEPENDENTS_OF

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-76
Constraints on Relationships

 Constraints on Relationship Types


– ( Also known as ratio constraints )
– Maximum Cardinality
 One-to-one (1:1)
 One-to-many (1:N) or Many-to-one (N:1)
 Many-to-many
– Minimum Cardinality (also called participation
constraint or existence dependency constraints)
 zero (optional participation, not existence-dependent)
 one or more (mandatory, existence-dependent)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-77
Many-to-one (N:1) RELATIONSHIP
EMPLOYEE WORKS_FOR DEPARTMENT

r1
e1   d1

e2  r2

e3  r3  d2

e4  r4
d3
e5  
r5
e6 
r6
e7 
r7

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-78
Many-to-many (M:N) RELATIONSHIP

r9
r1
e1   p1

e2  r2

e3  r3  p2

e4  r4
p3
e5  
r5
e6 
r6
e7 
r 8 r7

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-79
Relationships and Relationship
Types (3)
 We can also have a recursive relationship type.
 Both participations are same entity type in different roles.
 For example, SUPERVISION relationships between
EMPLOYEE (in role of supervisor or boss) and (another)
EMPLOYEE (in role of subordinate or worker).
 In following figure, first role participation labeled with 1 and
second role participation labeled with 2.
 In ER diagram, need to display role names to distinguish
participations.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-80
A RECURSIVE RELATIONSHIP
SUPERVISION
EMPLOYEE SUPERVISION

e1  2
1 r1
e2  2
1
r2
e3  2
1
e4  2 r3
1
e5  1
2 r4
e6  1

2 r5
e7 

r6
© The Benjamin/Cummings Publishing Company, Inc. 1994, Elmasri/Navathe, Fundamentals of Database Systems, Second Edition

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-81
Recursive Relationship Type is: SUPERVISION
(participation role names are shown)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-82
Attributes of Relationship types

 A relationship type can have attributes; for


example, HoursPerWeek of WORKS_ON; its
value for each relationship instance describes
the number of hours per week that an
EMPLOYEE works on a PROJECT.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-83
Attribute of a Relationship Type is:
Hours of WORKS_ON

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-84
Structural Constraints –
one way to express semantics
of relationships
Structural constraints on relationships:
 Cardinality ratio (of a binary relationship): 1:1, 1:N, N:1,
or M:N
SHOWN BY PLACING APPROPRIATE NUMBER ON
THE LINK.
 Participation constraint (on each participating entity
type): total (called existence dependency) or partial.
SHOWN BY DOUBLE LINING THE LINK
NOTE: These are easy to specify for Binary Relationship
Types.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-85
Alternative (min, max) notation for relationship
structural constraints:
 Specified on each participation of an entity type E in a relationship type R
 Specifies that each entity e in E participates in at least min and at most max

relationship instances in R
 Default(no constraint): min=0, max=n

 Must have minmax, min0, max 1

 Derived from the knowledge of mini-world constraints

Examples:
 A department has exactly one manager and an employee can manage at most

one department.
– Specify (0,1) for participation of EMPLOYEE in MANAGES

– Specify (1,1) for participation of DEPARTMENT in MANAGES

 An employee can work for exactly one department but a department can have

any number of employees.


– Specify (1,1) for participation of EMPLOYEE in WORKS_FOR

– Specify (0,n) for participation of DEPARTMENT in WORKS_FOR


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 3-86
The (min,max) notation
relationship constraints

(0,1) (1,1)

(1,1) (1,N)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-87
COMPANY ER Schema Diagram
using (min, max) notation

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-88
Relationships of Higher Degree

 Relationship types of degree 2 are called binary


 Relationship types of degree 3 are called ternary and of
degree n are called n-ary
 In general, an n-ary relationship is not equivalent to n
binary relationships
 Higher-order relationships discussed further in Chapter 4

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-89
Data Modeling Tools

A number of popular tools that cover conceptual


modeling and mapping into relational schema
design. Examples: ERWin, S- Designer
(Enterprise Application Suite), ER- Studio, etc.
POSITIVES: serves as documentation of
application requirements, easy user
interface - mostly graphics editor support

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-90
Problems with Current
Modeling Tools
 DIAGRAMMING
– Poor conceptual meaningful notation.
– To avoid the problem of layout algorithms and aesthetics
of diagrams, they prefer boxes and lines and do nothing
more than represent (primary-foreign key) relationships
among resulting tables.(a few exceptions)
 METHODOLGY
– lack of built-in methodology support.
– poor tradeoff analysis or user-driven design preferences.
– poor design verification and suggestions for improvement.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-91
Some of the Currently Available Automated Database
Design Tools
COMPANY TOOL FUNCTIONALITY
Embarcadero ER Studio Database Modeling in ER and IDEF1X
Technologies
DB Artisan Database administration and space and security
management
Oracle Developer 2000 and Database modeling, application development
Designer 2000
Popkin Software System Architect 2001 Data modeling, object modeling, process modeling,
structured analysis/design
Platinum Platinum Enterprice Data, process, and business component modeling
Technology Modeling Suite: Erwin,
BPWin, Paradigm Plus
Persistence Inc. Pwertier Mapping from O-O to relational model

Rational Rational Rose Modeling in UML and application generation in C++


and JAVA
Rogue Ware RW Metro Mapping from O-O to relational model

Resolution Ltd. Xcase Conceptual modeling up to code maintenance

Sybase Enterprise Application Suite Data modeling, business logic modeling


Visio Visio Enterprise Data modeling, design and reengineering Visual Basic
and Visual C++
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 3-92
ER DIAGRAM FOR A BANK
DATABASE

© The Benjamin/Cummings Publishing Company, Inc. 1994, Elmasri/Navathe, Fundamentals of Database Systems, Second Edition

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-93
PROBLEM with ER notation

THE ENTITY RELATIONSHIP MODEL IN


ITS ORIGINAL FORM DID NOT
SUPPORT THE SPECIALIZATION/
GENERALIZATION ABSTRACTIONS

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-94
Extended Entity-Relationship
(EER) Model
 Incorporates Set-subset relationships
 Incorporates Specialization/Generalization Hierarchies

NEXT CHAPTER ILLUSTRATES HOW THE ER


MODEL CAN BE EXTENDED WITH
- Set-subset relationships and
Specialization/Generalization Hierarchies and how to
display them in EER diagrams

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 3-95
Copyright © 2004 Pearson Education, Inc.
Chapter 5
The Relational Data Model and
Relational Database Constraints

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline
 Relational Model Concepts
 Relational Model Constraints and Relational Database
Schemas
 Update Operations and Dealing with Constraint
Violations

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-98
Relational Model Concepts
 The relational Model of Data is based on the concept
of a Relation.

 A Relation is a mathematical concept based on the


ideas of sets.

 The strength of the relational approach to data


management comes from the formal foundation
provided by the theory of relations.

 We review the essentials of the relational approach in


this chapter.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 5-99
Relational Model Concepts
 The model was first proposed by Dr. E.F. Codd of
IBM in 1970 in the following paper:
"A Relational Model for Large Shared Data
Banks," Communications of the ACM, June 1970.

The above paper caused a major revolution in the field of


Database management and earned Ted Codd the coveted
ACM Turing Award.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-100
INFORMAL DEFINITIONS
 RELATION: A table of values

– A relation may be thought of as a set of rows.


– A relation may alternately be though of as a set of columns.
– Each row represents a fact that corresponds to a real-world entity or
relationship.
– Each row has a value of an item or set of items that uniquely
identifies that row in the table.
– Sometimes row-ids or sequential numbers are assigned to identify the
rows in the table.
– Each column typically is called by its column name or column header
or attribute name.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-101
FORMAL DEFINITIONS
 A Relation may be defined in multiple ways.
 The Schema of a Relation: R (A1, A2, .....An)
Relation schema R is defined over attributes A1, A2, .....An
For Example -
CUSTOMER (Cust-id, Cust-name, Address, Phone#)

Here, CUSTOMER is a relation defined over the four


attributes Cust-id, Cust-name, Address, Phone#, each of
which has a domain or a set of valid values. For example,
the domain of Cust-id is 6 digit numbers.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-102
FORMAL DEFINITIONS
 A tuple is an ordered set of values
 Each value is derived from an appropriate domain.
 Each row in the CUSTOMER table may be referred to as a
tuple in the table and would consist of four values.
 <632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000">

is a tuple belonging to the CUSTOMER relation.


 A relation may be regarded as a set of tuples (rows).
 Columns in a table are also called attributes of the relation.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-103
FORMAL DEFINITIONS
 A domain has a logical definition: e.g.,
“USA_phone_numbers” are the set of 10 digit phone
numbers valid in the U.S.
 A domain may have a data-type or a format defined for it.
The USA_phone_numbers may have a format: (ddd)-ddd-
dddd where each d is a decimal digit. E.g., Dates have
various formats such as monthname, date, year or yyyy-mm-
dd, or dd mm,yyyy etc.
 An attribute designates the role played by the domain. E.g.,
the domain Date may be used to define attributes “Invoice-
date” and “Payment-date”.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-104
FORMAL DEFINITIONS
 The relation is formed over the cartesian product of the sets;
each set has values from a domain; that domain is used in a
specific role which is conveyed by the attribute name.
 For example, attribute Cust-name is defined over the domain
of strings of 25 characters. The role these strings play in the
CUSTOMER relation is that of the name of customers.
 Formally,
Given R(A1, A2, .........., An)
r(R)  dom (A1) X dom (A2) X ....X dom(An)
 R: schema of the relation
 r of R: a specific "value" or population of R.
 R is also called the intension of a relation
 r is also called the extension of a relation
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 5-105
FORMAL DEFINITIONS
 Let S1 = {0,1}
 Let S2 = {a,b,c}

 Let R  S1 X S2

 Then for example: r(R) = {<0,a> , <0,b> , <1,c> }


is one possible “state” or “population” or
“extension” r of the relation R, defined over domains
S1 and S2. It has three tuples.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-106
DEFINITION SUMMARY
Informal Terms Formal Terms

Table Relation
Column Attribute/Domain
Row Tuple
Values in a column Domain
Table Definition Schema of a Relation
Populated Table Extension
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 5-107
Copyright © 2004 Pearson Education, Inc.
Example - Figure 5.1

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-108
CHARACTERISTICS OF RELATIONS
 Ordering of tuples in a relation r(R): The tuples are not
considered to be ordered, even though they appear to be in
the tabular form.
  Ordering of attributes in a relation schema R (and of
values within each tuple): We will consider the attributes
in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be
ordered .
(However, a more general alternative definition of
relation does not require this ordering).
  Values in a tuple: All values are considered atomic
(indivisible). A special null value is used to represent
values that are unknown or inapplicable to certain tuples.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-109
CHARACTERISTICS OF RELATIONS

 Notation:
- We refer to component values of a tuple t
by t[Ai] = vi (the value of attribute Ai for
tuple t).
Similarly, t[Au, Av, ..., Aw] refers to the
subtuple of t containing the values of
attributes Au, Av, ..., Aw, respectively.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-110
CHARACTERISTICS OF RELATIONS- Figure
5.2

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-111
Relational Integrity Constraints

 Constraints are conditions that must hold


on all valid relation instances. There are
three main types of constraints:
1. Key constraints
2. Entity integrity constraints
3. Referential integrity constraints

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-112
Key Constraints
 Superkey of R: A set of attributes SK of R such that no two
tuples in any valid relation instance r(R) will have the same
value for SK. That is, for any distinct tuples t1 and t2 in
r(R), t1[SK]  t2[SK].
 Key of R: A "minimal" superkey; that is, a superkey K such
that removal of any attribute from K results in a set of
attributes that is not a superkey.
Example: The CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also
superkeys. {SerialNo, Make} is a superkey but not a key.
 If a relation has several candidate keys, one is chosen
arbitrarily to be the primary key. The primary key attributes
are underlined.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-113
Key Constraints
5.4

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-114
Entity Integrity
 Relational Database Schema: A set S of relation schemas
that belong to the same database. S is the name of the
database.
S = {R1, R2, ..., Rn}
 Entity Integrity: The primary key attributes PK of each
relation schema R in S cannot have null values in any tuple
of r(R). This is because primary key values are used to
identify the individual tuples.
t[PK]  null for any tuple t in r(R)
  Note: Other attributes of R may be similarly constrained
to disallow null values, even though they are not members
of the primary key.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-115
Referential Integrity
 A constraint involving two relations (the previous
constraints involve a single relation).
 Used to specify a relationship among tuples in two
relations: the referencing relation and the referenced
relation.
 Tuples in the referencing relation R1 have attributes FK
(called foreign key attributes) that reference the primary
key attributes PK of the referenced relation R2. A tuple t1
in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
 A referential integrity constraint can be displayed in a
relational database schema as a directed arc from R1.FK to
R2.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-116
Referential Integrity
Constraint
Statement of the constraint
The value in the foreign key column (or columns)
FK of the the referencing relation R1 can be either:
(1) a value of an existing primary key value of the
corresponding primary key PK in the referenced
relation R2,, or..
(2) a null.
In case (2), the FK in R1 should not be a part of its own
primary key.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-117
Other Types of Constraints
Semantic Integrity Constraints:
- based on application semantics and cannot be
expressed by the model per se
- E.g., “the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week”
- A constraint specification language may have to
be used to express these
- SQL-99 allows triggers and ASSERTIONS to
allow for some of these

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-118
5.5

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-119
5.6

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-120
5.7

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-121
Update Operations on Relations
 INSERT a tuple.
 DELETE a tuple.
 MODIFY a tuple.
 
 Integrity constraints should not be violated by the update
operations.
 Several update operations may have to be grouped
together.
 Updates may propagate to cause other updates
automatically. This may be necessary to maintain integrity
constraints.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 5-122
Update Operations on Relations
 In case of integrity violation, several actions can
be taken:
– Cancel the operation that causes the violation (REJECT
option)
– Perform the operation but inform the user of the
violation
– Trigger additional updates so the violation is corrected
(CASCADE option, SET NULL option)
– Execute a user-specified error-correction routine

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-123
In-Class Exercise
(Taken from Exercise 5.15)
Consider the following relations for a database that keeps
track of student enrollment in courses and the books adopted
for each course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign
keys for this schema.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 5-124
Copyright © 2004 Pearson Education, Inc.
Chapter 6
The Relational Algebra and Calculus

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Chapter Outline
 Example Database Application (COMPANY)
 Relational Algebra
– Unary Relational Operations
– Relational Algebra Operations From Set Theory
– Binary Relational Operations
– Additional Relational Operations
– Examples of Queries in Relational Algebra
 Relational Calculus
– Tuple Relational Calculus
– Domain Relational Calculus
 Overview of the QBE language (appendix D)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-127
Database State for COMPANY
All examples discussed below refer to the COMPANY database shown here.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-128
Relational Algebra
 The basic set of operations for the relational model is known
as the relational algebra. These operations enable a user to
specify basic retrieval requests.

 The result of a retrieval is a new relation, which may have


been formed from one or more relations. The algebra
operations thus produce new relations, which can be further
manipulated using operations of the same algebra.

 A sequence of relational algebra operations forms a


relational algebra expression, whose result will also be a
relation that represents the result of a database query (or
retrieval request).
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-129
Unary Relational Operations
 SELECT Operation

SELECT operation is used to select a subset of the tuples from a relation that
satisfy a selection condition. It is a filter that keeps only those tuples that
satisfy a qualifying condition – those satisfying the condition are selected
while others are discarded.
Example: To select the EMPLOYEE tuples whose department number is
four or those whose salary is greater than $30,000 the following notation is
used:
 DNO = 4 (EMPLOYEE)
SALARY > 30,000 (EMPLOYEE)
In general, the select operation is denoted by <selection condition>(R) where the
symbol  (sigma) is used to denote the select operator, and the selection
condition is a Boolean expression specified on the attributes of relation R

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-130
Unary Relational Operations
SELECT Operation Properties
– The SELECT operation <selection condition>(R) produces a relation S that
has the same schema as R

– The SELECT operation is commutative; i.e.,


 <condition1>(< condition2> ( R)) = <condition2> (< condition1> ( R))

– A cascaded SELECT operation may be applied in any order; i.e.,


 <condition1>(< condition2> (<condition3> ( R))
 = <condition2> (< condition3> (< condition1> ( R)))

– A cascaded SELECT operation may be replaced by a single selection


with a conjunction of all the conditions; i.e.,
 <condition1>(< condition2> (<condition3> ( R))
 = <condition1> AND < condition2> AND < condition3> ( R)))

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-131
Unary Relational Operations (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-132
Unary Relational Operations (cont.)
 PROJECT Operation
This operation selects certain columns from the table and discards the other
columns. The PROJECT creates a vertical partitioning – one with the needed
columns (attributes) containing results of the operation and other containing
the discarded Columns.
Example: To list each employee’s first and last name and salary, the
following is used:
  LNAME, FNAME,SALARY (EMPLOYEE)

The general form of the project operation is <attribute list>(R) where 
(pi) is the symbol used to represent the project operation and <attribute list>
is the desired list of attributes from the attributes of relation R.
The project operation removes any duplicate tuples, so the result of the
project operation is a set of tuples and hence a valid relation.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-133
Unary Relational Operations (cont.)
PROJECT Operation Properties

The number of tuples in the result of projection  <list> Ris always
less or equal to the number of tuples in R.

– If the list of attributes includes a key of R, then the number of tuples is


equal to the number of tuples in R.


 <list2> R)<list1> Ras long as<list2>contains
<list1>
theattributes in<list2>

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-134
Unary Relational Operations (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-135
Unary Relational Operations (cont.)
 Rename Operation
We may want to apply several relational algebra operations one after the
other. Either we can write the operations as a single relational algebra
expression by nesting the operations, or we can apply one operation at a time
and create intermediate result relations. In the latter case, we must give
names to the relations that hold the intermediate results.
Example: To retrieve the first name, last name, and salary of all employees
who work in department number 5, we must apply a select and a project
operation. We can write a single relational algebra expression as follows:

 FNAME, LNAME, SALARY ( DNO=5(EMPLOYEE))


OR We can explicitly show the sequence of operations, giving a name to each
intermediate relation:
DEP5_EMPS   DNO=5(EMPLOYEE)
RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-136
Unary Relational Operations (cont.)
 Rename Operation (cont.)
The rename operator is 

The general Rename operation can be expressed by any of the


following forms:

 S (B1, B2, …, Bn ) ( R) is a renamed relationS based on R with column names B1, B1, …..Bn

 S ( R) is a renamed relationS based on R (which does not specify column names).

 (B1, B2, …, Bn ) ( R) is a renamed relationwith column names B1, B1, …..Bn which does

not specify a new relation name.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-137
Unary Relational Operations (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-138
Relational Algebra Operations From
Set Theory
 UNION Operation
The result of this operation, denoted by R  S, is a relation that includes all
tuples that are either in R or in S or in both R and S. Duplicate tuples are
eliminated.
Example: To retrieve the social security numbers of all employees who either
work in department 5 or directly supervise an employee who works in
department 5, we can use the union operation as follows:
DEP5_EMPS  DNO=5 (EMPLOYEE)
RESULT1   SSN(DEP5_EMPS)
RESULT2(SSN)   SUPERSSN(DEP5_EMPS)
RESULT  RESULT1  RESULT2
The union operation produces the tuples that are in either RESULT1 or
RESULT2 or both. The two operands must be “type compatible”.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-139
Relational Algebra Operations From
Set Theory
 Type Compatibility
– The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn)
must have the same number of attributes, and the domains of
corresponding attributes must be compatible; that is,
dom(Ai)=dom(Bi) for i=1, 2, ..., n.

– The resulting relation for R1R2,R1  R2, or R1-R2 has the


same attribute names as the first operand relation R1 (by
convention).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-140
Relational Algebra Operations From
Set Theory
 UNION Example

STUDENTINSTRUCTOR

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-141
Relational Algebra Operations From Set
Theory (cont.) – use Fig. 6.4

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-142
Relational Algebra Operations From Set
Theory (cont.)
 INTERSECTION OPERATION

The result of this operation, denoted by R S, is a relation that includes all
tuples that are in both R and S. The two operands must be "type compatible"

Example: The result of the intersection operation (figure below) includes


only those who are both students and instructors.

STUDENT INSTRUCTOR

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-143
Relational Algebra Operations From Set
Theory (cont.)
 Set Difference (or MINUS) Operation
The result of this operation, denoted by R - S, is a relation that includes all
tuples that are in R but not in S. The two operands must be "type compatible”.

Example: The figure shows the names of students who are not instructors,
and the names of instructors who are not students.

STUDENT-INSTRUCTOR

INSTRUCTOR-STUDENT

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-144
Relational Algebra Operations From Set
Theory (cont.)
 Notice that both union and intersection are commutative
operations; that is
R  S = S  R, and R  S = S  R

 Both union and intersection can be treated as n-ary operations


applicable to any number of relations as both are associative
operations; that is
R  (S  T) = (R  S)  T, and (R  S)  T = R  (S  T)

 The minus operation is not commutative; that is, in general


R-S≠S–R

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-145
Relational Algebra Operations From Set
Theory (cont.)
 CARTESIAN (or cross product) Operation
– This operation is used to combine tuples from two relations in a
combinatorial fashion. In general, the result of R(A1, A2, . . ., An) x S(B1,
B2, . . ., Bm) is a relation Q with degree n + m attributes Q(A1, A2, . . ., An,
B1, B2, . . ., Bm), in that order. The resulting relation Q has one tuple for
each combination of tuples—one from R and one from S.
– Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then
| R x S | will have nR * nS tuples.
– The two operands do NOT have to be "type compatible”

Example:
FEMALE_EMPS   SEX=’F’(EMPLOYEE)
EMPNAMES   FNAME, LNAME, SSN (FEMALE_EMPS)

EMP_DEPENDENTS  EMPNAMES x DEPENDENT


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-146
Relational Algebra Operations From Set
Theory (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-147
Binary Relational Operations
 JOIN Operation
– The sequence of cartesian product followed by select is used
quite commonly to identify and select related tuples from two
relations, a special operation, called JOIN. It is denoted by a
– This operation is very important for any relational database
with more than a single relation, because it allows us to process
relationships among relations.
– The general form of a join operation on two relations R(A1, A2,
. . ., An) and S(B1, B2, . . ., Bm) is:
R <join condition> S
where R and S can be any relations that result from general
relational algebra expressions.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-148
Binary Relational Operations (cont.)
Example: Suppose that we want to retrieve the name of
the manager of each department. To get the manager’s
name, we need to combine each DEPARTMENT tuple
with the EMPLOYEE tuple whose SSN value matches
the MGRSSN value in the department tuple. We do this
by using the join operation.
DEPT_MGR  DEPARTMENT MGRSSN=SSN EMPLOYEE

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-149
Binary Relational Operations (cont.)
 EQUIJOIN Operation
The most common use of join involves join conditions with equality comparisons only.
Such a join, where the only comparison operator used is =, is called an EQUIJOIN. In
the result of an EQUIJOIN we always have one or more pairs of attributes (whose
names need not be identical) that have identical values in every tuple.
The JOIN seen in the previous example was EQUIJOIN.

 NATURAL JOIN Operation


Because one of each pair of attributes with identical values is superfluous, a new
operation called natural join—denoted by *—was created to get rid of the second
(superfluous) attribute in an EQUIJOIN condition.
The standard definition of natural join requires that the two join attributes, or each pair
of corresponding join attributes, have the same name in both relations. If this is not the
case, a renaming operation is applied first.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-150
Binary Relational Operations (cont.)
Example: To apply a natural join on the DNUMBER attributes of
DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write:
DEPT_LOCS  DEPARTMENT * DEPT_LOCATIONS

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-151
Complete Set of Relational Operations

 The set of operations including select ,


project  , union , set difference - , and
cartesian product X is called a complete set
because any other relational algebra expression
can be expressed by a combination of these five
operations.
 For example:
R  S = (R  S ) – ((R  S)  (S  R))
R <join condition> S= <join condition> (R X S)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-152
Binary Relational Operations (cont.)

 DIVISION Operation
– The division operation is applied to two relations
R(Z)  S(X), where X subset Z. Let Y = Z - X (and hence Z
= X  Y); that is, let Y be the set of attributes of R that are
not attributes of S.
– The result of DIVISION is a relation T(Y) that includes a
tuple t if tuples tR appear in R with tR [Y] = t, and with
tR [X] = ts for every tuple ts in S.

– For a tuple t to appear in the result T of the DIVISION, the


values in t must appear in R in combination with every tuple
in S.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-153
Binary Relational Operations (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-154
Recap of Relational Algebra Operations

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-155
Additional Relational Operations

 Aggregate Functions and Grouping


– A type of request that cannot be expressed in the basic relational algebra
is to specify mathematical aggregate functions on collections of values
from the database.

– Examples of such functions include retrieving the average or total salary


of all employees or the total number of employee tuples. These functions
are used in simple statistical queries that summarize information from
the database tuples.

– Common functions applied to collections of numeric values include


SUM, AVERAGE, MAXIMUM, and MINIMUM. The COUNT
function is used for counting tuples or values.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-156
Additional Relational Operations (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-157
Additional Relational Operations (cont.)

Use of the Functional operator ℱ

ℱMAX Salary (Employee) retrieves the maximum salary value from


the Employee relation

ℱMIN Salary (Employee) retrieves the minimum Salary value from


the Employee relation
ℱSUM Salary (Employee) retrieves the sum of the Salary from the
Employee relation
DNO ℱCOUNT SSN, AVERAGE Salary (Employee) groups employees by DNO
(department number) and computes the count of employees
and average salary per department.[ Note: count just counts the
number of rows, without removing duplicates]

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-158
Additional Relational Operations (cont.)
 Recursive Closure Operations
– Another type of operation that, in general, cannot be specified in the
basic original relational algebra is recursive closure. This operation is
applied to a recursive relationship.
– An example of a recursive operation is to retrieve all SUPERVISEES of
an EMPLOYEE e at all levels—that is, all EMPLOYEE e’ directly
supervised by e; all employees e’’ directly supervised by each employee
e’; all employees e’’’ directly supervised by each employee e’’; and so
on .
– Although it is possible to retrieve employees at each level and then take
their union, we cannot, in general, specify a query such as “retrieve the
supervisees of ‘James Borg’ at all levels” without utilizing a looping
mechanism.
– The SQL3 standard includes syntax for recursive closure.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-159
Additional Relational Operations (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-160
Additional Relational Operations (cont.)
 The OUTER JOIN Operation
– In NATURAL JOIN tuples without a matching (or related) tuple are eliminated
from the join result. Tuples with null in the join attributes are also eliminated.
This amounts to loss of information.
– A set of operations, called outer joins, can be used when we want to keep all the
tuples in R, or all those in S, or all those in both relations in the result of the
join, regardless of whether or not they have matching tuples in the other
relation.
– The left outer join operation keeps every tuple in the first or left relation R in
R S; if no matching tuple is found in S, then the attributes of S in the join
result are filled or “padded” with null values.
– A similar operation, right outer join, keeps every tuple in the second or right
relation S in the result of R S.
– A third operation, full outer join, denoted by keeps all tuples in both
the left and the right relations when no matching tuples are found, padding them
with null values as needed.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-161
Additional Relational Operations (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-162
Additional Relational Operations (cont.)
 OUTER UNION Operations
– The outer union operation was developed to take the union of tuples from two
relations if the relations are not union compatible.
– This operation will take the union of tuples in two relations R(X, Y) and S(X, Z)
that are partially compatible, meaning that only some of their attributes, say X, are
union compatible.
– The attributes that are union compatible are represented only once in the result, and
those attributes that are not union compatible from either relation are also kept in
the result relation T(X, Y, Z).
– Example: An outer union can be applied to two relations whose schemas are
STUDENT(Name, SSN, Department, Advisor) and INSTRUCTOR(Name, SSN,
Department, Rank). Tuples from the two relations are matched based on having the
same combination of values of the shared attributes—Name, SSN, Department. If a
student is also an instructor, both Advisor and Rank will have a value; otherwise,
one of these two attributes will be null.
The result relation STUDENT_OR_INSTRUCTOR will have the following
attributes:
STUDENT_OR_INSTRUCTOR (Name, SSN, Department, Advisor, Rank)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-163
Examples of Queries in Relational Algebra
 Q1: Retrieve the name and address of all employees who
work for the ‘Research’ department.
RESEARCH_DEPT   DNAME=’Research’ (DEPARTMENT)
RESEARCH_EMPS  (RESEARCH_DEPT DNUMBER= DNOEMPLOYEE EMPLOYEE)
RESULT   FNAME, LNAME, ADDRESS (RESEARCH_EMPS)

 Q6: Retrieve the names of employees who have no


dependents.
ALL_EMPS   SSN(EMPLOYEE)
EMPS_WITH_DEPS(SSN)   ESSN(DEPENDENT)
EMPS_WITHOUT_DEPS  (ALL_EMPS - EMPS_WITH_DEPS)
RESULT   LNAME, FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-164
Relational Calculus
 A relational calculus expression creates a new relation, which is
specified in terms of variables that range over rows of the stored
database relations (in tuple calculus) or over columns of the
stored relations (in domain calculus).
 In a calculus expression, there is no order of operations to
specify how to retrieve the query result—a calculus expression
specifies only what information the result should contain. This is
the main distinguishing feature between relational algebra and
relational calculus.
 Relational calculus is considered to be a nonprocedural
language. This differs from relational algebra, where we must
write a sequence of operations to specify a retrieval request;
hence relational algebra can be considered as a procedural way
of stating a query.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-165
Tuple Relational Calculus
 The tuple relational calculus is based on specifying a number of tuple variables. Each
tuple variable usually ranges over a particular database relation, meaning that the
variable may take as its value any individual tuple from that relation.
 A simple tuple relational calculus query is of the form
{t | COND(t)}
where t is a tuple variable and COND (t) is a conditional expression involving t. The
result of such a query is the set of all tuples t that satisfy COND (t).

Example: To find the first and last names of all employees whose salary is above
$50,000, we can write the following tuple calculus expression:

{t.FNAME, t.LNAME | EMPLOYEE(t) AND t.SALARY>50000}


The condition EMPLOYEE(t) specifies that the range relation of tuple variable t is
EMPLOYEE. The first and last name (PROJECTION FNAME, LNAME) of each
EMPLOYEE tuple t that satisfies the condition t.SALARY>50000 (SELECTION
 SALARY >50000 ) will be retrieved.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-166
The Existential and Universal Quantifiers

 Two special symbols called quantifiers can appear in formulas; these are the
universal quantifier ) and the existential quantifier ).
 Informally, a tuple variable t is bound if it is quantified, meaning that it
appears in an ( t) or ( t) clause; otherwise, it is free.

 If F is a formula, then so is ( t)(F), where t is a tuple variable. The formula


(  t)(F) is true if the formula F evaluates to true for some (at least one) tuple
assigned to free occurrences of t in F; otherwise ( t)(F) is false.

 If F is a formula, then so is ( t)(F), where t is a tuple variable. The formula


(  t)(F) is true if the formula F evaluates to true for every tuple (in the
universe) assigned to free occurrences of t in F; otherwise ( t)(F) is false.
It is called the universal or “for all” quantifier because every tuple in “the
universe of” tuples must make F true to make the quantified formula true.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-167
Example Query Using Existential Quantifier
 Retrieve the name and address of all employees who work for the ‘Research’
department.
Query :
{t.FNAME, t.LNAME, t.ADDRESS | EMPLOYEE(t) and  d)
(DEPARTMENT(d) and d.DNAME=‘Research’ and d.DNUMBER=t.DNO) }

 The only free tuple variables in a relational calculus expression should be


those that appear to the left of the bar ( | ). In above query, t is the only free
variable; it is then bound successively to each tuple. If a tuple satisfies the
conditions specified in the query, the attributes FNAME, LNAME, and
ADDRESS are retrieved for each such tuple.
 The conditions EMPLOYEE (t) and DEPARTMENT(d) specify the range
relations for t and d. The condition d.DNAME = ‘Research’ is a selection
condition and corresponds to a SELECT operation in the relational algebra,
whereas the condition d.DNUMBER = t.DNO is a JOIN condition.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-168
Example Query Using Universal Quantifier
 Find the names of employees who work on all the projects controlled by
department number 5.

Query :
{e.LNAME, e.FNAME | EMPLOYEE(e) and  x)(not(PROJECT(x)) or
not(x.DNUM=5)
OR ( ( w)(WORKS_ON(w) and w.ESSN=e.SSN and x.PNUMBER=w.PNO) ) ) )}
 Exclude from the universal quantification all tuples that we are not interested in
by making the condition true for all such tuples. The first tuples to exclude (by
making them evaluate automatically to true) are those that are not in the relation
R of interest.
 In query above, using the expression not(PROJECT(x)) inside the universally
quantified formula evaluates to true all tuples x that are not in the PROJECT
relation. Then we exclude the tuples we are not interested in from R itself. The
expression not(x.DNUM=5) evaluates to true all tuples x that are in the project
relation but are not controlled by department 5.
 Finally, we specify a condition that must hold on all the remaining tuples in R.
( ( w)(WORKS_ON(w) and w.ESSN=e.SSN and x.PNUMBER=w.PNO)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-169
Languages Based on Tuple Relational
Calculus
 The language SQL is based on tuple calculus. It uses the basic
SELECT <list of attributes>
FROM <list of relations>
WHERE <conditions>
block structure to express the queries in tuple calculus where the SELECT clause
mentions the attributes being projected, the FROM clause mentions the relations
needed in the query, and the WHERE clause mentions the selection as well as the
join conditions.
SQL syntax is expanded further to accommodate other operations. (See Chapter 8).

 Another language which is based on tuple calculus is QUEL which actually


uses the range variables as in tuple calculus.
Its syntax includes:
RANGE OF <variable name> IS <relation name>
Then it uses
RETRIEVE <list of attributes from range variables>
WHERE <conditions>
This language was proposed in the relational DBMS INGRES.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 6-170
The Domain Relational Calculus
 Another variation of relational calculus called the domain relational calculus, or
simply, domain calculus is equivalent to tuple calculus and to relational algebra.
 The language called QBE (Query-By-Example) that is related to domain calculus was
developed almost concurrently to SQL at IBM Research, Yorktown Heights, New
York. Domain calculus was thought of as a way to explain what QBE does.
 Domain calculus differs from tuple calculus in the type of variables used in formulas:
rather than having variables range over tuples, the variables range over single values
from domains of attributes. To form a relation of degree n for a query result, we must
have n of these domain variables—one for each attribute.
 An expression of the domain calculus is of the form
{x1, x2, . . ., xn | COND(x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m)}
where x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m are domain variables that range over
domains (of attributes) and COND is a condition or formula of the domain relational
calculus.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-171
Example Query Using Domain Calculus
 Retrieve the birthdate and address of the employee whose name is ‘John B.
Smith’.

Query :
{uv | ( q) ( r) ( s) ( t) ( w) ( x) ( y) ( z)
(EMPLOYEE(qrstuvwxyz) and q=’John’ and r=’B’ and s=’Smith’)}

 Ten variables for the employee relation are needed, one to range over the
domain of each attribute in order. Of the ten variables q, r, s, . . ., z, only u and
v are free.
 Specify the requested attributes, BDATE and ADDRESS, by the free domain
variables u for BDATE and v for ADDRESS.
 Specify the condition for selecting a tuple following the bar ( | )—namely, that
the sequence of values assigned to the variables qrstuvwxyz be a tuple of the
employee relation and that the values for q (FNAME), r (MINIT), and s
(LNAME) be ‘John’, ‘B’, and ‘Smith’, respectively.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-172
QBE: A Query Language Based on Domain
Calculus (Appendix D)
 This language is based on the idea of giving an example of a query using
example elements.
 An example element stands for a domain variable and is specified as an
example value preceded by the underscore character.
 P. (called P dot) operator (for “print”) is placed in those columns which are
requested for the result of the query.
 A user may initially start giving actual values as examples, but later can get
used to providing a minimum number of variables as example elements.
 The language is very user-friendly, because it uses minimal syntax.
 QBE was fully developed further with facilities for grouping, aggregation,
updating etc. and is shown to be equivalent to SQL.
 The language is available under QMF (Query Management Facility) of DB2
of IBM and has been used in various ways by other products like ACCESS of
Microsoft, PARADOX.
 For details, see Appendix D in the text.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-173
QBE Examples
 QBE initially presents a relational schema as a “blank schema”
in which the user fills in the query as an example:

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-174
QBE Examples
 The following domain calculus query can be successively
minimized by the user as shown:
Query :
{uv | ( q) ( r) ( s) ( t) ( w) ( x) ( y) ( z)
(EMPLOYEE(qrstuvwxyz) and q=’John’ and r=’B’ and s=’Smith’)}

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-175
QBE Examples
Specifying complex cinditions in QBE:
 A technique called the “condition box” is used in QBE to state
more involved Boolean expressions as conditions.
 The D.4(a) gives employees who work on either project 1 or 2,
whereas the query in D.4(b) gives those who work on both the
projects.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-176
QBE Examples
 Illustrating join in QBE. The join is simple accomplished by
using the same example element in the columns being joined.
Note that the Result is set us as an independent table.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 6-177
Copyright © 2004 Pearson Education, Inc.
Chapter 7
Relational Database Design by
ER- and EERR-to-Relational
Mapping

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline

 ER-to-Relational Mapping Algorithm


Step 1: Mapping of Regular Entity Types
Step 2: Mapping of Weak Entity Types
Step 3: Mapping of Binary 1:1 Relation Types
Step 4: Mapping of Binary 1:N Relationship Types.
Step 5: Mapping of Binary M:N Relationship Types.
Step 6: Mapping of Multivalued attributes.
Step 7: Mapping of N-ary Relationship Types.

 Mapping EER Model Constructs to Relations


Step 8: Options for Mapping Specialization or Generalization.
Step 9: Mapping of Union Types (Categories).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-180
ER-to-Relational Mapping
Algorithm
 Step 1: Mapping of Regular Entity Types.

– For each regular (strong) entity type E in the ER schema, create a


relation R that includes all the simple attributes of E.
– Choose one of the key attributes of E as the primary key for R. If the
chosen key of E is composite, the set of simple attributes that form it
will together form the primary key of R.

Example: We create the relations EMPLOYEE, DEPARTMENT, and


PROJECT in the relational schema corresponding to the regular entities
in the ER diagram. SSN, DNUMBER, and PNUMBER are the primary
keys for the relations EMPLOYEE, DEPARTMENT, and PROJECT as
shown.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-181
FIGURE 7.1
The ER
conceptual
schema
diagram for
the
COMPANY
database.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-182
FIGURE 7.2
Result of
mapping the
COMPANY
ER schema
into a
relational
schema.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-183
ER-to-Relational Mapping
Algorithm (cont)
 Step 2: Mapping of Weak Entity Types

– For each weak entity type W in the ER schema with owner entity type E, create
a relation R and include all simple attributes (or simple components of
composite attributes) of W as attributes of R.
– In addition, include as foreign key attributes of R the primary key attribute(s)
of the relation(s) that correspond to the owner entity type(s).
– The primary key of R is the combination of the primary key(s) of the owner(s)
and the partial key of the weak entity type W, if any.

Example: Create the relation DEPENDENT in this step to correspond to the


weak entity type DEPENDENT. Include the primary key SSN of the
EMPLOYEE relation as a foreign key attribute of DEPENDENT (renamed to
ESSN).
The primary key of the DEPENDENT relation is the combination {ESSN,
DEPENDENT_NAME} because DEPENDENT_NAME is the partial key of
DEPENDENT.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-184
ER-to-Relational Mapping
Algorithm (cont)
 Step 3: Mapping of Binary 1:1 Relation Types

For each binary 1:1 relationship type R in the ER schema, identify the relations
S and T that correspond to the entity types participating in R. There are three
possible approaches:
(1) Foreign Key approach: Choose one of the relations-S, say-and include a foreign key in S the
primary key of T. It is better to choose an entity type with total participation in R in the role of S.
Example: 1:1 relation MANAGES is mapped by choosing the participating entity type
DEPARTMENT to serve in the role of S, because its participation in the MANAGES relationship
type is total.

(2) Merged relation option: An alternate mapping of a 1:1 relationship type is possible by merging
the two entity types and the relationship into a single relation. This may be appropriate when both
participations are total.

(3) Cross-reference or relationship relation option: The third alternative is to set up a third relation R
for the purpose of cross-referencing the primary keys of the two relations S and T representing the
entity types.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-185
ER-to-Relational Mapping
Algorithm (cont)
 Step 4: Mapping of Binary 1:N Relationship Types.

– For each regular binary 1:N relationship type R, identify the relation S
that represent the participating entity type at the N-side of the
relationship type.
– Include as foreign key in S the primary key of the relation T that
represents the other entity type participating in R.
– Include any simple attributes of the 1:N relation type as attributes of S.

Example: 1:N relationship types WORKS_FOR, CONTROLS, and


SUPERVISION in the figure. For WORKS_FOR we include the
primary key DNUMBER of the DEPARTMENT relation as foreign
key in the EMPLOYEE relation and call it DNO.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-186
ER-to-Relational Mapping
Algorithm (cont)
 Step 5: Mapping of Binary M:N Relationship Types.

– For each regular binary M:N relationship type R, create a new relation S
to represent R.
– Include as foreign key attributes in S the primary keys of the relations that
represent the participating entity types; their combination will form the
primary key of S.
– Also include any simple attributes of the M:N relationship type (or simple
components of composite attributes) as attributes of S.

Example: The M:N relationship type WORKS_ON from the ER diagram


is mapped by creating a relation WORKS_ON in the relational database
schema. The primary keys of the PROJECT and EMPLOYEE relations are
included as foreign keys in WORKS_ON and renamed PNO and ESSN,
respectively.
Attribute HOURS in WORKS_ON represents the HOURS attribute of the
relation type. The primary key of the WORKS_ON relation is the
combination of the foreign key attributes {ESSN, PNO}.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 7-187
ER-to-Relational Mapping
Algorithm (cont)
 Step 6: Mapping of Multivalued attributes.

– For each multivalued attribute A, create a new relation R. This relation R


will include an attribute corresponding to A, plus the primary key attribute
K-as a foreign key in R-of the relation that represents the entity type of
relationship type that has A as an attribute.
– The primary key of R is the combination of A and K. If the multivalued
attribute is composite, we include its simple components.

Example: The relation DEPT_LOCATIONS is created. The attribute


DLOCATION represents the multivalued attribute LOCATIONS of
DEPARTMENT, while DNUMBER-as foreign key-represents the
primary key of the DEPARTMENT relation. The primary key of R is the
combination of {DNUMBER, DLOCATION}.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-188
ER-to-Relational Mapping
Algorithm (cont)
 Step 7: Mapping of N-ary Relationship Types.
– For each n-ary relationship type R, where n>2, create a new
relationship S to represent R.
– Include as foreign key attributes in S the primary keys of the
relations that represent the participating entity types.
– Also include any simple attributes of the n-ary relationship
type (or simple components of composite attributes) as
attributes of S.
Example: The relationship type SUPPY in the ER below. This can be
mapped to the relation SUPPLY shown in the relational schema, whose
primary key is the combination of the three foreign keys {SNAME,
PARTNO, PROJNAME}

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-189
FIGURE 4.11
Ternary relationship types. (a) The SUPPLY relationship.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-190
FIGURE 7.3
Mapping the n-ary relationship type SUPPLY from
Figure 4.11a.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-191
Summary of Mapping constructs
and constraints

Table 7.1 Correspondence between ER and Relational Models

ER Model Relational Model


Entity type “Entity” relation
1:1 or 1:N relationship type Foreign key (or “relationship” relation)
M:N relationship type “Relationship” relation and two foreign keys
n-ary relationship type “Relationship” relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of simple component attributes
Multivalued attribute Relation and foreign key
Value set Domain
Key attribute Primary (or secondary) key

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-192
Mapping EER Model Constructs to
Relations
 Step8: Options for Mapping Specialization or Generalization.
Convert each specialization with m subclasses {S1, S2,….,Sm} and generalized
superclass C, where the attributes of C are {k,a1,…an} and k is the (primary)
key, into relational schemas using one of the four following options:

Option 8A: Multiple relations-Superclass and subclasses.


Create a relation L for C with attributes Attrs(L) = {k,a 1,…an} and PK(L) = k. Create a
relation Li for each subclass Si, 1 < i < m, with the attributesAttrs(Li) = {k} U
{attributes of Si} and PK(Li)=k. This option works for any specialization (total or
partial, disjoint of over-lapping).

Option 8B: Multiple relations-Subclass relations only


Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attr(Li) =
{attributes of Si} U {k,a1…,an} and PK(Li) = k. This option only works for a
specialization whose subclasses are total (every entity in the superclass must belong to
(at least) one of the subclasses).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-193
FIGURE 4.4
EER diagram
notation for an
attribute-
defined
specialization
on JobType.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-194
FIGURE 7.4
Options for mapping specialization or generalization.
(a) Mapping the EER schema in Figure 4.4 using option
8A.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-195
FIGURE 4.3
Generalization. (b) Generalizing CAR and TRUCK into the
superclass VEHICLE.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-196
FIGURE 7.4
Options for mapping specialization or generalization.
(b) Mapping the EER schema in Figure 4.3b using
option 8B.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-197
Mapping EER Model Constructs to
Relations (cont)

Option 8C: Single relation with one type attribute.


Create a single relation L with attributes Attrs(L) = {k,a1,…an} U {attributes
of S1} U…U {attributes of Sm} U {t} and PK(L) = k. The attribute t is called
a type (or discriminating) attribute that indicates the subclass to which each
tuple belongs

Option 8D: Single relation with multiple type attributes.


Create a single relation schema L with attributes Attrs(L) = {k,a1,…an} U
{attributes of S1} U…U {attributes of Sm} U {t1, t2,…,tm} and PK(L) = k.
Each ti, 1 < I < m, is a Boolean type attribute indicating whether a tuple
belongs to the subclass Si.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-198
FIGURE 4.4
EER diagram
notation for an
attribute-
defined
specialization
on JobType.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-199
FIGURE 7.4
Options for mapping specialization or generalization.
(c) Mapping the EER schema in Figure 4.4 using option
8C.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-200
FIGURE 4.5
EER diagram notation for an overlapping (nondisjoint)
specialization.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-201
FIGURE 7.4
Options for mapping specialization or generalization.
(d) Mapping Figure 4.5 using option 8D with Boolean
type fields Mflag and Pflag.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-202
Mapping EER Model Constructs to
Relations (cont)
 Mapping of Shared Subclasses (Multiple Inheritance)
A shared subclass, such as STUDENT_ASSISTANT, is a subclass of several
classes, indicating multiple inheritance. These classes must all have the same
key attribute; otherwise, the shared subclass would be modeled as a category.

We can apply any of the options discussed in Step 8 to a shared subclass,


subject to the restriction discussed in Step 8 of the mapping algorithm. Below
both 8C and 8D are used for the shared class STUDENT_ASSISTANT.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-203
FIGURE 4.7
A specialization
lattice with multiple
inheritance for a
UNIVERSITY
database.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-204
FIGURE 7.5
Mapping the EER specialization lattice in Figure 4.6
using multiple options.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-205
Mapping EER Model Constructs to
Relations (cont)
 Step 9: Mapping of Union Types (Categories).

– For mapping a category whose defining superclass have different keys, it


is customary to specify a new key attribute, called a surrogate key,
when creating a relation to correspond to the category.
– In the example below we can create a relation OWNER to correspond to
the OWNER category and include any attributes of the category in this
relation. The primary key of the OWNER relation is the surrogate key,
which we called OwnerId.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-206
FIGURE 4.8
Two categories (union
types): OWNER and
REGISTERED_VEHICLE.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-207
FIGURE 7.6
Mapping the EER
categories (union
types) in Figure 4.7
to relations.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-208
Mapping Exercise
Exercise 7.4.

FIGURE 7.7
An ER schema for a SHIP_TRACKING database.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 7-209
Copyright © 2004 Pearson Education, Inc.
Chapter 8
SQL-99: Schema
Definition, Basic
Constraints, and Queries

Copyright © 2004 Pearson Education, Inc.


Data Definition, Constraints,
and Schema Changes
 Used to CREATE, DROP, and ALTER the
descriptions of the tables (relations) of a
database

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-212
CREATE TABLE
 Specifies a new base relation by giving it a name, and
specifying each of its attributes and their data types
(INTEGER, FLOAT, DECIMAL(i,j), CHAR(n),
VARCHAR(n))
 A constraint NOT NULL may be specified on an
attribute
CREATE TABLE DEPARTMENT
( DNAME VARCHAR(10) NOT NULL,
DNUMBER INTEGER NOT NULL,
MGRSSN CHAR(9),
MGRSTARTDATE CHAR(9) );

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-213
CREATE TABLE
 In SQL2, can use the CREATE TABLE command for specifying the
primary key attributes, secondary keys, and referential integrity
constraints (foreign keys).
 Key attributes can be specified via the PRIMARY KEY and UNIQUE
phrases

CREATE TABLE DEPT


( DNAME VARCHAR(10) NOT NULL,
DNUMBER INTEGER NOT NULL,
MGRSSN CHAR(9),
MGRSTARTDATE CHAR(9),
PRIMARY KEY (DNUMBER),
UNIQUE (DNAME),
FOREIGN KEY (MGRSSN) REFERENCES EMP );

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-214
DROP TABLE
 Used to remove a relation (base table) and its
definition
 The relation can no longer be used in queries,
updates, or any other commands since its
description no longer exists
 Example:

DROP TABLE DEPENDENT;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-215
ALTER TABLE
 Used to add an attribute to one of the base relations
 The new attribute will have NULLs in all the tuples of the
relation right after the command is executed; hence, the NOT
NULL constraint is not allowed for such an attribute
 Example:

ALTER TABLE EMPLOYEE ADD JOB VARCHAR(12);

 The database users must still enter a value for the new attribute
JOB for each EMPLOYEE tuple. This can be done using the
UPDATE command.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-216
Features Added in SQL2 and
SQL-99
 CREATE SCHEMA
 REFERENTIAL INTEGRITY
OPTIONS

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-217
CREATE SCHEMA
 Specifies a new database schema by giving
it a name

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-218
REFERENTIAL INTEGRITY
OPTIONS
 We can specify RESTRICT, CASCADE, SET NULL or SET
DEFAULT on referential integrity constraints (foreign keys)

CREATE TABLE DEPT


( DNAME VARCHAR(10) NOT NULL,
DNUMBER INTEGER NOT NULL,
MGRSSN CHAR(9),
MGRSTARTDATE CHAR(9),
PRIMARY KEY (DNUMBER),
UNIQUE (DNAME),
FOREIGN KEY (MGRSSN) REFERENCES EMP
ON DELETE SET DEFAULT ON UPDATE CASCADE );

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-219
REFERENTIAL INTEGRITY
OPTIONS (continued)
CREATE TABLE EMP
( ENAME VARCHAR(30) NOT NULL,
ESSN CHAR(9),
BDATE DATE,
DNO INTEGER DEFAULT 1,
SUPERSSN CHAR(9),
PRIMARY KEY (ESSN),
FOREIGN KEY (DNO) REFERENCES DEPT
ON DELETE SET DEFAULT ON UPDATE CASCADE,
FOREIGN KEY (SUPERSSN) REFERENCES EMP
ON DELETE SET NULL ON UPDATE CASCADE );

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-220
Additional Data Types in
SQL2 and SQL-99
Has DATE, TIME, and TIMESTAMP data types
 DATE:
– Made up of year-month-day in the format yyyy-mm-dd
 TIME:
– Made up of hour:minute:second in the format hh:mm:ss
 TIME(i):
– Made up of hour:minute:second plus i additional digits
specifying fractions of a second
– format is hh:mm:ss:ii...i
 TIMESTAMP:
– Has both DATE and TIME components

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-221
Additional Data Types in
SQL2 and SQL-99 (cont.)
 INTERVAL:
– Specifies a relative value rather than an absolute value
– Can be DAY/TIME intervals or YEAR/MONTH
intervals
– Can be positive or negative when added to or
subtracted from an absolute value, the result is an
absolute value

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-222
Retrieval Queries in SQL
 SQL has one basic statement for retrieving information from a
database; the SELECT statement
 This is not the same as the SELECT operation of the relational
algebra
 Important distinction between SQL and the formal relational model;
SQL allows a table (relation) to have two or more tuples that are
identical in all their attribute values
 Hence, an SQL relation (table) is a multi-set (sometimes called a bag)
of tuples; it is not a set of tuples
 SQL relations can be constrained to be sets by specifying PRIMARY
KEY or UNIQUE attributes, or by using the DISTINCT option in a
query

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-223
Retrieval Queries in SQL
(cont.)
 Basic form of the SQL SELECT statement is called a
mapping or a SELECT-FROM-WHERE block

SELECT <attribute list>


FROM <table list>
WHERE <condition>

– <attribute list> is a list of attribute names whose values are to be


retrieved by the query
– <table list> is a list of the relation names required to process the
query
– <condition> is a conditional (Boolean) expression that identifies
the tuples to be retrieved by the query

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-224
Relational Database Schema--Figure 5.5

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-225
Populated
Database--Fig.5.6

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-226
Simple SQL Queries
 Basic SQL queries correspond to using the SELECT, PROJECT, and JOIN
operations of the relational algebra
 All subsequent examples use the COMPANY database
 Example of a simple query on one relation
 Query 0: Retrieve the birthdate and address of the employee whose name is
'John B. Smith'.

Q0: SELECT BDATE, ADDRESS


FROM EMPLOYEE
WHERE FNAME='John' AND MINIT='B’
AND LNAME='Smith’

– Similar to a SELECT-PROJECT pair of relational algebra operations; the SELECT-


clause specifies the projection attributes and the WHERE-clause specifies the
selection condition
– However, the result of the query may contain duplicate tuples

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-227
Simple SQL Queries (cont.)
 Query 1: Retrieve the name and address of all employees who work for
the 'Research' department.

Q1: SELECT FNAME, LNAME, ADDRESS


FROM EMPLOYEE, DEPARTMENT
WHERE DNAME='Research' AND DNUMBER=DNO

– Similar to a SELECT-PROJECT-JOIN sequence of relational


algebra operations
– (DNAME='Research') is a selection condition (corresponds to a
SELECT operation in relational algebra)
– (DNUMBER=DNO) is a join condition (corresponds to a JOIN
operation in relational algebra)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-228
Simple SQL Queries (cont.)
 Query 2: For every project located in 'Stafford', list the project number, the
controlling department number, and the department manager's last name,
address, and birthdate.

Q2: SELECT PNUMBER, DNUM, LNAME, BDATE, ADDRESS

FROM PROJECT, DEPARTMENT, EMPLOYEE


WHERE DNUM=DNUMBER AND MGRSSN=SSN
AND PLOCATION='Stafford'

– In Q2, there are two join conditions


– The join condition DNUM=DNUMBER relates a project to its controlling
department
– The join condition MGRSSN=SSN relates the controlling department to
the employee who manages that department

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-229
Aliases, * and DISTINCT,
Empty WHERE-clause
 In SQL, we can use the same name for two (or more)
attributes as long as the attributes are in different relations
A query that refers to two or more attributes with the same
name must qualify the attribute name with the relation
name by prefixing the relation name to the attribute name
Example:

 EMPLOYEE.LNAME, DEPARTMENT.DNAME

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-230
ALIASES
 Some queries need to refer to the same relation twice
 In this case, aliases are given to the relation name
 Query 8: For each employee, retrieve the employee's name, and the name
of his or her immediate supervisor.

Q8: SELECT E.FNAME, E.LNAME, S.FNAME,


S.LNAME
FROM EMPLOYEE E S
WHERE E.SUPERSSN=S.SSN

– In Q8, the alternate relation names E and S are called aliases or


tuple variables for the EMPLOYEE relation
– We can think of E and S as two different copies of EMPLOYEE; E
represents employees in role of supervisees and S represents
employees in role of supervisors

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-231
ALIASES (cont.)
– Aliasing can also be used in any SQL query for convenience
Can also use the AS keyword to specify aliases

Q8: SELECT E.FNAME, E.LNAME, S.FNAME,


S.LNAME
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.SUPERSSN=S.SSN

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-232
UNSPECIFIED
WHERE-clause
 A missing WHERE-clause indicates no condition; hence, all
tuples of the relations in the FROM-clause are selected
 This is equivalent to the condition WHERE TRUE
 Query 9: Retrieve the SSN values for all employees.

Q9: SELECT SSN


FROM EMPLOYEE

 If more than one relation is specified in the FROM-clause and


there is no join condition, then the CARTESIAN PRODUCT of
tuples is selected

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-233
UNSPECIFIED
WHERE-clause (cont.)
 Example:

Q10: SELECT SSN, DNAME


FROM EMPLOYEE, DEPARTMENT

– It is extremely important not to overlook specifying any selection and


join conditions in the WHERE-clause; otherwise, incorrect and very
large relations may result

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-234
USE OF *
 To retrieve all the attribute values of the selected tuples, a * is
used, which stands for all the attributes
Examples:

Q1C: SELECT *
FROM EMPLOYEE
WHERE DNO=5

Q1D: SELECT *
FROM EMPLOYEE, DEPARTMENT
WHERE DNAME='Research' AND
DNO=DNUMBER
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 8-235
USE OF DISTINCT
 SQL does not treat a relation as a set; duplicate tuples can
appear
 To eliminate duplicate tuples in a query result, the keyword
DISTINCT is used
 For example, the result of Q11 may have duplicate SALARY
values whereas Q11A does not have any duplicate values

Q11: SELECT SALARY


FROM EMPLOYEE
Q11A: SELECT DISTINCT SALARY
FROM EMPLOYEE

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-236
SET OPERATIONS
 SQL has directly incorporated some set operations
 There is a union operation (UNION), and in some
versions of SQL there are set difference (MINUS) and
intersection (INTERSECT) operations
 The resulting relations of these set operations are sets of
tuples; duplicate tuples are eliminated from the result
 The set operations apply only to union compatible
relations ; the two relations must have the same
attributes and the attributes must appear in the same
order

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-237
SET OPERATIONS (cont.)
 Query 4: Make a list of all project numbers for projects that involve an
employee whose last name is 'Smith' as a worker or as a manager of
the department that controls the project.

Q4: (SELECT PNAME


FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE DNUM=DNUMBER AND MGRSSN=SSN
AND LNAME='Smith')
UNION (SELECT PNAME
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE PNUMBER=PNO AND ESSN=SSN AND
LNAME='Smith')

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-238
NESTING OF QUERIES
 A complete SELECT query, called a nested query , can be specified
within the WHERE-clause of another query, called the outer query
 Many of the previous queries can be specified in an alternative form
using nesting
 Query 1: Retrieve the name and address of all employees who work for
the 'Research' department.

Q1: SELECT FNAME, LNAME, ADDRESS


FROM EMPLOYEE
WHERE DNO IN (SELECT DNUMBER
FROM DEPARTMENT
WHERE DNAME='Research' )

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-239
NESTING OF QUERIES
(cont.)
 The nested query selects the number of the 'Research' department
 The outer query select an EMPLOYEE tuple if its DNO value is in the
result of either nested query
 The comparison operator IN compares a value v with a set (or multi-set)
of values V, and evaluates to TRUE if v is one of the elements in V
 In general, we can have several levels of nested queries
 A reference to an unqualified attribute refers to the relation declared in
the innermost nested query
 In this example, the nested query is not correlated with the outer query

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-240
CORRELATED NESTED
QUERIES
 If a condition in the WHERE-clause of a nested query references an attribute
of a relation declared in the outer query , the two queries are said to be
correlated
 The result of a correlated nested query is different for each tuple (or
combination of tuples) of the relation(s) the outer query
 Query 12: Retrieve the name of each employee who has a dependent with the
same first name as the employee.

Q12: SELECT E.FNAME, E.LNAME


FROM EMPLOYEE AS E
WHERE E.SSN IN (SELECT ESSN
FROM DEPENDENT
WHERE ESSN=E.SSN AND
E.FNAME=DEPENDENT_NAME)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-241
CORRELATED NESTED
QUERIES (cont.)
– In Q12, the nested query has a different result for each tuple in the outer
query
– A query written with nested SELECT... FROM... WHERE... blocks and
using the = or IN comparison operators can always be expressed as a
single block query. For example, Q12 may be written as in Q12A

Q12A: SELECT E.FNAME, E.LNAME


FROM EMPLOYEE E, DEPENDENT D
WHERE E.SSN=D.ESSN AND
E.FNAME=D.DEPENDENT_NAME

– The original SQL as specified for SYSTEM R also had a CONTAINS


comparison operator, which is used in conjunction with nested correlated
queries
– This operator was dropped from the language, possibly because of the
difficulty in implementing it efficiently

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-242
CORRELATED NESTED
QUERIES (cont.)
– Most implementations of SQL do not have this operator
– The CONTAINS operator compares two sets of values , and returns
TRUE if one set contains all values in the other set
(reminiscent of the division operation of algebra).
 Query 3: Retrieve the name of each employee who works on all the projects
controlled by department number 5.

Q3: SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE ( (SELECT PNO
FROM WORKS_ON
WHERE SSN=ESSN)
CONTAINS
(SELECT PNUMBER
FROM PROJECT
WHERE DNUM=5) )

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-243
CORRELATED NESTED
QUERIES (cont.)
– In Q3, the second nested query, which is not correlated
with the outer query, retrieves the project numbers of all
projects controlled by department 5
– The first nested query, which is correlated, retrieves the
project numbers on which the employee works, which is
different for each employee tuple because of the
correlation

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-244
THE EXISTS FUNCTION
 EXISTS is used to check whether the result
of a correlated nested query is empty
(contains no tuples) or not
 We can formulate Query 12 in an
alternative form that uses EXISTS as Q12B
below

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-245
THE EXISTS FUNCTION (cont.)
 Query 12: Retrieve the name of each employee who
has a dependent with the same first name as the
employee.

Q12B: SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE EXISTS (SELECT *
FROM DEPENDENT
WHERE SSN=ESSN AND
FNAME=DEPENDENT_NAME)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-246
THE EXISTS FUNCTION (cont.)
 Query 6: Retrieve the names of employees who have no
dependents.

Q6: SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE NOT EXISTS (SELECT *
FROM DEPENDENT
WHERE SSN=ESSN)

– In Q6, the correlated nested query retrieves all DEPENDENT tuples


related to an EMPLOYEE tuple. If none exist , the EMPLOYEE tuple
is selected
– EXISTS is necessary for the expressive power of SQL

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-247
EXPLICIT SETS
 It is also possible to use an explicit (enumerated) set of
values in the WHERE-clause rather than a nested query
 Query 13: Retrieve the social security numbers of all
employees who work on project number 1, 2, or 3.

Q13: SELECT DISTINCT ESSN


FROM WORKS_ON
WHERE PNO IN (1, 2, 3)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-248
NULLS IN SQL QUERIES
 SQL allows queries that check if a value is NULL (missing or
undefined or not applicable)
 SQL uses IS or IS NOT to compare NULLs because it considers
each NULL value distinct from other NULL values, so equality
comparison is not appropriate .
 Query 14: Retrieve the names of all employees who do not have
supervisors.
Q14: SELECT FNAME, LNAME
FROM EMPLOYEE
WHERE SUPERSSN IS NULL
Note: If a join condition is specified, tuples with NULL values for
the join attributes are not included in the result

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-249
Joined Relations Feature
in SQL2
 Can specify a "joined relation" in the FROM-clause
 Looks like any other relation but is the result of a join
 Allows the user to specify different types of joins (regular
"theta" JOIN, NATURAL JOIN, LEFT OUTER JOIN,
RIGHT OUTER JOIN, CROSS JOIN, etc)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-250
Joined Relations Feature
in SQL2 (cont.)
 Examples:

Q8: SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME


FROM EMPLOYEE E S
WHERE E.SUPERSSN=S.SSN

can be written as:

Q8: SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME


FROM (EMPLOYEE E LEFT OUTER JOIN EMPLOYEES
ON E.SUPERSSN=S.SSN)

Q1: SELECT FNAME, LNAME, ADDRESS


FROM EMPLOYEE, DEPARTMENT
WHERE DNAME='Research' AND DNUMBER=DNO

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-251
Joined Relations Feature
in SQL2 (cont.)
 could be written as:

Q1: SELECT FNAME, LNAME, ADDRESS


FROM (EMPLOYEE JOIN DEPARTMENT
ON DNUMBER=DNO)
WHERE DNAME='Research’

or as:

Q1: SELECT FNAME, LNAME, ADDRESS


FROM (EMPLOYEE NATURAL JOIN DEPARTMENT
AS DEPT(DNAME, DNO, MSSN, MSDATE)
WHERE DNAME='Research’

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-252
Joined Relations Feature
in SQL2 (cont.)
 Another Example;
– Q2 could be written as follows; this illustrates multiple
joins in the joined tables

Q2: SELECT PNUMBER, DNUM, LNAME,


BDATE, ADDRESS
FROM (PROJECT JOIN DEPARTMENT
ON DNUM=DNUMBER) JOIN
EMPLOYEE ON MGRSSN=SSN) )
WHERE PLOCATION='Stafford’

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-253
AGGREGATE FUNCTIONS
 Include COUNT, SUM, MAX, MIN, and AVG
 Query 15: Find the maximum salary, the minimum salary, and
the average salary among all employees.

Q15: SELECT MAX(SALARY),


MIN(SALARY), AVG(SALARY)
FROM EMPLOYEE

– Some SQL implementations may not allow more than one


function in the SELECT-clause

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-254
AGGREGATE FUNCTIONS
(cont.)
 Query 16: Find the maximum salary, the minimum salary,
and the average salary among employees who work for the
'Research' department.

Q16: SELECT MAX(SALARY), MIN(SALARY),


AVG(SALARY)
FROM EMPLOYEE, DEPARTMENT
WHERE DNO=DNUMBER AND
DNAME='Research'

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-255
AGGREGATE FUNCTIONS
(cont.)
 Queries 17 and 18: Retrieve the total number of employees
in the company (Q17), and the number of employees in the
'Research' department (Q18).

Q17: SELECT COUNT (*)


FROM EMPLOYEE

Q18: SELECT COUNT (*)


FROM EMPLOYEE,
DEPARTMENT
WHERE DNO=DNUMBER AND
DNAME='Research’

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-256
GROUPING
 In many cases, we want to apply the aggregate
functions to subgroups of tuples in a relation
 Each subgroup of tuples consists of the set of tuples
that have the same value for the grouping
attribute(s)
 The function is applied to each subgroup
independently
 SQL has a GROUP BY-clause for specifying the
grouping attributes, which must also appear in the
SELECT-clause
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 8-257
GROUPING (cont.)
 Query 20: For each department, retrieve the department number, the
number of employees in the department, and their average salary.

Q20: SELECT DNO, COUNT (*), AVG (SALARY)


FROM EMPLOYEE
GROUP BY DNO

– In Q20, the EMPLOYEE tuples are divided into groups--each


group having the same value for the grouping attribute DNO
– The COUNT and AVG functions are applied to each such group
of tuples separately
– The SELECT-clause includes only the grouping attribute and the
functions to be applied on each group of tuples
– A join condition can be used in conjunction with grouping

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-258
GROUPING (cont.)
 Query 21: For each project, retrieve the project number, project
name, and the number of employees who work on that project.

Q21: SELECT PNUMBER, PNAME, COUNT (*)


FROM PROJECT, WORKS_ON
WHERE PNUMBER=PNO
GROUP BY PNUMBER, PNAME

– In this case, the grouping and functions are applied after the joining of
the two relations

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-259
THE HAVING-CLAUSE
 Sometimes we want to retrieve the values of
these functions for only those groups that
satisfy certain conditions
 The HAVING-clause is used for specifying
a selection condition on groups (rather than
on individual tuples)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-260
THE HAVING-CLAUSE (cont.)
 Query 22: For each project on which more than two
employees work , retrieve the project number, project
name, and the number of employees who work on that
project.

Q22: SELECT PNUMBER, PNAME, COUNT


(*)
FROM PROJECT, WORKS_ON
WHERE PNUMBER=PNO
GROUP BY PNUMBER, PNAME
HAVING COUNT (*) > 2

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-261
SUBSTRING COMPARISON
 The LIKE comparison operator is used to
compare partial strings
 Two reserved characters are used: '%' (or '*'
in some implementations) replaces an
arbitrary number of characters, and '_'
replaces a single arbitrary character

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-262
SUBSTRING COMPARISON
(cont.)
 Query 25: Retrieve all employees whose address is in
Houston, Texas. Here, the value of the ADDRESS
attribute must contain the substring 'Houston,TX'.

Q25: SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE ADDRESS LIKE
'%Houston,TX%’

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-263
SUBSTRING COMPARISON
(cont.)
 Query 26: Retrieve all employees who were born during the
1950s. Here, '5' must be the 8th character of the string
(according to our format for date), so the BDATE value is
'_______5_', with each underscore as a place holder for a
single arbitrary character.

Q26: SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE BDATE LIKE '_______5_’

 The LIKE operator allows us to get around the fact that each
value is considered atomic and indivisible; hence, in SQL,
character string attribute values are not atomic
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 8-264
ARITHMETIC OPERATIONS
 The standard arithmetic operators '+', '-'. '*', and '/' (for addition,
subtraction, multiplication, and division, respectively) can be
applied to numeric values in an SQL query result
 Query 27: Show the effect of giving all employees who work
on the 'ProductX' project a 10% raise.

Q27: SELECT FNAME, LNAME, 1.1*SALARY


FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE SSN=ESSN AND PNO=PNUMBER AND
PNAME='ProductX’

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-265
ORDER BY
 The ORDER BY clause is used to sort the tuples in a
query result based on the values of some attribute(s)
 Query 28: Retrieve a list of employees and the
projects each works in, ordered by the employee's
department, and within each department ordered
alphabetically by employee last name.
Q28: SELECT DNAME, LNAME, FNAME, PNAME
FROM DEPARTMENT, EMPLOYEE,
WORKS_ON, PROJECT
WHERE DNUMBER=DNO AND SSN=ESSN
AND PNO=PNUMBER
ORDER BY DNAME, LNAME

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-266
ORDER BY (cont.)
 The default order is in ascending order of values
 We can specify the keyword DESC if we want a
descending order; the keyword ASC can be used to
explicitly specify ascending order, even though it is
the default

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-267
Summary of SQL Queries
 A query in SQL can consist of up to six clauses, but only
the first two, SELECT and FROM, are mandatory. The
clauses are specified in the following order:

SELECT <attribute list>


FROM <table list>
[WHERE <condition>]
[GROUP BY <grouping attribute(s)>]
[HAVING <group condition>]
[ORDER BY <attribute list>]

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-268
Summary of SQL Queries
(cont.)
 The SELECT-clause lists the attributes or functions to be
retrieved
 The FROM-clause specifies all relations (or aliases) needed in
the query but not those needed in nested queries
 The WHERE-clause specifies the conditions for selection and
join of tuples from the relations specified in the FROM-clause
 GROUP BY specifies grouping attributes
 HAVING specifies a condition for selection of groups
 ORDER BY specifies an order for displaying the result of a
query
 A query is evaluated by first applying the WHERE-clause, then
GROUP BY and HAVING, and finally the SELECT-clause

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-269
Specifying Updates in SQL
 There are three SQL commands to modify
the database; INSERT, DELETE, and
UPDATE

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-270
INSERT
 In its simplest form, it is used to add one or
more tuples to a relation
 Attribute values should be listed in the same
order as the attributes were specified in the
CREATE TABLE command

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-271
INSERT (cont.)
 Example:

U1: INSERT INTO EMPLOYEE


VALUES ('Richard','K','Marini', '653298653', '30-DEC-52',
'98 Oak Forest,Katy,TX', 'M', 37000,'987654321', 4 )

 An alternate form of INSERT specifies explicitly the attribute names


that correspond to the values in the new tuple
 Attributes with NULL values can be left out
 Example: Insert a tuple for a new EMPLOYEE for whom we only
know the FNAME, LNAME, and SSN attributes.

U1A: INSERT INTO EMPLOYEE (FNAME, LNAME, SSN)


VALUES ('Richard', 'Marini', '653298653')

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-272
INSERT (cont.)
 Important Note: Only the constraints specified in
the DDL commands are automatically enforced by
the DBMS when updates are applied to the
database
 Another variation of INSERT allows insertion of
multiple tuples resulting from a query into a
relation

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-273
INSERT (cont.)
– Example: Suppose we want to create a temporary table that has the name,
number of employees, and total salaries for each department. A table
DEPTS_INFO is created by U3A, and is loaded with the summary
information retrieved from the database by the query in U3B.

U3A: CREATE TABLE DEPTS_INFO


(DEPT_NAME VARCHAR(10),
NO_OF_EMPS INTEGER,
TOTAL_SAL INTEGER);

U3B: INSERT INTO DEPTS_INFO (DEPT_NAME,


NO_OF_EMPS, TOTAL_SAL)
SELECT DNAME, COUNT (*), SUM (SALARY)
FROM DEPARTMENT, EMPLOYEE
WHERE DNUMBER=DNO
GROUP BY DNAME ;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-274
INSERT (cont.)
 Note: The DEPTS_INFO table may not be up-to-date if we
change the tuples in either the DEPARTMENT or the
EMPLOYEE relations after issuing U3B. We have to
create a view (see later) to keep such a table up to date.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-275
DELETE
 Removes tuples from a relation
 Includes a WHERE-clause to select the tuples to be deleted
 Tuples are deleted from only one table at a time (unless
CASCADE is specified on a referential integrity constraint)
 A missing WHERE-clause specifies that all tuples in the
relation are to be deleted; the table then becomes an empty
table
 The number of tuples deleted depends on the number of
tuples in the relation that satisfy the WHERE-clause
 Referential integrity should be enforced

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-276
DELETE (cont.)
 Examples:
U4A: DELETE FROM EMPLOYEE
WHERE LNAME='Brown’

U4B: DELETE FROM EMPLOYEE


WHERE SSN='123456789’

U4C: DELETE FROM EMPLOYEE


WHERE DNO IN (SELECT
DNUMBER
FROM DEPARTMENT
WHERE DNAME='Research')

U4D: DELETE FROM EMPLOYEE

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-277
UPDATE
 Used to modify attribute values of one or more
selected tuples
 A WHERE-clause selects the tuples to be modified
 An additional SET-clause specifies the attributes to
be modified and their new values
 Each command modifies tuples in the same relation
 Referential integrity should be enforced

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-278
UPDATE (cont.)
 Example: Change the location and controlling department
number of project number 10 to 'Bellaire' and 5,
respectively.

U5: UPDATE PROJECT


SET PLOCATION = 'Bellaire', DNUM = 5
WHERE PNUMBER=10

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-279
UPDATE (cont.)
 Example: Give all employees in the 'Research' department a 10% raise
in salary.

U6: UPDATE EMPLOYEE


SET SALARY = SALARY *1.1
WHERE DNO IN (SELECT DNUMBER
FROM DEPARTMENT
WHERE DNAME='Research')

 In this request, the modified SALARY value depends on the original


SALARY value in each tuple
 The reference to the SALARY attribute on the right of = refers to the
old SALARY value before modification
 The reference to the SALARY attribute on the left of = refers to the new
SALARY value after modification

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 8-280
Copyright © 2004 Pearson Education, Inc.
Chapter 9

MORE SQL:
Assertions,
Views, and
Programming
Techniques

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline
9.1 General Constraints as Assertions
9.2 Views in SQL
9.3 Database Programming
9.4 Embedded SQL
9.5 Functions Calls, SQL/CLI
9.6 Stored Procedures, SQL/PSM
9.7 Summary

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-283
Chapter Objectives
 Specification of more general constraints
via assertions
 SQL facilities for defining views (virtual
tables)
 Various techniques for accessing and
manipulating a database via programs in
general-purpose languages (e.g., Java)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-284
Constraints as Assertions
 General constraints: constraints that do
not fit in the basic SQL categories
(presented in chapter 8)
 Mechanism: CREAT ASSERTION
– components include: a constraint name,
followed by CHECK, followed by a condition

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-285
Assertions: An Example
 “The salary of an employee must not be
greater than the salary of the manager of the
department that the employee works for’’
CREAT ASSERTION SALARY_CONSTRAINT
CHECK (NOT EXISTS (SELECT *
FROM EMPLOYEE E, EMPLOYEE M, DEPARTMENT D
WHERE E.SALARY > M.SALARY AND
E.DNO=D.NUMBER AND D.MGRSSN=M.SSN))

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-286
Using General Assertions
 Specify a query that violates the condition;
include inside a NOT EXISTS clause
 Query result must be empty
– if the query result is not empty, the assertion
has been violated

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-287
SQL Triggers
 Objective: to monitor a database and take
action when a condition occurs
 Triggers are expressed in a syntax similar to
assertions and include the following:
– event (e.g., an update operation)
– condition
– action (to be taken when the condition is
satisfied)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-288
SQL Triggers: An Example
 A trigger to compare an employee’s salary to his/her
supervisor during insert or update operations:

CREATE TRIGGER INFORM_SUPERVISOR


BEFORE INSERT OR UPDATE OF
SALARY, SUPERVISOR_SSN ON EMPLOYEE
FOR EACH ROW
WHEN
(NEW.SALARY> (SELECT SALARY FROM EMPLOYEE
WHERE SSN=NEW.SUPERVISOR_SSN))
INFORM_SUPERVISOR (NEW.SUPERVISOR_SSN,NEW.SSN;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-289
Views in SQL
 A view is a “virtual” table that is derived
from other tables
 Allows for limited update operations (since
the table may not physically be stored)
 Allows full query operations
 A convenience for expressing certain
operations

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-290
Specification of Views
 SQL command: CREATE VIEW
– a table (view) name
– a possible list of attribute names (for example,
when arithmetic operations are specified or
when we want the names to be different from
the attributes in the base relations)
– a query to specify the table contents

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-291
SQL Views: An Example
 Specify a different WORKS_ON table

CREATE TABLE WORKS_ON_NEW AS


SELECT FNAME, LNAME, PNAME, HOURS
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE SSN=ESSN AND PNO=PNUMBER
GROUP BY PNAME;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-292
Using a Virtual Table
 We can specify SQL queries on a newly
create table (view):
SELECT FNAME, LNAME FROM WORKS_ON_NEW
WHERE PNAME=‘Seena’;
 When no longer needed, a view can be
dropped:
DROP WORKS_ON_NEW;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-293
Efficient View Implementation
 Query modification: present the view query
in terms of a query on the underlying base
tables
– disadvantage: inefficient for views defined via
complex queries (especially if additional
queries are to be applied to the view within a
short time period)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-294
Efficient View Implementation
 View materialization: involves physically
creating and keeping a temporary table
– assumption: other queries on the view will
follow
– concerns: maintaining correspondence between
the base table and the view when the base table
is updated
– strategy: incremental update

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-295
View Update
 Update on a single view without aggregate
operations: update may map to an update on
the underlying base table
 Views involving joins: an update may map
to an update on the underlying base
relations
– not always possible

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-296
Un-updatable Views
 Views defined using groups and aggregate
functions are not updateable
 Views defined on multiple tables using joins
are generally not updateable
 WITH CHECK OPTION: must be added to the
definition of a view if the view is to be
updated
– to allow check for updatability and to plan for an
execution strategy
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 9-297
Database Programming
 Objective: to access a database from an
application program (as opposed to
interactive interfaces)
 Why? An interactive interface is convenient
but not sufficient; a majority of database
operations are made thru application
programs (nowadays thru web applications)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-298
Database Programming Approaches

 Embedded commands: database commands


are embedded in a general-purpose
programming language
 Library of database functions: available to
the host language for database calls; known
as an API
 A brand new, full-fledged language
(minimizes impedance mismatch)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 9-299
Impedance Mismatch
 Incompatibilities between a host
programming language and the database
model, e.g.,
– type mismatch and incompatibilities; requires a
new binding for each language
– set vs. record-at-a-time processing
need special iterators to loop over query results and
manipulate individual values

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-300
Steps in Database Programming

1. Client program opens a connection to the


database server
2. Client program submits queries to and/or
updates the database
3. When database access is no longer
needed, client program terminates the
connection

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-301
Embedded SQL
 Most SQL statements can be embedded in a
general-purpose host programming language
such as COBOL, C, Java
 An embedded SQL statement is
distinguished from the host language
statements by EXEC SQL and a matching
END-EXEC (or semicolon)
– shared variables (used in both languages)
usually prefixed with a colon (:) in SQL
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 9-302
Example: Variable Declaration
in Language C
 Variables inside DECLARE are shared and can appear (while
prefixed by a colon) in SQL statements
 SQLCODE is used to communicate errors/exceptions between
the database and the program
int loop;
EXEC SQL BEGIN DECLARE SECTION;
varchar dname[16], fname[16], …;
char ssn[10], bdate[11], …;
int dno, dnumber, SQLCODE, …;
EXEC SQL END DECLARE SECTION;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-303
SQL Commands for
Connecting to a Database
 Connection (multiple connections are
possible but only one is active)
CONNECT TO server-name AS connection-name
AUTHORIZATION user-account-info;
 Change from an active connection to
another one
SET CONNECTION connection-name;

 Disconnection
DISCONNECT connection-name;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-304
Embedded SQL in C
Programming Examples
loop = 1;
while (loop) {
prompt (“Enter SSN: “, ssn);
EXEC SQL
select FNAME, LNAME, ADDRESS, SALARY
into :fname, :lname, :address, :salary
from EMPLOYEE where SSN == :ssn;
if (SQLCODE == 0) printf(fname, …);
else printf(“SSN does not exist: “, ssn);
prompt(“More SSN? (1=yes, 0=no): “, loop);
END-EXEC
}

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-305
Embedded SQL in C
Programming Examples
 A cursor (iterator) is needed to process
multiple tuples
 FETCH commands move the cursor to the
next tuple
 CLOSE CURSOR indicates that the
processing of query results has been
completed

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-306
Dynamic SQL
 Objective: executing new (not previously compiled)
SQL statements at run-time
– a program accepts SQL statements from the keyboard at
run-time
– a point-and-click operation translates to certain SQL query
 Dynamic update is relatively simple; dynamic query
can be complex
– because the type and number of retrieved attributes are
unknown at compile time

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-307
Dynamic SQL: An Example

EXEC SQL BEGIN DECLARE SECTION;


varchar sqlupdatestring[256];
EXEC SQL END DECLARE SECTION;

prompt (“Enter update command:“,
sqlupdatestring);
EXEC SQL PREPARE sqlcommand FROM
:sqlupdatestring;
EXEC SQL EXECUTE sqlcommand;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-308
Embedded SQL in Java
 SQLJ: a standard for embedding SQL in
Java
 An SQLJ translator converts SQL
statements into Java (to be executed thru the
JDBC interface)
 Certain classes, e.g., java.sql have to be
imported

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-309
Java Database Connectivity
 JDBC: SQL connection function calls for
Java programming
 A Java program with JDBC functions can
access any relational DBMS that has a
JDBC driver
 JDBC allows a program to connect to
several databases (known as data sources)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-310
Steps in JDBC Database Access

1. Import JDBC library (java.sql.*)


2. Load JDBC driver:
Class.forname(“oracle.jdbc.driver.OracleDriver”)

3. Define appropriate variables


4. Create a connect object (via getConnection)
5. Create a statement object from the
Statement class:
1. PreparedStatment
2. CallableStatement

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-311
Steps in JDBC Database Access
(continued)
6. Identify statement parameters (to be
designated by question marks)
7. Bound parameters to program variables
8. Execute SQL statement (referenced by an
object) via JDBC’s executeQuery
9. Process query results (returned in an object
of type ResultSet)
– ResultSet is a 2-dimentional table

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-312
Embedded SQL in Java:
An Example
ssn = readEntry(“Enter a SSN: “);
try {
#sql{select FNAME< LNAME, ADDRESS, SALARY
into :fname, :lname, :address, :salary
from EMPLOYEE where SSN = :ssn};
}
catch (SQLException se) {
System.out.println(“SSN does not exist: “,+ssn);
return;
}
System.out.println(fname+“ “+lname+… );

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-313
Multiple Tuples in SQLJ
 SQLJ supports two types of iterators:
– named iterator: associated with a query result
– positional iterator: lists only attribute types in a
query result
 A FETCH operation retrieves the next tuple in
a query result:
fetch iterator-variable into program-variable

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-314
Database Programming with
Functional Calls
 Embedded SQL provides static database
programming
 API: dynamic database programming with a
library of functions
– advantage: no preprocessor needed (thus more
flexible)
– drawback: SQL syntax checks to be done at
run-time
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 9-315
SQL Call Level Interface
 A part of the SQL standard
 Provides easy access to several databases
within the same program
 Certain libraries (e.g., sqlcli.h for C)
have to be installed and available
 SQL statements are dynamically created
and passed as string parameters in the calls

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-316
Components of SQL/CLI
 Environment record: keeps track of
database connections
 Connection record: keep tracks of info
needed for a particular connection
 Statement record: keeps track of info
needed for one SQL statement
 Description record: keeps track of tuples

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-317
Steps in C and SQL/CLI
Programming
1. Load SQL/CLI libraries
2. Declare record handle variables for the above
components (called: SQLHSTMT, SQLHDBC,
SQLHENV, SQLHDEC)
3. Set up an environment record using
SQLAllocHandle
4. Set up a connection record using SQLAllocHandle
5. Set up a statement record using SQLAllocHandle

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-318
Steps in C and SQL/CLI
Programming (continued)
6. Prepare a statement using SQL/CLI
function SQLPrepare
7. Bound parameters to program variables
8. Execute SQL statement via SQLExecute
9. Bound columns in a query to a C variable
via SQLBindCol
10. Use SQLFetch to retrieve column values
into C variables
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 9-319
Database Stored Procedures
 Persistent procedures/functions (modules) are
stored locally and executed by the database server
(as opposed to execution by clients)
 Advantages:
– if the procedure is needed by many applications, it can
be invoked by any of them (thus reduce duplications)
– execution by the server reduces communication costs
– enhance the modeling power of views

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-320
Stored Procedure Constructs
 A stored procedure
CREATE PROCEDURE procedure-name (params)
local-declarations
procedure-body;

 A stored function
CREATE FUNCTION fun-name (params) RETRUNS return-type
local-declarations
function-body;

 Calling a procedure or function


CALL procedure-name/fun-name (arguments);

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-321
SQL Persistent Stored Modules

 SQL/PSM: part of the SQL standard for


writing persistent stored modules
 SQL + stored procedures/functions +
additional programming constructs
– e.g., branching and looping statements
– enhance the power of SQL

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-322
SQL/PSM: An Example
CREATE FUNCTION DEPT_SIZE (IN deptno INTEGER)
RETURNS VARCHAR[7]
DECLARE TOT_EMPS INTEGER;

SELECT COUNT (*) INTO TOT_EMPS


FROM SELECT EMPLOYEE WHERE DNO = deptno;
IF TOT_EMPS > 100 THEN RETURN “HUGE”
ELSEIF TOT_EMPS > 50 THEN RETURN “LARGE”
ELSEIF TOT_EMPS > 30 THEN RETURN “MEDIUM”
ELSE RETURN “SMALL”
ENDIF;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-323
Summary
 Assertions provide a means to specify
additional constraints
 Triggers are a special kind of assertions;
they define actions to be taken when
certain conditions occur
 Views are a convenient means for creating
temporary (virtual) tables

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-324
Summary (continued)
 A database may be accessed via an interactive
database
 Most often, however, data in a database is
manipulate via application programs
 Several methods of database programming:
– embedded SQL
– dynamic SQL
– stored procedure and function

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 9-325
Copyright © 2004 Pearson Education, Inc.
Chapter 10
Functional Dependencies and
Normalization for Relational
Databases

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline
1 Informal Design Guidelines for Relational Databases
1.1Semantics of the Relation Attributes
1.2 Redundant Information in Tuples and Update Anomalies
1.3 Null Values in Tuples
1.4 Spurious Tuples
2 Functional Dependencies (FDs)
2.1 Definition of FD
2.2 Inference Rules for FDs
2.3 Equivalence of Sets of FDs
2.4 Minimal Sets of FDs

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-328
Chapter Outline(contd.)
3 Normal Forms Based on Primary Keys
3.1 Normalization of Relations
3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form
4 General Normal Form Definitions (For Multiple
Keys)
5 BCNF (Boyce-Codd Normal Form)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-329
1 Informal Design Guidelines for
Relational Databases (1)

 What is relational database design?


The grouping of attributes to form "good" relation schemas
  Two levels of relation schemas
– The logical "user view" level
– The storage "base relation" level
  Design is concerned mainly with base relations
  What are the criteria for "good" base relations? 

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-330
Informal Design Guidelines for
Relational Databases (2)
 We first discuss informal guidelines for good
relational design
 Then we discuss formal concepts of functional
dependencies and normal forms
- 1NF (First Normal Form)
- 2NF (Second Normal Form)
- 3NF (Third Normal Form)
- BCNF (Boyce-Codd Normal Form)
 Additional types of dependencies, further normal
forms, relational design algorithms by synthesis are
discussed in Chapter 11
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-331
1.1 Semantics of the Relation
Attributes
GUIDELINE 1: Informally, each tuple in a relation
should represent one entity or relationship instance.
(Applies to individual relations and their attributes).
 Attributes of different entities (EMPLOYEEs, DEPARTMENTs,
PROJECTs) should not be mixed in the same relation
 Only foreign keys should be used to refer to other entities
  Entity and relationship attributes should be kept apart as much as
possible.
Bottom Line: Design a schema that can be explained
easily relation by relation. The semantics of
attributes should be easy to interpret.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-332
Figure 10.1 A simplified COMPANY
relational database schema

Note: The above figure is now called Figure 10.1 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 10-333
Copyright © 2004 Pearson Education, Inc.
1.2 Redundant Information in Tuples
and Update Anomalies
 Mixing attributes of multiple entities may cause
problems
 Information is stored redundantly wasting storage
 Problems with update anomalies
– Insertion anomalies
– Deletion anomalies
– Modification anomalies

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-334
EXAMPLE OF AN UPDATE
ANOMALY (1)
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
 
 Update Anomaly: Changing the name of project
number P1 from “Billing” to “Customer-
Accounting” may cause this update to be made for
all 100 employees working on project P1.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-335
EXAMPLE OF AN UPDATE
ANOMALY (2)
 Insert Anomaly: Cannot insert a project unless
an employee is assigned to .
Inversely - Cannot insert an employee unless an
he/she is assigned to a project.
  Delete Anomaly: When a project is deleted, it
will result in deleting all the employees who work
on that project. Alternately, if an employee is the
sole employee on a project, deleting that employee
would result in deleting the corresponding project.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-336
Figure 10.3 Two relation schemas
suffering from update anomalies

Note: The above figure is now called Figure 10.3 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-337
Figure 10.4 Example States for EMP_DEPT
and EMP_PROJ

Note: The above figure is now called Figure 10.4 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-338
Guideline to Redundant Information in
Tuples and Update Anomalies
 GUIDELINE 2: Design a schema that does not
suffer from the insertion, deletion and update
anomalies. If there are any present, then note them
so that applications can be made to take them into
account

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-339
1.3 Null Values in Tuples

GUIDELINE 3: Relations should be designed such


that their tuples will have as few NULL values as
possible
  Attributes that are NULL frequently could be
placed in separate relations (with the primary key)
  Reasons for nulls:
– attribute not applicable or invalid
– attribute value unknown (may exist)
– value known to exist, but unavailable

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-340
1.4 Spurious Tuples

 Bad designs for a relational database may result in


erroneous results for certain JOIN operations
 The "lossless join" property is used to guarantee
meaningful results for join operations

GUIDELINE 4: The relations should be designed to


satisfy the lossless join condition. No spurious
tuples should be generated by doing a natural-join
of any relations.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-341
Spurious Tuples (2)

 There are two important properties of decompositions:


(a) non-additive or losslessness of the corresponding
join
(b) preservation of the functional dependencies.

Note that property (a) is extremely important and


cannot be sacrificed. Property (b) is less stringent
and may be sacrificed. (See Chapter 11).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-342
2.1 Functional Dependencies (1)

 Functional dependencies (FDs) are used to specify


formal measures of the "goodness" of relational
designs
 FDs and keys are used to define normal forms for
relations
 FDs are constraints that are derived from the
meaning and interrelationships of the data attributes
 A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique
value for Y

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-343
Functional Dependencies (2)

 X -> Y holds if whenever two tuples have the same value


for X, they must have the same value for Y
 For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]
 X -> Y in R specifies a constraint on all relation instances
r(R)
 Written as X -> Y; can be displayed graphically on a
relation schema as in Figures. ( denoted by the arrow: ).
 FDs are derived from the real-world constraints on the
attributes

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-344
Examples of FD constraints (1)

 social security number determines employee name


SSN -> ENAME
 project number determines project name and
location
PNUMBER -> {PNAME, PLOCATION}
 employee ssn and project number determines the
hours per week that the employee works on the
project
{SSN, PNUMBER} -> HOURS

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-345
Examples of FD constraints (2)

 An FD is a property of the attributes in the schema


R
 The constraint must hold on every relation
instance r(R)
 If K is a key of R, then K functionally determines
all attributes in R (since we never have two
distinct tuples with t1[K]=t2[K])

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-346
2.2 Inference Rules for FDs (1)

 Given a set of FDs F, we can infer additional FDs


that hold whenever the FDs in F hold
 Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X -> Y
IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

  IR1, IR2, IR3 form a sound and complete set of


inference rules
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-347
Inference Rules for FDs (2)

Some additional inference rules that are useful:


(Decomposition) If X -> YZ, then X -> Y and X -> Z
(Union) If X -> Y and X -> Z, then X -> YZ
(Psuedotransitivity) If X -> Y and WY -> Z, then WX ->
Z

  The last three inference rules, as well as any other


inference rules, can be deduced from IR1, IR2,
and IR3 (completeness property)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-348
Inference Rules for FDs (3)

 Closure of a set F of FDs is the set F+ of all FDs


that can be inferred from F

 Closure of a set of attributes X with respect to F is


the set X + of all attributes that are functionally
determined by X

 X + can be calculated by repeatedly applying IR1,


IR2, IR3 using the FDs in F

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-349
2.3 Equivalence of Sets of FDs

 Two sets of FDs F and G are equivalent if:


- every FD in F can be inferred from G, and
- every FD in G can be inferred from F
 Hence, F and G are equivalent if F + =G +
Definition: F covers G if every FD in G can be
inferred from F (i.e., if G + subset-of F +)
 F and G are equivalent if F covers G and G covers F
 There is an algorithm for checking equivalence of
sets of FDs

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-350
2.4 Minimal Sets of FDs (1)

 A set of FDs is minimal if it satisfies the


following conditions:
(1) Every dependency in F has a single attribute for its RHS.
(2) We cannot remove any dependency from F and have a
set of dependencies that is equivalent to F.
(3) We cannot replace any dependency X -> A in F with a
dependency Y -> A, where Y proper-subset-of X
( Y subset-of X) and still have a set of dependencies that
is equivalent to F.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-351
Minimal Sets of FDs (2)

 Every set of FDs has an equivalent minimal set


 There can be several equivalent minimal sets
 There is no simple algorithm for computing a
minimal set of FDs that is equivalent to a set F of
FDs
 To synthesize a set of relations, we assume that we
start with a set of dependencies that is a minimal
set (e.g., see algorithms 11.2 and 11.4)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-352
3 Normal Forms Based on Primary
Keys
3.1 Normalization of Relations
3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes
Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-353
3.1 Normalization of Relations (1)

 Normalization: The process of decomposing


unsatisfactory "bad" relations by breaking up their
attributes into smaller relations

 Normal form: Condition using keys and FDs of a


relation to certify whether a relation schema is in a
particular normal form

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-354
Normalization of Relations (2)

 2NF, 3NF, BCNF based on keys and FDs of a


relation schema
 4NF based on keys, multi-valued dependencies :
MVDs; 5NF based on keys, join dependencies :
JDs (Chapter 11)
 Additional properties may be needed to ensure a
good relational design (lossless join, dependency
preservation; Chapter 11)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-355
3.2 Practical Use of Normal Forms

 Normalization is carried out in practice so that the


resulting designs are of high quality and meet the desirable
properties
 The practical utility of these normal forms becomes
questionable when the constraints on which they are based
are hard to understand or to detect
 The database designers need not normalize to the highest
possible normal form. (usually up to 3NF, BCNF or 4NF)
 Denormalization: the process of storing the join of higher
normal form relations as a base relation—which is in a
lower normal form

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-356
3.3 Definitions of Keys and Attributes
Participating in Keys (1)
 A superkey of a relation schema R = {A1, A2, ....,
An} is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]

 A key K is a superkey with the additional


property that removal of any attribute from K will
cause K not to be a superkey any more.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-357
Definitions of Keys and Attributes
Participating in Keys (2)
 If a relation schema has more than one key, each
is called a candidate key. One of the candidate
keys is arbitrarily designated to be the primary
key, and the others are called secondary keys.
 A Prime attribute must be a member of some
candidate key
 A Nonprime attribute is not a prime attribute—
that is, it is not a member of any candidate key.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-358
3.2 First Normal Form

 Disallows composite attributes, multivalued


attributes, and nested relations; attributes
whose values for an individual tuple are
non-atomic

 Considered to be part of the definition of


relation

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-359
Figure 10.8 Normalization into 1NF

Note: The above figure is now called Figure 10.8 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-360
Figure 10.9 Normalization nested
relations into 1NF

Note: The above figure is now called Figure 10.9 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-361
3.3 Second Normal Form (1)
 Uses the concepts of FDs, primary key
Definitions:
 Prime attribute - attribute that is member of the
primary key K
 Full functional dependency - a FD Y -> Z
where removal of any attribute from Y means the
FD does not hold any more
Examples: - {SSN, PNUMBER} -> HOURS is a full FD
since neither SSN -> HOURS nor PNUMBER -> HOURS hold
- {SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-362
Second Normal Form (2)

 A relation schema R is in second normal


form (2NF) if every non-prime attribute A
in R is fully functionally dependent on the
primary key

 R can be decomposed into 2NF relations via


the process of 2NF normalization

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-363
Figure 10.10 Normalizing into 2NF and
3NF

Note: The above figure is now called Figure 10.10 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-364
Figure 10.11 Normalization into 2NF and
3NF

Note: The above figure is now called Figure 10.11 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-365
3.4 Third Normal Form (1)

Definition:
 Transitive functional dependency - a FD X -> Z
that can be derived from two FDs X -> Y and Y -> Z
Examples:
- SSN -> DMGRSSN is a transitive FD since
SSN -> DNUMBER and DNUMBER -> DMGRSSN
hold
- SSN -> ENAME is non-transitive since there is no set
of attributes X where SSN -> X and X -> ENAME

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-366
Third Normal Form (2)

 A relation schema R is in third normal form


(3NF) if it is in 2NF and no non-prime attribute A
in R is transitively dependent on the primary key
 R can be decomposed into 3NF relations via the
process of 3NF normalization
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a
problem only if Y is not a candidate key. When Y is a candidate key,
there is no problem with the transitive dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-367
4 General Normal Form Definitions
(For Multiple Keys) (1)
 The above definitions consider the primary key
only
 The following more general definitions take into
account relations with multiple candidate keys
 A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on every key of R

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-368
General Normal Form Definitions (2)

Definition:
 Superkey of relation schema R - a set of attributes
S of R that contains a key of R
 A relation schema R is in third normal form (3NF)
if whenever a FD X -> A holds in R, then either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
NOTE: Boyce-Codd normal form disallows condition (b)
above

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-369
5 BCNF (Boyce-Codd Normal Form)

 A relation schema R is in Boyce-Codd Normal


Form (BCNF) if whenever an FD X -> A holds
in R, then X is a superkey of R
 Each normal form is strictly stronger than the previous one
– Every 2NF relation is in 1NF
– Every 3NF relation is in 2NF
– Every BCNF relation is in 3NF
 There exist relations that are in 3NF but not in BCNF
 The goal is to have each relation in BCNF (or 3NF)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-370
Figure 10.12 Boyce-Codd normal form

Note: The above figure is now called Figure 10.12 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-371
Figure 10.13 a relation TEACH that is in
3NF but not in BCNF

Note: The above figure is now called Figure 10.13 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 10-372
Achieving the BCNF by Decomposition
(1)
 Two FDs exist in the relation TEACH:
fd1: { student, course} -> instructor
fd2: instructor -> course
 {student, course} is a candidate key for this relation and that
the dependencies shown follow the pattern in Figure 10.12
(b). So this relation is in 3NF but not in BCNF
 A relation NOT in BCNF should be decomposed so as to
meet this property, while possibly forgoing the preservation
of all functional dependencies in the decomposed relations.
(See Algorithm 11.3)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-373
Achieving the BCNF by Decomposition
(2)
 Three possible decompositions for relation TEACH
1. {student, instructor} and {student, course}
2. {course, instructor } and {course, student}
3. {instructor, course } and {instructor, student}
 All three decompositions will lose fd1. We have to settle for sacrificing the
functional dependency preservation. But we cannot sacrifice the non-additivity
property after decomposition.
 Out of the above three, only the 3rd decomposition will not generate spurious
tuples after join.(and hence has the non-additivity property).
 A test to determine whether a binary decomposition (decomposition into two
relations) is nonadditive (lossless) is discussed in section 11.1.4 under Property
LJ1. Verify that the third decomposition above meets the property.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 10-374
Copyright © 2004 Pearson Education, Inc.
Chapter 11
Relational Database Design
Algorithms and Further
Dependencies

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Chapter Outline
0. Designing a Set of Relations
1. Properties of Relational Decompositions
2. Algorithms for Relational Database Schema
3. Multivalued Dependencies and Fourth Normal Form
4. Join Dependencies and Fifth Normal Form
5. Inclusion Dependencies
6. Other Dependencies and Normal Forms

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-377
DESIGNING A SET OF RELATIONS (1)

The Approach of Relational Synthesis (Bottom-up


Design) :
  Assumes that all possible functional dependencies
are known.
 First constructs a minimal set of FDs
 Then applies algorithms that construct a target set of
3NF or BCNF relations.
 Additional criteria may be needed to ensure the the
set of relations in a relational database are
satisfactory (see Algorithms 11.2 and 11.4).
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-378
DESIGNING A SET OF RELATIONS
(2)
Goals:
 Lossless join property (a must) – algorithm 11.1
tests for general losslessness.
 Dependency preservation property – algorithms 11.3
decomposes a relation into BCNF components by
sacrificing the dependency preservation.
 Additional normal forms
– 4NF (based on multi-valued dependencies)
– 5NF (based on join dependencies)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-379
1. Properties of Relational Decompositions (1)
Relation Decomposition and Insufficiency of Normal
Forms:
 Universal Relation Schema: a relation schema R={A1, A2, …,
An} that includes all the attributes of the database.
 Universal relation assumption: every attribute name is
unique.
 Decomposition: The process of decomposing the universal
relation schema R into a set of relation schemas D = {R1,R2,
…, Rm} that will become the relational database schema by
using the functional dependencies.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-380
Properties of Relational Decompositions (2)
Relation Decomposition and Insufficiency of Normal
Forms (cont.):
 Attribute preservation condition: Each attribute in
R will appear in at least one relation schema Ri in the
decomposition so that no attributes are “lost”.
 Another goal of decomposition is to have each
individual relation Ri in the decomposition D be in
BCNF or 3NF.
 Additional properties of decomposition are needed to
prevent from generating spurious tuples
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-381
Properties of Relational Decompositions (3)
Dependency Preservation Property of a
Decomposition :
Definition:
 Given a set of dependencies F on R, the projection of
F on Ri, denoted by pRi(F) where Ri is a subset of R, is
the set of dependencies X  Y in F+ such that the
attributes in X υ Y are all contained in Ri. Hence, the
projection of F on each relation schema Ri in the
decomposition D is the set of functional dependencies
in F+, the closure of F, such that all their left- and
right-hand-side attributes are in Ri.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-382
Properties of Relational Decompositions (4)
Dependency Preservation Property of a
Decomposition (cont.):
 Dependency Preservation Property: a decomposition
D = {R1, R2, ..., Rm} of R is dependency-preserving
with respect to F if the union of the projections of F on
each Ri in D is equivalent to F; that is, ((R1(F)) υ . . .
υ (Rm(F)))+ = F+
(See examples in Fig 10.12a and Fig 10.11)
Claim 1: It is always possible to find a dependency-
preserving decomposition D with respect to F such that
each relation Ri in D is in 3nf.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-383
Properties of Relational Decompositions (5)
Lossless (Non-additive) Join Property of a Decomposition:
Definition:
 Lossless join property: a decomposition D = {R1, R2, ..., Rm} of
R has the lossless (nonadditive) join property with respect to
the set of dependencies F on R if, for every relation state r of R
that satisfies F, the following holds, where * is the natural join
of all the relations in D:
* (R1(r), ..., Rm(r)) = r

Note: The word loss in lossless refers to loss of information, not


to loss of tuples. In fact, for “loss of information” a better term
is “addition of spurious information”

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-384
Properties of Relational Decompositions (6)
Lossless (Non-additive) Join Property of a Decomposition (cont.):
Algorithm 11.1: Testing for Lossless Join Property
Input: A universal relation R, a decomposition D = {R1, R2, ..., Rm}
of R, and a set F of functional dependencies.
1. Create an initial matrix S with one row i for each relation Ri in
D, and one column j for each attribute Aj in R.
2. Set S(i,j):=bij for all matrix entries. (* each bij is a distinct
symbol associated with indices (i,j) *).
3. For each row i representing relation schema Ri
{for each column j representing attribute Aj
{if (relation Ri includes attribute Aj) then set S(i,j):= aj;};};
(* each aj is a distinct symbol associated with index (j) *)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-385
Properties of Relational Decompositions (7)
Lossless (Non-additive) Join Property of a Decomposition (cont.):
Algorithm 11.1: Testing for Lossless Join Property (cont.)
4. Repeat the following loop until a complete loop execution results
in no changes to S
            {for each functional dependency X Y in F
{for all rows in S which have the same symbols in the columns corresponding to
attributes in X
{make the symbols in each column that correspond to an attribute in Y be the
same in all these rows as follows: if any of the rows has an “a” symbol for the
column, set the other rows to that same “a” symbol in the column. If no “a”
symbol exists for the attribute in any of the rows, choose one of the “b”
symbols that appear in one of the rows for the attribute and set the other rows to
that same “b” symbol in the column ;};};};
5. If a row is made up entirely of “a” symbols, then the
decomposition has the lossless join property; otherwise it does not.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-386
Properties of Relational Decompositions (8)
Lossless (nonadditive) join test for n-ary decompositions.
(a) Case 1: Decomposition of EMP_PROJ into EMP_PROJ1 and EMP_LOCS fails test. (b) A

decomposition of EMP_PROJ that has the lossless join property.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-387
Properties of Relational Decompositions (8)
Lossless (nonadditive)
join test for n-ary
decompositions.
(c) Case 2:
Decomposition of
EMP_PROJ into EMP,
PROJECT, and
WORKS_ON satisfies
test.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-388
Properties of Relational Decompositions (9)
Testing Binary Decompositions for Lossless Join
Property:
 Binary Decomposition: decomposition of a relation R
into two relations.
 PROPERTY LJ1 (lossless join test for binary
decompositions): A decomposition D = {R1, R2} of R
has the lossless join property with respect to a set of
functional dependencies F on R if and only if either
– The f.d. ((R1 ∩ R2)  (R1- R2)) is in F+, or
– The f.d. ((R1 ∩ R2)  (R2 - R1)) is in F+.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-389
Properties of Relational Decompositions (10)
Successive Lossless Join Decomposition:
 Claim 2 (Preservation of non-additivity in
successive decompositions):
If a decomposition D = {R1, R2, ..., Rm} of R has the lossless
(non-additive) join property with respect to a set of functional
dependencies F on R, and if a decomposition Di = {Q1, Q2, ...,
Qk} of Ri has the lossless (non-additive) join property with
respect to the projection of F on Ri, then the decomposition D2 =
{R1, R2, ..., Ri-1, Q1, Q2, ..., Qk, Ri+1, ..., Rm} of R has the non-
additive join property with respect to F.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-390
2. Algorithms for Relational Database Schema
Design (1)
Algorithm 11.2: Relational Synthesis into 3NF with Dependency
Preservation (Relational Synthesis Algorithm)
Input: A universal relation R and a set of functional dependencies F on
the attributes of R.
1. Find a minimal cover G for F (use Algorithm 10.2);
2. For each left-hand-side X of a functional dependency that appears in
G, create a relation schema in D with attributes {X υ {A1} υ {A2} ...
υ {Ak}}, where X  A1, X  A2, ..., X  Ak are the only dependencies in
G with X as left-hand-side (X is the key of this relation) ;
3. Place any remaining attributes (that have not been placed in any
relation) in a single relation schema to ensure the attribute
preservation property.
Claim 3: Every relation schema created by Algorithm 11.2 is in 3NF.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-391
Algorithms for Relational Database Schema
Design (2)
Algorithm 11.3: Relational Decomposition into BCNF with
Lossless (non-additive) join property
Input: A universal relation R and a set of functional dependencies F
on the attributes of R.
1. Set D := {R};
2. While there is a relation schema Q in D that is not in BCNF
do {
choose a relation schema Q in D that is not in BCNF;
           find a functional dependency X  Y in Q that violates BCNF;
           replace Q in D by two relation schemas (Q - Y) and (X υ Y);
};
Assumption: No null values are allowed for the join attributes.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-392
Algorithms for Relational Database Schema
Design (3)
Algorithm 11.4 Relational Synthesis into 3NF with Dependency
Preservation and Lossless (Non-Additive) Join Property
Input: A universal relation R and a set of functional dependencies F
on the attributes of R.
1. Find a minimal cover G for F (Use Algorithm 10.2).
2. For each left-hand-side X of a functional dependency that
appears in G, create a relation schema in D with attributes {X υ
{A1} υ {A2} ... υ {Ak}}, where X  A1, X  A2, ..., X –>Ak are the
only dependencies in G with X as left-hand-side (X is the key of
this relation).
3. If none of the relation schemas in D contains a key of R, then
create one more relation schema in D that contains attributes
that form a key of R. (Use Algorithm 11.4a to find the key of R)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-393
Algorithms for Relational Database Schema
Design (4)
Algorithm 11.4a Finding a Key K for R Given a set F of
Functional Dependencies
Input: A universal relation R and a set of functional dependencies F
on the attributes of R.
1. Set K := R.
2. For each attribute A in K {
           compute (K - A)+ with respect to F;
If (K - A)+ contains all the attributes in R,
then set K := K - {A}; }

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-394
Algorithms for Relational Database Schema
Design (5)
Issues with null-value joins. (a) Some EMPLOYEE tuples have null for the join attribute

DNUM.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-395
Algorithms for Relational Database Schema
Design (5)
Issues with null-value joins. (b) Result of applying NATURAL JOIN to the EMPLOYEE and
DEPARTMENT relations. (c) Result of applying LEFT OUTER JOIN to EMPLOYEE and
DEPARTMENT.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-396
Algorithms for Relational Database Schema
Design (6)
The “dangling tuple” problem. (a) The relation EMPLOYEE_1 (includes all attributes of

EMPLOYEE from frigure 11.2a except DNUM).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-397
Algorithms for Relational Database Schema
Design (6)
The “dangling tuple” problem. (b) The relation EMPLOYEE_2 (includes DNUM attribute with
null values). (c) The relation EMPLOYEE_3 (includes DNUM attribute but does not include
tuples for which DNUM has null values).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-398
Algorithms for Relational Database Schema
Design (7)
Discussion of Normalization Algorithms:
Problems:
 The database designer must first specify all the relevant
functional dependencies among the database attributes.
 These algorithms are not deterministic in general.
 It is not always possible to find a decomposition into relation
schemas that preserves dependencies and allows each relation
schema in the decomposition to be in BCNF (instead of 3NF as
in Algorithm 11.4).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-399
Algorithms for Relational Database Schema
Design (8)
Table 11.1 Summary of some of the algorithms discussed above
Algorit Input Output Properties/Purp Remarks
hm ose

11.1 A decomposition Boolean result: Testing for non- See a simpler test
D of R and a set F yes or no for additive join in Section 11.1.4
of functional lossless join decomposition for binary
dependencies property decompositions
11.2 Set of functional A set of Dependency No guarantee of
dependencies F relations in 3NF preservation satisfying lossless
join property
11.3 Set of functional A set of Lossless join No guarantee of
dependencies F relations in decomposition dependency
BCNF preservation
11.4 Set of functional A set of Lossless join and May not achieve
dependencies F relations in 3NF dependency BCNF
preserving
decomposition
11.4a Relation schema Key K of R To find a key K The entire relation
R with a set of (which is a R is always a
functional subset of R) default superkey
dependencies
ElmasriFand Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-400
3. Multivalued Dependencies and Fourth
Normal Form (1)
(a) The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME. (b)
Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and

EMP_DEPENDENTS.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-401
3. Multivalued Dependencies and Fourth
Normal Form (1)
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3).
(d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-402
Multivalued Dependencies and Fourth Normal
Form (2)
Definition:
 A multivalued dependency (MVD) X —>> Y specified on relation
schema R, where X and Y are both subsets of R, specifies the
following constraint on any relation state r of R: If two tuples t1 and
t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should
also exist in r with the following properties, where we use Z to
denote (R 2 (X υ Y)):
·         t3[X] = t4[X] = t1[X] = t2[X].
·         t3[Y] = t1[Y] and t4[Y] = t2[Y].
·         t3[Z] = t2[Z] and t4[Z] = t1[Z].
 An MVD X —>> Y in R is called a trivial MVD if (a) Y is a subset
of X, or (b) X υ Y = R.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-403
Multivalued Dependencies and Fourth Normal
Form (3)
Inference Rules for Functional and Multivalued Dependencies:
IR1 (reflexive rule for FDs): If X  Y, then X –> Y.
IR2 (augmentation rule for FDs): {X –> Y}  XZ –> YZ.
IR3 (transitive rule for FDs): {X –> Y, Y –>Z}  X –> Z.
IR4 (complementation rule for MVDs): {X —>> Y}  X —>> (R – (X  Y))}.
IR5 (augmentation rule for MVDs): If X —>> Y and W  Z then WX —>> YZ.
IR6 (transitive rule for MVDs): {X —>> Y, Y —>> Z}  X —>> (Z 2 Y).
IR7 (replication rule for FD to MVD): {X –> Y}  X —>> Y.
IR8 (coalescence rule for FDs and MVDs): If X —>> Y and there exists W with
the properties that (a) W  Y is empty, (b) W –> Z, and (c) Y  Z, then X –
> Z.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-404
Multivalued Dependencies and Fourth Normal
Form (4)
Definition:
 A relation schema R is in 4NF with respect to a set of
dependencies F (that includes functional dependencies
and multivalued dependencies) if, for every nontrivial
multivalued dependency X —>> Y in F+, X is a superkey
for R.

Note: F+ is the (complete) set of all dependencies


(functional or multivalued) that will hold in every
relation state r of R that satisfies F. It is also called the
closure of F.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-405
Multivalued Dependencies and Fourth Normal
Form (5)
Decomposing a relation state of EMP that is not in 4NF. (a) EMP relation with additional
tuples. (b) Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-406
Multivalued Dependencies and Fourth Normal
Form (6)
Lossless (Non-additive) Join Decomposition into 4NF
Relations:
 PROPERTY LJ1’
The relation schemas R1 and R2 form a lossless (non-additive)
join decomposition of R with respect to a set F of functional and
multivalued dependencies if and only if
(R1 ∩ R2) —>> (R1 - R2)
or by symmetry, if and only if
(R1 ∩ R2) —>> (R2 - R1)).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-407
Multivalued Dependencies and Fourth Normal
Form (7)
Algorithm 11.5: Relational decomposition into 4NF
relations with non-additive join property
Input: A universal relation R and a set of functional and multivalued
dependencies F.
1. Set D := { R };
2. While there is a relation schema Q in D that is not in 4NF do
{ choose a relation schema Q in D that is not in 4NF;
find a nontrivial MVD X —>> Y in Q that violates 4NF;
replace Q in D by two relation schemas (Q - Y) and (X υ Y);
};

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-408
4. Join Dependencies and Fifth Normal Form
(1)
Definition:
 A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified
on relation schema R, specifies a constraint on the states r of R.
The constraint states that every legal state r of R should have a
non-additive join decomposition into R1, R2, ..., Rn; that is, for
every such r we have
* (R1(r), R2(r), ..., Rn(r)) = r
Note: an MVD is a special case of a JD where n = 2.

 A join dependency JD(R1, R2, ..., Rn), specified on relation


schema R, is a trivial JD if one of the relation schemas Ri in
JD(R1, R2, ..., Rn) is equal to R.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-409
Join Dependencies and Fifth Normal Form (2)
Definition:
 A relation schema R is in fifth normal form (5NF) (or
Project-Join Normal Form (PJNF)) with respect to a
set F of functional, multivalued, and join dependencies
if, for every nontrivial join dependency JD(R1, R2, ...,
Rn) in F+ (that is, implied by F), every Ri is a superkey
of R.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-410
Relation SUPPLY with Join Dependency and
conversion to Fifth Normal Form
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3).
(d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-411
5. Inclusion Dependencies (1)
Definition:
 An inclusion dependency R.X < S.Y between two sets of
attributes—X of relation schema R, and Y of relation schema S
—specifies the constraint that, at any specific time when r is a
relation state of R and s a relation state of S, we must have
X(r(R))  Y(s(S))

Note: The ? (subset) relationship does not necessarily have to be


a proper subset. The sets of attributes on which the inclusion
dependency is specified—X of R and Y of S—must have the
same number of attributes. In addition, the domains for each
pair of corresponding attributes should be compatible.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-412
Inclusion Dependencies (2)
Objective of Inclusion Dependencies:
To formalize two types of interrelational constraints which cannot
be expressed using F.D.s or MVDs:
– Referential integrity constraints
– Class/subclass relationships
 Inclusion dependency inference rules
IDIR1 (reflexivity): R.X < R.X.
IDIR2 (attribute correspondence): If R.X < S.Y
where X = {A1, A2 ,..., An} and Y = {B1,
B2, ..., Bn} and Ai Corresponds-to Bi, then R.Ai < S.Bi
for 1 ≤ i ≤ n.
IDIR3 (transitivity): If R.X < S.Y and S.Y < T.Z, then R.X < T.Z.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-413
6. Other Dependencies and Normal Forms (1)
Template Dependencies:
 Template dependencies provide a technique for representing
constraints in relations that typically have no easy and formal
definitions.
 The idea is to specify a template—or example—that defines
each constraint or dependency.
 There are two types of templates: tuple-generating templates
and constraint-generating templates.
 A template consists of a number of hypothesis tuples that are
meant to show an example of the tuples that may appear in one
or more relations. The other part of the template is the template
conclusion.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-414
Other Dependencies and Normal Forms (2)
Templates for some
common types of
dependencies.
(a) Template for
functional
dependency X –> Y.
(b) Template for the
multivalued
dependency X —>> Y
. (c) Template for the
inclusion dependency
R.X < S.Y.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-415
Other Dependencies and Normal Forms (3)
Templates for the constraint that an employee’s salary must be less than the supervisor’s
salary.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 11-416
Other Dependencies and Normal Forms (4)
Domain-Key Normal Form (DKNF):
 Defintion:A relation schema is said to be in DKNF if all
constraints and dependencies that should hold on the valid
relation states can be enforced simply by enforcing the domain
constraints and key constraints on the relation.
 The idea is to specify (theoretically, at least) the “ultimate
normal form” that takes into account all possible types of
dependencies and constraints. .
 For a relation in DKNF, it becomes very straightforward to
enforce all database constraints by simply checking that each
attribute value in a tuple is of the appropriate domain and that
every key constraint is enforced.
 The practical utility of DKNF is limited
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 11-417
Copyright © 2004 Pearson Education, Inc.
Chapter 12
Practical Database
Design Methodology and
Use of UML Diagrams

Copyright © 2004 Pearson Education, Inc.


The Role of Information
Systems in Organizations
 The Organizational Context for Using
Database Systems
 The Information System Life Cycle
 The Database Application System Life
Cycle

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 12-420
The Database Design and
Implementation Process
 Phase 1: Requirements Collection and Analysis
 Phase 2: Conceptual Database Design
 Phase 3: Choice of DBMS
 Phase 4: Data Model Mapping (Logical Database
Design)
 Phase 5: Physical Database Design
 Phase 6: Database System Implementation and
Tuning

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 12-421
Use of UML Diagrams as an
Aid to Database Design
Specification
 UML As a Design Specification Standard
 UML for Database Application Design
 Different Diagrams in UML
 A Modeling and Design Example:
University Database

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 12-422
Relational Rose, A UML
Based Design Tool
 Relational Rose for Database Design
 Relational Rose Data Modeler
 Data Modeling Using Rational Rose Data
Modeler

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 12-423
Automated Database Design
Tools

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 12-424
Summary

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 12-425
Copyright © 2004 Pearson Education, Inc.
Chapter 13
Disk Storage, Basic File Structures, and
Hashing.

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline
 Disk Storage Devices
 Files of Records
 Operations on Files
 Unordered Files
 Ordered Files
 Hashed Files
– Dynamic and Extendible Hashing Techniques
 RAID Technology
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-428
Disk Storage Devices (cont.)
 Preferred secondary storage device for high
storage capacity and low cost.
 Data stored as magnetized areas on magnetic
disk surfaces.
 A disk pack contains several magnetic disks
connected to a rotating spindle.
 Disks are divided into concentric circular
tracks on each disk surface. Track capacities
vary typically from 4 to 50 Kbytes.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-429
Disk Storage Devices (cont.)
Because a track usually contains a large amount of
information, it is divided into smaller blocks or
sectors.
 The division of a track into sectors is hard-coded
on the disk surface and cannot be changed. One
type of sector organization calls a portion of a track
that subtends a fixed angle at the center as a sector.
 A track is divided into blocks. The block size B is
fixed for each system. Typical block sizes range
from B=512 bytes to B=4096 bytes. Whole blocks
are transferred between disk and main memory for
processing.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-430
Disk Storage Devices (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-431
Disk Storage Devices (cont.)
 A read-write head moves to the track that contains the block to
be transferred. Disk rotation moves the block under the read-
write head for reading or writing.
 A physical disk block (hardware) address consists of a cylinder
number (imaginery collection of tracks of same radius from all
recoreded surfaces), the track number or surface number (within
the cylinder), and block number (within track).
 Reading or writing a disk block is time consuming because of the
seek time s and rotational delay (latency) rd.
 Double buffering can be used to speed up the transfer of
contiguous disk blocks.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-432
Disk Storage Devices (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-433
Typical
Disk
Parameter
s

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-434
Records
 Fixed and variable length records
 Records contain fields which have values of a
particular type (e.g., amount, date, time, age)
 Fields themselves may be fixed length or
variable length
 Variable length fields can be mixed into one
record: separator characters or length fields are
needed so that the record can be “parsed”.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-435
Blocking
 Blocking: refers to storing a number of records in
one blo ck on the disk.
 Blocking factor (bfr) refers to the number of records
per block.
 There may be empty space in a block if an integral
number of records do not fit in one block.
 Spanned Records: refer to records that exceed the
size of one or more blocks and hence span a number
of blocks.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-436
Files of Records
 A file is a sequence of records, where each record is a
collection of data values (or data items).
 A file descriptor (or file header ) includes information
that describes the file, such as the field names and their
data types, and the addresses of the file blocks on disk.
 Records are stored on disk blocks. The blocking factor
bfr for a file is the (average) number of file records
stored in a disk block.
 A file can have fixed-length records or variable-length
records.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-437
Files of Records (cont.)
 File records can be unspanned (no record can span two
blocks) or spanned (a record can be stored in more than
one block).
 The physical disk blocks that are allocated to hold the
records of a file can be contiguous, linked, or indexed.
 In a file of fixed-length records, all records have the
same format. Usually, unspanned blocking is used with
such files.
 Files of variable-length records require additional
information to be stored in each record, such as
separator characters and field types. Usually spanned
blocking is used with such files.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-438
Operation on Files
Typical file operations include:
 OPEN: Readies the file for access, and associates a
pointer that will refer to a current file record at each
point in time.
 FIND: Searches for the first file record that satisfies a
certain condition, and makes it the current file record.
 FINDNEXT: Searches for the next file record (from the
current record) that satisfies a certain condition, and
makes it the current file record.
 READ: Reads the current file record into a program
variable.
 INSERT: Inserts a new record into the file, and makes it
the current file record.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-439
Operation on Files (cont.)
 DELETE: Removes the current file record from the
file, usually by marking the record to indicate that it
is no longer valid.
 MODIFY: Changes the values of some fields of the
current file record.
 CLOSE: Terminates access to the file.
 REORGANIZE: Reorganizes the file records. For
example, the records marked deleted are physically
removed from the file or a new organization of the
file records is created.
 READ_ORDERED: Read the file blocks in order of
a specific field of the file.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-440
Unordered Files
 Also called a heap or a pile file.
 New records are inserted at the end of the file.
 To search for a record, a linear search through the
file records is necessary. This requires reading and
searching half the file blocks on the average, and is
hence quite expensive.
 Record insertion is quite efficient.
 Reading the records in order of a particular field
requires sorting the file records.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-441
Ordered Files
 Also called a sequential file.
 File records are kept sorted by the values of an ordering
field.
 Insertion is expensive: records must be inserted in the
correct order. It is common to keep a separate unordered
overflow (or transaction ) file for new records to improve
insertion efficiency; this is periodically merged with the
main ordered file.
 A binary search can be used to search for a record on its
ordering field value. This requires reading and searching
log2 of the file blocks on the average, an improvement
over linear search.
 Reading the records in order of the ordering field is quite
efficient.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-442
Ordered Files
(cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-443
Average Access Times
The following table shows the average access time
to access a specific record for a given type of file

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-444
Hashed Files
 Hashing for disk files is called External Hashing
 The file blocks are divided into M equal-sized buckets, numbered
bucket0, bucket1, ..., bucket M-1. Typically, a bucket corresponds to
one (or a fixed number of) disk block.
 One of the file fields is designated to be the hash key of the file.
 The record with hash key value K is stored in bucket i, where
i=h(K), and h is the hashing function.
 Search is very efficient on the hash key.
 Collisions occur when a new record hashes to a bucket that is
already full. An overflow file is kept for storing such records.
Overflow records that hash to each bucket can be linked together.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-445
Hashed Files (cont.)
There are numerous methods for collision resolution, including the
following:
 Open addressing: Proceeding from the occupied position specified
by the hash address, the program checks the subsequent positions in
order until an unused (empty) position is found.
 Chaining: For this method, various overflow locations are kept,
usually by extending the array with a number of overflow positions.
In addition, a pointer field is added to each record location. A
collision is resolved by placing the new record in an unused
overflow location and setting the pointer of the occupied hash
address location to the address of that overflow location.
 Multiple hashing: The program applies a second hash function if
the first results in a collision. If another collision results, the
program uses open addressing or applies a third hash function and
then uses open addressing if necessary.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-446
Hashed Files (cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-447
Hashed Files (cont.)
 To reduce overflow records, a hash file is typically
kept 70-80% full.
 The hash function h should distribute the records
uniformly among the buckets; otherwise, search
time will be increased because many overflow
records will exist.
 Main disadvantages of static external hashing:
- Fixed number of buckets M is a problem if the
number of records in the file grows or shrinks.
- Ordered access on the hash key is quite inefficient
(requires sorting the records).
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-448
Hashed Files - Overflow handling

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-449
Dynamic And Extendible Hashed
Files
Dynamic and Extendible Hashing Techniques
 Hashing techniques are adapted to allow the dynamic
growth and shrinking of the number of file records.
 These techniques include the following: dynamic
hashing , extendible hashing , and linear hashing .
 Both dynamic and extendible hashing use the binary
representation of the hash value h(K) in order to
access a directory. In dynamic hashing the directory
is a binary tree. In extendible hashing the directory is
an array of size 2d where d is called the global depth.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-450
Dynamic And Extendible Hashing
(cont.)
 The directories can be stored on disk, and they expand or shrink
dynamically. Directory entries point to the disk blocks that
contain the stored records.
 An insertion in a disk block that is full causes the block to split
into two blocks and the records are redistributed among the two
blocks. The directory is updated appropriately.
 Dynamic and extendible hashing do not require an overflow
area.
 Linear hashing does require an overflow area but does not use a
directory. Blocks are split in linear order as the file expands.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-451
Extendible
Hashing

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-452
Parallelizing Disk Access using
RAID Technology.
 Secondary storage technology must take steps to keep up in
performance and reliability with processor technology.
 A major advance in secondary storage technology is
represented by the development of RAID, which originally
stood for Redundant Arrays of Inexpensive Disks.
 The main goal of RAID is to even out the widely different rates
of performance improvement of disks against those in memory
and microprocessors.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-453
RAID Technology (cont.)
 A natural solution is a large array of small
independent disks acting as a single higher-
performance logical disk. A concept called
data striping is used, which utilizes
parallelism to improve disk performance.
 Data striping distributes data transparently over
multiple disks to make them appear as a single
large, fast disk.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-454
RAID Technology (cont.)
Different raid organizations were defined based on different
combinations of the two factors of granularity of data interleaving
(striping) and pattern used to compute redundant information.
 Raid level 0 has no redundant data and hence has the best write performance.
 Raid level 1 uses mirrored disks.
 Raid level 2 uses memory-style redundancy by using Hamming codes, which
contain parity bits for distinct overlapping subsets of components. Level 2
includes both error detection and correction.
 Raid level 3 uses a single parity disk relying on the disk controller to figure out
which disk has failed.
 Raid Levels 4 and 5 use block-level data striping, with level 5 distributing data
and parity information across all disks.
 Raid level 6 applies the so-called P + Q redundancy scheme using Reed-Soloman
codes to protect against up to two disk failures by using just two redundant disks.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-455
Use of RAID Technology (cont.)
Different raid organizations are being used under different
situations
 Raid level 1 (mirrored disks)is the easiest for rebuild of a disk from other
disks
– It is used for critical applications like logs
 Raid level 2 uses memory-style redundancy by using Hamming codes,
which contain parity bits for distinct overlapping subsets of components.
Level 2 includes both error detection and correction.
 Raid level 3 ( single parity disks relying on the disk controller to figure out
which disk has failed) and level 5 (block-level data striping) are preferred for
Large volume storage, with level 3 giving higher transfer rates.
 Most popular uses of the RAID technology currently are: Level 0 (with
striping), Level 1 (with mirroring) and Level 5 with an extra drive for parity.
 Design Decisions for RAID include – level of RAID, number of disks,
choice of parity schemes, and grouping of disks for block-level striping.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-456
Use of RAID
Technology
(cont.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-457
Trends in Disk Technology

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 13-458
Storage Area Networks
 The demand for higher storage has risen considerably in
recent times.
 Organizations have a need to move from a static fixed
data center oriented operation to a more flexible and
dynamic infrastructure for information processing.
 Thus they are moving to a concept of Storage Area
Networks (SANs). In a SAN, online storage peripherals
are configured as nodes on a high-speed network and
can be attached and detached from servers in a very
flexible manner.
 This allows storage systems to be placed at longer
distances from the servers and provide different
performance and connectivity options.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-459
Storage Area Networks (contd.)
Advantages of SANs are:
– Flexible many-to-many connectivity among servers and
storage devices using fiber channel hubs and switches.
– Up to 10km separation between a server and a storage
system using appropriate fiber optic cables.
– Better isolation capabilities allowing nondisruptive
addition of new peripherals and servers.
 SANs face the problem of combining storage
options from multiple vendors and dealing with
evolving standards of storage management
software and hardware.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 13-460
Copyright © 2004 Pearson Education, Inc.
Chapter 14

Indexing Structures for


Files

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Ramez Elmasri and Shamkant Navathe
Chapter Outline
 Types of Single-level Ordered Indexes
– Primary Indexes
– Clustering Indexes
– Secondary Indexes
 Multilevel Indexes
 Dynamic Multilevel Indexes Using B-Trees
and B+-Trees
 Indexes on Multiple Keys
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 14-463
Indexes as Access Paths

– A single-level index is an auxiliary file that makes


it more efficient to search for a record in the data
file.
– The index is usually specified on one field of the
file (although it could be specified on several fields)
– One form of an index is a file of entries <field
value, pointer to record>, which is ordered by
field value
– The index is called an access path on the field.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-464
Indexes as Access Paths (contd.)
– The index file usually occupies considerably less
disk blocks than the data file because its entries are
much smaller
– A binary search on the index yields a pointer to the
file record
– Indexes can also be characterized as dense or
sparse.
A dense index has an index entry for every search key
value (and hence every record) in the data file.
A sparse (or nondense) index, on the other hand, has
index entries for only some of the search values
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 14-465
Indexes as Access Paths (contd.)
Example: Given the following data file:
EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL, ... )
Suppose that:
record size R=150 bytes
block size B=512 bytes
r=30000 records

Then, we get:
blocking factor Bfr= B div R= 512 div 150= 3 records/block
number of file blocks b= (r/Bfr)= (30000/3)= 10000 blocks

For an index on the SSN field, assume the field size V SSN=9 bytes,
assume the record pointer size PR=7 bytes. Then:
index entry size RI=(VSSN+ PR)=(9+7)=16 bytes
index blocking factor BfrI= B div RI= 512 div 16= 32 entries/block
number of index blocks b= (r/ Bfr I)= (30000/32)= 938 blocks
binary search needs log2bI= log2938= 10 block accesses

This is compared to an average linear search cost of:


(b/2)= 30000/2= 15000 block accesses
If the file records are ordered, the binary search cost would be:
log2b= log230000= 15 block accesses
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 14-466
Types of Single-Level Indexes

 Primary Index
– Defined on an ordered data file
– The data file is ordered on a key field
– Includes one index entry for each block in the data file; the
index entry has the key field value for the first record in the
block, which is called the block anchor
– A similar scheme can use the last record in a block.
– A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and the
keys of its anchor record rather than for every search value.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 14-467
FIGURE 14.1
Primary index
on the
ordering key
field of the file
shown in
Figure 13.7.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-468
Types of Single-Level Indexes

 Clustering Index

– Defined on an ordered data file

– The data file is ordered on a non-key field unlike primary


index, which requires that the ordering field of the data file
have a distinct value for each record.
– Includes one index entry for each distinct value of the field;
the index entry points to the first data block that contains
records with that field value.
– It is another example of nondense index where Insertion and
Deletion is relatively straightforward with a clustering index.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 14-469
FIGURE 14.2
A clustering index on the
DEPTNUMBER ordering nonkey
field of an EMPLOYEE file.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-470
FIGURE 14.3
Clustering index with a
separate block cluster
for each group of
records that share the
same value for the
clustering field.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-471
Types of Single-Level Indexes
 Secondary Index
– A secondary index provides a secondary means of accessing a file
for which some primary access already exists.
– The secondary index may be on a field which is a candidate key and
has a unique value in every record, or a nonkey with duplicate
values.
– The index is an ordered file with two fields.
 The first field is of the same data type as some nonordering
field of the data file that is an indexing field.
 The second field is either a block pointer or a record pointer.
There can be many secondary indexes (and hence, indexing
fields) for the same file.
– Includes one entry for each record in the data file; hence, it is a
dense index
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 14-472
FIGURE 14.4
A dense
secondary index
(with block
pointers) on a
nonordering key
field of a file.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-473
FIGURE 14.5
A secondary index (with recored pointers) on a nonkey field implemented using
one level of indirection so that index entries are of fixed length and have unique
field values.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-474
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 14-475
Multi-Level Indexes
 Because a single-level index is an ordered file, we can
create a primary index to the index itself ; in this case,
the original index file is called the first-level index and
the index to the index is called the second-level index.
 We can repeat the process, creating a third, fourth, ...,
top level until all entries of the top level fit in one disk
block
 A multi-level index can be created for any type of first-
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk
block

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-476
FIGURE 14.6
A two-level
primary index
resembling
ISAM (Indexed
Sequential
Access Method)
organization.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-477
Multi-Level Indexes

 Such a multi-level index is a form of search tree


; however, insertion and deletion of new index
entries is a severe problem because every level
of the index is an ordered file.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-478
FIGURE 14.8
A node in a search tree with pointers to subtrees below
it.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-479
FIGURE 14.9
A search tree of order p = 3.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-480
Dynamic Multilevel Indexes Using B-Trees
and B+-Trees
 Because of the insertion and deletion problem, most
multi-level indexes use B-tree or B+-tree data
structures, which leave space in each tree node (disk
block) to allow for new index entries
 These data structures are variations of search trees that
allow efficient insertion and deletion of new search
values.
 In B-Tree and B+-Tree data structures, each node
corresponds to a disk block
 Each node is kept between half-full and completely full

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-481
Dynamic Multilevel Indexes Using B-Trees
and B+-Trees (contd.)
 An insertion into a node that is not full is quite
efficient; if a node is full the insertion causes a split
into two nodes
 Splitting may propagate to other tree levels
 A deletion is quite efficient if a node does not become
less than half full
 If a deletion causes a node to become less than half
full, it must be merged with neighboring nodes

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-482
Difference between B-tree and B+-tree

 In a B-tree, pointers to data records exist at all levels


of the tree

 In a B+-tree, all pointers to data records exists at the


leaf-level nodes

 A B+-tree can have less levels (or higher capacity of


search values) than the corresponding B-tree

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-483
FIGURE 14.10
B-tree structures. (a) A node in a B-tree with q – 1 search
values. (b) A B-tree of order p = 3. The values were inserted
in the order 8, 5, 1, 7, 3, 12, 9, 6.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-484
FIGURE 14.11
The nodes of a B+-tree. (a) Internal node of a B+-tree with q –1 search values.
(b) Leaf node of a B+-tree with q – 1 search values and q – 1 data pointers.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-485
FIGURE 14.12
An example of insertion
in a B+-tree with q = 3
and pleaf = 2.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-486
FIGURE 14.13
An example of
deletion from a
B+-tree.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 14-487
Copyright © 2004 Pearson Education, Inc.
Chapter 15
Algorithms for Query Processing and
Optimization

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline (1)

0. Introduction to Query Processing


1. Translating SQL Queries into Relational Algebra
2. Algorithms for External Sorting
3. Algorithms for SELECT and JOIN Operations
4. Algorithms for PROJECT and SET Operations
5. Implementing Aggregate Operations and Outer Joins
6. Combining Operations using Pipelining
7. Using Heuristics in Query Optimization

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-490
Chapter Outline (2)

8. Using Selectivity and Cost Estimates in Query


Optimization
9. Overview of Query Optimization in Oracle
10. Semantic Query Optimization

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-491
0. Introduction to Query Processing (1)

 Query optimization: the process of choosing a


suitable execution strategy for processing a query.

 Two internal representations of a query


– Query Tree
– Query Graph

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-492
Introduction to Query Processing (2)

Note: The above figure is now called Figure 15.1 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-493
1. Translating SQL Queries into Relational
Algebra (1)
 Query block: the basic unit that can be translated
into the algebraic operators and optimized.
 A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
 Nested queries within a query are identified as
separate query blocks.
 Aggregate operators in SQL must be included in
the extended algebra.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-494
Translating SQL Queries into Relational
Algebra (2)
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);

SELECT LNAME, FNAME SELECT MAX (SALARY)


FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-495
2. Algorithms for External Sorting (1)
 External sorting: refers to sorting algorithms that are
suitable for large files of records stored on disk that do not
fit entirely in main memory, such as most database files.
 Sort-Merge strategy: starts by sorting small subfiles
(runs) of the main file and then merges the sorted runs,
creating larger sorted subfiles that are merged in turn.
– Sorting phase: nR = ⌐(b/nB)¬
– Merging phase: dM = Min (nB-1, nR); nP = ⌐(logdM(nR))¬
nR: number of initial runs; b: number of file blocks;
nB: available buffer space; dM: degree of merging;
nP: number of passes.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-496
Algorithms for External Sorting (2)

Note: The above figure is now called Figure 15.2 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-497
3. Algorithms for SELECT and JOIN
Operations (1)
Implementing the SELECT Operation:
 Examples:
(OP1):  SSN='123456789' (EMPLOYEE)
(OP2):  DNUMBER>5(DEPARTMENT)
(OP3):  DNO=5(EMPLOYEE)
(OP4):  DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)
(OP5):  ESSN=123456789 AND PNO=10(WORKS_ON)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-498
Algorithms for SELECT and JOIN Operations (2)
Implementing the SELECT Operation (cont.):
Search Methods for Simple Selection:
 S1. Linear search (brute force): Retrieve every record in the file,
and test whether its attribute values satisfy the selection condition.
 S2. Binary search: If the selection condition involves an equality
comparison on a key attribute on which the file is ordered, binary
search (which is more efficient than linear search) can be used.
(See OP1).
 S3. Using a primary index or hash key to retrieve
a single record: If the selection condition involves an
equality comparison on a key attribute with a
primary index (or a hash key), use the primary index
(or the hash key) to retrieve the record.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-499
Algorithms for SELECT and JOIN Operations (3)
Implementing the SELECT Operation (cont.):
Search Methods for Simple Selection:
 S4. Using a primary index to retrieve multiple
records: If the comparison condition is >, ≥, <, or
≤ on a key field with a primary index, use the
index to find the record satisfying the
corresponding equality condition, then retrieve all
subsequent records in the (ordered) file.
 S5. Using a clustering index to retrieve multiple records:
If the selection condition involves an equality comparison
on a non-key attribute with a clustering index, use the
clustering index to retrieve all the records satisfying the
selection condition.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-500
Algorithms for SELECT and JOIN Operations (4)
Implementing the SELECT Operation (cont.):
Search Methods for Simple Selection:
 S6. Using a secondary (B+-tree) index: On an equality
comparison, this search method can be used to retrieve a
single record if the indexing field has unique values (is a key)
or to retrieve multiple records if the indexing field is not a
key. In addition, it can be used to retrieve records on
conditions involving >,>=, <, or <=. (FOR RANGE
QUERIES)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-501
Algorithms for SELECT and JOIN Operations (5)
Implementing the SELECT Operation (cont.):
Search Methods for Complex Selection:
 S7. Conjunctive selection: If an attribute involved in any
single simple condition in the conjunctive condition has an
access path that permits the use of one of the methods S2 to
S6, use that condition to retrieve the records and then check
whether each retrieved record satisfies the remaining simple
conditions in the conjunctive condition.
 S8. Conjunctive selection using a composite index: If two
or more attributes are involved in equality conditions in the
conjunctive condition and a composite index (or hash
structure) exists on the combined field, we can use the index
directly.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-502
Algorithms for SELECT and JOIN Operations (6)
Implementing the SELECT Operation (cont.):
Search Methods for Complex Selection:
 S9. Conjunctive selection by intersection of record
pointers: This method is possible if secondary indexes are
available on all (or some of) the fields involved in equality
comparison conditions in the conjunctive condition and if the
indexes include record pointers (rather than block pointers).
Each index can be used to retrieve the record pointers that
satisfy the individual condition. The intersection of these sets
of record pointers gives the record pointers that satisfy the
conjunctive condition, which are then used to retrieve those
records directly. If only some of the conditions have secondary
indexes, each retrieved record is further tested to determine
whether it satisfies the remaining conditions.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-503
Algorithms for SELECT and JOIN Operations (7)
Implementing the SELECT Operation (cont.):
 Whenever a single condition specifies the selection, we can
only check whether an access path exists on the attribute
involved in that condition. If an access path exists, the method
corresponding to that access path is used; otherwise, the “brute
force” linear search approach of method S1 is used. (See
OP1, OP2 and OP3)
 For conjunctive selection conditions,
whenever more than one of the attributes
involved in the conditions have an access path,
query optimization should be done to choose the
access path that retrieves the fewest records in
the most efficient way .
 Disjunctive selection conditions
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-504
Algorithms for SELECT and JOIN Operations (8)
Implementing the JOIN Operation:
 Join (EQUIJOIN, NATURAL JOIN)
– two–way join: a join on two files
e.g. R A=B S
– multi-way joins: joins involving more than two files.
e.g. R A=B S C=D T
 Examples
(OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT
(OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-505
Algorithms for SELECT and JOIN Operations (9)
Implementing the JOIN Operation (cont.):
Methods for implementing joins:
 J1. Nested-loop join (brute force): For each record t in R
(outer loop), retrieve every record s from S (inner loop) and
test whether the two records satisfy the join condition t[A] =
s[B].
 J2. Single-loop join (Using an access structure to retrieve
the matching records): If an index (or hash key) exists for one
of the two join attributes — say, B of S — retrieve each
record t in R, one at a time, and then use the access structure
to retrieve directly all matching records s from S that satisfy
s[B] = t[A].
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-506
Algorithms for SELECT and JOIN Operations
(10)
Implementing the JOIN Operation (cont.):
Methods for implementing joins:
 J3. Sort-merge join: If the records of R and S are
physically sorted (ordered) by value of the join attributes A
and B, respectively, we can implement the join in the most
efficient way possible. Both files are scanned in order of the
join attributes, matching the records that have the same
values for A and B. In this method, the records of each file
are scanned only once each for matching with the other file—
unless both A and B are non-key attributes, in which case the
method needs to be modified slightly.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-507
Algorithms for SELECT and JOIN Operations
(11)
Implementing the JOIN Operation (cont.):
Methods for implementing joins:
 J4. Hash-join: The records of files R and S are both hashed
to the same hash file, using the same hashing function on the
join attributes A of R and B of S as hash keys. A single pass
through the file with fewer records (say, R) hashes its records
to the hash file buckets. A single pass through the other file
(S) then hashes each of its records to the appropriate bucket,
where the record is combined with all matching records from
R.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-508
Algorithms for SELECT and JOIN Operations
(12)

Note: The above figure is now called Figure 15.3 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-509
Algorithms for SELECT and JOIN Operations
(13)

Note: The above figure is now called Figure 15.3 (continued) in Edition 4
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-510
Algorithms for SELECT and JOIN Operations
(14)
Implementing the JOIN Operation (cont.):
 Factors affecting JOIN performance
– Available buffer space

– Join selection factor

– Choice of inner VS outer relation

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-511
Algorithms for SELECT and JOIN Operations
(15)
Implementing the JOIN Operation (cont.):
Other types of JOIN algorithms
 Partition hash join
– Partitioning phase: Each file (R and S) is first partitioned into
M partitions using a partitioning hash function on the join
attributes: 
R1 , R2 , R3 , ...... Rm and S1 , S2 , S3 , ...... Sm
Minimum number of in-memory buffers needed for the
partitioning phase: M+1.
A disk sub-file is created per partition to store the tuples for
that partition.  
– Joining or probing phase: Involves M iterations, one per
partitioned file. Iteration i involves joining partitions Ri
and Si.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-512
Algorithms for SELECT and JOIN Operations
(16)
Implementing the JOIN Operation (cont.):
Partitioned Hash Join Procedure:
Assume Ri is smaller than Si.
1. Copy records from Ri into memory buffers.
2. Read all blocks from Si, one at a time and each record from Si
is used to probe for a matching record(s) from partition Si.
3. Write matching record from Ri after joining to the record
from Si into the result file.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-513
Algorithms for SELECT and JOIN Operations
(17)
Implementing the JOIN Operation (cont.):
Cost analysis of partition hash join:
1. Reading and writing each record from R and S during the
partitioning phase: (bR + bS), (bR + bS)
2. Reading each record during the joining phase: (bR + bS)
3. Writing the result of join: bRES
 
Total Cost: 3* (bR + bS) + bRES

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-514
Algorithms for SELECT and JOIN Operations
(18)
Implementing the JOIN Operation (cont.):
 Hybrid hash join: Same as partitioned hash join except:
Joining phase of one of the partitions is included during
the partitioning phase.
– Partitioning phase: Allocate buffers for smaller relation-
one block for each of the M-1 partitions, remaining blocks
to partition 1. Repeat for the larger relation in the pass
through S.)
– Joining phase: M-1 iterations are needed for the partitions
R2 , R3 , R4 , ......Rm and S2 , S3 , S4 , ......Sm. R1 and
S1 are joined during the partitioning of S1 , and results of
joining R1 and S1 are already written to the disk by the
end of partitioning phase .

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-515
4. Algorithms for PROJECT and SET
Operations (1)
Algorithm for PROJECT operations (Figure 15.3b)
<attribute list>(R)
1. If <attribute list> has a key of relation R, extract all tuples
from R with only the values for the attributes in <attribute
list>.
2. If <attribute list> does NOT include a key of relation R,
duplicated tuples must be removed from the results.

 Methods to remove duplicate tuples


1. Sorting
2. Hashing

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-516
Algorithms for PROJECT and SET Operations
(2)
Algorithm for SET operations
 Set operations: UNION, INTERSECTION, SET
DIFFERENCE and CARTESIAN PRODUCT.

 CARTESIAN PRODUCT of relations R and S include all


possible combinations of records from R and S. The
attribute of the result include all attributes of R and S.
 Cost analysis of CARTESIAN PRODUCT
If R has n records and j attributes and S has m records and k
attributes, the result relation will have n*m records and j+k
attributes.
 CARTESIAN PRODUCT operation is very expensive and
should be avoided if possible.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-517
Algorithms for PROJECT and SET Operations
(3)
Algorithm for SET operations (Cont.)
 UNION (See Figure 15.3c)
1. Sort the two relations on the same attributes.
2. Scan and merge both sorted files concurrently, whenever the
same tuple exists in both relations, only one is kept in the
merged results.
 INTERSECTION (See Figure 15.3d)
1. Sort the two relations on the same attributes.
2. Scan and merge both sorted files concurrently, keep in the
merged results only those tuples that appear in both relations.
 SET DIFFERENCE R-S (See Figure 15.3e)
(keep in the merged results only those tuples that appear in
relation R but not in relation S.)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-518
5. Implementing Aggregate Operations
and Outer Joins (1)
Implementing Aggregate Operations:
 Aggregate operators: MIN, MAX, SUM, COUNT and AVG
 Options to implement aggregate operators:
– Table Scan
– Index
 Example
SELECT MAX (SALARY)
FROM EMPLOYEE;

If an (ascending) index on SALARY exists for the employee


relation, then the optimizer could decide on traversing the
index for the largest value, which would entail following the
right most pointer in each index node from the root to a
leaf.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-519
Implementing Aggregate Operations and
Outer Joins (2)
Implementing Aggregate Operations (cont.):
 SUM, COUNT and AVG
1. For a dense index (each record has one index
entry): apply the associated computation to the
values in the index.
2. For a non-dense index: actual number of records
associated with each index entry must be
accounted for
 With GROUP BY: the aggregate operator must be
applied separately to each group of tuples.
1. Use sorting or hashing on the group attributes to partition the
file into the appropriate groups;
2. ComputesElmasri
theandaggregate function for the tuples in each
Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
group.
Chapter 15-520
Implementing Aggregate Operations and
Outer Joins (3)
Implementing Outer Join:
 Outer Join Operators: LEFT OUTER JOIN, RIGHT OUTER
JOIN and FULL OUTER JOIN.
 The full outer join produces a result which is
equivalent to the union of the results of the left
and right outer joins.
 Example
SELECT FNAME, DNAME
FROM (EMPLOYEE LEFT OUTER JOIN
DEPARTMENT
ON DNO = DNUMBER);
Note: The result of this query is a table of employee names
and their associated departments. It is similar to a regular
join result, with the exception that if an employee does not
have an associated department,
Elmasri and Navathe, Fundamentals the employee's
of Database Systems, Fourth Edition name will
Chapter 15-521
still appear in the Copyright
resulting table,
© 2004 Pearson Education, Inc. although the department
Implementing Aggregate Operations and
Outer Joins (4)
Implementing Outer Join (cont.):
 Modifying Join Algorithms: Nested Loop or
Sort-Merge joins can be modified to implement
outer join.
e.g., for left outer join, use the left relation as
outer relation and construct result from every
tuple in the left relation. If there is a match, the
concatenated tuple is saved in the result.
However, if an outer tuple does not match, then
the tuple is still included in the result but is
padded with a null value(s).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-522
Implementing Aggregate Operations and
Outer Joins (5)
Implementing Outer Join (cont.):
 Executing a combination of relational
algebra operators.
Implement the previous left outer join example

1. {Compute the JOIN of the EMPLOYEE and DEPARTMENT tables}


TEMP1FNAME,DNAME(EMPLOYEE DNO=DNUMBER DEPARTMENT)
2.  {Find the EMPLOYEEs that do not appear in the JOIN}
TEMP2  FNAME (EMPLOYEE) - FNAME (Temp1)
3. {Pad each tuple in TEMP2 with a null DNAME field}
TEMP2  TEMP2 x 'null'
4. {UNION the temporary tables to produce the LEFT OUTER JOIN result}

RESULT Elmasri
TEMP1 υ TEMP2
and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 15-523
Copyright © 2004 Pearson Education, Inc.
6. Combining Operations using Pipelining (1)
 Motivation
– A query is mapped into a sequence of operations.
– Each execution of an operation produces a temporary
result.
– Generating and saving temporary files on disk is time
consuming and expensive.
 Alternative:
– Avoid constructing temporary results as much as
possible.
– Pipeline the data through multiple operations - pass the
result of a previous operator to the next without waiting
to complete the previous operation.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-524
Combining Operations using Pipelining (2)
 Example: For a 2-way join, combine the 2
selections on the input and one projection on the
output with the Join.
 Dynamic generation of code to allow for multiple
operations to be pipelined.
 Results of a select operation are fed in a
"Pipeline" to the join algorithm.
 Also known as stream-based processing.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-525
7. Using Heuristics in Query Optimization(1)

 Process for heuristics optimization


1. The parser of a high-level query generates an initial internal
representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.

 The main heuristic is to apply first the operations that reduce


the size of intermediate results.
E.g., Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-526
Using Heuristics in Query Optimization (2)

 Query tree: a tree data structure that corresponds to a


relational algebra expression. It represents the input relations
of the query as leaf nodes of the tree, and represents the
relational algebra operations as internal nodes.
 An execution of the query tree consists of executing an
internal node operation whenever its operands are available
and then replacing that internal node by the relation that
results from executing the operation.
 Query graph: a graph data structure that corresponds to a
relational calculus expression. It does not indicate an order
on which operations to perform first. There is only a single
graph corresponding to each query.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-527
Using Heuristics in Query Optimization (3)

 Example:
For every project located in ‘Stafford’, retrieve the project
number, the controlling department number and the department
manager’s last name, address and birthdate.

Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-528
Using Heuristics in Query Optimization (4)

Note: The above figure is now called Figure 15.4 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-529
Using Heuristics in Query Optimization (5)

Note: The above figure is now called Figure 15.4 (continued) in Edition 4
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-530
Using Heuristics in Query Optimization (6)
Heuristic Optimization of Query Trees:
 The same query could correspond to many different relational
algebra expressions — and hence many different query trees.

 The task of heuristic optimization of query trees is to find a


final query tree that is efficient to execute.

 Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND PNMUBER=PNO
AND ESSN=SSN AND BDATE > ‘1957-12-31’;
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-531
Using Heuristics in Query Optimization (7)

Note: The above figure is now called Figure 15.5 in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-532
Using Heuristics in Query Optimization (8)

Note: The above figure is now called Figure 15.5(continued c, d) in Edition


4 Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-533
Using Heuristics in Query Optimization (9)

Note: The above figure is now called Figure 15.5(continued e) in Edition 4


Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-534
Using Heuristics in Query Optimization (10)
General Transformation Rules for Relational Algebra Operations:
1. Cascade of : A conjunctive selection condition can
be broken up into a cascade (sequence) of individual
s operations:  c1 AND c2 AND ... AND cn(R)  c1 (c2 (...
(cn(R))...) )
2. Commutativity of : The  operation is commutative:
c1 (c2(R))  c2 (c1(R))
3. Cascade of : In a cascade (sequence) of 
operations, all but the last one can be ignored:
List1 (List2 (...(Listn(R))...) ) = List1(R)
4. Commuting  with : If the selection condition c
involves only the attributes A1, ..., An in the
projection list, the two operations can be commuted:
A1, A2, ..., An (c (R)) = c (A1, A2, ..., An (R))
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-535
Using Heuristics in Query Optimization (11)
General Transformation Rules for Relational Algebra Operations
(cont.):
5. Commutativity of ( and x ): The operation is
commutative as is the x operation: R C S = S C R; R x
S=Sx R
6. Commuting  with (or x): If all the attributes in the
selection condition c involve only the attributes of one
of the relations being joined—say, R—the two
operations can be commuted as follows :
c ( R S )  (c (R)) S
Alternatively, if the selection condition c can be written
as (c1 and c2), where condition c1 involves only the
attributes of R and condition c2 involves only the
attributes of S, the operations commute as follows:
c ( R S )  (c1 (R)) (c2 (S))
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-536
Using Heuristics in Query Optimization (12)
General Transformation Rules for Relational Algebra
Operations (cont.):
7. Commuting  with (or x ): Suppose that the
projection list is L = {A1, ..., An, B1, ..., Bm},
where A1, ..., An are attributes of R and B1, ...,
Bm are attributes of S. If the join condition c
involves only attributes in L, the two operations
can be commuted as follows:
L ( R C S )  (A1, ..., An (R)) C (B1, ..., Bm (S))

If the join condition c contains additional


attributes not in L, these must be added to the
projection list, and a final  operation is needed.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-537
Using Heuristics in Query Optimization (13)
General Transformation Rules for Relational Algebra
Operations (cont.):
8. Commutativity of set operations: The set
operations υ and  are commutative but – is not.
9. Associativity of , x, υ, and ∩: These four
operations are individually associative; that is, if 
stands for any one of these four operations
(throughout the expression), we have
( R  S )  T  R  ( S  T )
10. Commuting  with set operations: The  operation
commutes with υ, ∩, and –. If  stands for any one
of these three operations, we have
c ( R  S )  (c (R))  (c (S))
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-538
Using Heuristics in Query Optimization (14)
General Transformation Rules for Relational Algebra
Operations (cont.):
11. The  operation commutes with υ.
L ( R υ S )  (L (R)) υ (L (S))

12. Converting a (xsequence into : If the condition c of a


thatfollows a x Corresponds to a join condition, convert the
(xsequence into a as follows:
(C (R x S)) = (R C S)

13. Other transformations

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-539
Using Heuristics in Query Optimization (15)
Outline of a Heuristic Algebraic Optimization Algorithm:
1. Using rule 1, break up any select operations with
conjunctive conditions into a cascade of select operations.

2. Using rules 2, 4, 6, and 10 concerning the commutativity


of select with other operations, move each select
operation as far down the query tree as is permitted by the
attributes involved in the select condition.
3. Using rule 9 concerning associativity of binary operations,
rearrange the leaf nodes of the tree so that the leaf node
relations with the most restrictive select operations are
executed first in the query tree representation.
4. Using Rule 12, combine a cartesian product operation with
a subsequent select operation in the tree into a join
operation.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-540
Using Heuristics in Query Optimization (16)
Outline of a Heuristic Algebraic Optimization
Algorithm (cont.)
5. Using rules 3, 4, 7, and 11 concerning the
cascading of project and the commuting of project
with other operations, break down and move lists
of projection attributes down the tree as far as
possible by creating new project operations as
needed.

6. Identify subtrees that represent groups of


operations that can be executed by a single
algorithm.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-541
Using Heuristics in Query Optimization (17)
Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations
that reduce the size of intermediate results.
2. Perform select operations as early as possible to
reduce the number of tuples and perform project
operations as early as possible to reduce the number
of attributes. (This is done by moving select and
project operations as far down the tree as possible.)
3. The select and join operations that are most
restrictive should be executed before other similar
operations. (This is done by reordering the leaf
nodes of the tree among themselves and adjusting
the rest of the tree appropriately.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-542
Using Heuristics in Query Optimization (17)
Query Execution Plans
 An execution plan for a relational algebra query
consists of a combination of the relational algebra
query tree and information about the access
methods to be used for each relation as well as
the methods to be used in computing the
relational operators stored in the tree.
 Materialized evaluation: the result of an operation is
stored as a temporary relation.
 Pipelined evaluation: as the result of an
operator is produced, it is forwarded to the next
operator in sequence.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-543
8. Using Selectivity and Cost Estimates in
Query Optimization (1)
 Cost-based query optimization: Estimate and compare the
costs of executing a query using different execution strategies
and choose the strategy with the lowest cost estimate.
(Compare to heuristic query optimization)

 Issues
– Cost function
– Number of execution strategies to be considered

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-544
Using Selectivity and Cost Estimates in Query
Optimization (2)
 Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost

Note: Different database systems may focus on different cost


components.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-545
Using Selectivity and Cost Estimates in Query
Optimization (3)
 Catalog Information Used in Cost Functions
– Information about the size of a file
 number of records (tuples) (r),
 record size (R),
 number of blocks (b)
 blocking factor (bfr)
– Information about indexes and indexing attributes of a file
 Number of levels (x) of each multilevel index
 Number of first-level index blocks (bI1)
 Number of distinct values (d) of an attribute
 Selectivity (sl) of an attribute
 Selection cardinality (s) of an attribute. (s = sl * r)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-546
Using Selectivity and Cost Estimates in Query
Optimization (4)
Examples of Cost Functions for SELECT
 S1. Linear search (brute force) approach
CS1a = b;
For an equality condition on a key, CS1a = (b/2) if the record is found;
otherwise CS1a = b.
 S2. Binary search:
CS2 = log2b + ┌(s/bfr) ┐–1
For an equality condition on a unique (key) attribute,
CS2 =log2b
 S3. Using a primary index (S3a) or hash key (S3b) to retrieve a single
record
CS3a = x + 1; CS3b = 1 for static or linear hashing;
CS3b = 1 for extendible hashing;
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-547
Using Selectivity and Cost Estimates in Query
Optimization (5)
Examples of Cost Functions for SELECT (cont.)
 S4. Using an ordering index to retrieve multiple records:
For the comparison condition on a key field with an ordering
index, CS4 = x + (b/2)
 S5. Using a clustering index to retrieve multiple records:
CS5 = x + ┌ (s/bfr) ┐
 S6. Using a secondary (B+-tree) index:
For an equality comparison, CS6a = x + s;
For an comparison condition such as >, <, >=, or <=,
CS6a = x + (bI1/2) + (r/2)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-548
Using Selectivity and Cost Estimates in Query
Optimization (6)
Examples of Cost Functions for SELECT (cont.)
 S7. Conjunctive selection:
Use either S1 or one of the methods S2 to S6 to solve.
For the latter case, use one condition to retrieve the records
and then check in the memory buffer whether each retrieved
record satisfies the remaining conditions in the conjunction.
 S8. Conjunctive selection using a composite index:
Same as S3a, S5 or S6a, depending on the type of index.

 Examples of using the cost functions.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-549
Using Selectivity and Cost Estimates in Query
Optimization (7)
Examples of Cost Functions for JOIN
 Join selectivity (js)
js = | (R C S) | / | R x S | = | (R C S) | / (|R| * |S |)
If condition C does not exist, js = 1;
If no tuples from the relations satisfy condition C, js = 0;
Usually, 0 <= js <= 1;
 Size of the result file after join operation
| (R C S) | = js * |R| * |S |

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-550
Using Selectivity and Cost Estimates in Query
Optimization (8)
Examples of Cost Functions for JOIN (cont.)
 J1. Nested-loop join:
CJ1 = bR + (bR*bS) + ((js* |R|* |S|)/bfrRS)
(Use R for outer loop)
 J2. Single-loop join (using an access structure to retrieve the
matching record(s))
If an index exists for the join attribute B of S with index levels
xB, we can retrieve each record s in R and then use the index to
retrieve all the matching records t from S that satisfy t[B] =
s[A].
The cost depends on the type of index.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-551
Using Selectivity and Cost Estimates in Query
Optimization (9)
Examples of Cost Functions for JOIN (cont.)
 J2. Single-loop join (cont.)
For a secondary index,
CJ2a = bR + (|R| * (xB + sB)) + ((js* |R|* |S|)/bfrRS);
For a clustering index,
CJ2b = bR + (|R| * (xB + (sB/bfrB))) + ((js* |R|* |S|)/bfrRS);
For a primary index,
CJ2c = bR + (|R| * (xB + 1)) + ((js* |R|* |S|)/bfrRS);
If a hash key exists for one of the two join attributes — B of S
CJ2d = bR + (|R| * h) + ((js* |R|* |S|)/bfrRS);
 J3. Sort-merge join:
CJ3a = CS + bR + bS + ((js* |R|* |S|)/bfrRS); (CS: Cost for sorting files)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-552
Using Selectivity and Cost Estimates in Query
Optimization (10)
Multiple Relation Queries and Join Ordering
 A query joining n relations will have n-1 join operations, and
hence can have a large number of different join orders when
we apply the algebraic transformation rules.

 Current query optimizers typically limit the structure of a


(join) query tree to that of left-deep (or right-deep) trees.

 Left-deep tree: a binary tree where the right child of each non-
leaf node is always a base relation.
– Amenable to pipelining
– Could utilize any access paths on the base relation (the right child)
when executing the join.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-553
9. Overview of Query Optimization in Oracle
Oracle DBMS V8
 Rule-based query optimization: the optimizer chooses
execution plans based on heuristically ranked operations.
(Currently it is being phased out)
 Cost-based query optimization: the optimizer examines
alternative access paths and operator algorithms and chooses
the execution plan with lowest estimate cost. The query cost is
calculated based on the estimated usage of resources such as
I/O, CPU and memory needed.
 Application developers could specify hints to the ORACLE
query optimizer. The idea is that an application developer
might know more information about the data.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 15-554
10. Semantic Query Optimization
 Semantic Query Optimization: Uses constraints
specified on the database schema in order to modify one
query into another query that is more efficient to execute.

 Consider the following SQL query,


SELECT E.LNAME, M.LNAME
FROM EMPLOYEE E M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY

Explanation: Suppose that we had a constraint on the database


schema that stated that no employee can earn more than his or her
direct supervisor. If the semantic query optimizer checks for the
existence of this constraint, it need not execute the query at all
because it knows that the result of the query will be empty.
Techniques known as theorem proving can be used for this
purpose.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 15-555
Copyright © 2004 Pearson Education, Inc.
Chapter 16

Practical Database Design


and Tuning

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline

1. Physical Database Design in Relational Databases


2. An Overview of Database Tuning in Relational
Systems.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 16-558
1. Physical Database Design in Relational
Databases(1)
Factors that Influence Physical Database Design
A. Analyzing the database queries and transactions
For each query, the following information is needed.
– The files that will be accessed by the query;
– The attributes on which any selection conditions for the
query are specified;
– The attributes on which any join conditions or conditions
to link multiple tables or objects for the query are
specified;
– The attributes whose values will be retrieved by the query.
Note: the attributes listed in items 2 and 3 above are candidates for
definition of access structures.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-559
Physical Database Design in Relational
Databases(2)
Factors that Influence Physical Database Design (cont.)
A. Analyzing the database queries and transactions (cont.)
For each update transaction or operation, the following
information is needed.
– The files that will be updated;
– The type of operation on each file (insert, update or delete);
– The attributes on which selection conditions for a delete or
update operation are specified;
– The attributes whose values will be changed by an update
operation.
Note: the attributes listed in items 3 above are candidates for definition of
access structures. However, the attributes listed in item 4 are candidates
for avoiding an access structure.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-560
Physical Database Design in Relational
Databases(3)
Factors that Influence Physical Database Design (cont.)
B. Analyzing the expected frequency of invocation of queries
and transactions
– The expected frequency information, along with the attribute
information collected on each query and transaction, is used to
compile a cumulative list of expected frequency of use for all
the queries and transactions.
– It is expressed as the expected frequency of using each
attribute in each file as a selection attribute or join attribute,
over all the queries and transactions.
– 80-20 rule
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-561
Physical Database Design in Relational
Databases(4)
Factors that Influence Physical Database Design (cont.)
C. Analyzing the time constraints of queries and transactions
– Performance constraints place further priorities on the attributes that are
candidates for access paths.
– The selection attributes used by queries and transactions with time
constraints become higher-priority candidates for primary access structure.
D. Analyzing the expected frequencies of update operations
A minimum number of access paths should be specified for a file that is
updated frequently.
E. Analyzing the uniqueness constraints on attributes.
Access paths should be specified on all candidate key attributes — or set of
attributes — that are either the primary key or constrained to be unique.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-562
Physical Database Design in Relational
Databases(5)
Physical Database Design Decisions
 Design decisions about indexing
1. Whether to index an attribute?
2. What attribute or attributes to index on?
3. Whether to set up a clustered index?
4. Whether to use a hash index over a tree index?
5. Whether to use dynamic hashing for the file?

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 16-563
Physical Database Design in Relational
Databases(6)
Physical Database Design Decisions (cont.)
 Denormalization as a design decision for speeding up
queries
– The goal of normalization is to separate the logically related
attributes into tables to minimize redundancy and thereby
avoid the update anomalies that cause an extra processing
overheard to maintain consistency of the database.
– The goal of denormalization is to improve the performance
of frequently occurring queries and transactions. (Typically
the designer adds to a table attributes that are needed for
answering queries or producing reports so that a join with
another table is avoided.)
– Trade off between update and query performance
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-564
2. An Overview of Database Tuning in
Relational Systems (1)
 Tuning: the process of continuing to revise/adjust the
physical database design by monitoring resource utilization as
well as internal DBMS processing to reveal bottlenecks such
as contention for the same data or devices.

 Goal:
– To make application run faster
– To lower the response time of queries/transactions
– To improve the overall throughput of transactions

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 16-565
An Overview of Database Tuning in
Relational Systems (2)
Statistics internally Statistics obtained from
collected in DBMSs: monitoring:
 Size of individual tables  Storage statistics
 Number of distinct values in a
 I/O and device performance
column
statistics
 The number of times a particular
query or transaction is  Query/transaction processing
submitted/executed in an interval statistics
of time  Locking/logging related
 The times required for different statistics
phases of query and transaction  Index statistics
processing

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 16-566
An Overview of Database Tuning in
Relational Systems (3)
Problems to be considered in tuning:
 How to avoid excessive lock contention?
 How to minimize overhead of logging and
unnecessary dumping of data?
 How to optimize buffer size and scheduling of
processes?
 How to allocate resources such as disks, RAM and
processes for most efficient utilization?

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 16-567
An Overview of Database Tuning in
Relational Systems (4)
Tuning Indexes
 Reasons to tuning indexes
– Certain queries may take too long to run for lack of an index;
– Certain indexes may not get utilized at all;
– Certain indexes may be causing excessive overhead because the index
is on an attribute that undergoes frequent changes

 Options to tuning indexes


– Drop or/and build new indexes
– Change a non-clustered index to a clustered index (and vice versa)
– Rebuilding the index

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 16-568
An Overview of Database Tuning in
Relational Systems (5)
Tuning the Database Design
 Dynamically changed processing requirements
need to be addressed by making changes to the
conceptual schema if necessary and to reflect those
changes into the logical schema and physical design.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 16-569
An Overview of Database Tuning in
Relational Systems (6)
Tuning the Database Design (cont.)
 Possible changes to the database design
– Existing tables may be joined (denormalized) because certain attributes
from two or more tables are frequently needed together.
– For the given set of tables, there may be alternative design choices, all
of which achieve 3NF or BCNF. One may be replaced by the other.
– A relation of the form R(K, A, B, C, D, …) that is in BCNF can be
stored into multiple tables that are also in BCNF by replicating the key
K in each table.
– Attribute(s) from one table may be repeated in another even though
this creates redundancy and potential anomalies.
– Apply horizontal partitioning as well as vertical partitioning if
necessary.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-570
An Overview of Database Tuning in
Relational Systems (7)
Tuning Queries
 Indications for tuning queries
– A query issues too many disk accesses
– The query plan shows that relevant indexes are not being
used.
 Typical instances for query tuning
1. Many query optimizers do not use indexes in the presence
of arithmetic expressions, numerical comparisons of
attributes of different sizes and precision, NULL
comparisons, and sub-string comparisons.
2. Indexes are often not used for nested queries using IN;
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-571
An Overview of Database Tuning in
Relational Systems (8)
Tuning Queries (cont.)
 Typical instances for query tuning (cont.)
3. Some DISTINCTs may be redundant and can be avoided
without changing the result.
4. Unnecessary use of temporary result tables can be avoided
by collapsing multiple queries into a single query unless the
temporary relation is needed for some intermediate
processing.
5. In some situations involving using of correlated queries,
temporaries are useful.
6. If multiple options for join condition are possible, choose
one that uses a clustering index and avoid those that contain
string comparisons.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-572
An Overview of Database Tuning in
Relational Systems (9)
Tuning Queries (cont.)
 Typical instances for query tuning (cont.)
7. The order of tables in the FROM clause may affect the join
processing.
8. Some query optimizers perform worse on nested queries
compared to their equivalent un-nested counterparts.
9. Many applications are based on views that define the data
of interest to those applications. Sometimes these views
become an overkill.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 16-573
An Overview of Database Tuning in
Relational Systems (10)
Additional Query Tuning Guidelines
1. A query with multiple selection conditions that are connected
via OR may not be prompting the query optimizer to use any
index. Such a query may be split up and expressed as a union
of queries, each with a condition on an attribute that causes an
index to be used.
2. Apply the following transformations
– NOT condition may be transformed into a positive expression.
– Embedded SELECT blocks may be replaced by joins.
– If an equality join is set up between two tables, the range predicate on
the joining attribute set up in one table may be repeated for the other
table
3. WHERE conditions may be rewritten to utilize the indexes on
multiple columns.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 16-574
Copyright © 2004 Pearson Education, Inc.
Chapter 17
Introduction to Transaction Processing
Concepts and Theory

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline

1 Introduction to Transaction Processing


2 Transaction and System Concepts
3 Desirable Properties of Transactions
4 Characterizing Schedules based on Recoverability
5 Characterizing Schedules based on Serializability
6 Transaction Support in SQL

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-577
1 Introduction to Transaction Processing (1)

 Single-User System: At most one user at a time can


use the system.
 Multiuser System: Many users can access the system
concurrently.
 Concurrency
– Interleaved processing: concurrent execution of
processes is interleaved in a single CPU
– Parallel processing: processes are concurrently
executed in multiple CPUs.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-578
Introduction to Transaction Processing (2)
 A Transaction: logical unit of database processing that
includes one or more access operations (read -retrieval,
write - insert or update, delete).
 A transaction (set of operations) may be stand-
alone specified in a high level language like SQL
submitted interactively, or may be embedded within a
program.
 Transaction boundaries: Begin and End transaction.
 An application program may contain several
transactions separated by the Begin and End
transaction boundaries.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 17-579
Introduction to Transaction Processing (3)
SIMPLE MODEL OF A DATABASE (for
purposes of discussing transactions):
 A database - collection of named data items
 Granularity of data - a field, a record , or a whole
disk block (Concepts are independent of granularity)
 Basic operations are read and write
– read_item(X): Reads a database item named X into a
program variable. To simplify our notation, we assume
that the program variable is also named X.
– write_item(X): Writes the value of program variable X
into the database item named X.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 17-580
Introduction to Transaction Processing (4)

READ AND WRITE OPERATIONS:


 Basic unit of data transfer from the disk to the
computer main memory is one block. In general, a
data item (what is read or written) will be the field of
some record in the database, although it may be a
larger unit such as a record or even a whole block.
 read_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk
block is not already in some main memory buffer).
3. Copy item X from the buffer to the program variable named X.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-581
Introduction to Transaction Processing (5)

READ AND WRITE OPERATIONS (cont.):


 write_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if
that disk block is not already in some main memory
buffer).
3. Copy item X from the program variable named X into
its correct location in the buffer.
4. Store the updated block from the buffer back to disk
(either immediately or at some later point in time).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-582
FIGURE 17.2
Two sample transactions. (a) Transaction T1.
(b) Transaction T2.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-583
Introduction to Transaction Processing (7)

Why Concurrency Control is needed:


 The Lost Update Problem.
This occurs when two transactions that access the same
database items have their operations interleaved in a way
that makes the value of some database item incorrect.
 The Temporary Update (or Dirty Read) Problem.
This occurs when one transaction updates a database item
and then the transaction fails for some reason (see Section
17.1.4). The updated item is accessed by another
transaction before it is changed back to its original value.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-584
Introduction to Transaction Processing (8)

Why Concurrency Control is needed (cont.):


 The Incorrect Summary Problem .
If one transaction is calculating an aggregate
summary function on a number of records while
other transactions are updating some of these
records, the aggregate function may calculate some
values before they are updated and others after they
are updated.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-585
FIGURE 17.3
Some problems that occur when concurrent execution
is uncontrolled. (a) The lost update problem.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-586
FIGURE 17.3 (continued)
Some problems that occur when concurrent execution
is uncontrolled. (b) The temporary update problem.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-587
FIGURE 17.3 (continued)
Some problems that occur when concurrent execution is
uncontrolled. (c) The incorrect summary problem.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-588
Introduction to Transaction Processing (11)
Why recovery is needed:
(What causes a Transaction to fail)
1. A computer failure (system crash): A hardware or
software error occurs in the computer system during
transaction execution. If the hardware crashes, the
contents of the computer’s internal memory may be lost.
2. A transaction or system error : Some operation in the
transaction may cause it to fail, such as integer overflow
or division by zero. Transaction failure may also occur
because of erroneous parameter values or because of a
logical programming error. In addition, the user may
interrupt the transaction during its execution.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-589
Introduction to Transaction Processing (12)
Why recovery is needed (cont.):
3. Local errors or exception conditions detected by the
transaction:
- certain conditions necessitate cancellation of the
transaction. For example, data for the transaction may not be
found. A condition, such as insufficient account balance in a
banking database, may cause a transaction, such as a fund
withdrawal from that account, to be canceled.
- a programmed abort in the transaction causes it to fail.
4. Concurrency control enforcement: The concurrency control
method may decide to abort the transaction, to be restarted
later, because it violates serializability or because several
transactions are in a state of deadlock (see Chapter 18).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-590
Introduction to Transaction Processing (13)

Why recovery is needed (cont.):


5. Disk failure: Some disk blocks may lose their data
because of a read or write malfunction or because of a
disk read/write head crash. This may happen during
a read or a write operation of the transaction.
6. Physical problems and catastrophes: This refers to an
endless list of problems that includes power or air-
conditioning failure, fire, theft, sabotage, overwriting
disks or tapes by mistake, and mounting of a wrong
tape by the operator.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-591
2 Transaction and System Concepts (1)

A transaction is an atomic unit of work that is either


completed in its entirety or not done at all. For
recovery purposes, the system needs to keep track of
when the transaction starts, terminates, and commits
or aborts.
Transaction states:
 Active state
 Partially committed state
 Committed state
 Failed state
 Terminated State

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-592
Transaction and System Concepts (2)
Recovery manager keeps track of the following operations:
 begin_transaction: This marks the beginning of transaction
execution.
 read or write: These specify read or write operations on the
database items that are executed as part of a transaction.
 end_transaction: This specifies that read and write
transaction operations have ended and marks the end limit
of transaction execution. At this point it may be necessary
to check whether the changes introduced by the transaction
can be permanently applied to the database or whether the
transaction has to be aborted because it violates
concurrency control or for some other reason.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-593
Transaction and System Concepts (3)

Recovery manager keeps track of the following operations


(cont):
 commit_transaction: This signals a successful end of the
transaction so that any changes (updates) executed by
the transaction can be safely committed to the database
and will not be undone.
 rollback (or abort): This signals that the transaction has
ended unsuccessfully, so that any changes or effects that
the transaction may have applied to the database must
be undone.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-594
Transaction and System Concepts (4)

Recovery techniques use the following operators:


 undo: Similar to rollback except that it applies
to a single operation rather than to a whole
transaction.
 redo: This specifies that certain transaction
operations must be redone to ensure that all the
operations of a committed transaction have been
applied successfully to the database.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-595
FIGURE 17.4
State transition diagram illustrating the states for
transaction execution.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-596
Transaction and System Concepts (6)

The System Log


 Log or Journal : The log keeps track of all transaction
operations that affect the values of database items. This
information may be needed to permit recovery from
transaction failures. The log is kept on disk, so it is not
affected by any type of failure except for disk or
catastrophic failure. In addition, the log is periodically
backed up to archival storage (tape) to guard against such
catastrophic failures.
 T in the following discussion refers to a unique
transaction-id that is generated automatically by the
system and is used to identify each transaction:

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-597
Transaction and System Concepts (7)
The System Log (cont):
Types of log record:
1. [start_transaction,T]: Records that transaction T has started
execution.
2. [write_item,T,X,old_value,new_value]: Records that
transaction T has changed the value of database item X from
old_value to new_value.
3. [read_item,T,X]: Records that transaction T has read the value
of database item X.
4. [commit,T]: Records that transaction T has completed
successfully, and affirms that its effect can be committed
(recorded permanently) to the database.
5. [abort,T]: Records that transaction T has been aborted.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-598
Transaction and System Concepts (8)
The System Log (cont):
 protocols for recovery that avoid cascading
rollbacks do not require that read operations
be written to the system log, whereas other
protocols require these entries for recovery.
 strict protocols require simpler write entries
that do not include new_value (see Section
17.4).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-599
Transaction and System Concepts (9)
Recovery using log records:
If the system crashes, we can recover to a consistent database
state by examining the log and using one of the techniques
described in Chapter 19.
1. Because the log contains a record of every write operation
that changes the value of some database item, it is possible
to undo the effect of these write operations of a transaction
T by tracing backward through the log and resetting all
items changed by a write operation of T to their old_values.
2. We can also redo the effect of the write operations of a
transaction T by tracing forward through the log and
setting all items changed by a write operation of T (that did
not get done permanently) to their new_values.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-600
Transaction and System Concepts (10)
Commit Point of a Transaction:
 Definition: A transaction T reaches its commit point
when all its operations that access the database have
been executed successfully and the effect of all the
transaction operations on the database has been
recorded in the log. Beyond the commit point, the
transaction is said to be committed, and its effect is
assumed to be permanently recorded in the database. The
transaction then writes an entry [commit,T] into the log.
 Roll Back of transactions: Needed for transactions that
have a [start_transaction,T] entry into the log but no
commit entry [commit,T] into the log.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-601
Transaction and System Concepts (11)
Commit Point of a Transaction (cont):
 Redoing transactions: Transactions that have written their
commit entry in the log must also have recorded all their
write operations in the log; otherwise they would not be
committed, so their effect on the database can be redone from
the log entries. (Notice that the log file must be kept on disk.
At the time of a system crash, only the log entries that have
been written back to disk are considered in the recovery process
because the contents of main memory may be lost.)
 Force writing a log: before a transaction reaches its commit
point, any portion of the log that has not been written to the
disk yet must now be written to the disk. This process is called
force-writing the log file before committing a transaction.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-602
3 Desirable Properties of Transactions (1)

ACID properties:
 Atomicity: A transaction is an atomic unit of
processing; it is either performed in its entirety
or not performed at all.

 Consistency preservation: A correct execution


of the transaction must take the database from
one consistent state to another.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-603
Desirable Properties of Transactions (2)

ACID properties (cont.):


 Isolation: A transaction should not make its updates
visible to other transactions until it is committed; this
property, when enforced strictly, solves the temporary
update problem and makes cascading rollbacks of
transactions unnecessary (see Chapter 21).
 Durability or permanency: Once a transaction changes
the database and the changes are committed, these
changes must never be lost because of subsequent
failure.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-604
4 Characterizing Schedules
based on Recoverability (1)
 Transaction schedule or history: When transactions are
executing concurrently in an interleaved fashion, the order of
execution of operations from the various transactions forms what
is known as a transaction schedule (or history).

 A schedule (or history) S of n transactions T1, T2, ..., Tn :


It is an ordering of the operations of the transactions subject to
the constraint that, for each transaction Ti that participates in S,
the operations of T1 in S must appear in the same order in which
they occur in T1. Note, however, that operations from other
transactions Tj can be interleaved with the operations of Ti in S.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-605
Characterizing Schedules
based on Recoverability (2)
Schedules classified on recoverability:
 Recoverable schedule: One where no transaction needs
to be rolled back.
A schedule S is recoverable if no transaction T in S commits
until all transactions T’ that have written an item that T reads
have committed.
 Cascadeless schedule: One where every transaction reads
only the items that are written by committed transactions.
Schedules requiring cascaded rollback: A schedule in
which uncommitted transactions that read an item from a
failed transaction must be rolled back.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-606
Characterizing Schedules
based on Recoverability (3)
Schedules classified on recoverability (cont.):
 Strict Schedules: A schedule in which a transaction
can neither read or write an item X until the last
transaction that wrote X has committed.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-607
5 Characterizing Schedules
based on Serializability (1)
 Serial schedule: A schedule S is serial if, for every
transaction T participating in the schedule, all the
operations of T are executed consecutively in the
schedule. Otherwise, the schedule is called nonserial
schedule.
 Serializable schedule: A schedule S is serializable
if it is equivalent to some serial schedule of the same
n transactions.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-608
Characterizing Schedules
based on Serializability (2)
 Result equivalent: Two schedules are called result
equivalent if they produce the same final state of the
database.
 Conflict equivalent: Two schedules are said to be
conflict equivalent if the order of any two conflicting
operations is the same in both schedules.
 Conflict serializable: A schedule S is said to be
conflict serializable if it is conflict equivalent to
some serial schedule S’.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 17-609
Characterizing Schedules
based on Serializability (3)
 Being serializable is not the same as being serial
 
 Being serializable implies that the schedule is a
correct schedule.
– It will leave the database in a consistent state.
– The interleaving is appropriate and will result in a
state as if the transactions were serially executed, yet
will achieve efficiency due to concurrent execution.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-610
Characterizing Schedules
based on Serializability (4)
 Serializability is hard to check.
– Interleaving of operations occurs in an operating
system through some scheduler
– Difficult to determine beforehand how the
operations in a schedule will be interleaved.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-611
Characterizing Schedules
based on Serializability (5)
Practical approach:
 Come up with methods (protocols) to ensure
serializability.
 It’s not possible to determine when a schedule begins
and when it ends. Hence, we reduce the problem of
checking the whole schedule to checking only a
committed project of the schedule (i.e. operations from
only the committed transactions.)
 Current approach used in most DBMSs:
– Use of locks with two phase locking

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-612
Characterizing Schedules
based on Serializability (6)
 View equivalence: A less restrictive definition of
equivalence of schedules

 View serializability: definition of serializability


based on view equivalence. A schedule is view
serializable if it is view equivalent to a serial
schedule.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-613
Characterizing Schedules
based on Serializability (7)
Two schedules are said to be view equivalent if the following three
conditions hold:
1. The same set of transactions participates in S and S’, and S and
S’ include the same operations of those transactions.
2. For any operation Ri(X) of Ti in S, if the value of X read by the
operation has been written by an operation Wj(X) of Tj (or if it
is the original value of X before the schedule started), the same
condition must hold for the value of X read by operation Ri(X)
of Ti in S’.
3. If the operation Wk(Y) of Tk is the last operation to write item
Y in S, then Wk(Y) of Tk must also be the last operation to
write item Y in S’.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-614
Characterizing Schedules
based on Serializability (8)
The premise behind view equivalence:
 As long as each read operation of a transaction reads
the result of the same write operation in both
schedules, the write operations of each transaction
musr produce the same results.
 “The view”: the read operations are said to see the
the same view in both schedules.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-615
Characterizing Schedules
based on Serializability (9)
Relationship between view and conflict equivalence:
 The two are same under constrained write assumption
which assumes that if T writes X, it is constrained by the
value of X it read; i.e., new X = f(old X)
 Conflict serializability is stricter than view serializability.
With unconstrained write (or blind write), a schedule that
is view serializable is not necessarily conflict serialiable.
 Any conflict serializable schedule is also view
serializable, but not vice versa.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-616
Characterizing Schedules
based on Serializability (10)
Relationship between view and conflict equivalence (cont):
Consider the following schedule of three transactions
T1: r1(X), w1(X); T2: w2(X); and T3: w3(X):
Schedule Sa: r1(X); w2(X); w1(X); w3(X); c1; c2; c3;

In Sa, the operations w2(X) and w3(X) are blind writes, since T1 and T3
do not read the value of X.

Sa is view serializable, since it is view equivalent to the serial schedule


T1, T2, T3. However, Sa is not conflict serializable, since it is not
conflict equivalent to any serial schedule.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-617
Characterizing Schedules
based on Serializability (11)
Testing for conflict serializability
Algorithm 17.1:
1. Looks at only read_Item (X) and write_Item (X) operations
2. Constructs a precedence graph (serialization graph) - a graph
with directed edges
3. An edge is created from Ti to Tj if one of the operations in Ti
appears before a conflicting operation in Tj
4. The schedule is serializable if and only if the precedence graph
has no cycles.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-618
FIGURE 17.7
Constructing the precedence graphs for schedules A and D from Figure
17.5 to test for conflict serializability. (a) Precedence graph for serial
schedule A. (b) Precedence graph for serial schedule B. (c) Precedence
graph for schedule C (not serializable). (d) Precedence graph for schedule
D (serializable, equivalent to schedule A).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-619
FIGURE 17.8
Another example of serializability testing. (a) The READ
and WRITE operations of three transactions T1, T2, and
T3.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-620
FIGURE 17.8 (continued)
Another example of serializability testing. (b) Schedule E.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-621
FIGURE 17.8 (continued)
Another example of serializability testing. (c) Schedule F.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-622
Characterizing Schedules
based on Serializability (14)
Other Types of Equivalence of Schedules
 Under special semantic constraints, schedules that
are otherwise not conflict serializable may work
correctly. Using commutative operations of addition
and subtraction (which can be done in any order)
certain non-serializable transactions may work
correctly

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-623
Characterizing Schedules
based on Serializability (15)
Other Types of Equivalence of Schedules (cont.)
Example: bank credit / debit transactions on a given item are
separable and commutative.
Consider the following schedule S for the two transactions:
Sh : r1(X); w1(X); r2(Y); w2(Y); r1(Y); w1(Y); r2(X); w2(X);
Using conflict serializability, it is not serializable.
However, if it came from a (read,update, write) sequence as follows:
r1(X); X := X – 10; w1(X); r2(Y); Y := Y – 20;r1(Y);
Y := Y + 10; w1(Y); r2(X); X := X + 20; (X);
Sequence explanation: debit, debit, credit, credit.
It is a correct schedule for the given semantics

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-624
6 Transaction Support in SQL2 (1)

 A single SQL statement is always considered to be


atomic. Either the statement completes execution
without error or it fails and leaves the database
unchanged.
 With SQL, there is no explicit Begin Transaction
statement. Transaction initiation is done implicitly
when particular SQL statements are encountered.
 Every transaction must have an explicit end
statement, which is either a COMMIT or
ROLLBACK.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 17-625
Transaction Support in SQL2 (2)
Characteristics specified by a SET
TRANSACTION statement in SQL2:
 Access mode: READ ONLY or READ WRITE. The
default is READ WRITE unless the isolation level of
READ UNCOMITTED is specified, in which case
READ ONLY is assumed.
 Diagnostic size n, specifies an integer value n,
indicating the number of conditions that can be held
simultaneously in the diagnostic area. (Supply user
feedback information)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-626
Transaction Support in SQL2 (3)
Characteristics specified by a SET
TRANSACTION statement in SQL2 (cont.):
 Isolation level <isolation>, where <isolation> can be
READ UNCOMMITTED, READ COMMITTED,
REPEATABLE READ or SERIALIZABLE. The
default is SERIALIZABLE.
With SERIALIZABLE: the interleaved execution of
transactions will adhere to our notion of
serializability. However, if any transaction executes
at a lower level, then serializability may be violated.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-627
Transaction Support in SQL2 (4)
Potential problem with lower isolation levels:
 Dirty Read: Reading a value that was written by a
transaction which failed.
 Nonrepeatable Read: Allowing another transaction to write
a new value between multiple reads of one transaction.
A transaction T1 may read a given value from a table.
If another transaction T2 later updates that value and T1
reads that value again, T1 will see a different value.
Consider that T1 reads the employee salary for Smith.
Next, T2 updates the salary for Smith. If T1 reads Smith's
salary again, then it will see a different value for Smith's
salary.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-628
Transaction Support in SQL2 (5)
Potential problem with lower isolation levels
(cont.):
 Phantoms: New rows being read using the same read
with a condition.
A transaction T1 may read a set of rows from a
table, perhaps based on some condition specified in
the SQL WHERE clause. Now suppose that a
transaction T2 inserts a new row that also satisfies
the WHERE clause condition of T1, into the table
used by T1. If T1 is repeated, then T1 will see a row
that previously did not exist, called a phantom.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 17-629
Transaction Support in SQL2 (6)
Sample SQL transaction:
EXEC SQL whenever sqlerror go to UNDO;
 EXEC SQL SET TRANSACTION
READ WRITE
DIAGNOSTICS SIZE 5
ISOLATION LEVEL SERIALIZABLE;
 EXEC SQL INSERT
INTO EMPLOYEE (FNAME, LNAME, SSN, DNO, SALARY)
VALUES ('Robert','Smith','991004321',2,35000);
EXEC SQL UPDATE EMPLOYEE
SET SALARY = SALARY * 1.1
WHERE DNO = 2;
EXEC SQL COMMIT;
GOTO THE_END;  
UNDO: EXEC SQL ROLLBACK;
THE_END: ...

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-630
Transaction Support in SQL2 (7)
Possible violation of serializabilty:

Type of Violation
___________________________________
Isolation Dirty nonrepeatable
level read read phantom
_____________________ _____ _________ ____________________
READ UNCOMMITTED yes yes yes
READ COMMITTED no yes yes
REPEATABLE READ no no yes
SERIALIZABLE no no no

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 17-631
Copyright © 2004 Pearson Education, Inc.
Chapter 18
Concurrency Control
Techniques

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter 18 Outline
Databases Concurrency Control
1 Purpose of Concurrency Control
2 Two-Phase locking
5 Limitations of CCMs
6 Index Locking
7 Lock Compatibility Matrix
8 Lock Granularity

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 18-634
Database Concurrency Control
1 Purpose of Concurrency Control

• To enforce Isolation (through mutual exclusion) among


conflicting transactions.
• To preserve database consistency through consistency
preserving execution of transactions.
• To resolve read-write and write-write conflicts.

Example: In concurrent execution environment if T1


conflicts with T2 over a data item A, then the existing
concurrency control decides if T1 or T2 should get the A
and if the other transaction is rolled-back or waits.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 18-635
Database Concurrency Control

Two-Phase Locking Techniques


Locking is an operation which secures (a) permission to Read
or (b) permission to Write a data item for a transaction.
Example: Lock (X). Data item X is locked in behalf of the
requesting transaction.

Unlocking is an operation which removes these permissions


from the data item. Example: Unlock (X). Data item X is
made available to all other transactions.
Lock and Unlock are Atomic operations.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 18-636
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
Two locks modes (a) shared (read) and (b) exclusive (write).
Shared mode: shared lock (X). More than one transaction can apply
share lock on X for reading its value but no write lock can be applied
on X by any other transaction.
Exclusive mode: Write lock (X). Only one write lock on X can exist
at any time and no shared lock can be applied by any other
transaction on X.
Conflict matrix
Read Write
Read

Y N
Write

N N

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-637
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
Lock Manager: Managing locks on data items.
Lock table: Lock manager uses it to store the identify of
transaction locking a data item, the data item, lock
mode and pointer to the next data item locked. One
simple way to implement a lock table is through linked
list.

Transaction ID Data item id lock mode Ptr to next data item


T1 X1 Read Next

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-638
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
Database requires that all transactions should be well-
formed. A transaction is well-formed if:

• It must lock the data item before it reads or writes


to it.
• It must not lock an already locked data items and it
must not try to unlock a free data item.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-639
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
The following code performs the lock operation:

B: if LOCK (X) = 0 (*item is unlocked*)


then LOCK (X)  1 (*lock the item*)
else begin
wait (until lock (X) = 0) and
the lock manager wakes up the transaction);
goto B
end;
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 18-640
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
The following code performs the unlock operation:

LOCK (X)  0 (*unlock the item*)


if any transactions are waiting then
wake up one of the waiting the transactions;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-641
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
The following code performs the read operation:
B: if LOCK (X) = “unlocked” then
begin LOCK (X)  “read-locked”;
no_of_reads (X)  1;
end
else if LOCK (X)  “read-locked” then
no_of_reads (X)  no_of_reads (X) +1
else begin wait (until LOCK (X) = “unlocked” and
the lock manager wakes up the transaction);
go to B
end;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-642
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
The following code performs the write lock operation:
B: if LOCK (X) = “unlocked” then
begin LOCK (X)  “read-locked”;
no_of_reads (X)  1;
end
else if LOCK (X)  “read-locked” then
no_of_reads (X)  no_of_reads (X) +1
else begin wait (until LOCK (X) = “unlocked” and
the lock manager wakes up the transaction);
go to B
end;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-643
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
The following code performs the unlock operation:
if LOCK (X) = “write-locked” then
begin LOCK (X)  “unlocked”;
wakes up one of the transactions, if any
end
else if LOCK (X)  “read-locked” then
begin
no_of_reads (X)  no_of_reads (X) -1
if no_of_reads (X) = 0 then
begin
LOCK (X) = “unlocked”;
wake up one of the transactions, if any
end
end;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-644
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: Essential components
Lock conversion
Lock upgrade: existing read lock to write lock
if Ti has a read-lock (X) and Tj has no read-lock (X) (i  j) then
convert read-lock (X) to write-lock (X)
else
force Ti to wait until Tj unlocks X

Lock downgrade: existing write lock to read lock


Ti has a write-lock (X) (*no transaction can have any lock on X*)
convert write-lock (X) to read-lock (X)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-645
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: The algorithm

Two Phases: (a) Locking (Growing) (b) Unlocking (Shrinking).


Locking (Growing) Phase: A transaction applies locks (read or write) on
desired data items one at a time.
Unlocking (Shrinking) Phase: A transaction unlocks its locked data items one
at a time.
Requirement: For a transaction these two phases must be mutually
exclusively, that is, during locking phase unlocking phase must not start and
during unlocking phase locking phase must not begin.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-646
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: The algorithm

T1 T2 Result
read_lock (Y); read_lock (X); Initial values: X=20; Y=30
read_item (Y); read_item (X); Result of serial execution
unlock (Y); unlock (X); T1 followed by T2
write_lock (X); Write_lock (Y); X=50, Y=80.
read_item (X); read_item (Y); Result of serial execution
X:=X+Y; Y:=X+Y; T2 followed by T1
write_item (X); write_item (Y); X=70, Y=50
unlock (X); unlock (Y);

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-647
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: The algorithm
T1 T2 Result
read_lock (Y); X=50; Y=50
read_item (Y); Nonserializable because it.
unlock (Y); violated two-phase policy.
read_lock (X);
Time read_item (X);
unlock (X);
write_lock (Y);
read_item (Y);
Y:=X+Y;
write_item (Y);
unlock (Y);
write_lock (X);
read_item (X);
X:=X+Y;
write_item (X);
unlock (X);
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 18-648
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: The algorithm

T’1 T’2
read_lock (Y); read_lock (X); T1 and T2 follow two-phase
read_item (Y); read_item (X); policy but they are subject to
write_lock (X); Write_lock (Y); deadlock, which must be
unlock (Y); unlock (X); dealt with.
read_item (X); read_item (Y);
X:=X+Y; Y:=X+Y;
write_item (X); write_item (Y);
unlock (X); unlock (Y);

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-649
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Two-Phase Locking Techniques: The algorithm

Two-phase policy generates two locking algorithms (a) Basic and (b)
Conservative.
Conservative: Prevents deadlock by locking all desired data items before
transaction begins execution.
Basic: Transaction locks data items incrementally. This may cause deadlock
which is dealt with.
Strict: A more stricter version of Basic algorithm where unlocking is
performed after a transaction terminates (commits or aborts and rolled-back).
This is the most commonly used two-phase locking algorithm.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-650
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Dealing with Deadlock and Starvation

Deadlock
T’1 T’2
read_lock (Y); T1 and T2 did follow two-phase
read_item (Y); policy but they are deadlock
read_lock (X);
read_item (Y);
write_lock (X);
(waits for X) write_lock (Y);
(waits for Y)
Deadlock (T’1 and T’2)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-651
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Dealing with Deadlock and Starvation

Deadlock prevention
A transaction locks all data items it refers to before it begins execution.
This way of locking prevents deadlock since a transaction never waits
for a data item. The conservative two-phase locking uses this approach.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-652
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Dealing with Deadlock and Starvation

Deadlock detection and resolution


In this approach, deadlocks are allowed to happen. The scheduler
maintains a wait-for-graph for detecting cycle. If a cycle exists, then
one transaction involved in the cycle is selected (victim) and rolled-
back.
A wait-for-graph is created using the lock table. As soon as a
transaction is blocked, it is added to the graph. When a chain like: Ti
waits for Tj waits for Tk waits for Ti or Tj occurs, then this creates a
cycle. One of the transaction of the cycle is selected and rolled back.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-653
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Dealing with Deadlock and Starvation

Deadlock avoidance
There are many variations of two-phase locking algorithm. Some avoid
deadlock by not letting the cycle to complete. That is as soon as the
algorithm discovers that blocking a transaction is likely to create a
cycle, it rolls back the transaction. Wound-Wait and Wait-Die
algorithms use timestamps to avoid deadlocks by rolling-back
victim.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-654
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Dealing with Deadlock and Starvation

Starvation
Starvation occurs when a particular transaction consistently waits or
restarted and never gets a chance to proceed further. In a deadlock
resolution it is possible that the same transaction may consistently be
selected as victim and rolled-back. This limitation is inherent in all
priority based scheduling mechanisms. In Wound-Wait scheme a
younger transaction may always be wounded (aborted) by a long
running older transaction which may create starvation.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-655
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Timestamp based concurrency control algorithm

Timestamp
A monotonically increasing variable (integer) indicating the age of an
operation or a transaction. A larger timestamp value indicates a more
recent event or operation.
Timestamp based algorithm uses timestamp to serialize the execution of
concurrent transactions.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-656
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Timestamp based concurrency control algorithm
Basic Timestamp Ordering
1. Transaction T issues a write_item(X) operation:
a. If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then an younger
transaction has already read the data item so abort and roll-back T
and reject the operation.
b. If the condition in part (a) does not exist, then execute write_item(X)
of T and set write_TS(X) to TS(T).
2. Transaction T issues a read_item(X) operation:
a. If write_TS(X) > TS(T), then an younger transaction has already
written to the data item so abort and roll-back T and reject the
operation.
b. If write_TS(X)  TS(T), then execute read_item(X) of T and set
read_TS(X) to the larger of TS(T) and the current read_TS(X).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-657
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Timestamp based concurrency control algorithm

Strict Timestamp Ordering


1. Transaction T issues a write_item(X) operation:
a. If TS(T) > read_TS(X), then delay T until the transaction T’ that
wrote or read X has terminated (committed or aborted).
2. Transaction T issues a read_item(X) operation:
a. If TS(T) > write_TS(X), then delay T until the transaction T’ that
wrote or read X has terminated (committed or aborted).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-658
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Timestamp based concurrency control algorithm

Thomas’s Write Rule


1. If read_TS(X) > TS(T) then abort and roll-back T and reject the
operation.
2. If write_TS(X) > TS(T), then just ignore the write operation and
continue execution. This is because the most recent writes counts
in case of two consecutive writes.
3. If the conditions given in 1 and 2 above do not occur, then
execute write_item(X) of T and set write_TS(X) to TS(T).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-659
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Multiversion concurrency control techniques

Concept

This approach maintains a number of versions of a data item


and allocates the right version to a read operation of a
transaction. Thus unlike other mechanisms a read operation in
this mechanism is never rejected.
Side effect: Significantly more storage (RAM and disk) is
required to maintain multiple versions. To check unlimited
growth of versions, a garbage collection is run when some
criteria is satisfied.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-660
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Multiversion technique based on timestamp ordering

This approach maintains a number of versions of a data item


and allocates the right version to a read operation of a
transaction. Thus unlike other mechanisms a read operation in
this mechanism is never rejected.
Side effects: Significantly more storage (RAM and disk) is
required to maintain multiple versions. To check unlimited
growth of versions, a garbage collection is run when some
criteria is satisfied.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-661
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Multiversion technique based on timestamp ordering

Assume X1, X2, …, Xn are the version of a data item X created


by a write operation of transactions. With each Xi a read_TS
(read timestamp) and a write_TS (write timestamp) are
associated.
read_TS(Xi): The read timestamp of Xi is the largest of all the
timestamps of transactions that have successfully read version Xi.
write_TS(Xi): The write timestamp of Xi that wrote the value of
version Xi.
A new version of Xi is created only by a write operation.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-662
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Multiversion technique based on timestamp ordering

To ensure serializability, the following two rules are used.


If transaction T issues write_item (X) and version i of X has the
highest write_TS(Xi) of all versions of X that is also less than or
equal to TS(T), and read _TS(Xi) > TS(T), then abort and roll-
back T; otherwise create a new version Xi and read_TS(X) =
write_TS(Xj) = TS(T).

If transaction T issues read_item (X), find the version i of X that


has the highest write_TS(Xi) of all versions of X that is also less
than or equal to TS(T), then return the value of Xi to T, and set
the value of read _TS(Xi) to the largest of TS(T) and the current
read_TS(Xi).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-663
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Multiversion technique based on timestamp ordering
To ensure serializability, the following two rules are used.
1. If transaction T issues write_item (X) and version i of X has
the highest write_TS(Xi) of all versions of X that is also less
than or equal to TS(T), and read _TS(Xi) > TS(T), then abort
and roll-back T; otherwise create a new version Xi and
read_TS(X) = write_TS(Xj) = TS(T).

2. If transaction T issues read_item (X), find the version i of X


that has the highest write_TS(Xi) of all versions of X that is
also less than or equal to TS(T), then return the value of Xi
to T, and set the value of read _TS(Xi) to the largest of
TS(T) and the current read_TS(Xi).

Rule 2 guarantees that a read will never be rejected.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-664
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Multiversion Two-Phase Locking Using Certify Locks
Concept
Allow a transaction T’ to read a data item X while it is write
locked by a conflicting transaction T.
This is accomplished by maintaining two versions of each
data item X where one version must always have been
written by some committed transaction. This means a write
operation always creates a new version of X.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-665
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Multiversion Two-Phase Locking Using Certify Locks
Steps
1. X is the committed version of a data item.
2. T creates a second version X’ after obtaining a write lock on X.
3. Other transactions continue to read X.
4. T is ready to commit so it obtains a certify lock on X’.
5. The committed version X becomes X’.
6. T releases its certify lock on X’, which is X now.

Compatibility tables for


Read Write Read Write Certify
Read yes no Read yes no no
Write no no Write no no no
Certify no no no

read/write locking scheme read/write/certify locking scheme

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-666
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Multiversion Two-Phase Locking Using Certify Locks

Note
In multiversion 2PL read and write operations from conflicting
transactions can be processed concurrently. This improves
concurrency but it may delay transaction commit because of
obtaining certify locks on all its writes. It avoids cascading abort but
like strict two phase locking scheme conflicting transactions may get
deadlocked.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-667
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Validation (Optimistic) Concurrency Control Schemes

In this technique only at the time of commit serializability is checked


and transactions are aborted in case of non-serializable schedules.
Three phases:
Read phase: A transaction can read values of committed data items.
However, updates are applied only to local copies (versions) of the
data items (in database cache).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-668
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Validation (Optimistic) Concurrency Control Schemes
Validation phase: Serializability is checked before transactions write
their updates to the database.
This phase for Ti checks that, for each transaction Tj that is
either committed or is in its validation phase, one of the
following conditions holds:
1. Tj completes its write phase before Ti starts its read phase.
2. Ti starts its write phase after Tj completes its write phase,
and the read_set of Ti has no items in common with the
write_set of Tj
3. Both the read_set and write_set of Ti have no items in
common with the write_set of Tj, and Tj completes its ead
phase.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-669
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Validation (Optimistic) Concurrency Control Schemes

When validating Ti, the first condition is checked first for each
transaction Tj, since (1) is the simplest condition to check. If (1) is
false then (2) is checked and if (2) is false then (3 ) is checked. If
none of these conditions holds, the validation fails and Ti is aborted.

Write phase: On a successful validation transactions’ updates are


applied to the database; otherwise, transactions are restarted.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-670
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Granularity of data items and Multiple Granularity Locking

A lockable unit of data defines its granularity. Granularity can be


coarse (entire database) or it can be fine (a tuple or an attribute of a
relation). Data item granularity significantly affects concurrency
control performance. Thus, the degree of concurrency is low for
coarse granularity and high for fine granularity. Example of data
item granularity:
1. A field of a database record (an attribute of a tuple).
2. A database record (a tuple or a relation).
3. A disk block.
4. An entire file.
5. The entire atabase.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 18-671
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Granularity of data items and Multiple Granularity Locking

The following diagram illustrates a hierarchy of granularity from


coarse (database) to fine (record).

DB

f1 f2

p11 p12 ... p1n p11 p12 ... p1n

r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-672
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Granularity of data items and Multiple Granularity Locking

To manage such hierarchy, in addition to read and write,


three additional locking modes, called intention lock
modes are defined:
Intention-shared (IS): indicates that a shared lock(s) will be
requested on some descendent nodes(s).
Intention-exclusive (IX): indicates that an exclusive lock(s)
will be requested on some descendent nodes(s).
Shared-intention-exclusive (SIX): indicates that the current
node is locked in shared mode but an exclusive lock(s)
will be requested on some descendent nodes(s).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-673
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Granularity of data items and Multiple Granularity Locking

These locks are applied using the following compatibility


matrix:

IS IX S SIX X
IS yes yes yes yes no
IX yes yes no no no
S yes no yes no no
SIX yes no no no no
X no no no no no

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-674
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Granularity of data items and Multiple Granularity Locking
The set of rules which must be followed for producing
serializable schedule are
1. The lock compatibility must adhered to.
2. The root of the tree must be locked first, in any mode..
3. A node N can be locked by a transaction T in S or IX mode
only if the parent node is already locked by T in either IS or
IX mode.
4. A node N can be locked by T in X, IX, or SIX mode only if
the parent of N is already locked by T in either IX or SIX
mode.
5. T can lock a node only if it has not unlocked any node (to
enforce 2PL policy).
6. T can unlock a node, N, only if none of the children of N are
currently locked by T.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-675
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Granularity of data items and Multiple Granularity Locking
An example of a serializable execution:
T1 T2 T3
IX(db)
IX(f1)
IX(db)
IS(db)
IS(f1)
IS(p11)
IX(p11)
X(r111)
IX(f1)
X(p12)
S(r11j)
IX(f2)
IX(p21)
IX(r211)
Unlock (r211)
Unlock (p21)
Unlock (f2)
S(f2)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 18-676
Copyright © 2004 Pearson Education, Inc.
Database Concurrency Control
Granularity of data items and Multiple Granularity Locking

An example of a serializable execution (continued):


T1 T2 T3
unlock(p12)
unlock(f1)
unlock(db)
unlock(r111)
unlock(p11)
unlock(f1)
unlock(db)
unlock (r111j)
unlock (p11)
unlock (f1)
unlock(f2)
unlock(db)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 18-677
Copyright © 2004 Pearson Education, Inc.
Copyright © 2004 Pearson Education, Inc.
Chapter 19
Database Recovery
Techniques

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter 19 Outline
Databases Recovery
1 Purpose of Database Recovery
2 Types of Failure
3 Transaction Log
4 Data Updates
5 Data Caching
6 Transaction Roll-back (Undo) and Roll-Forward
7 Checkpointing
8 Recovery schemes
9 ARIES Recovery Scheme
10 Recovery in Multidatabase System

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-680
Database Recovery
1 Purpose of Database Recovery
• To bring the database into the last consistent state,
which existed prior to the failure.
• To preserve transaction properties (Atomicity,
Consistency, Isolation and Durability).

Example: If the system crashes before a fund transfer


transaction completes its execution, then either one or both
accounts may have incorrect value. Thus, the database
must be restored to the state before the transaction modified
any of the accounts.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-681
Database Recovery

2 Types of Failure
The database may become unavailable for use due to
• Transaction failure: Transactions may fail because of
incorrect input, deadlock, incorrect synchronization.
• System failure: System may fail because of addressing
error, application error, operating system fault, RAM
failure, etc.
• Media failure: Disk head crash, power disruption, etc.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-682
Database Recovery
3 Transaction Log
For recovery from any type of failure data values prior to
modification (BFIM - BeFore Image) and the new value after
modification (AFIM – AFter Image) are required. These values
and other information is stored in a sequential file called
Transaction log. A sample log is given below. Back P and Next P
point to the previous and next log records of the same transaction.

T ID Back P Next P Operation Data item BFIM AFIM


T1 0 1 Begin
T1 1 4 Write X X = 100 X = 200
T2 0 8 Begin
T1 2 5 W Y Y = 50 Y = 100
T1 4 7 R M M = 200 M = 200
T3 0 9 R N N = 400 N = 400
T1 5 nil End

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-683
Copyright © 2004 Pearson Education, Inc.
Database Recovery
4 Data Update
• Immediate Update: As soon as a data item is
modified in cache, the disk copy is updated.
• Deferred Update: All modified data items in the
cache is written either after a transaction ends its
execution or after a fixed number of transactions
have completed their execution.
• Shadow update: The modified version of a data
item does not overwrite its disk copy but is written at
a separate disk location.
• In-place update: The disk version of the data item is
overwritten by the cache version.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-684
Copyright © 2004 Pearson Education, Inc.
Database Recovery
5 Data Caching
Data items to be modified are first stored into database
cache by the Cache Manager (CM) and after
modification they are flushed (written) to the disk. The
flushing is controlled by Modified and Pin-Unpin bits.
Pin-Unpin: Instructs the operating system not to flush
the data item.
Modified: Indicates the AFIM of the data item.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-685
Copyright © 2004 Pearson Education, Inc.
Database Recovery
6 Transaction Roll-back (Undo) and Roll-Forward (Redo)
To maintain atomicity, a transaction’s operations are redone
or undone.
Undo: Restore all BFIMs on to disk (Remove all AFIMs).
Redo: Restore all AFIMs on to disk.
Database recovery is achieved either by performing only
Undos or only Redos or by a combination of the two. These
operations are recorded in the log as they happen.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-686
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Roll-back

We show the process of roll-back with the help of the following three transactions T1,
and T2 and T3.

T1 T2 T3
read_item (A) read_item (B) read_item (C)
read_item (D) write_item (B) write_item (B)
write_item (D) read_item (D) read_item (A)
write_item (A) write_item (A)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-687
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Roll-back: One execution of T1, T2 and T3 as recorded in the log.
A B C D
30 15 40 20

[start_transaction, T3]
[read_item, T3, C]
* [write_item, T3, B, 15, 12] 12
[start_transaction,T2]
[read_item, T2, B]
** [write_item, T2, B, 12, 18] 18
[start_transaction,T1]
[read_item, T1, A]
[read_item, T1, D]
[write_item, T1, D, 20, 25] 25
[read_item, T2, D]
** [write_item, T2, D, 25, 26] 26
[read_item, T3, A]
---- system crash ----
* T3 is rolled back because it did not reach its commit point.
** T2 is rolled back because it reads the value of item B written by T3.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 19-688
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Roll-back: One execution of T1, T2 and T3 as recorded in the log.

T3 READ(C) WRITE(B) READ(A)


BEGIN READ(B) WRITE(B) READ(D) WRITE(D)
T2
BEGIN READ(A) READ(D) WRITE(D)
T1
BEGIN
Time
system crash

Illustrating cascading roll-back

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-689
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Write-Ahead Logging
When in-place update (immediate or deferred) is used then
log is necessary for recovery and it must be available to
recovery manager. This is achieved by Write-Ahead
Logging (WAL) protocol. WAL states that
For Undo: Before a data item’s AFIM is flushed to the
database disk (overwriting the BFIM) its BFIM must be
written to the log and the log must be saved on a stable store
(log disk).
For Redo: Before a transaction executes its commit operation,
all its AFIMs must be written to the log and the log must be
saved on a stable store.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-690
Copyright © 2004 Pearson Education, Inc.
Database Recovery
7 Checkpointing
Time to time (randomly or under some criteria) the database
flushes its buffer to database disk to minimize the task of
recovery. The following steps defines a checkpoint operation:
1. Suspend execution of transactions temporarily.
2. Force write modified buffer data to disk.
3. Write a [checkpoint] record to the log, save the log to disk.
4. Resume normal transaction execution.
During recovery redo or undo is required to transactions
appearing after [checkpoint] record.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-691
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Steal/No-Steal and Force/No-Force
Possible ways for flushing database cache to database disk:
Steal: Cache can be flushed before transaction commits.
No-Steal: Cache cannot be flushed before transaction commit.
Force: Cache is immediately flushed (forced) to disk.
No-Force: Cache is deferred until transaction commits.
These give rise to four different ways for handling recovery:
Steal/No-Force (Undo/Redo), Steal/Force (Undo/No-redo),
No-Steal/No-Force (Redo/No-undo) and No-Steal/Force (No-
undo/No-redo).
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 19-692
Copyright © 2004 Pearson Education, Inc.
Database Recovery
8 Recovery Scheme
Deferred Update (No Undo/Redo)
The data update goes as follows:
1. A set of transactions records their updates in the log.
2. At commit point under WAL scheme these updates are
saved on database disk.
After reboot from a failure the log is used to redo all the
transactions affected by this failure. No undo is required
because no AFIM is flushed to the disk before a transaction
commits.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-693
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Deferred Update in a single-user system

There is no concurrent data sharing in a single user


system. The data update goes as follows:
1. A set of transactions records their updates in the log.
2. At commit point under WAL scheme these updates are
saved on database disk.
After reboot from a failure the log is used to redo all the
transactions affected by this failure. No undo is required
because no AFIM is flushed to the disk before a transaction
commits.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 19-694
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Deferred Update in a single-user system
(a) T1 T2
read_item (A) read_item (B)
read_item (D) write_item (B)
write_item (D) read_item (D)
write_item (A)
(b)
[start_transaction, T1]
[write_item, T1, D, 20]
[commit T1]
[start_transaction, T1]
[write_item, T2, B, 10]
[write_item, T2, D, 25]  system crash

The [write_item, …] operations of T1 are redone.


T2 log entries are ignored by the recovery manager.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 19-695
Copyright © 2004 Pearson Education, Inc.
Database Recovery
Deferred Update with concurrent users
This environment requires some concurrency control mechanism to
guarantee isolation property of transactions. In a system recovery
transactions which were recorded in the log after the last checkpoint were
redone. The recovery manager may scan some of the transactions
recorded before the checkpoint to get the AFIMs.

T1
T2
T3
T4
T5
t1 Time t2
checkpoint system crash

Recovery in a concurrent users environment.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-696
Database Recovery
Deferred Update with concurrent users
(a) T1 T2 T3 T4
read_item (A) read_item (B) read_item (A) read_item (B)
read_item (D) write_item (B) write_item (A) write_item (B)
write_item (D) read_item (D) read_item (C) read_item (A)
write_item (D) write_item (C) write_item (A)

(b) [start_transaction, T1]


[write_item, T1, D, 20]
[commit, T1]
[checkpoint]
[start_transaction, T4]
[write_item, T4, B, 15]
[write_item, T4, A, 20]
[commit, T4]
[start_transaction T2]
[write_item, T2, B, 12]
[start_transaction, T3]
[write_item, T3, A, 30]
[write_item, T2, D, 25]  system crash

T2 and T3 are ignored because they did not reach their commit points.
T4 is redone because its commit point is after the last checkpoint.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-697
Database Recovery
Deferred Update with concurrent users
Two tables are required for implementing this protocol:

Active table: All active transactions are entered in this table.


Commit table: Transactions to be committed are entered in this table.

During recovery, all transactions of the commit table are redone and all
transactions of active tables are ignored since none of their AFIMs
reached the database. It is possible that a commit table transaction may
be redone twice but this does not create any inconsistency because of a
redone is “idempotent”, that is, one redone for an AFIM is equivalent to
multiple redone for the same AFIM.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-698
Database Recovery
Recovery Techniques Based on Immediate Update

Undo/No-redo Algorithm

In this algorithm AFIMs of a transaction are flushed to the database disk


under WAL before it commits. For this reason the recovery manager
undoes all transactions during recovery. No transaction is redone. It is
possible that a transaction might have completed execution and ready to
commit but this transaction is also undone.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-699
Database Recovery
Recovery Techniques Based on Immediate Update

Undo/Redo Algorithm (Single-user environment)

Recovery schemes of this category apply undo and also redo for
recovery. In a single-user environment no concurrency control is
required but a log is maintained under WAL. Note that at any time there
will be one transaction in the system and it will be either in the commit
table or in the active table. The recovery manager performs:

1. Undo of a transaction if it is in the active table.


2. Redo of a transaction if it is in the commit table.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-700
Database Recovery
Recovery Techniques Based on Immediate Update
Undo/Redo Algorithm (Concurrent execution)
Recovery schemes of this category applies undo and also redo to recover
the database from failure. In concurrent execution environment a
concurrency control is required and log is maintained under WAL.
Commit table records transactions to be committed and active table
records active transactions. To minimize the work of the recovery
manager checkpointing is used. The recovery performs:

1. Undo of a transaction if it is in the active table.


2. Redo of a transaction if it is in the commit table.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-701
Database Recovery
Shadow Paging
The AFIM does not overwrite its BFIM but recorded at another place on
the disk. Thus, at any time a data item has AFIM and BFIM (Shadow
copy of the data item) at two different places on the disk.

X Y
X' Y'

Database

X and Y: Shadow copies of data items


X` and Y`: Current copies of data items

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-702
Database Recovery
Shadow Paging
To manage access of data items by concurrent transactions two
directories (current and shadow) are used. The directory arrangement is
illustrated below. Here a page is a data item.
Current Directory Shadow Directory
(after updating pages 2, 5) (not updated)
Page 5 (old)
1 Page 1 1
2 Page 4 2
3 Page 2 (old) 3
4 Page 3 4
5 Page 6 5
6 Page 2 (new) 6
Page 5 (new)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-703
Database Recovery
9 The ARIES Recovery Algorithm
The ARIES Recovery Algorithm is based on:

1. WAL (Write Ahead Logging)


2. Repeating history during redo: ARIES will retrace all actions of
the database system prior to the crash to reconstruct the database
state when the crash occurred.
3. Logging changes during undo: It will prevent ARIES from
repeating the completed undo operations if a failure occurs
during recovery, which causes a restart of the recovery process.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-704
Database Recovery
The ARIES Recovery Algorithm

The ARIES recovery algorithm consists of three steps:

1. Analysis: step identifies the dirty (updated) pages in the buffer


and the set of transactions active at the time of crash. The
appropriate point in the log where redo is to start is also
determined.
2. Redo: necessary redo operations are applied.
3. Undo: log is scanned backwards and the operations of
transactions active at the time of crash are undone in reverse
order.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-705
Database Recovery
The ARIES Recovery Algorithm
The Log and Log Sequence Number (LSN)
A log record is written for (a) data update, (b) transaction commit,
(c) transaction abort, (d) undo, and (e) transaction end. In the case
of undo a compensating log record is written.

A unique LSN is associated with every log record. LSN increases


monotonically and indicates the disk address of the log record it is
associated with. In addition, each data page stores the LSN of the
latest log record corresponding to a change for that page.

A log record stores (a) the previous LSN of that transaction, (b) the
transaction ID, and (c) the type of log record.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-706
Database Recovery
The ARIES Recovery Algorithm
The Log and Log Sequence Number (LSN)
A log record stores:
1. Previous LSN of that transaction: It links the log record of
each transaction. It is like a back pointer points to the previous
record of the same transaction.
2. Transaction ID
3. Type of log record.

For a write operation the following additional information is logged:


4. Page ID for the page that includes the item
5. Length of the updated item
6. Its offset from the beginning of the page
7. BFIM of the item
8. AFIM of the item

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-707
Database Recovery
The ARIES Recovery Algorithm
The Transaction table and the Dirty Page table

For efficient recovery following tables are also stored in the log during
checkpointing:

Transaction table: Contains an entry for each active transaction, with


information such as transaction ID, transaction status and the LSN of
the most recent log record for the transaction.

Dirty Page table: Contains an entry for each dirty page in the buffer,
which includes the page ID and the LSN corresponding to the earliest
update to that page.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-708
Database Recovery
The ARIES Recovery Algorithm
Checkpointing
A checkpointing does the following:

1. Writes a begin_checkpoint record in the log


2. Writes an end_checkpoint record in the log. With this record the
contents of transaction table and dirty page table are appended to
the end of the log.
3. Writes the LSN of the begin_checkpoint record to a special file.
This special file is accessed during recovery to locate the last
checkpoint information.

To reduce the cost of checkpointing and allow the system to


continue to execute transactions, ARIES uses “fuzzy
checkpointing”.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-709
Database Recovery
The ARIES Recovery Algorithm
The following steps are performed for recovery
1. Analysis phase: Start at the begin_checkpoint record and
proceed to the end_checkpoint record. Access transaction table
and dirty page table are appended to the end of the log. Note
that during this phase some other log records may be written to
the log and transaction table may be modified. The analysis
phase compiles the set of redo and undo to be performed and
ends.
2. Redo phase: Starts from the point in the log up to where all dirty
pages have been flushed, and move forward to the end of the
log. Any change that appears in the dirty page table is redone.
3. Undo phase: Starts from the end of the log and proceeds
backward while performing appropriate undo. For each undo it
writes a compensating record in the log.
The recovery completes at the end of undo phase.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 19-710
Database Recovery
An example of the working of ARIES scheme
LSN LAST-LSN TRAN-ID TYPE PAGE-ID Other Info.
1 0 T1 update C -----
2 0 T2 update B -----
3 1 T1 commit -----
(a) 4 begin checkpoint
5 end checkpoint
6 0 T3 update A -----
7 2 T2 update C -----
8 7 T2 commit -----

TRANSACTION TABLE DIRTY PAGE TABLE


TRANSACTION ID LAST LSN STATUS PAGE ID LSN
(b) T1 3 commit C 1
T2 2 in progress B 2

TRANSACTION TABLE DIRTY PAGE TABLE


TRANSACTION ID LAST LSN STATUS PAGE ID LSN
T1 3 commit C 1
(c) T2 8 commit B 2
T3 6 in progress A 6

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-711
Database Recovery
10 Recovery in multidatabase system

A multidatabase system is a special distributed database system


where one node may be running relational database system under
Unix, another may be running object-oriented system under
window and so on. A transaction may run in a distributed fashion
at multiple nodes. In this execution scenario the transaction
commits only when all these multiple nodes agree to commit
individually the part of the transaction they were executing. This
commit scheme is referred to as “two-phase commit” (2PC). If any
one of these nodes fails or cannot commit the part of the
transaction, then the transaction is aborted. Each node recovers the
transaction under its own recovery protocol.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 19-712
Copyright © 2004 Pearson Education, Inc.
Chapter 21

Object Database
Standards, Languages,
and Design

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter 21Outline
21.1 Overview of the Object Model ODMG
21.2 The Object Definition Language DDL
21.3 The Object Query Language OQL
21.4 Overview of C++ Binding
21.5 Object Database Conceptual Model
21.6 Summary

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-715
Chapter Objectives
 Discuss the importance of standards (e.g.,
portability, interoperability)
 Introduce Object Data Management Group
(ODMG): object model, object definition language
(ODL), object query language (OQL)
 Present ODMG object binding to programming
languages (e.g., C++)
 Present Object Database Conceptual Design

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-716
21.1 The Object Model of ODMG

 Provides a standard model for object


databases
 Supports object definition via ODL
 Supports object querying via OQL
 Supports a variety of data types and type
constructors

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-717
ODMG Objects and Literals
 The basic building blocks of the object model
are
– Objects
– Literlas
 An object has four characteristics
1. Identifier: unique system-wide identifier
2. Name: unique within a particular database and/or
program; it is optional
3. Lifetime: persistent vs transient
4. Structure: specifies how object is constructed by the
type constructor and whether it is an atomic object

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-718
ODMG Literals
 A literal has a current value but not an identifier
 Three types of literals
1. atomic: predefined; basic data type values (e.g.,
short, float, boolean, char)
2. structured: values that are constructed by type
constructors (e.g., date, struct variables)
3. collection: a collection (e.g., array) of values or
objects

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-719
ODMG Interface Definition:
An Example
 Note: interface is ODMG’s keyword for class/type

interface Date:Object {
enum weekday{sun,mon,tue,wed,thu,fri,sat};
enum Month{jan,feb,mar,…,dec};
unsigned short year();
unsigned short month();
unsigned short day();

boolean is_equal(in Date other_date);
};

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-720
Built-in Interfaces for
Collection Objects
 A collection object inherits the basic
collection interface, for example:
– cardinality()
– is_empty()
– insert_element()
– remove_element()
– contains_element()
– create_iterator()
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-721
Collection Types
 Collection objects are further specialized into
types like a set, list, bag, array, and dictionary
 Each collection type may provide additional
interfaces, for example, a set provides:
– create_union()
– create_difference()
– is_subst_of(
– is_superset_of()
– is_proper_subset_of()
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-722
Object Inheritance Hierarchy

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-723
Atomic Objects
 Atomic objects are user-defined objects and
are defined via keyword class
 An example:
class Employee (extent all_emplyees key ssn) {
attribute string name;
attribute string ssn;
attribute short age;
relationship Dept works_for;
void reassign(in string new_name);
}

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-724
Class Extents
 An ODMG object can have an extent defined
via a class declaration
 Each extent is given a name and will contain all
persistent objects of that class
 For Employee class, for example, the extent is
called all_employees
 This is similar to creating an object of type
Set<Employee> and making it persistent

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-725
Class Key
 A class key consists of one or more
unique attributes
 For the Employee class, the key is ssn
Thus each employee is expected to have a
unique ssn
 Keys can be composite, e.g.,
(key dnumber, dname)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-726
Object Factory
 An object factory is used to generate individual
objects via its operations
 An example:
interface ObjectFactory {
Object new ();
};
 new() returns new objects with an object_id

 One can create their own factory interface by


inheriting the above interface

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-727
Interface and Class Definition
 ODMG supports two concepts for
specifying object types:
 Interface
 Class
 There are similarities and differences
between interfaces and classes
 Both have behaviors (operations) and state
(attributes and relationships)
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-728
ODMG Interface
 An interface is a specification of the
abstract behavior of an object type
 State properties of an interface (i.e., its
attributes and relationships) cannot be
inherited from
 Objects cannot be instantiated from an
interface

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-729
ODMG Class
 A class is a specification of abstract behavior
and state of an object type
 A class is Instantiable
 Supports “extends” inheritance to allow both
state and behavior inheritance among classes
 Multiple inheritance via “extends” is not
allowed

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-730
21.2 Object Definition Language

 ODL supports semantics constructs of


ODMG
 ODL is ndependent of any programming
language
 ODL is used to create object specification
(classes and interfaces)
 ODL is not used for database manipulation

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-731
ODL Examples (1)
A Very Simple Class
 A very simple, straightforward class definition
(all examples are based on the university schema presented in Chapter 4
and graphically shown on page 680):

class Degree {
attribute string college;
attribute string degree;
attribute string year;
};

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-732
ODL Examples (2)
A Class With Key and Extent
 A class definition with “extent”, “key”, and more
elaborate attributes; still relatively straightforward

class Person (extent persons key ssn) {


attribute struct Pname {string fname …} name;
attribute string ssn;
attribute date birthdate;

short age();
}

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-733
ODL Examples (3)
A Class With Relationships
 Note extends (inheritance) relationship
 Also note “inverse” relationship

Class Faculty extends Person (extent faculty) {


attribute string rank;
attribute float salary;
attribute string phone;

relationship Dept works_in inverse Dept::has_faculty;
relationship set<GradStu> advises inverse GradStu::advisor;
void give_raise (in float raise);
void promote (in string new_rank);
};

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-734
Inheritance via “:” – An Example

interface Shape {
attribute struct point {…} reference_point;
float perimeter ();

};

class Triangle: Shape (extent triangles) {


attribute short side_1;
attribute short side_2;

};

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-735
21.3 Object Query Language
 OQL is DMG’s query language
 OQL works closely with programming
languages such as C++
 Embedded OQL statements return objects that
are compatible with the type system of the
host language
 OQL’s syntax is similar to SQL with
additional features for objects
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-736
Simple OQL Queries
 Basic syntax: select…from…where…
SELECT d.name
FROM d in departments
WHERE d.college = ‘Engineering’;
 An entry point to the database is needed for
each query
 An extent name (e.g., departments in
the above example) may serve as an entry
point
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-737
Iterator Variables
 Iterator variables are defined whenever a
collection is referenced in an OQL query
 Iterator d in the previous example serves as an
iterator and ranges over each object in the
collection
 Syntactical options for specifying an iterator:
– d in departments
– departments d
– departments as d

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-738
Data Type of Query Results
 The data type of a query result can be any
type defined in the ODMG model
 A query does not have to follow the
select…from…where… format
 A persistent name on its own can serve as a
query whose result is a reference to the
persistent object, e.g., departments; whose
type is set<Departments>
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-739
Path Expressions
 A path expression is used to specify a path
to attributes and objects in an entry point
 A path expression starts at a persistent
object name (or its iterator variable)
 The name will be followed by zero or more
dot connected relationship or attribute
names, e.g., departments.chair;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-740
Views as Named Objects
 The define keyword in OQL is used to
specify an identifier for a named query
 The name should be unique; if not, the
results will replace an existing named query
 Once a query definition is created, it will
persist until deleted or redefined
 A view definition can include parameters

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-741
An Example of OQL View
 A view to include students in a department
who have a minor:

define has_minor(dept_name) as
select s
from s in students
where s.minor_in.dname=dept_name
 has_minor can now be used in queries
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-742
Single Elements from Collections

 An OQL query returns a collection


 OQL’s element operator can be used to
return a single element from a singleton
collection that contains one element:
element (select d from d in departments)
where d.dname = ‘Software Engineering’);

 If d is empty or has more than one elements,


an exception is raised
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-743
Collection Operators
 OQL supports a number of aggregate
operators that can be applied to query results
 The aggregate operators include min,
max, count, sum, and avg and operate
over a collection
 count returns an integer; others return the
same type as the collection type

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-744
An Example of an OQL
Aggregate Operator
 To compute the average GPA of all seniors
majoring in Business:

avg (select s.gpa from s in students


where s.class = ‘senior’ and
s.majors_in.dname =‘Business’);

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-745
Membership and Quantification

 OQL provides membership and


quantification operators:
– (e in c) is true if e is in the collection c
– (for all e in c: b) is true if all e
elements of collection c satisfy b
– (exists e in c: b) is true if at least one
e in collection c satisfies b

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-746
An Example of Membership
 To retrieve the names of all students who
completed CS101:

select s.name.fname s.name.lname


from s in students
where ‘CS101’ in
(select c.name from c in
s.completed_sections.section.of_course);

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-747
Ordered Collections
 Collections that are lists or arrays allow
retrieving their first, last, and ith
elements
 OQL provides additional operators for
extracting a sub-collection and concatenating
two lists
 OQL also provides operators for ordering the
results
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-748
An Example of Ordered Operation

 To retrieve the last name of the faculty


member who earns the highest salary:

first (select struct


(faculty: f.name.lastname,salary
f.salary)
from f in faculty
ordered by f.salary desc);

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-749
Grouping Operator
 OQL also supports a grouping operator
called group by
 To retrieve average GPA of majors in each
department having >100 majors:
select deptname, avg_gpa:
avg (select p.s.gpa from p in partition)
from s in students
group by deptname: s.majors_in.dname
having count (partition) > 100

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-750
4. C++ Language Binding
 C++ language binding specifies how ODL
constructs are mapped to C++ statements
and include:
– a C++ class library
– a Data Manipulation Language (ODL/OML)
– a set of constructs called physical pragmas (to
allow programmers some control over the
physical storage concerns)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-751
Class Library
 The class library added to C++ for the ODMG
standards uses the prefix _d for class
declarations
 d_Ref<T> is defined for each database
class T
 To utilize ODMG’s collection types, various
templates are defined, e.g., d_Object<T>
specifies the operations to be inherited by all
objects
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-752
Template Classes
 A template class is provided for each type of
ODMG collections:
– d_Set<T>
– d_List<T>
– d_Bag<t>
– d_Varray<t>
– d_Dictionary<T>
 Thus a programmer can declare:
d_Set<d_Ref<Student>>
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-753
Data Types of Attributes
 The data types of ODMG database
attributes are also available to the C++
programmers via the _d prefix, e.g.,
d_Short, d_Long, d_Float
 Certain structured literals are also available,
e.g., d_Date, d_Time, d_Intreval

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-754
Specifying Relationships
 To specify relationships, the prefix Rel_ is
used within the prefix of type names, e.g.,
d_Rel_Ref<Dept, has_majors> majors_in;

 The C++ binding also allows the creation of


extents via using the library class
d_Extent:

d_Extent<Person> All_Persons(CS101)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-755
21.5 Object Database
Conceptual Design
 Object Database (ODB) vs Relational
Database (RDB)
– Relationships are handled differently
– Inheritance is handled differently
– Operations in OBD are expressed early on since
they are a part of the class specificaiton

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-756
Relationships: ODB vs RDB (1)

 Relationships in ODB:
– relationships are handled by reference attributes
that include OIDs of related objects
– single and collection of references are allowed
– references for binary relationships can be
expressed in single direction or both directions
via inverse operator

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-757
Relationships: ODB vs RDB (2)

 Relationships in RDB:
– Relationships among tuples are specified by
attributes with matching values (via foreign
keys)
– Foreign keys are single-valued
– M:N relationships must be presented via a
separate relation (table)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-758
Inheritance Relationship
in ODB vs RDB
 Inheritance structures are built in ODB (and
achieved via “:” and extends operators)
 RDB has no built-in support for inheritance
relationships; there are several options for
mapping inheritance relationships in an
RDB (see Chapter 7)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-759
Early Specification of Operations

 Another major difference between ODB


and RDB is the specification of operations
– ODB: operations specified during design (as
part of class specification)
– RDB: may be delayed until implementation

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-760
Mapping EER Schemas
to ODB Schemas
 Mapping EER schemas into ODB schemas
is relatively simple especially since ODB
schemas provide support for inheritance
relationships
 Once mapping has been completed,
operations must be added to ODB schemas
since EER schemas do not include an
specification of operations
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 21-761
Mapping EER to ODB Schemas
Step 1
 Create an ODL class for each EER entity
type or subclass
– Multi-valued attributes are declared by sets,
bags or lists constructors
– Composite attributes are mapped into tuple
constructors

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-762
Mapping EER to ODB Schemas
Step 2
 Add relationship properties or reference
attributes for each binary relationship into
the ODL classes participating in the
relationship
– Relationship cardinality: single-valued for 1:1
and N:1 directions; set-valued for 1:N and M:N
directions
– Relationship attributes: create via tuple
constructors

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-763
Mapping EER to ODB Schemas
Step 3
 Add appropriate operations for each class
– Operations are not available from the EER
schemas; original requirements must be
reviewed
– Corresponding constructor and destructor
operations must also be added

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-764
Mapping EER to ODB Schemas
Step 4
 Specify inheritance relationships via
extends clause
– An ODL class that corresponds to a sub-class in
the EER schema inherits the types and methods
of its super-class in the ODL schemas
– Other attributes of a sub-class are added by
following Steps 1-3

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-765
Mapping EER to ODB Schemas
Step 5
 Map weak entity types in the same way as
regular entities
– Weak entities that do not participate in any
relationships may alternatively be presented as
composite multi-valued attribute of the owner
entity type

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-766
Mapping EER to ODB Schemas
Step 6
 Map categories (union types) to ODL
– The process is not straightforward
– May follow the same mapping used for EER-
to-relational mapping:
Declare a class to represent the category
Define 1:1 relationships between the category and
each of its super-classes

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-767
Mapping EER to ODB Schemas
Step 7
 Map n-ary relationships whose degree is
greater than 2
– Each relationship is mapped into a separate
class with appropriate reference to each
participating class

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-768
21.6 Summary
 Proposed standards for object databases presented
 Various constructs and built-in types of the
ODMG model presented
 ODL and OQL languages were presented
 An overview of the C++ language binding was
given
 Conceptual design of object-oriented database
discussed

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 21-769
Copyright © 2004 Pearson Education, Inc.
Chapter 22
Object-Relational and
Extended-Relational
Systems

Copyright © 2004 Pearson Education, Inc.


Overview of SQL and Its
Object-Relational Features
 The SQL Standard and Its Components
 Object-Relational Support in SQL-99
 Some New Operations and Features in SQL

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 22-772
Evolution and Current Trends
of Database Technology

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 22-773
The Informix Universal Server
 Extensible Data Types
 Support for User-Defined Routines
 Support for Inheritance
 Support for Indexing Extensions
 Support for External Data Sources
 Support for Data Blades Application
Programming Interface
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 22-774
Object-Relational Features of
Oracle 8
 Some Examples of Object-Relational
Features of Oracle
 Managing Large Objects and Other Storage
Features

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 22-775
Implementation and Related
Issues for Extended Type
Systems

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 22-776
The Nested Relational Model

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 22-777
Summary

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 22-778
Copyright © 2004 Pearson Education, Inc.
Chapter 23
Database Security
and
Authorization

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline
1 Database Security and Authorization
1.1 Introduction to Database Security Issues
1.2 Types of Security
1.3 Database Security and DBA
1.4 Access Protection, User Accounts, and Database Audits

2 Discretionary Access Control Based on Granting Revoking Privileges


2.1 Types of Discretionary Privileges
2.2 Specifying Privileges Using Views
2.3 Revoking Privileges
2.4 Propagation of Privileges Using the GRANT OPTION
2.5 Specifying Limits on Propagation of Privileges

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-781
Chapter Outline(contd.)
3 Mandatory Access Control and Role-Based Access
Control for Multilevel Security
3.1 Comparing Discretionary Access Control
and Mandatory Access Control
3.2 Role-Based Access Control
3.3 Access Control Policies for E-Commerce and
the Web

4 Introduction to Statistical Database Security

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-782
Chapter Outline(contd.)
5 Introduction to Flow Control
5.1 Covert Channels
6 Encryption and Public Key Infrastructures
6.1The Data and Advanced Encryption Standards
6.2 Public Key Encryption
6.3 Digital Signatures

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-783
1 Introduction to Database Security
Issues

 Types of Security
– Legal and ethical issues
– Policy issues
– System-related issues
– The need to identify multiple security levels 

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-784
Introduction to Database Security
Issues (2)

Threats to databases
- Loss of integrity
- Loss of availability
- Loss of confidentiality

To protect databases against these types of threats


four kinds of countermeasures can be
implemented : access control, inference control,
flow control, and encryption.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-785
Introduction to Database
Security Issues (3)

A DBMS typically includes a database security and


authorization subsystem that is responsible for
ensuring the security portions of a database
against unauthorized access.

Two types of database security mechanisms:


 Discretionary security mechanisms
  Mandatory security mechanisms

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-786
Introduction to Database
Security Issues (4)

The security mechanism of a DBMS must


include provisions for restricting access to
the database as a whole; this function is
called access control and is handled by
creating user accounts and passwords to
control login process by the DBMS.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-787
Introduction to Database
Security Issues (5)
The security problem associated with databases is
that of controlling the access to a statistical
database, which is used to provide statistical
information or summaries of values based on
various criteria.

The countermeasures to statistical database


security problem is called inference control
measures.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 23-788
Introduction to Database
Security Issues (6)
Another security is that of flow control, which
prevents information from flowing in such a way
that it reaches unauthorized users.

Channels that are pathways for information to flow


implicitly in ways that violate the security policy
of an organization are called covert channels.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-789
Introduction to Database
Security Issues (7)
A final security issue is data encryption, which is
used to protect sensitive data (such as credit card
numbers) that is being transmitted via some type
communication network.
The data is encoded using some coding algorithm.
An unauthorized user who access encoded data
will have difficulty deciphering it, but authorized
users are given decoding or decrypting algorithms
(or keys) to decipher data.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 23-790
1.2 Database Security and the DBA

The database administrator (DBA) is the central


authority for managing a database system. The
DBA’s responsibilities include granting privileges
to users who need to use the system and
classifying users and data in accordance with the
policy of the organization. The DBA has a DBA
account in the DBMS, sometimes called a system
or superuser account, which provides powerful
capabilities :

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-791
1.2 Database Security and the DBA

1. Account creation
2. Privilege granting
3. Privilege revocation
4. Security level assignment

The DBA is responsible for the overall security of


the database system.
Action 1 is access control, whereas 2 and 3 are
discretionary and 4 is used to control mandatory
authorization.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 23-792
1.3 Access Protection, User Accounts,
and Database Audits
Whenever a person or group of person s need to access a
database system, the individual or group must first apply
for a user account. The DBA will then create a new
account number and password for the user if there is a
legitimate need to access the database.

The user must log in to the DBMS by entering account


number and password whenever database access is needed.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-793
1.3 Access Protection, User Accounts,
and Database Audits(2)
The database system must also keep track of all operations on the database
that are applied by a certain user throughout each login session.

To keep a record of all updates applied to the database and of the particular
user who applied each update, we can modify system log, which
includes an entry for each operation applied to the database that may be
required for recovery from a transaction failure or system crash.
If any tampering with the database is suspected, a database audit is
performed, which consists of reviewing the log to examine all accesses
and operations applied to the database during a certain time period.
A database log that is used mainly for security purposes is sometimes
called an audit trail.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-794
Discretionary Access Control
Based on Granting and
Revoking Privileges
The typical method of enforcing discretionary access
control in a database system is based on the granting and
revoking privileges.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-795
2.1Types of Discretionary Privileges

 The account level: At this level, the DBA specifies the


particular privileges that each account holds independently
of the relations in the database.
 The relation (or table level): At this level, the DBA can
control the privilege to access each individual relation or
view in the database.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-796
2.1Types of Discretionary Privileges(2)

The privileges at the account level apply to the capabilities


provided to the account itself and can include the
CREATE SCHEMA or CREATE TABLE privilege, to
create a schema or base relation; the CREATE VIEW
privilege; the ALTER privilege, to apply schema changes
such adding or removing attributes from relations; the
DROP privilege, to delete relations or views; the MODIFY
privilege, to insert, delete, or update tuples; and the
SELECT privilege, to retrieve information from the
database by using a SELECT query.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-797
2.1Types of Discretionary Privileges(3)

The second level of privileges applies to the relation level,


whether they are base relations or virtual (view) relations.

The granting and revoking of privileges generally follow an


authorization model for discretionary privileges known as
the access matrix model, where the rows of a matrix M
represents subjects (users, accounts, programs) and the
columns represent objects (relations, records, columns,
views, operations). Each position M(i,j) in the matrix
represents the types of privileges (read, write, update) that
subject i holds on object j.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-798
2.1Types of Discretionary Privileges(4)

To control the granting and revoking of relation privileges,


each relation R in a database is assigned and owner
account, which is typically the account that was used
when the relation was created in the first place. The owner
of a relation is given all privileges on that relation. In
SQL2, the DBA can assign and owner to a whole schema
by creating the schema and associating the appropriate
authorization identifier with that schema, using the
CREATE SCHEMA command. The owner account holder
can pass privileges on any of the owned relation to other
users by granting privileges to their accounts.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 23-799
2.1Types of Discretionary Privileges(5)

In SQL the following types of privileges can be granted on


each individual relation R:
 SELECT (retrieval or read) privilege on R: Gives the
account retrieval privilege. In SQL this gives the account
the privilege to use the SELECT statement to retrieve
tuples from R.
 MODIFY privileges on R: This gives the account the
capability to modify tuples of R. In SQL this privilege is
further divided into UPDATE, DELETE, and INSERT
privileges to apply the corresponding SQL command to R.
In addition, both the INSERT and UPDATE privileges can
specify that only certain attributes can be updated by the
account.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-800
2.1Types of Discretionary Privileges(6)

 REFERENCES privilege on R: This gives the account the


capability to reference relation R when specifying integrity
constraints. The privilege can also be restricted to specific
attributes of R.

Notice that to create a view, the account must have SELECT


privilege on all relations involved in the view definition.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-801
2.2 Specifying Privileges Using Views

The mechanism of views is an important discretionary


authorization mechanism in its own right.

For example, if the owner A of a relation R wants another


account B to be able to retrieve only some fields of R, then A
can create a view V of R that includes only those attributes
and then grant SELECT on V to B. The same applies to
limiting B to retrieving only certain tuples of R; a view V’
can be created by defining the view by means of a query that
selects only those tuples from R that A wants to allow B to
access.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-802
2.3 Revoking Privileges

In some cases it is desirable to grant a privilege to a user


temporarily.

For example, the owner of a relation may want to grant the


SELECT privilege to a user for a specific task and then
revoke that privilege once the task is completed. Hence, a
mechanism for revoking privileges is needed. In SQL, a
REVOKE command is included for the purpose of
canceling privileges.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-803
2.4 Propagation of Privileges using the
GRANT OPTION
Whenever the owner A of a relation R grants a privilege on R
to another account B, privilege can be given to B with or
without the GRANT OPTION. If the GRANT OPTION is
given, this means that B can also grant that privilege on R
to other accounts. Suppose that B is given the GRANT
OPTION by A and that B then grants the privilege on R to
a third account C, also with GRANT OPTION. In this
way, privileges on R can propagate to other accounts
without the knowledge of the owner of R. If the owner
account A now revokes the privilege granted to B, all the
privileges that B propagated based on that privilege should
automatically be revoked by the system.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-804
2.5 An Example

Suppose that the DBA creates four accounts --A1, A2, A3, and A4-- and
wants only A1 to be able to create base relations; then the DBA must
issue the following GRANT command in SQL:

GRANT CREATETAB TO A1;

In SQL2 the same effect can be accomplished by having the DBA issue a
CREATE SCHEMA command as follows:

CREATE SCHAMA EXAMPLE AUTHORIZATION A1;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-805
2.5 An Example(2)

User account A1 can create tables under the schema called EXAMPLE.

Suppose that A1 creates the two base relations EMPLOYEE and


DEPARTMENT; A1 is then owner of these two relations and hence
all the relation privileges on each of them.

Suppose that A1 wants to grant A2 the privilege to insert and delete tuples
in both of these relations, but A1 does not want A2 to be able to
propagate these privileges to additional accounts:

GRANT INSERT, DELETE ON EMPLOYEE, DEPARTMENT TO A2;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-806
2.5 An Example(3)

EMPLOYEE
NAME SSN BDATE ADDRESS SEX SALARY DNO

DEPARTMENT
DNUMBER DNAME MGRSSN

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 23-807
Copyright © 2004 Pearson Education, Inc.
2.5 An Example(4)

Suppose that A1 wants to allow A3 to retrieve information from either of


the two tables and also to be able to propagate the SELECT privilege
to other accounts.
A1 can issue the command:

GRANT SELECT ON EMPLOYEE, DEPARTMENT TO A3 WITH


GRANT OPTION;

A3 can grant the SELECT privilege on the EMPLOYEE relation to A4 by


issuing:
GRANT SELECT ON EMPLOYEE TO A4;
(Notice that A4 can not propagate the SELECT privilege because
GRANT OPTION was not given to A4.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-808
2.5 An Example(5)

Suppose that A1 decides to revoke the SELECT privilege on


the EMPLOYEE relation from A3; A1 can issue:
REVOKE SELECT ON EMPLOYEE FROM A3;
(The DBMS must now automatically revoke the SELECT
privilege on EMPLOYEE from A4, too, because A3
granted that privilege to A4 and A3 does not have the
privilege any more.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-809
2.5 An Example(6)

Suppose that A1 wants to give back to A3 a limited capability to SELECT from


the EMPLOYEE relation and wants to allow A3 to be able to propagate the
privilege. The limitation is to retrieve only the NAME, BDATE, and
ADDRESS attributes and only for the tuples with DNO=5.
A1 then create the view:
CREATE VIEW A3EMPLOYEE AS
SELECT NAME, BDATE, ADDRESS
FROM EMPLOYEE
WHERE DNO = 5;
After the view is created, A1 can grant SELECT on the view
A3EMPLOYEE to A3 as follows:
GRANT SELECT ON A3EMPLOYEE TO A3 WITH GRANT OPTION;

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-810
2.5 An Example(7)

Finally, suppose that A1 wants to allow A4 to update only the


SALARY attribute of EMPLOYEE;
A1 can issue:

GRANT UPDATE ON EMPLOYEE (SALARY) TO A4;

(The UPDATE or INSERT privilege can specify particular


attributes that may be updated or inserted in a relation.
Other privileges (SELECT, DELETE) are not attribute
specific.)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-811
2.6 Specifying Limits on Propagation
of Privileges
Techniques to limit the propagation of privileges have been
developed, although they have not yet been implemented
in most DBMSs and are not a part of SQL.

Limiting horizontal propagation to an integer number i


means that an account B given the GRANT OPTION can
grant the privilege to at most i other accounts.

Vertical propagation is more complicated; it limits the depth


of the granting of privileges.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-812
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security

The discretionary access control techniques of granting and


revoking privileges on relations has traditionally been the
main security mechanism for relational database systems.
This is an all-or-nothing method: A user either has or does
not have a certain privilege.
In many applications, and additional security policy is needed
that classifies data and users based on security classes.
This approach as mandatory access control, would
typically be combined with the discretionary access control
mechanisms.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-813
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security(2)

Typical security classes are top secret (TS), secret (S),


confidential (C), and unclassified (U), where TS is the
highest level and U the lowest: TS ≥ S ≥ C ≥ U

The commonly used model for multilevel security, known as


the Bell-LaPadula model, classifies each subject (user,
account, program) and object (relation, tuple, column,
view, operation) into one of the security classifications, T,
S, C, or U: clearance (classification) of a subject S as
class(S) and to the classification of an object O as
class(O).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-814
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security(3)

Two restrictions are enforced on data access based on the


subject/object classifications:
1. A subject S is not allowed read access to an object O
unless class(S) ≥ class(O). This is known as the simple
security property.
2. A subject S is not allowed to write an object O unless
class(S) ≤ class(O). This known as the star property (or
* property).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-815
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security(4)

To incorporate multilevel security notions into the relational


database model, it is common to consider attribute
values and tuples as data objects. Hence, each attribute
A is associated with a classification attribute C in the
schema, and each attribute value in a tuple is associated
with a corresponding security classification. In addition,
in some models, a tuple classification attribute TC is
added to the relation attributes to provide a classification
for each tuple as a whole. Hence, a multilevel relation
schema R with n attributes would be represented as
R(A1,C1,A2,C2, …, An,Cn,TC)
where each Ci represents the classification attribute
associated with attribute Ai.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-816
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security(5)

The value of the TC attribute in each tuple t – which is the


highest of all attribute classification values within t –
provides a general classification for the tuple itself,
whereas each Ci provides a finer security classification
for each attribute value within the tuple.

The apparent key of a multilevel relation is the set of


attributes that would have formed the primary key in a
regular(single-level) relation.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-817
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security(6)

A multilevel relation will appear to contain different data to subjects


(users) with different clearance levels. In some cases, it is
possible to store a single tuple in the relation at a higher
classification level and produce the corresponding tuples at a
lower-level classification through a process known as filtering.
In other cases, it is necessary to store two or more tuples at different
classification levels with the same value for the apparent key.
This leads to the concept of polyinstantiation where several
tuples can have the same apparent key value but have different
attribute values for users at different classification levels.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-818
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security(7)

In general, the entity integrity rule for multilevel relations


states that all attributes that are members of the apparent
key must not be null and must have the same security
classification within each individual tuple.
In addition, all other attribute values in the tuple must have a
security classification greater than or equal to that of the
apparent key. This constraint ensures that a user can see
the key if the user is permitted to see any part of the
tuple at all.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-819
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security(8)

Other integrity rules, called null integrity and


interinstance integrity, informally ensure
that if a tuple value at some security level
can be filtered (derived) from a higher-
classified tuple, then it is sufficient to store
the higher-classified tuple in the multilevel
relation.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-820
3.1 Comparing Discretionary Access Control
and Mandatory Access Control

 Discretionary Access Control (DAC) policies are


characterized by a high degree of flexibility, which
makes them suitable for a large variety of application
domains.
 The main drawback of DAC models is their
vulnerability to malicious attacks, such as Trojan horses
embedded in application programs.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-821
3.1 Comparing Discretionary Access Control
and Mandatory Access Control(2)

 By contrast, mandatory policies ensure a high degree of


protection in a way, they prevent any illegal flow of
information.
 Mandatory policies have the drawback of being too rigid
and they are only applicable in limited environments.
 In many practical situations, discretionary policies are
preferred because they offer a better trade-off between
security and applicability.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-822
3.2 Role-Based Access Control

Role-based access control (RBAC) emerged rapidly in the


1990s as a proven technology for managing and
enforcing security in large-scale enterprisewide systems.
Its basic notion is that permissions are associated with
roles, and users are assigned to appropriate roles. Roles
can be created using the CREATE ROLE and
DESTROY ROLE commands. The GRANT and
REVOKE commands discussed under DAC can then be
used to assign and revoke privileges from roles.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-823
3.2 Role-Based Access Control(2)

 RBAC appears to be a viable alternative to traditional


discretionary and mandatory access controls; it ensures
that only authorized users are given access to certain data
or resources.
 Many DBMSs have allowed the concept of roles, where
privileges can be assigned to roles.
 Role hierarchy in RBAC is a natural way of organizing
roles to reflect the organization’s lines of authority and
responsibility.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-824
3.2 Role-Based Access Control(3)

 Another important consideration in RBAC systems is the


possible temporal constraints that may exist on roles, such
as time and duration of role activations, and timed
triggering of a role by an activation of another role.
 Using an RBAC model is highly desirable goal for
addressing the key security requirements of Web-based
applications.
In contrast, discretionary access control (DAC) and
mandatory access control (MAC) models lack capabilities
needed to support the security requirements emerging
enterprises and Web-based applications.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-825
3.3 Access Control Policies for
E-Commerce and the Web
 E-Commerce environments require elaborate policies that
go beyond traditional DBMSs.
– In an e-commerce environment the resources to be
protected are not only traditional data but also
knowledge and experience.
– The access control mechanism should be flexible
enough to support a wide spectrum of heterogeneous
protection objects.
 A related requirement is the support for content-based
access-control.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-826
3.3 Access Control Policies for
E-Commerce and the Web(2)
 Another requirement is related to the heterogeneity of
subjects, which requires access control policies based on
user characteristics and qualifications.
– A possible solution, to better take into account user profiles in the
formulation of access control policies, is to support the notion of
credentials. A credential is a set of properties concerning a user
that are relevant for security purposes (for example, age, position
within an organization).
– It is believed that the XML language can play a key role in access
control for e-commerce applications.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-827
4 Introduction to Statistical
Database Security
 Statistical databases are used mainly to produce statistics
on various populations.
 The database may contain confidential data on individuals,
which should be protected from user access.
 Users are permitted to retrieve statistical information on
the populations, such as averages, sums, counts,
maximums, minimums, and standard deviations.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-828
4 Introduction to Statistical
Database Security(2)
A population is a set of tuples of a relation (table) that satisfy
some selection condition.
 Statistical queries involve applying statistical functions to
a population of tuples.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-829
4 Introduction to Statistical
Database Security(3)
For example, we may want to retrieve the number of
individuals in a population or the average income in the
population. However, statistical users are not allowed to
retrieve individual data, such as the income of a specific
person. Statistical database security techniques must
prohibit the retrieval of individual data.
This can be achieved by prohibiting queries that retrieve
attribute values and by allowing only queries that involve
statistical aggregate functions such as COUNT, SUM,
MIN, MAX, AVERAGE, and STANDARD DEVIATION.
Such queries are sometimes called statistical queries.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-830
4 Introduction to Statistical
Database Security(4)
 It is DBMS’s responsibility to ensure confidentiality of
information about individuals, while still providing useful
statistical summaries of data about those individuals to
users.
 Provision of privacy protection of users in a statistical
database is paramount.
 In some cases it is possible to infer the values of
individual tuples from a sequence statistical queries. This
is particularly true when the conditions result in a
population consisting of a small number of tuples.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-831
5 Introduction to Flow Control
 Flow control regulates the distribution or flow of
information among accessible objects. A flow between
object X and object Y occurs when a program reads values
from X and writes values into Y.
 Flow controls check that information contained in some
objects does not flow explicitly or implicitly into less
protected objects.
 A flow policy specifies the channels along which
information is allowed to move. The simplest flow policy
specifies just two classes of information: confidential (C)
and nonconfidential (N), and allows all flows except those
from class C to class N.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-832
5.1 Covert Channels

A covert channel allows a transfer of information that


violates the security or the policy.

 A covert channel allows information to pass from a


higher classification level to a lower classification level
through improper means.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-833
5.1 Covert Channels(2)

 Covert channels can be classified into two broad


categories:
– storage channels do not require any temporal synchronization, in
that information is conveyed by accessing system information or
what is otherwise inaccessible to the user.
– in a timing channel the information is conveyed by the timing of
events or processes.
Some security experts believe that one way to avoid covert channels
is for programmers to not actually gain access to sensitive data that
a program is supposed to process after the program has been put
into operation.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-834
6 Encryption and Public Key
Infrastructures
 Encryption is a means of maintaining secure data in an
insecure environment.
 Encryption consists of applying an encryption algorithm
to data using some prespecified encryption key.
– the resulting data has to be decrypted using a decryption key to
recover the original data.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-835
6.1 The Data and Advanced Encryption
Standards
 The Data Encryption Standard (DES) is a system
developed by the U.S. government for use by the general
public. It has been widely accepted as a cryptographic
standard both in the United States and abroad.
 DES can provide end-to-end encryption on the channel
between the sender A and receiver B.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-836
6.1 The Data and Advanced Encryption
Standards(2)
 DES algorithm is a careful and complex combination of
two of the fundamental building blocks of encryption:
substitution and permutation (transposition).
– The algorithm derives its strength from repeated application of
these two techniques for a total of 16 cycles.
– Plaintext (the original form of the message) is encrypted as blocks
of 64 bits.
 After questioning the adequacy of DES, the National
Institute of Standards (NIST) introduced the Advanced
Encryption Standards (AES).
– This algorithm has a block size of 128 bits and thus takes longer
time to crack.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-837
6.2 Public Key Encryption

In 1976 Diffie and Hellman proposed a new kind of


cryptosystem, which they called public key encryption.

 Public key algorithms are based on mathematical functions


rather than operations on bit patterns.
– They also involve the use of two separate keys, in contrast to
conventional encryption, which uses only one key.
– The use of two keys can have profound consequences in the areas
of confidentiality, key distribution, and authentication.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-838
6.2 Public Key Encryption(2)

 The two keys used for public key encryption are referred
to as the public key and the private key.
– the private key is kept secret, but it is referred to as private key
rather than a secret key (the key used in conventional encryption)
to avoid confusion with conventional encryption.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-839
6.2 Public Key Encryption(3)

A public key encryption scheme, or infrastructure, has six


ingredients:
1. Plaintext : This is the data or readable message that is fed into the
algorithm as input.
2. Encryption algorithm : The encryption algorithm performs various
transformations on the plaintext.
3. and
4. Public and private keys : These are pair of keys that have been
selected so that if one is used for encryption, the other is used for
decryption. The exec transformations performed by the encryption
algorithm depend on the public or private key that is provided as
input.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-840
6.2 Public Key Encryption(4)

5. Ciphertext : This is the scrambled message produced as output. It


depends on the plaintext and the key. For a given message, two
different keys will produce two different ciphertexts.
6. Decryption algorithm : This algorithm accepts the ciphertext and
the matching key and produces the original plaintext.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-841
6.2 Public Key Encryption(5)

Public key is made for public and private key is known only
by owner.
A general-purpose public key cryptographic algorithm relies
on one key for encryption and a different but related one
for decryption. The essential steps are as follows:
1. Each user generates a pair of keys to be used for the encryption and
decryption of messages.
2. Each user places one of the two keys in a public register or other
accessible file. This is the public key. The companion key is kept
private.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-842
6.2 Public Key Encryption(6)

3. If a sender wishes to send a private message to a receiver, the


sender encrypts the message using the receiver’s public key.
4. When the receiver receives the message, he or she decrypts it using
the receiver’s private key. No other recipient can decrypt the
message because only the receiver knows his or her private key.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-843
6.2 Public Key Encryption(7)

The RSA Public Key Encryption algorithm, one of the first


public key schemes was introduced in 1978 by Ron
Rivest, Adi Shamir, and Len Adleman at MIT and is
named after them as the RSA scheme.

 The RSA encryption algorithm incorporates results from number


theory, combined with the difficulty of determining the prime
factors of a target.
 The RSA algorithm also operates with modular arithmetic – mod n.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-844
6.2 Public Key Encryption(8)
 Two keys, d and e, are used for decryption and encryption.
– An important property is that d and e can be interchanged.
– n is chosen as a large integer that is a product of two large distinct prime
numbers, a and b.
– The encryption key e is a randomly chosen number between 1 and n that
is relatively prime to (a-1) x (b-1).
– The plaintext block P is encrypted as Pe mod n.
– Because the exponentiation is performed mod n, factoring Pe to uncover
the encrypted plaintext is difficult.
– However, the decryption key d is carefully chosen so that
(Pe)d mod n = P.
– The decryption key d can be computed from the condition that
d x e= 1 mod ((a-1)x(b-1)).
– Thus, the legitimate receiver who knows d simply computes
(Pe)d mod n = P and recovers P without having to factor Pe .

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-845
6.3 Digital Signatures

A digital signature is an example of using encryption


techniques to provide authentication services in e-
commerce applications.
 A digital signature is a means of associating a mark
unique to an individual with a body of text.
– The mark should be unforgettable, meaning that others should be
able to check that the signature does come from the originator.
 A digital signature consists of a string of symbols.
– Signature must be different for each use. This can be achieved by
making each digital signature a function of the message that it is
signing, together with a time stamp.
– Public key techniques are the means creating digital signatures.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 23-846
Copyright © 2004 Pearson Education, Inc.
Chapter 24
Enhanced Data Models
for Advanced
Applications

Copyright © 2004 Pearson Education, Inc.


Active Database Concepts
and Triggers
 Generalized Model for Active Databases
and Oracle Triggers
 Design and Implementation Issues for
Active Databases
 Examples of Statement-Level Active Rules
in STARBURST
 Potential Applications for Active Databases
 Triggers in SQL-99
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 24-849
Temporal Database Concepts
 Time Representation, Calendars, and Time
Dimensions
 Incorporating Time in Relational Databases Using
Tuple Versioning
 Incorporating Time in Object-Oriented Databases
Using Attribute Versioning
 Temporal Querying Constructs and the TSQL2
Language
 Time Series Data

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 24-850
Multimedia Databases
 Introduction to Spatial Database Concepts
 Introduction to Multimedia Database
Concepts

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 24-851
Introduction to Deductive
Databases
 Overview of Deductive Databases
 Prolog/Datalog Notation
 Datalog Notation
 Clausal Form and Horn Clauses
 Interpretation of Rules
 Datalog Programs and Their Safety
 Use the Relational Operations
 Evaluation of Nonrecursive Datalog Queries
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 24-852
Summary

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 24-853
Copyright © 2004 Pearson Education, Inc.
Chapter 25
Distributed Databases
and Client–Server
Architectures

Copyright © 2004 Pearson Education, Inc.


Distributed Database
Concepts
 Parallel Versus Distributed Technology
 Advantages of Distributed Databases
 Additional Functions of Distributed
Databases

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 25-856
Data Fragmentation,
Replication, and Allocation
Techniques for Distributed
Database Design
 Data Fragmentation
 Data Replication and Allocation
 Example of Fragmentation, Allocation, and
Replication

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 25-857
Types of Distributed Database
Systems

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 25-858
Query Processing in
Distributed Databases
 Data Transfer Costs of Distributed Query
Processing
 Distributed Query Processing Using
Semijoin
 Query and Update Decomposition

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 25-859
Overview of Concurrency
Control and Recovery in
Distributed Databases
 Distributed Concurrency Control Based on
a Distinguished Copy of a Data Item
 Distributed Concurrency Control Based on
Voting
 Distributed Recovery

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 25-860
An Overview of 3-Tier Client-
Server Architecture

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 25-861
Distributed Databases in
Oracle

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 25-862
Summary

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 25-863
Copyright © 2004 Pearson Education, Inc.
Chapter 26

XML and Internet


Databases

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline

 Introduction
 Structured, Semi structured, and Unstructured
Data.
 XML Hierarchical (Tree) Data Model.
 XML Documents, DTD, and XML Schema.
 XML Documents and Databases.
 XML Querying.
– Xpath
– XQuery

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-866
Introduction

 Although HTML is widely used for formatting and structuring Web


documents, it is not suitable for specifying structured data that is
extracted from databases.
 A new language—namely XML (eXtended Markup Language) has
emerged as the standard for structuring and exchanging data over the
Web. XML can be used to provide more information about the
structure and meaning of the data in the Web pages rather than just
specifying how the Web pages are formatted for display on the screen.
 The formatting aspects are specified separately—for example, by
using a formatting language such as XSL (eXtended Stylesheet
Language).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-867
Structured, Semi Structured and Unstructured Data.

 Information stored in databases is known as structured data because it is


represented in a strict format. The DBMS then checks to ensure that all data
follows the structures and constraints specified in the schema.
 In some applications, data is collected in an ad-hoc manner before it is
known how it will be stored and managed. This data may have a certain
structure, but not all the information collected will have identical structure.
This type of data is known as semi-structured data.
– In semi-structured data, the schema information is mixed in with the data
values, since each data object can have different attributes that are not
known in advance. Hence, this type of data is sometimes referred to as
self-decribing data.
 A third category is known as unstructured data, because there is very
limited indication of the type of data. A typical example would be a text
document that contains information embedded within it. Web pages in
HTML that contain some data are considered as unstructured data.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-868
Structured, Semi Structured and Unstructured
Data (cont.)

 Semi-structured data may be displayed as a directed graph, as


shown.
 The labels or tags on the directed edges represent the schema
names—the names of attributes, object types (or entity types or
classes), and relationships.
 The internal nodes represent individual objects or composite
attributes.
 The leaf nodes represent actual data values of simple (atomic)
attributes.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-869
FIGURE 26.1

Representing semistructured data as a graph.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-870
XML Hierarchical
(Tree) Data
Model

FIGURE 26.3
A complex XML
element called
<projects>.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-871
XML Hierarchical (Tree) Data Model (cont.)

 The basic object is XML is the XML document. There are two
main structuring concepts that are used to construct an XML
document: elements and attributes. Attributes in XML provide
additional information that describe elements.
 As in HTML, elements are identified in a document by their
start tag and end tag. The tag names are enclosed between
angled brackets <…>, and end tags are further identified by a
backslash </…>. Complex elements are constructed from other
elements hierarchically, whereas simple elements contain data
values.
 It is straightforward to see the correspondence between the
XML textual representation and the tree structure. In the tree
representation, internal nodes represent complex elements,
whereas leaf nodes represent simple elements. That is why the
XML model is called a tree model or a hierarchical model.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-872
XML Hierarchical (Tree) Data Model (cont.)
It is possible to characterize three main types of XML
documents:
1. Data-centric XML documents:
These documents have many small data items that follow a
specific structure, and hence may be extracted from a
structured database. They are formatted as XML documents in
order to exchange them or display them over the Web.
2. Document-centric XML documents:
These are documents with large amounts of text, such as news
articles or books. There is little or no structured data elements
in these documents.
3. Hybrid XML documents:
These documents may have parts that contains structured data
and other parts that are predominantly textual or unstructured.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-873
XML Documents, DTD, and XML Schema.
Well-Formed
– It must start with an XML declaration to indicate the version of XML being used—as
well as any other relevant attributes.
– It must follow the syntactic guidelines of the tree model. This means that there should
be a single root element, and every element must include a matching pair of start tag
and end tag within the start and end tags of the parent element.
– A well-formed XML document is syntactically correct. This allows it to be
processed by generic processors that traverse the document and create an internal tree
representation.
 DOM (Document Object Model) - Allows programs to manipulate the resulting
tree representation corresponding to a well-formed XML document. The whole
document must be parsed beforehand when using dom.
 SAX - Allows processing of XML documents on the fly by notifying the
processing program whenever a start or end tag is encountered.
Valid
– A stronger criterion is for an XML document to be valid. In this case, the document
must be well-formed, and in addition the element names used in the start and end tag
pairs must follow the structure specified in a separate XML DTD (Document Type
Definition) file or XML schema file.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-874
XML Documents, DTD, and XML Schema (cont.)

FIGURE 26.4 An XML DTD file called projects.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-875
XML Documents, DTD, and XML Schema (cont.)
XML DTD Notation

 A * following the element name means that the element can be repeated zero or more
times in the document. This can be called an optional multivalued (repeating) element.
 A + following the element name means that the element can be repeated one or more
times in the document. This can be called a required multivalued (repeating) element.
 A ? following the element name means that the element can be repeated zero or one
times. This can be called an optional single-valued (non-repeating) element.
 An element appearing without any of the preceding three symbols must appear exactly
once in the document. This can be called an required single-valued (non-repeating)
element.
 The type of the element is specified via parentheses following the element. If the
parentheses include names of other elements, these would be the children of the
element in the tree structure. If the parentheses include the keyword #PCDATA or one
of the other data types available in XML DTD, the element is a leaf node. PCDATA
stands for parsed character data, which is roughly similar to a string data type.
 Parentheses can be nested when specifying elements.
 A bar symbol ( e1 | e2 ) specifies that either e1 or e2 can appear in the document.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-876
XML Documents, DTD, and XML Schema (cont.)

Limitations of XML DTD

 First, the data types in DTD are not very general.

 DTD has its own special syntax and so it requires specialized


processors. It would be advantageous to specify XML schema
documents using the syntax rules of XML itself so that the same
processors for XML documents can process XML schema
descriptions.

 Third, all DTD elements are always forced to follow the specified
ordering the document so unordered elements are not permitted.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-877
XML Documents, DTD, and XML Schema (cont.)

FIGURE 26.5 An XML schema file called company.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-878
FIGURE 26.5
(continued)
An XML schema file
called company.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-879
FIGURE 26.5
(continued)
An XML schema file
called company.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-880
FIGURE 26.5 (continued)
An XML schema file called company.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-881
XML Documents, DTD, and XML Schema (cont.)
XML Schema
 Schema Descriptions and XML Namespaces:
It is necessary to identify the specific set of XML schema language elements
(tags) by a file stored at a Web site location. The second line in our example
specifies the file used in this example, which is:
"http://www.w3.org/2001/XMLSchema".
Each such definition is called an XML namespace.
The file name is assigned to the variable xsd using the attribute xmlns (XML
namespace), and this variable is used as a prefix to all XML schema tags.
 Annotations, documentation, amd language used:
The xsd:annotation and xsd:documentation are used for providing comments
and other descriptions in the XML document. The attribute XML:lang of the
xsd:documentation element specifies the language being used. Eg. “en”
 Elements and types:
We specify the root element of our XML schema. In XML schema, the name
attribute of the xsd:element tag specifies the element name, which is called
company for the root element in our example. The structure of the company
root element is a xsd:complexType.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-882
XML Documents, DTD, and XML Schema (cont.)
XML Schema
 First-level elements in the company database:
These elements are named employee, department, and project, and each is
specified in an xsd:element tag. If a tag has only attributes and no further sub-
elements or data within it, it can be ended with the back slash symbol (/>) and
termed Empty Element.
 Specifying element type and minimum and maximum occurrences:
If we specify a type attribute in an xsd:element, this means that the structure
of the element will be described separately, typically using the
xsd:complexType element. The minOccurs and maxOccurs tags are used for
specifying lower and upper bounds on the number of occurrences of an
element. The default is exactly one occurrence.
 Specifying Keys:
For specifying primary keys, the tag xsd:key is used.
For specifying foreign keys, the tag xsd:keyref is used. When specifying a
foreign key, the attribute refer of the xsd:keyref tag specifies the referenced
primary key whereas the tags xsd:selector and xsd:field specify the
referencing element type and foreign key.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-883
XML Documents, DTD, and XML Schema (cont.)
XML Schema
 Specifying the structures of complex elements via complex types:
Complex elements in our example are Department, Employee, Project, and
Dependent, which use the tag xsd:complexType. We specify each of these as
a sequence of subelements corresponding to the database attributes of each
entity type by using the xsd:sequence and xsd:element tags of XML schema.
Each element is given a name and type via the attributes name and type of
xsd:element.
We can also specify minOccurs and maxOccurs attributes if we need to
change the default of exactly one occurrence. For (optional) database
attributes where null is allowed, we need to specify minOccurs = 0, whereas
for multivalued database attributes we need to specify maxOccurs =
“unbounded” on the corresponding element.
 Composite (compound) attributes:
Composite attributes from ER Schema are also specified as complex types in
the XML schema, as illustrated by the Address, Name, Worker, and
WorkesOn complex types. These could have been directly embedded within
their parent elements.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-884
XML Documents and Databases.
Approaches to Storing XML Documents
 Using a dbms to store the documents as text:
We can use a relational or object dbms to store whole XML documents as text
fields within the dbms records or objects. This aproach can be used if the
dbms has a special module for document processing, and would work for
storing schemaless and document-centric XML documents.
 Using a dbms to store the document contents as data elements:
This approach would work for storing a collection of documents that follow a
specific XML DTD or XML schema. Since all the documents have the same
structure, we can design a relational (or object) database to store the leaf-level
data elements within the XML documents.
 Designing a specialized system for storing native XML data:
A new type of database system based on the hierarchical (tree) model would
be designed and implemented. The system would include specialized indexing
and querying techniques, and would work for all types of XML documents.
 Creating or publishing customized XML documents from pre-existing
relational databases:
Because there are enormous amounts of data already stored in relational
databases, parts of these data may need to be formatted as documents for
exchanging or displaying over the Web.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-885
XML Documents, DTD, and XML Schema (cont.)
Extracting XML Documents from Relational Databases.
Suppose that an application needs to extract XML documents for student,
course, and grade information from the university database. The data needed
for these documents is contained in the database attributes of the entity types
course, section, and student as shown below (part of the main ER), and the
relationships s-s and c-s between them.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-886
FIGURE 26.7
Subset of the UNIVERSITY database schema needed for
XML document extraction.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-887
XML Documents, DTD, and XML Schema (cont.)
Extracting XML Documents from Relational Databases.
One of the possible hierarchies that can be extracted from the database subset
could choose COURSE as the root.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-888
FIGURE 26.8
Hierarchical (tree)
view with COURSE
as the root.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-889
FIGURE 26.9
XML schema document with COURSE as the root.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-890
XML Documents, DTD, and XML Schema (cont.)

Breaking Cycles To Convert Graphs into Trees


It is possible to have a more complex subset with one or more cycles, indicating
multiple relationships among the entities.
Suppose that we need the information in all the entity types and relationships in
figure below for a particular XML document, with student as the root element.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-891
FIGURE 26.6
An ER schema diagram for a simplified UNIVERSITY
database.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-892
XML Documents, DTD, and XML Schema (cont.)
Breaking Cycles To convert Graphs into Trees
One way to break the cycles is to replicate the entity types
involved in cycles.
 First, we replicate INSTRUCTOR as shown in part (2) of Figure, calling the
replica to the right INSTRUCTOR1. The INSTRUCTOR replica on the left
represents the relationship between instructors and the sections they teach,
whereas the INSTRUCTOR1 replica on the right represents the relationship
between instructors and the department each works in.

 We still have the cycle involving COURSE, so we can replicate COURSE in a


similar manner, leading to the hierarchy shown in part (3) . The COURSE1
replica to the left represents the relationship between courses and their
sections, whereas the COURSE replica to the right represents the relationship
between courses and the department that offers each course.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-893
FIGURE 26.13
Converting a graph with cycles into a hierarchical
(tree) structure.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-894
XML Querying
XPath
 An XPath expression returns a collection of element nodes that satisfy certain
patterns specified in the expression.
 The names in the XPath expression are node names in the XML document tree
that are either tag (element) names or attribute names, possibly with additional
qualifier conditions to further restrict the nodes that satisfy the pattern.
 There are two main separators when specifying a path:
single slash (/) and double slash (//).
A single slash before a tag specifies that the tag must appear as a direct child of
the previous (parent) tag, whereas a double slash specifies that the tag can appear
as a descendant of the previous tag at any level.
 It is customary to include the file name in any XPath query allowing us to
specify any local file name or path name that specifies the path.
doc(www.company.com/info.XML)/company => COMPANY XML doc

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-895
XML Querying
1. Returns the COMPANY root node and all its descendant nodes, which
means that it returns the whole XML document.
2. Returns all department nodes (elements) and their descendant subtrees.
3. Returns all employeeName nodes that are direct children of an employee
node, such that the employee node has another child element
employeeSalary whose value is greater than 70000.
4. This returns the same result as the previous one except that we specified
the full path name in this example.
5. This returns all projectWorker nodes and their descendant nodes that are
children under a path /company/project and that have a child node hours
with value greater than 20.0 hours.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-896
FIGURE 26.14
Some examples of XPath expressions on XML documents
that follow the XML schema file COMPANY in Figure 26.5.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-897
XML Querying
XQuery
 XQuery uses XPath expressions, but has additional constructs.
 XQuery permits the specification of more general queries on one or
more XML documents.
 The typical form of a query in XQuery is known as a FLWR
expression, which stands for the four main clauses of XQuery and
has the following form:

FOR <variable bindings to individual nodes (elements)>


LET <variable bindings to collections of nodes (elements)>
WHERE <qualifier conditions>
RETURN <query result specification>
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 26-898
XML Querying
1. This query retrieves the first and last names of employees who earn more
than 70000. The variable $x is bound to each employeeName element that
is a child of an employee element, but only for employee elements that
satisfy the qualifier that their employeeSalary is greater that 70000.
2. This is an alternative way of retrieving the same elements retrieved by the
first query.
3. This query illustrates how a join operation can be performed by having
more than one variable. Here, the $x variable is bound to each
projectWorker element that is a child of project number 5, whereas the $y
variable is bound to each employee element. The join condition matches
ssn values in order to retrieve the employee names.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-899
FIGURE 26.15
Some examples of XQuery queries on XML documents that
follow the XML schema file COMPANY in Figure 26.5.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 26-900
Copyright © 2004 Pearson Education, Inc.
Chapter 27

Data Mining Concepts

Copyright © 2004 Pearson Education, Inc.


Overview of Data Mining
Technology

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-903
Association Rules
 Market-Basket Model, Support, and Confidence
 Apriori Algorithm
 Sampling Algorithm
 Frequent-Pattern Tree Algorithm
 Partition Algorithm
 Other Types of Association Rules
 Additional Considerations for Association Rules

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-904
Classification

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-905
Clustering

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-906
Approaches to Other Data
Mining Problems
 Discovery of Sequential Patterns
 Discovery of Patterns in Time Series
 Regression
 Neural Networks
 Genetic Algorithm

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-907
Applications of Data Mining

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-908
Commercial Data Mining
Tools

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-909
Summary

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 27-910
Copyright © 2004 Pearson Education, Inc.
Chapter 28

Overview of Data
Warehousing and OLAP

Copyright © 2004 Pearson Education, Inc.


Introduction, Definitions, and
Terminology

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 28-913
Characteristics of Data
Warehouses

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 28-914
Data Modeling for Data
Warehouses

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 28-915
Building a Data Warehouse

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 28-916
Typical Functionality of a Data
Warehouse

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 28-917
Data Warehouse Versus
Views

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 28-918
Problems and Open Issues in
Data Warehouses
 Difficulties of Implementing Data
Warehouses
 Open Issues in Data Warehousing

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 28-919
Summary

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Slide 28-920
Copyright © 2004 Pearson Education, Inc.
Chapter 29
Emerging Database
Technologies and
Applications

Copyright © 2004 Pearson Education, Inc.


Copyright © 2004 Pearson Education, Inc.
Chapter Outline
1 Mobile Databases
1.1 Mobile Computing Architecture
1.2 Characteristics of Mobile Environments
1.3 Data Management Issues
1.4 Application: Intermittently Synchronized Databases

2 Multimedia Databases
2.1 The Nature of Multimedia Data and Applications
2.2 Data Management Issues
2.3 Open Research Problems
2.4 Multimedia Database Applications

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-923
Chapter Outline(contd.)
3 Geographic Information Systems
3.1 GIS Applications
3.2 Data Management Requirements of GIS
3.3 Specific GIS Data Operations
3.4 An Example of GIS Software: ARC-INFO
3.5 Problems and Future issues in GIS

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-924
Chapter Outline(contd.)
4 GENOME Data Management
4.1 Biological Sciences and Genetics
4.2 Characteristics of Biological Data
4.3 The Human Genome Project and Existing Biological
Databases

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-925
Emerging Database Technologies and
Applications

 Emerging database technologies


 The major application domains

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-926
1 Mobile Databases

Recent advances in portable and wireless technology led to


mobile computing, a new dimension in data
communication and processing.

Portable computing devices coupled with wireless


communications allow clients to access data from virtually
anywhere and at any time.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-927
1 Mobile Databases(2)

There are a number of hardware and software problems that


must be resolved before the capabilities of mobile
computing can be fully utilized.
Some of the software problems – which may involve data
management, transaction management, and database
recovery – have their origins in distributed database
systems.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-928
1 Mobile Databases(3)

In mobile computing, the problems are more difficult,


mainly:
 The limited and intermittent connectivity afforded by
wireless communications.
 The limited life of the power supply(battery).
 The changing topology of the network.

– In addition, mobile computing introduces new architectural


possibilities and challenges.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-929
1.1 Mobile Computing Architecture

The general architecture of a mobile platform is illustrated in


Fig29.1.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-930
Figure 27.4 A general architecture
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Chapter 29-931
Copyright © 2004 Pearson Education, Inc.
1.1 Mobile Computing Architecture(2)

It is distributed architecture where a number of computers,


generally referred to as Fixed Hosts and Base Stations are
interconnected through a high-speed wired network.

 Fixed hosts are general purpose computers configured to


manage mobile units.
 Base stations function as gateways to the fixed network for
the Mobile Units.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-932
1.1 Mobile Computing Architecture(3)

Wireless Communications –
 The wireless medium have bandwidth significantly lower
than those of a wired network.
– The current generation of wireless technology has data rates range
from the tens to hundreds of kilobits per second (2G cellular
telephony) to tens of megabits per second (wireless Ethernet,
popularly known as WiFi).
– Modern (wired) Ethernet, by comparison, provides data rates on
the order of hundreds of megabits per second.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-933
1.1 Mobile Computing Architecture(4)

Wireless Communications –
 The other characteristics distinguish wireless connectivity
options:
– interference,
– locality of access,
– range,
– support for packet switching,
– seamless roaming throughout a geographical region.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-934
1.1 Mobile Computing Architecture(5)

Wireless Communications –
 Some wireless networks, such as WiFi and Bluetooth, use
unlicensed areas of the frequency spectrum, which may
cause interference with other appliances, such as cordless
telephones.
 Modern wireless networks can transfer data in units called
packets, that are used in wired networks in order to
conserve bandwidth.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-935
1.1 Mobile Computing Architecture(6)

Client/Network Relationships –
 Mobile units can move freely in a geographic mobility
domain, an area that is circumscribed by wireless network
coverage.
– To manage entire mobility domain is divided into one or more
smaller domains, called cells, each of which is supported by at
least one base station.
– Mobile units be unrestricted throughout the cells of domain, while
maintaining information access contiguity.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-936
1.1 Mobile Computing Architecture(7)

Client/Network Relationships –
The communication architecture described earlier is designed
to give the mobile unit the impression that it is attached to
a fixed network, emulating a traditional client-server
architecture.
Wireless communications, however, make other architectures
possible. One alternative is a mobile ad-hoc network
(MANET), illustrated in 29.2.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-937
1.1 Mobile Computing Architecture(8)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-938
1.1 Mobile Computing Architecture(9)

Client/Network Relationships –
 In a MANET, co-located mobile units do not need to
communicate via a fixed network, but instead, form their
own using cost-effective technologies such as Bluetooth.
 In a MANET, mobile units are responsible for routing their
own data, effectively acting as base stations as well as
clients.
– Moreover, they must be robust enough to handle changes in the
network topology, such as the arrival or departure of other mobile
units.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-939
1.1 Mobile Computing Architecture(10)

Client/Network Relationships –
 MANET applications can be considered as peer-to-peer,
meaning that a mobile unit is simultaneously a client and a
server.
– Transaction processing and data consistency control become more
difficult since there is no central control in this architecture.
– Resource discovery and data routing by mobile units make
computing in a MANET even more complicated.
– Sample MANET applications are multi-user games, shared
whiteboard, distributed calendars, and battle information sharing.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-940
1.2 Characteristics of Mobile
Environments
The characteristics of mobile computing include:
 Communication latency.
 Intermittent connectivity.
 Limited battery life.
 Changing client location.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-941
1.2 Characteristics of Mobile
Environments(2)
The server may not be able to reach a client. A client may be
unreachable because it is dozing – in an energy-conserving
state in which many subsystems are shut down – or
because it is out of range of a base station.
In either case, neither client nor server can reach the other,
and modifications must be made to the architecture in
order to compensate for this case.
Proxies for unreachable components are added to the
architecture. For a client (and symmetrically for a server),
the proxy can cache updates intended for the server.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-942
1.2 Characteristics of Mobile
Environments(3)
Mobile computing poses challenges for servers as well as
clients. The latency involved in wireless communication
makes scalability a problem. Because latency due to
wireless communications increases the time to service
each client request, the server can handle fewer clients.
One way servers relieve this problem is by broadcasting
data whenever possible.
– A server can simply broadcast data periodically.
– Broadcast also reduces the load on the server, as clients do not
have to maintain active connections to it.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-943
1.2 Characteristics of Mobile
Environments(4)
Client mobility also poses many data management challenges.
 Servers must keep track of client locations in order to
efficiently route messages to them.
 Client data should be stored in the network location that
minimizes the traffic necessary to access it.
 The act of moving between cells must be transparent to the
client.
 The server must be able to gracefully divert the shipment of
data from one base to another, without the client noticing.
 Client mobility also allows new applications that are
location-based.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-944
1.3 Data Management Issues

From a data management standpoint, mobile computing may


be considered a variation of distributed computing.
Mobile databases can be distributed under two possible
scenarios:
1. The entire database is distributed mainly among the wired
components, possibly with full or partial replication. A base station
or fixed host manages its own database with a DBMS-like
functionality, with additional functionality for locating mobile units
and additional query and transaction management features to meet
the requirements of mobile environments.
2. The database is distributed among wired and wireless components.
Data management responsibility is shared among base stations or
fixed hosts and mobile units.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-945
1.3 Data Management Issues(2)

Data management issues as it is applied to mobile databases:


 Data distribution and replication
 Transactions models
 Query processing
 Recovery and fault tolerance
 Mobile database design
 Location-based service
 Division of labor
 Security

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-946
1.4 Application: Intermittently
Synchronized Databases
Whenever clients connect – through a process known in
industry as synchronization of a client with a server –
they receive a batch of updates to be installed on their
local database. The primary characteristic of this scenario
is that the clients are mostly disconnected; the server is
not necessarily able reach them. This environment has
problems similar to those in distributed and client-server
databases, and some from mobile databases.
This environment is referred to as Intermittently
Synchronized Database Environment (ISDBE).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-947
1.4 Application: Intermittently
Synchronized Databases(2)
The characteristics of Intermittently Synchronized
Databases (ISDBs) make them distinct from the mobile
databases are:
1. A client connects to the server when it wants to
exchange updates. The communication can be unicast –
one-on-one communication between the server and the
client– or multicast– one sender or server may
periodically communicate to a set of receivers or update
a group of clients.
2. A server cannot connect to a client at will.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-948
1.4 Application: Intermittently
Synchronized Databases(3)
3. Issues of wireless versus wired client connections and
power conservation are generally immaterial.
4. A client is free to manage its own data and transactions
while it is disconnected. It can also perform its own
recovery to some extent.
5. A client has multiple ways connecting to a server and, in
case of many servers, may choose a particular server to
connect to based on proximity, communication nodes
available, resources available, etc.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-949
2 Multimedia Databases

In the years ahead multimedia information systems are


expected to dominate our daily lives. Our houses will be
wired for bandwidth to handle interactive multimedia
applications. Our high-definition TV/computer
workstations will have access to a large number of
databases, including digital libraries, image and video
databases that will distribute vast amounts of
multisource multimedia content.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-950
2.1 Multimedia Databases

DBMSs have been constantly adding to the types of data they


support. Today the following types of multimedia data
are available in current systems.
 Text: May be formatted or unformatted. For ease of
parsing structured documents, standards like SGML and
variations such as HTML are being used.
 Graphics: Examples include drawings and illustrations
that are encoded using some descriptive standards (e.g.
CGM, PICT, postscript).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-951
2.1 Multimedia Databases(2)

 Images: Includes drawings, photographs, and so forth,


encoded in standard formats such as bitmap, JPEG, and
MPEG. Compression is built into JPEG and MPEG.
These images are not subdivided into components.
Hence querying them by content (e.g., find all images
containing circles) is nontrivial.
 Animations: Temporal sequences of image or graphic
data.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-952
2.1 Multimedia Databases(3)

 Video: A set of temporally sequenced photographic data


for presentation at specified rates– for example, 30
frames per second.
 Structured audio: A sequence of audio components
comprising note, tone, duration, and so forth.
 Audio: Sample data generated from aural recordings in a
string of bits in digitized form. Analog recordings are
typically converted into digital form before storage.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-953
2.1 Multimedia Databases(4)

 Composite or mixed multimedia data: A combination of


multimedia data types such as audio and video which
may be physically mixed to yield a new storage format
or logically mixed while retaining original types and
formats. Composite data also contains additional control
information describing how the information should be
rendered.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-954
2.1 Multimedia Databases(5)

Nature of Multimedia Applications: Multimedia data may be


stored, delivered, and utilized in many different ways.
Applications may be categorized based on their data
management characteristics as follows:
 Repository applications: A large amount of multimedia
data as well as metadata is stored for retrieval purposes.
Examples include repositories of satellite images,
engineering drawings and designs, space photographs,
and radiology scanned pictures.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-955
2.1 Multimedia Databases(6)

 Presentation applications: A large amount of


applications involve delivery of multimedia data subject
to temporal constraints; simple multimedia viewing of
video data, for example, requires a system to simulate
VCR-like functionality. Complex and interactive
multimedia presentations involve orchestration
directions to control the retrieval order of components
in a series or in parallel. Interactive environments must
support capabilities such as real-time editing analysis or
annotating of video and audio data.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-956
2.1 Multimedia Databases(7)

 Collaborative work using multimedia information: This


is a new category of applications in which engineers
may execute a complex design task by merging
drawings, fitting subjects to design constraints, and
generating new documentation, change notifications, and
so forth. Intelligent healthcare networks as well as
telemedicine will involve doctors collaborating among
themselves, analyzing multimedia patient data and
information in real time as it is generated.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-957
2.2 Data Management Issues

Multimedia applications dealing with thousands of images,


documents, audio and video segments, and free text data
depend critically on appropriate modeling of the
structure and content of data and then designing
appropriate database schemas for storing and retrieving
multimedia information. Multimedia information
systems are very complex and embrace a large set of
issues :
 Modeling
– complex objects

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-958
2.2 Data Management Issues(2)
 Design
– conceptual, logical, and physical design of
multimedia has not been addressed fully.
 Storage
– multimedia data on standard disklike devices presents
problems of representation, compression, mapping to
device hierarchies, archiving, and buffering during
the input/output operation.
 Queries and retrieval
– “database” way of retrieving information is based on
query languages and internal index structures.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-959
2.2 Data Management Issues(3)

 Performance
– multimedia applications involving only documents
and text, performance constraints are subjectively
determined by the user.
– applications involving video playback or audio-video
synchronization, physical limitations dominate.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-960
2.3 Multimedia Database Applications

Large-scale applications of multimedia databases can be


expected encompasses a large number of disciplines and
enhance existing capabilities.
 Documents and records management
 Knowledge dissemination
 Education and training
 Marketing, advertising, retailing, entertainment, and
travel
 Real-time control and monitoring

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-961
3 Geographic Information Systems

Geographic information systems(GIS) are used to collect,


model, and analyze information describing physical
properties of the geographical world. The scope of GIS
broadly encompasses two types of data:
1. spatial data, originating from maps, digital images,
administrative and political boundaries, roads,
transportation networks, physical data, such as rivers,
soil characteristics, climatic regions, land elevations, and

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-962
3 Geographic Information Systems(2)

2. nonspatial data, such as socio-economic data (like


census counts), economic data, and sales or marketing
information. GIS is a rapidly developing domain that
offers highly innovative approaches to meet some
challenging technical demands.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-963
3.1 GIS Applications

It is possible to divide GISs into three categories:


 cartographic applications,
 digital terrain modeling applications, and
 geographic objects applications

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-964
3.1 GIS Applications(2)
GIS Applications

Digital Terrain Modeling Geographic Objects


Cartographic Applications
Applications
Irrigation Car navigation
Earth systems
science
Crop yield Geographic
analysis Civil engineering and market analysis
Land military evaluation
Evaluation Utility
Soil Surveys distribution and
Planning and
Facilities Air and water consumption
management pollution studies Consumer product
Landscape and services –
studies Flood Control economic analysis

Traffic pattern Water resource


analysis management

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-965
3.2 Data Management Requirements of
GIS
The functional requirements of the GIS applications above
translate into the following database requirements.

Data Modeling and Representation, GIS data can be


broadly represented in two formats:
1. Vector data represents geometric objects such as points,
lines, and polygons.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-966
3.2 Data Management Requirements of
GIS(2)
2. Raster data is characterized as an array of points, where
each point represents the value of an attribute for a
real-world location. Informally, raster images are
n-dimensional array where each entry is a unit of the
image and represents an attribute. Two-dimensional
units are called pixels, while three-dimensional units are
called voxels. Three-dimensional elevation data is stored
in a raster-based digital elevation model (DEM) format.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-967
3.2 Data Management Requirements of
GIS(3)
Another raster format called triangular irregular network
(TIN) is a topological vector-based approach that models
surfaces by connecting sample points as vector of
triangles and has a point density that may vary with the
roughness of the terrain. rectangular grids (or elevation
matrices) are two-dimensional array structures. In
digital terrain modeling (DTM), the model also may be
used by substituting the elevation with some attribute of
interest such as population density or air temperature.
GIS data often includes a temporal structure in addition
to a spatial structure.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 29-968
3.2 Data Management Requirements of
GIS(4)
Data Analysis, GIS data undergoes various types of analysis.
For example, in applications such as soil erosion studies,
environmental impact studies, or hydrological runoff
simulations, DTM data may undergo various types of
geomorphometric analysis – measurements such as
slope values, gradients (the rate of change in altitude),
aspect (the compass direction of the gradient), profile
convexity (the rate of change of gradient), plan convexity
(the convexity of contours and other parameters).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-969
3.2 Data Management Requirements of
GIS(5)
Data Integration, GISs must integrate both vector and raster
data from a variety of sources. Sometimes edges and
regions are inferred from a raster image to form a vector
model, or conversely, raster images such as aerial
photographs are used to update vector models. Several
coordinate systems such as Universal Transverse
Mercator (UTM), latitude/longitude, and local cadastral
systems are used to identify locations. Data originating
from different coordinate systems requires appropriate
transformations.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-970
3.2 Data Management Requirements of
GIS(6)
Data Capture, The first step in developing a spatial database
for cartographic modeling is to capture the two-
dimensional or three-dimensional geographical information
in digital form – a process that is sometimes impeded by
source map characteristics such as resolution, type of
projection, map scales, cartographic licensing, diversity of
measurement techniques, and coordinate system
differences. Spatial data can also be captured from remote
sensors in satellites such as Landsat, NORA, and Advanced
Very High Resolution Radiometer(AVHRR) as well as
SPOT HRV (High Resolution Visible Range Instrument.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-971
3.3 Specific GIS Data Operations

GIS applications are conducted through the use of special


operators such as the following:
1. Interpolation
2. Interpretation
3. Proximity analysis
4. Raster image processing
5. Analysis of networks

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-972
3.3 Specific GIS Data Operations(2)

The functionality of a GIS database is also subject to other


considerations:
1. Extensibility
2. Data quality control
3. Visualization
Such requirements clearly illustrate that standard RDBMSs or
ODBMSs do not meet the special needs of GIS. It is
therefore necessary to design systems that support the
vector and raster representations and the spatial
functionality as well as the required DBMS features.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-973
4.1 Genome Data Management

Biological Sciences and Genetics: The biological sciences


encompass an enormous variety of information.
Environmental science gives us a view of how species
live and interact in a world filled with natural
phenomena. Biology and ecology study particular
species. Anatomy focuses on the overall structure of an
organism, documenting the physical aspects of
individual bodies. Traditional medicine and physiology
break the organism into systems and tissues and strive to
collect information on the workings of these systems and
the organism as a whole.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Chapter 29-974
4.1 Genome Data Management(2)

Histology and cell biology delve into the tissue and cellular
levels and provide knowledge about the inner structure
and function of the cell. This wealth of information that
has been generated, classified, and stored for centuries
has only recently become a major application of
database technology.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-975
4.1 Genome Data Management(3)

Genetics has emerged as an ideal field for the application of


information technology. In a broad sense, it can be
taught of as the construction of models based on
information about genes – which can be defined as units
of heredity – and population and the seeking out of
relationships in that information.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-976
4.1 Genome Data Management(4)

The study of genetics can be divided into three branches:


1. Mendelian genetics is the study of the transmission of
traits between generations.
2. Molecular genetics is the study of the chemical structure
and function of genes at the molecular level.
3. Population genetics is the study of how genetic
information varies across populations of organisms.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-977
4.1 Genome Data Management(5)

The origins of molecular genetics can be traced to two


important discoveries:
1. In 1869 when Friedrich Miescher discovered nuclein
and its primary component, deoxyribonucleic acid
(DNA). In subsequent research DNA and a related
compound, ribonucleic acid , were found to be
composed of nucleotides (a sugar, a phosphate, and a
base which combined to form nucleic acid) linked into
long polymers via the sugar and phosphate.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-978
4.1 Genome Data Management(6)

2. The second discovery was the demonstration in 1944 by


Oswald Avery that DNA was indeed the molecular
substance carrying genetic information.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-979
4.1 Genome Data Management(7)

Genes were shown to be composed of chains of nucleic acids


arranged linearly on chromosomes and to serve three
primary functions:
1. replicating genetic information between generations,
2. providing blueprints for the creation of polypeptides, and
3. accumulating changes– thereby allowing evolution to
occur.
Watson and Crick found the double-helix structure of the DNA
in 1953, which gave molecular biology a new direction.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-980
4.2 Characteristics of Biological Data

Biological data exhibits many special characteristics that


make management of biological information a
particularly challenging problem. The characteristics
related to biological information, and focusing on a
multidisciplinary field called bioinformatics that has
emerged. Bioinformatics addresses information
management of genetic information with special
emphasis on DNA sequence analysis.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-981
4.2 Characteristics of Biological
Data(2)
Applications of bioinformatics span design of targets for
drugs, study of mutations and related diseases,
anthropological investigations on migration patterns of
tribes and therapeutic treatments.

Characteristic 1: Biological data is highly complex when


compared with most other domains or applications.
Characteristic 2: The amount and range of variability in data
is high.
Characteristic 3: Schemas in biological databases change at
a rapid pace.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-982
4.2 Characteristics of Biological
Data(3)
Characteristic 4: Representations of the same data by
different biologists will likely be different (even using
the same system).
Characteristic 5: Most users of biological data do not
require write access to the database; read-only access is
adequate.
Characteristic 6: Most biologists are not likely to have
knowledge of the internal structure of the database or
about schema design.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-983
4.2 Characteristics of Biological
Data(4)
Characteristic 7: The context of data gives added meaning
for its use in biological applications.
Characteristic 8: Defining and representing complex queries
is extremely important to the biologist.
Characteristic 9: Users of biological information often
require access to “old” values of the data – particularly
when verifying previously reported results.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-984
4.3 The Human Genome Project and
Existing Biological Databases
The term genome is defined as the total genetic information
that can be obtained about an entity. The human
genome, for example, generally refers to the complete
set of genes required to create a human being –
estimated to be more than 30,000 genes spread over 23
pairs of chromosomes, with an estimated 3 to 4 billion
nucleotides. The goal of the Human Genome
Project(HGP) has been to obtain the complete sequence
– the ordering of the bases – of those nucleotides.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-985
4.3 The Human Genome Project and
Existing Biological Databases
The term genome is defined as the total genetic information
that can be obtained about an entity. The human
genome, for example, generally refers to the complete
set of genes required to create a human being –
estimated to be more than 30,000 genes spread over 23
pairs of chromosomes, with an estimated 3 to 4 billion
nucleotides. The goal of the Human Genome
Project(HGP) has been to obtain the complete sequence
– the ordering of the bases – of those nucleotides.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-986
4.3 The Human Genome Project and
Existing Biological Databases(2)
Some of the existing database systems that are supporting or
have grown out of the Human Genome Project.

GenBank – The preeminent DNA sequence database in the


world today is GenBank, maintained by the National
Center for Biotechnology Information (NCBI) of the
National Library of Medicine (NLM).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-987
4.3 The Human Genome Project and
Existing Biological Databases(3)
GenBank –
 Established in 1978 as a repository for DNA sequence
data.
 Since 1978 expanded to include sequence tag data,
protein sequence data, three-dimensional protein
structure, taxonomy, and links to the medical literature
(MEDLINE).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-988
4.3 The Human Genome Project and
Existing Biological Databases(4)
GenBank –
 As of release 135.0 in April 2003, GenBank contains
over 31 billion nucleotide bases of more than 24 million
sequences from over 100,000 species with roughly 1400
new organisms being added each month.
 The database size in flat file format is over 100 GB
uncompressed and has been doubling every 15 months.
 International collaboration with the European Molecular
Biology Laboratory (EMBL) in the U.K. and the DNA
Data Bank of Japan (DDBJ) on daily basis.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-989
4.3 The Human Genome Project and
Existing Biological Databases(5)
GenBank –
 Other limited data sources (e.g. three-dimensional
structure and Online Mendelian Inheritance in Man
(OMIM), have been added recently by reformatting the
existing OMIM and PDB databases and redesigning the
structure of the GenBank system to accommodate these
new data sets.
 The system is maintained as a combination of flat files,
relational databases, and files containing Abstract
Syntax Notation One (ASN.1)

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-990
4.3 The Human Genome Project and
Existing Biological Databases(6)
GenBank –
 The average user of the database is not able to access the
structure of the data directly for querying or other
functions, although complete snapshots of the database
are available for export in a number of formats,
including ASN.1. The query mechanism provided is via
the Entrez application (or its www version), which
allows keyword, sequence, and GenBank UID searching
through a static interface.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-991
4.3 The Human Genome Project and
Existing Biological Databases(7)
The Genome Database (GDB) –
 Created in 1989, GDB is a catalog of human gene
mapping data, a process that associates a piece of
information with a particular location on the human
genome.
 GDB data includes data describing primarily map
information (distance and confidence limits), and
Polymerase Chain Reaction (PCR) probe data
(experimental conditions, PCR primers, and reagents
used).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-992
4.3 The Human Genome Project and
Existing Biological Databases(8)
The Genome Database (GDB) –
 More recently efforts have been made to add data on
mutations linked to genetic loci, cell lines used in
experiments, DNA probe libraries, and some limited
polymorphism and population data.
 The GDB system is built around SYBASE, a
commercial relational DBMS, and its data are modeled
using standard Entity-Relationship techniques.
– GDB distributes a Database Access Toolkit

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-993
4.3 The Human Genome Project and
Existing Biological Databases(9)
The Genome Database (GDB) –
 As with GenBank, users are given only a very high-level
view of the data at the time of searching and thus cannot
make use of any knowledge gleaned from the structure
of the GDB tables. Search methods are most useful
when users are simply looking for an index into map or
probe data. Exploratory ad hoc searching is not
encouraged by present interfaces. Integration of the
database structures of GDB and OMIM was never fully
established.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-994
4.3 The Human Genome Project and
Existing Biological Databases(10)
Online Mendelian Inheritance in Man –
 Online Mandelian Inheritance in Man (OMIM) is an
electronic compendium of information on the genetic
basis of human disease.
 Begun in hard-copy form by Victor McCusick in 1966
with 1500 entries, it was converted to a full-text
electronic form between 1987 and 1989 by GDB.
– In 1991 its administration was transferred from John Hopkins
University to the NCBI, and the entire database was converted
to NCBI’s GenBank format. Today it contains more than 14,000
entries.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-995
4.3 The Human Genome Project and
Existing Biological Databases(11)
Online Mendelian Inheritance in Man –
 OMIM covers material on five disease areas based
loosely on organs and systems. Any morphological,
biochemical, behavioral, or other properties under study
are referred to as phenotype of an individual (or a cell).
Mendel realized that genes can exist in numerous forms
known as alleles. A genotype refers to the actual allelic
composition of an individual.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-996
4.3 The Human Genome Project and
Existing Biological Databases(12)
EcoCyc. –
 The Encyclopedia of Escherichia coli Genes and
Metabolism (EcoCyc) is a recent experiment in
combining information about the genome and the
metabolism of E.coli K-12.
 The database was created in 1996 as a collaboration
between Stanford Research Institute and Marine
Biological Laboratory.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-997
4.3 The Human Genome Project and
Existing Biological Databases(13)
EcoCyc. –
 An object-oriented data model was first used to
implement the system, with data stored in Ocelot, a
frame knowledge representation system. EcoCyc data
was arranged in a hierarchy of object classes based on
observations that
– the properties of a reaction are independent of an enzyme that
catalyzes it, and
– an enzyme has a number of properties that are “logically
distinct” from its reactions.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-998
4.3 The Human Genome Project and
Existing Biological Databases(14)
EcoCyc. –
 EcoCyc provides two methods of querying:
– direct (via predefined queries) and
– indirect (via hypertext navigation).

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-999
4.3 The Human Genome Project and
Existing Biological Databases(15)
Gene Ontology –
 Gene Ontology (GO) Consortium was formed in 1998
as a collaboration among three model organism
databases: FlyBase, Mouse Genome Informatics (MGI)
and Saccharomyces or yeast Genome Database (SGD).
– goal is to produce a structured, precisely defined, common,
controlled vocabulary for describing the roles of genes and gene
products in any organism.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-1000
4.3 The Human Genome Project and
Existing Biological Databases(16)
Gene Ontology –
 With the completion of genome sequencing of many
species, it has been observed that a large fraction of genes
among organisms display similarity in biological roles and
biologists have acknowledge that there is likely to be a
single limited universe of genes and proteins that are
conserved in most or all living cells.
 The GO Consortium has developed three ontologies:
Molecular function, biological process, and cellular
component, to describe attributes of genes, gene products,
or gene product groups.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-1001
4.3 The Human Genome Project and
Existing Biological Databases(17)
Gene Ontology –
 Each ontology comprises a set of well-defined
vocabularies of terms and relationships.
– The terms are organized in the form of directed acyclic graphs
(DAGs), in which a term node may have multiple parents and
multiple children.
– A child term can be an instance of (is a) or part of its parent.
– Latest release of GO database has over 13,000 terms and more
than 18,000 relationships between terms.
– GO was implemented using MySQL, an open source relational
database and a monthly database release is available in SQL and
XML formats.

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright © 2004 Pearson Education, Inc.
Chapter 29-1002
Summary Of the Major Genome-
Related Databases
DATABASE MAJOR INITIAL CURRENT DB PROBLEM PRIMARY DATA
NAME CONTENT TECHNOLOGY TECHNOLOGY AREAS TYPES

GenBank DNA/RNA Text files Flat-file/ASN.1 Schema browsing, Text, numeric, Some
sequence, schema evolution, complex types
protein linking to other dbs

OMIM Disease phenotypes Index cards/text files Flat-file/ASN.1 Unstructured, free Text
and genotypes,etc text entries linking to
other dbs
GDB Genetic map linkage Flat file Relational Schema expansion / Text, Numeric
data evolution, complex
objects, linking to
other dbs
ACEDB Genetic map linkage OO OO Schema expansion Text, Numeric
data, sequence /evolution, linking to
data(non-human) other dbs

HGMDB Sequence and Flat File-application Flat File-application Schema expansion Text
sequence variants specific specific /evolution, linking to
other dbs

EcoCyc Biochemical OO OO Locked into class Complex types, text,


reactions and hierarchy, schema numeric
pathways evolution

Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition


Chapter 29-1003
Copyright © 2004 Pearson Education, Inc.

You might also like