0% found this document useful (0 votes)
26 views97 pages

Fundamentals of Database Systems Module by Aklilu Thomas 2016 e

Uploaded by

tilayefkadie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views97 pages

Fundamentals of Database Systems Module by Aklilu Thomas 2016 e

Uploaded by

tilayefkadie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

Gambella University

College of Natural and Computational Science


Department of Computer Science

Fundamentals of Database Systems Module

By:
Aklilu Thomas (MSc.)

January 2024
Gambella, Ethiopia
Module Preface
This resource module is designed and developed in support of the Fundamentals to Database
systems course. It provides learning resources and teaching ideas related to the course.

Dear students, in chapter one you have been studied about introduction to database systems, typical
users, and DBMS concepts, terminology, and architecture, as well as a discussion of the progression of
database technologies over time and a brief history of data models within week 1-2.

In chapter two, database system architecture, schemas, instances, state, data model, data independence,
database language and interfaces, and database development life cycle are studied within weeks 2-3.

In chapter three, database model, the concepts of the Entity-Relationship (ER) model
and ER diagrams are presented and used to illustrate conceptual database design. Enhanced-ER (EER) data
model and EER diagrams that incorporate additional modeling concepts such as subclasses, specialization,
generalization, union types (categories) and inheritance. Conceptual design includes Entity-Relationship
model, logical design contains selecting specific database models and physical design consists the
processes of implementing the database in secondary storage; which were studied within week 3-6.

In chapter four, logical database design; converting ER Diagram to Relational Tables,


normalization, First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF),
other levels of normalization: Boyce-Codd Normal Form (BCNF) Forth Normal form (4NF) and
Fifth Normal Form (5NF) are well studied within weeks 7-8.

Chapter five is about the physical database design, mapping the logical database design to a
physical database design, base relations for target DBMS, enterprise constraints for target DBMS,
selecting appropriate file organizations, using secondary indexes to improve performance,
designing user views, designing security mechanisms to satisfy user requirements and designing
procedures and triggers were studied within weeks 9-10.

Chapter six is all about the relational query languages, relational algebra that includes selection,
projection, renaming, cross product, set-difference, union, Intersection, and join operations, and
relational calculus have been discussed within week 11-12.

In chapter seven, structured query language, data manipulation language, data definition language,
the syntax to write sql statements using sql server management studio, create database statement
and create table statement were studied within week 13-14.

i
Learning Objectives
This module introduces the students to the fundamentals to database concepts such as overview,
design and implementation of database systems.
At the end of this course, students will be able to:

 Understand what database is, database system and DBMS.


 Differentiate database system from file system.
 Identify the pros and cons of manual approach, file based approach and database approach.
 Understand the basic principles of database design systems using different database
models.
 Appreciate the use of database system in the real world.
 Design different types of databases.
 Understand database normalization & functional dependency.
 Understand the principles of relational database management systems and their languages.
 Understand file organizations and storage management, and index structure for files.
 Demonstrate queries in the relational algebra.
 Demonstrate queries in the tuple relational calculus.
 Create a relational database schema in SQL that incorporates key, entity integrity, and
referential integrity constraints.

ii
Table of Contents
Chapter One................................................................................................................................................... 1
Introduction to Database Systems ................................................................................................................. 1
1.1. Introduction ................................................................................................................................... 1
1.2. Traditional File Based Approach ................................................................................................... 2
1.3. Database Approach ....................................................................................................................... 3
1.4. Users and Actors of the Database ................................................................................................... 5
1.5. Chapter One Review Questions .................................................................................................... 7
Chapter Two.................................................................................................................................................. 8
Database System Architecture ...................................................................................................................... 8
2.1. Introduction ................................................................................................................................... 8
2.2. Schemas, Instances and Database State ........................................................................................ 8
2.3. Data Model.................................................................................................................................... 9
2.4. Three-Schema Architecture ......................................................................................................... 13
2.5. Data Independence ...................................................................................................................... 14
2.6. Database Languages .................................................................................................................... 15
2.7. Database Interfaces...................................................................................................................... 15
2.9. Database Development Life Cycle (DDLC)................................................................................ 18
2.10. Chapter Two Review Questions ............................................................................................. 19
Chapter Three.............................................................................................................................................. 20
Database Modeling ..................................................................................................................................... 20
3.1. Introduction ................................................................................................................................. 20
3.2. The Three levels of Database Design ......................................................................................... 21
3.3. Conceptual Database Design ...................................................................................................... 22
3.3. Developing an E-R Diagram ....................................................................................................... 23
3.4. Structural Constraints on Relationship ......................................................................................... 25
3.5. Problem in ER Modeling ............................................................................................................. 28
3.6. Enhanced E-R (EER) Models ..................................................................................................... 30
3.7. Constraints on specialization and generalization ........................................................................ 34
3.8. Relational Database Model ......................................................................................................... 35
3.9. Building Blocks of the Relational Data Model ........................................................................... 36
3.13. Key constraints........................................................................................................................ 40
3.14. Relational Views ..................................................................................................................... 40

iii
3.15. Chapter Three Review Questions............................................................................................ 41
Chapter Four ............................................................................................................................................... 43
Functional Dependency and Normalization ................................................................................................ 43
4.1. Introduction ................................................................................................................................. 43
4.2. Converting ER Diagram to Relational Tables ............................................................................ 44
4.3. Normalization ............................................................................................................................. 47
4.4. Functional Dependency (FD) ...................................................................................................... 48
4.5. Steps of Normalization................................................................................................................ 50
4.6. Chapter Four Review Questions ................................................................................................. 58
Chapter Five ................................................................................................................................................ 59
Record Storage and Primary File Organization .......................................................................................... 59
5.1. Introduction ................................................................................................................................. 59
5.2. Operation on Files ....................................................................................................................... 60
5.3. Hashing Techniques .................................................................................................................... 62
5.4. Choosing indexes ........................................................................................................................ 63
5.5. Multilevel Indexes ...................................................................................................................... 64
5.6. Dynamic Multilevel Indexes Using B-Trees and B+-Trees ........................................................ 64
5.7. Chapter Five Review Questions .................................................................................................. 65
Chapter Six.................................................................................................................................................. 66
Relational Algebra and Relational Calculus ............................................................................................... 66
6.1. Introduction ................................................................................................................................. 66
6.2. Relational Algebra ...................................................................................................................... 67
6.3. Select Operation .......................................................................................................................... 68
6.4. Project Operation ........................................................................................................................ 69
6.5. Rename Operation ...................................................................................................................... 70
6.6. Set Operations ............................................................................................................................. 71
6.7. CARTESIAN (cross product) Operation .................................................................................... 72
6.8. JOIN Operation ........................................................................................................................... 73
6.9. Relational Calculus ..................................................................................................................... 75
6.10. Quantifiers in Relational Calculus .......................................................................................... 78
6.11. Domain Relational Calculus ................................................................................................... 78
6.12. Chapter Six Review Questions ............................................................................................... 80
Chapter Seven ............................................................................................................................................. 81

iv
The SQL Language ..................................................................................................................................... 81
7.1. Introduction ................................................................................................................................. 81
7.2. The SQL Language ..................................................................................................................... 81
7.3. Data Manipulation and Data Definition Language ..................................................................... 82
7.4. Writing SQL Statements using SQL Server Management Studio .............................................. 83
7.5. Chapter Seven Review Questions ............................................................................................... 90
References ................................................................................................................................................... 91

v
Chapter One
Introduction to Database Systems

1.1. Introduction

In this chapter database systems, database management system (DBMS), types of database, the
pros and cons of manual approach, file based approach and database approach, basic principles of
database design systems using different database models and use of database system in the real
world are discussed.
After completing this chapter, the students will be able to:
 Understand what database is, database system and DBMS
 Differentiate database system from file system
 Identify the pros and cons of manual approach, file based approach and database approach
 Understand the basic principles of database design systems using different database models
 Appreciate the use of database system in the real world.
 Design different types of databases

Activity 1.1
 Define database systems?
 Discuss the applications of database management system (DBMS)?

Database systems are designed to manage large data set in an organization. The data management
involves both definition and the manipulation of the data which ranges from simple representation
of the data to considerations of structures for the storage of information. The data management
also consider the provision of mechanisms for the manipulation of information.

Today, Databases are essential to every business. They are used to maintain internal records, to
present data to customers and clients on the World-Wide- Web, and to support many other
commercial processes. Databases are likewise found at the core of many modern organizations.

The power of databases comes from a body of knowledge and technology that has developed over
several decades and is embodied in specialized software called a database management system, or
DBMS. A DBMS is a powerful tool for creating and managing large amounts of data efficiently
and allowing it to persist over long periods of time, safely. These systems are among the most
complex types of software available.

1
Thus, for our question: What is a database? In essence a database is nothing more than a
collection of shared information that exists over a long period of time, often many years. In
common dialect, the term database refers to a collection of data that is managed by a DBMS.

Data management passes through the different levels of development along with the development
in technology and services. These levels could best be described by categorizing the levels into
three levels of development. Even though there is an advantage and a problem overcome at each
new level, all methods of data handling are in use to some extent. The major three levels are;
Manual Approach, Traditional File Based Approach and Database Approach.

1.2. Traditional File Based Approach

After the introduction of Computers for data processing to the business community, the need to
use the device for data storage and processing increase. There were, and still are, several computer
applications with file based processing used for the purpose of data handling. Even though the
approach evolved over time, the basic structure is still similar if not identical.

 File based systems were an early attempt to computerize the manual filing system.
 This approach is the decentralized computerized data handling method.
 A collection of application programs perform services for the end-users. In such systems,
every application program that provides service to end users define and manage its own
data
 Such systems have number of programs for each of the different applications in the
organization.
 Since every application defines and manages its own data, the system is subjected to serious
data duplication problem.
 File, in traditional file based approach, is a collection of records which contains logically
related data.

Limitations of the Traditional File Based approach

As business application become more complex demanding more flexible and reliable data handling
methods, the shortcomings of the file based system became evident. These shortcomings include,
but not limited to:

 Separation or Isolation of Data: Available information in one application may not be


known. Data Synchronization is done manually.
 Limited data sharing- every application maintains its own data.
 Lengthy development and maintenance time
 Duplication or redundancy of data (money and time cost and loss of data integrity).

2
 Data dependency on the application- data structure is embedded in the application; hence,
a change in the data structure needs to change the application as well.
 Incompatible file formats or data structures (e.g. “C” and COBOL) between different
applications and programs creating inconsistency and difficulty to process jointly.
 Fixed query processing which is defined during application development. The limitations
for the traditional file based data handling approach arise from two basic reasons.
1. Definition of the data is embedded in the application program which makes it difficult
to modify the database definition easily.
2. No control over the access and manipulation of the data beyond that imposed by the
application programs.
The most significant problem experienced by the traditional file based approach of data handling
can be formalized by what is called “update anomalies”. We have three types of update
anomalies;

1. Modification Anomalies: a problem experienced when one ore more data value is
modified on one application program but not on others containing the same data set.
2. Deletion Anomalies: a problem encountered where one record set is deleted from one
application but remain untouched in other application programs.
3. Insertion Anomalies: a problem experienced whenever there is new data item to be
recorded, and the recording is not made in all the applications. And when same data item
is inserted at different applications, there could be errors in encoding which makes the
new data item to be considered as a totally different object.

1.3. Database Approach


Following a famous paper written by Dr. Edgard Frank Codd in 1970, database systems changed
significantly. Codd proposed that database systems should present the user with a view of data
organized as tables called relations. Behind the scenes, there might be a complex data structure
that allowed rapid response to a variety of queries. But, unlike the user of earlier database systems,
the user of a relational system would not be concerned with the storage structure. Queries
could be expressed in a very high-level language, which greatly increased the efficiency of
database programmers. The database approach emphasizes the integration and sharing of data
throughout the organization.

Thus in Database Approach:


 Database is just a computerized record keeping system or a kind of electronic filing cabinet.
 Database is a repository for collection of computerized data files.
 Database is a shared collection of logically related data and description of data designed to
meet the information needs of an organization. Since it is a shared corporate resource, the
3
database is integrated with minimum amount of or no duplication.
 Unlike the traditional file based approach in database approach there is program data
independence. That is the separation of the data definition from the application. Thus the
application is not affected by changes made in the data structure and file organization.
 Each database application will perform the combination of: Creating database, Reading,
Updating and Deleting data.

Benefits of the database approach


 Data can be shared: two or more users can access and use same data instead of storing data in redundant
manner for each user.

 Improved accessibility of data: by using structured query languages, the users can easily access data
without programming experience.

 Redundancy can be reduced: isolated data is integrated in database to decrease the redundant data stored
at different applications.

 Quality data can be maintained: the different integrity constraints in the database approach will
maintain the quality leading to better decision making

 Inconsistency can be avoided: controlled data redundancy will avoid inconsistency of the data in the
database to some extent.

 Transaction support can be provided: basic demands of any transaction support systems are implanted
in a full scale DBMS.

 Integrity can be maintained: data at different applications will be integrated together with additional
constraints to facilitate validity and consistency of shared data resource.

 Security measures can be enforced: the shared data can be secured by having different levels of
clearance and other data security mechanisms.

 Improved decision support: the database will provide information useful for decision making.
 Standards can be enforced: the different ways of using and dealing with data by different unite of an
organization can be balanced and standardized by using database approach.

 Compactness: since it is an electronic data handling method, the data is stored compactly (no
voluminous papers).

 Speed: data storage and retrieval is fast as it will be using the modern fast computer systems.
 Less labour: unlike the other data handling methods, data maintenance will not demand much resource.
 Centralized information control: since relevant data in the organization will be stored at one repository,
it can be controlled and managed at the central level.

4
1.4. Users and Actors of the Database
As people are one of the components in DBMS environment, there are group of roles played by different
stakeholders of the designing and operation of a database system such as database administrator (DBA),
database designer (DBD), application programmers, systems analysts and end users.

1. Database Administrator (DBA)


• Responsible to oversee, control and manage the database resources (the database itself, the
DBMS and other related software)
• Authorizing access to the database
• Coordinating and monitoring the use of the database
• Responsible for determining and acquiring hardware and software resources
• Accountable for problems like poor security, poor performance of the system
• Involves in all steps of database development
• We can have further classifications of this role in big organizations having huge amount of data
and user requirement.
a. Data Administrator (DA): is responsible on management of data resources. This
involves in database planning, development, maintenance of standards policies and
procedures at the conceptual and logical design phases.
b. Database Administrator (DBA): This is more technically oriented role. DBA is
responsible for the physical realization of the database. It is involved in physical
design, implementation, security and integrity control of the database.
2. Database Designer (DBD)
• Identifies the data to be stored and choose the appropriate structures to represent and store
the data.
• Should understand the user requirement and should choose how the user views the database.
• Involve on the design phase before the implementation of the database system.
• We have two distinctions of database designers, one involving in the logical and conceptual design and
another involving in physical design.
a. Logical and Conceptual DBD
- Identifies data (entity, attributes and relationship) relevant to the organization
- Identifies constraints on each data
- Understand data and business rules in the organization
- Sees the database independent of any data model at conceptual level and consider
one specific data model at logical design phase.
b. Physical DBD
- Take logical design specification as input and decide how it should be physically
realized.
- Map the logical data model on the specified DBMS with respect to tables and
integrity constraints. (DBMS dependent designing)

5
- Select specific storage structure and access path to the database
- Design security measures required on the database
3. Application Programmer and Systems Analyst
• System analyst determines the user requirement and how the user wants to view the database.
• The application programmer implements these specifications as programs; code, test, debug,
document and maintain the application program.
• The application programmer determines the interface on how to retrieve, insert, update and
delete data in the database.
• The application could use any high level programming language according to the availability,
the facility and the required service.
4. End Users
• Workers, whose job requires accessing the database frequently for various purposes, there are different
group of users in this category.
a. Naïve Users:
- Sizable proportion of users
- Unaware of the DBMS
- Only access the database based on their access level and demand
- Use standard and pre-specified types of queries.
b. Sophisticated Users
- Users familiar with the structure of the Database and facilities of the DBMS.
- Have complex requirements
- Have higher level queries
- Are most of the time engineers, scientists, business analysts, etc
c. Casual Users
- Users who access the database occasionally.
- Need different information from the database each time.
- Use sophisticated database queries to satisfy their needs.
- They Are most of the time middle to high level managers.
These users can be again classified as “Actors on the Scene” and “Workers Behind the
Scene”.

Actors on the Scene:


 Data Administrator
 Database Administrator
 Database Designer
 End Users

6
Workers behind the Scene:

 DBMS designers and implementers: who design and implement different DBMS
software.

 Tool Developers: experts who develop software packages that facilitates database system
designing and use. Prototype, simulation, code generator developers could be an
example. Independent software vendors could also be categorized in this group.

 Operators and Maintenance Personnel: system administrators who are responsible for
actually running and maintaining the hardware and software of the database system and
the information technology facilities.

1.5. Chapter One Review Questions

1. Define the following terms: data, database, DBMS, database system?


2. What four main types of actions involve databases? Briefly discuss each.
3. Discuss the main characteristics of the database approach and how it differs from traditional
file systems?
4. What are the responsibilities of the DBA and the database designers?
5. What are the different types of database end users? Discuss the main activities of each.
6. Discuss the capabilities that should be provided by a DBMS.

7
Chapter Two
Database System Architecture
2.1. Introduction

In this chapter database schema, instance, database management system (DBMS), basic principles
of database design systems using different database models and use of database system in the real
world are discussed.
After completing this chapter, the students will be able to:
 Understand what schema, instance and database state is.
 Differentiate the database models.
 Understand architecture and data independence.
 Recognize the database languages and interfaces.
 Understand the database system environment.
 Recognize the classification of DBMS.

Activity 2.1
 Define the database schema, instance and state?
 Discuss the overview of data models?

2.2. Schemas, Instances and Database State


When a database is designed using a Relational data model, all the data is represented in a form of
a table. In such definitions and representation, there are two basic components of the database. The
two components are the definition of the Relation or the Table and the actual data stored in each
table. The data definition is what we call the Schema or the skeleton of the database and the
Relations with some information at some point in time is the Instance or the flesh of the database.

Schemas
 Schema describes how data is to be structured, defined at setup/Design time (also called
"metadata")
 Since it is used during the database development phase, there is rare tendency of changing
the schema unless there is a need for system maintenance which demands change to the
definition of a relation.
 Database Schema (Intension): specifies name of relation and the collection of the
attributes (specifically the Name of attributes).

8
• refer to a description of database (or intension)
• specified during database design
• should not be changed unless during maintenance
 Schema Diagrams
• convention to display some aspect of a schema visually
 Schema Construct
• refers to each object in the schema (e.g. STUDENT) E.g.: STUNEDT
(FName,LName,Id,Year,Dept, Sex)

Instances
 Instance: is the collection of data in the database at a particular point of time (snap-shot).
• Also called State or Snap Shot or Extension of the database.
• Refers to the actual data in the database at a specific point in time.
• State of database is changed any time we add, delete or update an item.
• Valid state: the state that satisfies the structure and constraints specified in the schema
and is enforced by DBMS.
 Since Instance is actual data of database at some point in time, changes rapidly.
 To define a new database, we specify its database schema to the DBMS (database is
empty)
Database is initialized when we first load it with data.

2.3. Data Model


A specific DBMS has its own specific Data Definition Language to define a database schema, but
this type of language is too low level to describe the data requirements of an organization in a way
that is readily understandable by a variety of users. We need a higher-level language. Such a higher-
level description of the database schema is called data-model.

Data Model is a set of concepts to describe the structure of a database, and certain constraints that
the database should obey. A data model is a description of the way that data is stored in a database.
Data model helps to understand the relationship between entities and to create the most effective
structure to hold data. Data Model is a collection of tools or concepts for describing: Data, Data
relationships, Data semantics, and Data constraints. The main purpose of Data Model is
to represent the data in an understandable way.

9
Categories of data models include:

 Record-based model
 Object-based model
 Physical model

Record-based Data Models


Consist of a number of fixed format records. Each record type defines a fixed number of fields,
each field is typically of a fixed length.

• Hierarchical Data Model


• Network Data Model
• Relational Data Model

A) Hierarchical Model
 The simplest data model
 Record type is referred to as node or segment
 The top node is the root node
 Nodes are arranged in a hierarchical structure as sort of upside- down tree
 A parent node can have more than one child node
 A child node can only have one parent node
 The relationship between parent and child is one-to-many
 Relation is established by creating physical link between stored records (each is stored
with a predefined access path to other records)
 To add new record type or relationship, the database must be redefined
and then stored in a new form.

Department

Employee Job

Time Card Activity

10
ADVANTAGES of Hierarchical Data Model:
• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically organized domains
• e.g., assemblies in manufacturing, personnel organization in companies
• Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT
WITHIN PARENT etc.
DISADVANTAGES of Hierarchical Data Model:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"

B) Network Model

 Allows record types to have more than one parent unlike hierarchical
model
 A network data models sees records as set members
 Each set has an owner and one or more member
 Allow no many to many relationship between entities
 Like hierarchical model network model is a collection of physically linked records.
 Allow member records to have more than one owner

Department Job

Employee
Activity

Time Card

ADVANTAGES of Network Data Model:


• Network Model is able to model complex relationships and represents semantics of add/delete
on the relationships.
• Can handle most situations for modeling using record types and relationship types.
• Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND
NEXT within set, GET etc. Programmers can do optimal navigation through the database.

11
DISADVANTAGES of Network Data Model:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through aset of records.
• Little scope for automated "query optimization”

C) Relational Data Mode

• Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A Relational Model for Large
Shared Data Banks')
• Terminologies originates from the branch of mathematics called set theory and predicate logic
and is based on the mathematical concept called Relation
• Can define more flexible and complex relationship
• Viewed as a collection of tables called “Relations” equivalent to collection of
record types
• Relation: Two dimensional table
• Stores information or data in the form of tables: rows and columns
• A row of the table is called tuple: equivalent to record
• A column of a table is called attribute: equivalent to fields
• Data value is the value of the Attribute
• Records are related by the data stored jointly in the fields of records in two tables or files. The
related tables contain information that creates the relation
• The tables seem to be independent but are related some how.
• No physical consideration of the storage is required by the user
• Many tables are merged together to come up with a new virtual view of the relationship

Alternative terminologies
Relation Table File
Tuple Row Record
Attribute Column Field

• The rows represent records (collections of information about separate items)


• The rows represent records (collections of information about separate items)
• The columns represent fields (particular attributes of a record)
• Conducts searches by using data in specified columns of one table to find additional data in
another table
• In conducting searches, a relational database matches information from a field in one table
with information

12
In a corresponding field of another table to produce a third table that combines requested data
from both tables

2.4. Three-Schema Architecture


The three-level database architecture is also called ANSI-SPARC Architecture.
 All users should be able to access same data. This is important sincethe database is having
a shared data feature where all the data is stored in one location and all users will have
their own customized way of interacting with the data.
 Users should not need to know physical database storage details. A s t here are naïve
users of the system, hardware level or physical details should be a black-box for such
users.
 DBA should be able to change database storage structures without affecting the users'
views. A change in file organization, access method should not affect the structure of the
data which in turn will have no effect on the users.
 Internal structure of database should be unaffected by changes to physical aspects of
storage, such as change of hard disk

All of the above and much more functionalities are possible due to the three level ANSI-
SPARC architecture.

Fig: Three-level ANSI-SPARC Architecture of a Database

1. External Level /Schema: Users' view of the database. It describes that part of database that is
relevant to a particular user. Different users have their own customized view of the database
independent of other users.

2. Conceptual Level /Schema l: Community view of the database. Describes what data is stored
in database and relationships among the data along with the business constraints.

13
3. Internal Level /Schema: Physical representation of the database on the computer.
Describes how the data is stored in the database.

2.5. Data Independence

i. Logical Data Independence:


• Refers to immunity of external schemas to changes in conceptual schema.
• Conceptual schema changes e.g. addition/removal of entities should not require changes to
external schema or rewrites of application programs.
• The capacity to change the conceptual schema without having ot change the external schemas
and their application programs.

ii. Physical Data Independence


• The ability to modify the physical schema without changing logical schema
• Applications depend on the logical schema
• In general, the interfaces between the various levels and components should be well defined
so that changes in some parts do not seriously influence others.
• The capacity to change the internal schema without having o tchange the conceptual schema
• Refers to immunity of conceptual schema to changes in the internal schema
• Internal schema changes e.g. using different file organizations, storage structures/devices
should not require change to conceptual or external schemas.

Fig: Data Independence and the ANSI-SPARC Three-level Architecture

14
2.6. Database Languages

1) Data Definition Language (DDL)


• Allows DBA or user to describe and name entities, attributes andrelationships required for the
application.
• Specification notation for defining the database schema

2) Data Manipulation Language (DML)


• Provides basic data manipulation operations on data held in thedatabase.
• Language for accessing and manipulating the data organized b ythe appropriate data model
• DML also known as query language
a. Procedural DML: user specifies what data is required and how to get the data.
b. Non-Procedural DML: user specifies what data is required but not how it is to be
retrieved. SQL is the most widely used non-procedural query language.

3) Data Control Language (DCL)


• Allows a DBA to define access control and privileges for users.
• It is a mechanism for implementing security at a database object level.
• Uses the Grant and Revoke SQL Statements

4) Fourth Generation Language (4GL)


• Query Languages
• Forms Generators
• Report Generators
• Graphics Generators
• Application Generators

2.7. Database Interfaces

User-friendly interfaces provided by a DBMS may include the following:

a. Menu-Based Interfaces for Web Clients or Browsing. These interfaces present the user
with the lists of options (called menus) that lead the user through the formulation of a
request.
b. Forms-Based Interfaces. Forms-based interfaces display a form to each user. Users can
fill out all the form entries to insert new data, or they can fill out only certain entries, in
which case the DBMS will retrieve matching data for the remaining entries.
c. Graphical User Interfaces. A GUI typically displays a schema to the user in diagrammatic
form. The user then can specify a query by manipulating the diagram. In many cases, GUIs
utilize both menus and forms. Most GUI use a pointing device, such as a mouse, to select
certain parts of the displayed schema diagram.

15
d. Natural Language Interfaces. These interfaces accept requests written in English or some
other language and attempt to understand them.

2.8. Database System Environment


A DBMS is software package used to design, manage, and maintain databases. Each DBMS should
have facilities to define the database, manipulate the content of the database and control the
database. These facilities will help the designer, the user as well as the database administrator to
discharge their responsibility in designing, using and managing the database

Fig. General architecture of a DBMS


A DBMS is software provides the following facilities:

 Data Definition Language (DDL):


• Language used to define each data element required by the organization.
• Commands for setting up schema or the intension of database.
• These commands are used to setup a database, create, delete and alter table with the facility
of handling constraints.
• Is a core command used by end-users and programmers to store, retrieve, and access the data
in the database e.g. SQL
 Data Manipulation Language (DML):
• Is a core command used by end-users and programmers to store, retrieve, and access the data
in the database e.g. SQL

16
• Since the required data or Query by the user will be extracted
• ]using this type of language, it is also called "Query Language"
 Data Dictionary:
• Due to the fact that a database is a self describing system, this tool, Data Dictionary, is used to
store and organize information about the data stored in the database.
 Data Control Language:
• Database is a shared resource that demands control of data access and usage. The database
administrator should have the facility to control the overall operation of the system.
• Data Control Languages are commands that will help the Database Administrator to control
the database.
• The commands include grant or revoke privileges to access the database or particular object
within the database and to store or remove database transactions
The DBMS is software package that helps to design, manage, and use data using the database
approach. Taking a DBMS as a system, one can describe it with respect to it environment or other
systems interacting with the DBMS. The DBMS environment has five components. To design and
use a database, there will be the interaction or integration of Hardware, Software, Data, Procedure
and People.

1. Hardware: are components that one can touch and feel. These components are comprised
of various types of personal computers, mainframe or any server computers to be used in multi-
user system, network infrastructure, and other peripherals required in the system.
2. Software: are collection of commands and programs used to manipulate the hardware to
perform a function. These include components like the DBMS software, application programs,
operating systems, network software, language software and other relevant software.
3. Data: since the goal of any database system is to have better control of the data and making
data useful, Data is the most important component to the user of the database. There are two
categories of data in any database system: that is Operational and Metadata. Operational data is the
data actually stored in the system to be used by the user. Metadata is the data that is used to store
information about the database itself.
The structure of the data in the database is called the schema, which is composed of the
Entities, Properties of entities, and relationship between entities and business constraints.
4. Procedure: this is the rules and regulations on how to design and use a database. It includes
procedures like how to log on to the DBMS, how to use facilities, how to start and stop DBMS,
how to make backup, how to treat hardware and software failure, how to change the structure
of the database.
5. People: this component is composed of the people in the organization that are responsible or
play a role in designing, implementing, managing, administering and using the resources in the

17
database. This component includes group of people with high level of knowledge about the
database and the design technology to other with no knowledge of the system except using the
data in the database.

2.9. Database Development Life Cycle (DDLC)


As it is one component in most information system development tasks, there are several steps in
designing a database system. Here more emphasis is given to the design phases of the system
development life cycle. The major steps in database design are;

1. Planning: that is identifying information gap in an organization and propose a database


solution to solve the problem.
2. Analysis: that concentrates more on fact finding about the problem or the opportunity.
Feasibility analysis, requirement determination and structuring, and selection of best design
method are also performed at this phase.
3. Design: in database development more emphasis is given to this phase. The phase is further
divided into three sub-phases.
a. Conceptual Design: concise description of the data, data type, relationship between
data and constraints on the data.
• There is no implementation or physical detail consideration.
• Used to elicit and structure all information requirements
b. Logical Design: a higher level conceptual abstraction with selected specific data model
to implement the data structure.
• It is particular DBMS independent and with no other physical considerations.
c. Physical Design: physical implementation of the logical design of the database with
respect to internal storage and file structure of the database for the selected DBMS.
• To develop all technology and organizational specification.
4. Implementation: the testing and deployment of the designed database for use.
5. Operation and Support: administering and maintaining the operation of the database
system and providing support to users. Tuning the database operations for best performance.

18
2.10. Chapter Two Review Questions

1. Define the following terms: data model, database schema, database state?
2. Discuss the main categories of data models. What are the basic differences among the
relational model, the object model, and the XML model?
3. What is the difference between a database schema and a database state?
4. Describe the three-schema architecture. Why do we need mappings among schema levels?
How do different schema definition languages support this architecture?
5. What is the difference between logical data independence and physical data independence?
Which one is harder to achieve? Why?
6. What is the difference between procedural and nonprocedural DMLs?
7. What is the difference between the two-tier and three-tier client/server architectures?
8. What is the additional functionality incorporated in n-tier architecture (n > 3)?

19
Chapter Three
Database Modeling
3.1. Introduction

In this chapter database design, conceptual design, logical design, physical design, Entity
Relationship (ER) model, developing ER diagram, structural constraints on relationship, Problems
in ER Modeling, Enhanced E-R (EER) Models and Constraints on specialization, generalization,
relational data model, important terms of relational data model, properties of relational data model,
building blocks of relational data model and relational views are discussed.

After completing this chapter, the students will be able to:


 Understand the database design phases such as conceptual design, logical design, and
physical design.
 Understand how to develop Entity Relationship (ER) diagram.
 Identify Problems in ER Modeling such as Fab-trap and Chasm-trap.
 Understand an Enhanced E-R (EER) Models.
 Differentiate EER concepts such as Generalization, Specialization, Sub classes, and Super
classes.
 Understand the meaning of relation, attribute, domain, tuple, degree, cardinality and
relational database schema.
 Differentiate database schema, instances and database state.
 Identify the relationship types between different entities in the organization.
 Understand the basic principles of degree of relationship and cardinality among
relationships.
 Design a database using relational data model.

Activity 3.1
 Define the database design phases such as conceptual design, logical design, and
physical design?
 Discuss how to develop Entity Relationship (ER) diagram?

Database design is the process of coming up with different kinds of specification for the data to be
stored in the database. The database design part is one of the middle phases we have in information
systems development where the system uses a database approach. Design is the part on which we
would be engaged to describe how the data should be perceived at different levels and finally how
it is going to be stored in a computer system.

20
Information System with Database application consists of several tasks which include:

• Planning of Information systems Design


• Requirements Analysis,
• Design (Conceptual, Logical and Physical Design)
• Implementation
• Testing and deployment
• Operation and Support
From these different phases, the prime interest of a database system will be the Design part which
is again sub divided into other three sub-phases. These sub-phases are: Conceptual Design, Logical
Design, and Physical Design.

In general, one has to go back and forth between these tasks to refine a database design, and
decisions in one task can influence the choices in another task. In developing a good design, one
should answer such questions as:

 What are the relevant Entities for the Organization?


 What are the important features of each Entity?
 What are the important Relationships?
 What are the important queries from the user?
 What are the other requirements of the Organization and the Users?

3.2. The Three levels of Database Design

There are three sub-phases in database design that are: Conceptual Design, Logical Design, and
Physical Design.
1. Conceptual Database Design
 Conceptual design is the process of constructing a model of the information used in an
enterprise, independent of any physical considerations.
 It is the source of information for the logical design phase.
 Mostly uses an Entity Relationship Model to describe the data at this level.
 After the completion of Conceptual Design one has to go for refinement of the schema,
which is verification of Entities, Attributes, and Relationships
2. Logical Database Design
 Logical design is the process of constructing a model of the information used in an
enterprise based on a specific data model (e.g. relational, hierarchical or network or
object), but independent of a particular DBMS and other physical considerations.

21
 Normalization process
 Collection of Rules to be maintained
 Discover new entities in the process
 Revise attributes based on the rules and the discovered Entities
3. Physical Database Design
 Physical design is the process of producing a description of the implementation of the
database on secondary storage. -- defines specific storage or access methods used by
database
 Describes the storage structures and access methods used to achieve efficient access to the
data.
 Tailored to a specific DBMS system -- Characteristics are function of DBMS and
operating systems.
 Includes estimate of storage space

3.3. Conceptual Database Design

Conceptual design revolves around discovering and analyzing organizational and user data
requirements. The important activities are to identify: Entities, Attributes, Relationships, and
Constraints. And based on these components develop the ER model using ER diagrams.

The Entity Relationship (E-R) Model


Entity-Relationship modeling is used to represent conceptual view of the database. The main
components of ER Modeling are:

 Entities
 Corresponds to entire table, not row
 Represented by Rectangle
 Attributes
 Represents the property used to describe an entity or a relationship
 Represented by Oval
 Relationships
 Represents the association that exist between entities
 Represented by Diamond

22
 Constraints
 Represent the constraint in the data
 Cardinality and Participation Constraints
Before working on the conceptual design of the database, one has to know and answer the
following basic questions.

• What are the entities and relationships in the enterprise?


• What information about these entities and relationships should we store in the database?
• What are the integrity constraints that hold? Constraints on each data with respect to
update, retrieval and store.
• Represent this information pictorially in ER diagrams, then map ER diagram into a
relational schema.

3.3. Developing an E-R Diagram

Designing conceptual model for the database is not a one linear process but an iterative activity
where the design is refined again and again.

 To identify the entities, attributes, relationships, and constraints on the data, there are different
set of methods used during the analysis phase.
 These include information gathered by…
• Interviewing end users individually and in a group
• Questionnaire survey
• Direct observation
• Examining different documents
 Analysis of requirements gathered
• Nouns  prospective entities
• Adjectives  prospective attributes
• Verbs/verb phrases  prospective relationships
 The basic E-R model is graphically depicted and presented for review.
 The process is repeated until the end users and designers agree that the E-R diagram is a fair
representation of the organization’s activities and functions.
 Checking for Redundant Relationships in the ER Diagram. Relationships between entities
indicate access from one entity to another - it is therefore possible to access one entity
occurrence from another entity occurrence even if there are other entities and relationships that
separate them - this is often referred to as Navigation' of the ER diagram
23
 The last phase in ER modeling is validating an ER Model against requirement of the user.

Graphical Representations in ER Diagramming


 Entity is represented by a RECTANGLE containing the name of the entity.

Strong Entity Weak Entity

 Connected entities are called relationship participants


 Attributes are represented by OVALS and are connected to the entity by a line

 A derived attribute is indicated by a DOTTED LINE. (……)

 PRIMARY KEYS are underlined.

Key

 Relationships are represented by DIAMOND shaped symbols

• Weak Relationship is a relationship between Weak and Strong Entities.


• Strong Relationship is a relationship between two strong Entities.

Example: Build an ER Diagram for the following information:


 A student record management system will have the following two basic data object
categories with their own features or properties: Students will have an Id, Name, Dept, Age,
GPA and Course will have an Id, Name, Credit Hours.

24
 Whenever a student enroll in a course in a specific Academic Year and Semester, the Student
will have a grade for the course.

Name Dept DoB Id Name Credit

Id Gpa
Students Course

Age

Enrolled_In Semester
Academic
Year

Grade

3.4. Structural Constraints on Relationship

Multiplicity constraint is the number or range of possible occurrence of an entity type/relation that
may relate to a single occurrence/tuple of an entity type/relation through a particular relationship.
Mostly used to insure appropriate enterprise constraints.

One-to-one relationship:
 A customer is associated with at most one loan via the relationship borrower
 A loan is associated with at most one customer via borrower

25
E.g.: Relationship Manages between STAFF and BRANCH
 The multiplicity of the relationship is:
• One branch can only have one manager
• One employee could manage either one or no branches

One-To-Many Relationships
 In the one-to-many relationship a loan is associated with at most one customer via borrower,
a customer is associated with several (including 0) loans via borrower.

Chapter Two Review Questions (Conceptual modeling)


3.1 Discuss the role of a high-level data model in the database design process.
3.2. List the various cases where use of a NULL value would be appropriate.
E.g.: Relationship Leads between STAFF and PROJECT The multiplicity of the relationship
 One staff may Lead one or more project(s)
 One project is Lead by one staff

Many-To-Many Relationship
 A customer is associated with several (possibly 0) loans via borrower.
 A loan is associated with several (possibly 0) customers via borrower.

26
E.g.: Relationship “Teaches” between INSTRUCTOR and COURSE The multiplicity of the
relationship
 One Instructor Teaches one or more Course(s).
 One Course Thought by Zero or more Instructor(s)

Participation of an Entity Set in a Relationship Set


Participation constraint of a relationship is involved in identifying and setting the mandatory or
optional feature of an entity occurrence to take a role in a relationship. There are two distinct
participation constraints with this respect, namely: Total Participation and Partial Participation

 Total participation: every tuple in the entity or relation participates in at least one relationship
by taking a role. This means, every tuple in a relation will be attached with at least one other
tuple. The entity with total participation in a relationship will be connected to the relationship
using a double line.

 Partial participation: some tuple in the entity or relation may not participate in the
relationship. This means, there is at least one tuple from that Relation not taking any role in
that specific relationship. The entity with partial participation in a relationship will be
connected to the relationship using a single line.

E.g. 1: Participation of EMPLOYEE in “belongs to” relationship with DEPARTMENT is total


since every employee should belong to a department.
Participation of DEPARTMENT in “belongs to” relationship with EMPLOYEE is total since
every department should have more than one employee.

E.g. 2: Participation of EMPLOYEE in “manages” relationship with DEPARTMENT, is partial


participation since not all employees are managers.

27
Participation of DEPARTMENT in “Manages” relationship with EMPLOYEE is total since every
department should have a manager.

3.5. Problem in ER Modeling

The Entity-Relationship Model is a conceptual data model that views the real world as consisting
of entities and relationships. The model visually represents these concepts by the Entity-
Relationship diagram. The basic constructs of the ER model are entities, relationships, and
attributes. Entities are concepts, real or abstract, about which information is collected.
Relationships are associations between the entities. Attributes are properties which describe the
entities.
While designing the ER model one could face a problem on the design which is called a connection
traps. Connection traps are problems arising from misinterpreting certain relationships
There are two types of connection traps;

1. Fan trap:

Occurs where a model represents a relationship between entity types, but the pathway between
certain entity occurrences is ambiguous.
May exist where two or more one-to-many (1:M) relationships fan out from an entity. The problem
could be avoided by restructuring the model so that there would be no 1:M relationships fanning
out from a singe entity and all the semantics of the relationship is preserved.
Example:

Semantics description of the problem;

28
Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6 Emp6 working in Branch 1
(Br1)? Thus from this ER Model one cannot tell which car is used by which staff since a branch can
have more than one car and also a branch is populated by more than one employee. Thus we need to
restructure the model to avoid the connection trap.
To avoid the Fan Trap problem we can go for restructuring of the E-R Model. This will result in
the following E-R Model.

Semantics description of the problem;

Car1
Br1 Emp1
Car2
Br2 Emp2
Car3
Br3 Emp3
Car4
Emp4
Br4 Car5
Emp5
Car6
Emp6
Car7
Emp7

2. Chasm Trap:

Occurs where a model suggests the existence of a relationship between entity types, but the path
way does not exist between certain entity occurrences.
Chasm trap may exist when there are one or more relationships with a minimum multiplicity on
cardinality of zero forming part of the pathway between related entities.
Example:

If we have a set of projects that are not active currently then we can not assign a project manager
for these projects. So there are project with no project manager making the participation to have
a minimum value of zero.

29
Problem:
How can we identify which BRANCH is responsible for which PROJECT? We know that
whether the PROJECT is active or not there is a responsible BRANCH. But which branch is a
question to be answered, and since we have a minimum participation of zero between employee
and PROJECT we can’t identify the BRANCH responsible for each PROJECT.
The solution for this Chasm Trap problem is to add another relationship between the extreme
entities (BRANCH and PROJECT).

3.6. Enhanced E-R (EER) Models

 Object-oriented extensions to E-R model


 EER is important when we have a relationship between two entities and the participation is
partial between entity occurrences. In such cases EER is used to reduce the complexity in
participation and relationship complexity.
 ER diagrams consider entity types to be primitive objects
 EER diagrams allow refinements within the structures of entity types

 EER Concepts:
• Generalization
• Specialization
• Sub classes
• Super classes
• Attribute Inheritance
• Constraints on specialization and generalization

i. Generalization

 Generalization occurs when two or more entities represent categories of the same real-world
object.
 Generalization is the process of defining a more general entity type from a set of more
specialized entity types.
 A generalization hierarchy is a form of abstraction that specifies that two or more entities that

30
share common attributes can be generalized into a higher level entity type.
 Is considered as bottom-up definition of entities.
 Generalization hierarchy depicts relationship between higher level superclass and lower level
subclass.
 Generalization hierarchies can be nested. That is, a subtype of one hierarchy can be a supertype
of another. The level of nesting is limited only by the constraint of simplicity.
 Example: Account is a generalized form for saving and Current Accounts

ii. Specialization

 Is the result of subset of a higher level entity set to form a lower level entity set.
 The specialized entities will have additional set of attributes (distinguishing characteristics)
that distinguish them from the generalized entity.
 Is considered as Top-Down definition of entities.
 Specialization process is the inverse of the Generalization process. Identify the distinguishing
features of some entity occurrences, and specialize them into different subclasses.
 Reasons for Specialization
• Attributes only partially applying to superclasses
• Relationship types only partially applicable to the superclass
• In many cases, an entity type has numerous sub-groupings of its entities that are
meaningful and need to be represented explicitly. This need requires the representation
of each subgroup in the ER model. The generalized entity is a superclass and the set of
specialized entities will be subclasses for that specific Superclass.
 Example: Saving Accounts and Current Accounts are Specialized entities for the generalized
entity Accounts. Manager, Sales, Secretary: are specialized employees.

31
iii. Subclass/Subtype
 An entity type whose tuples have attributes that distinguish its members from tuples of the
generalized or Superclass entities.
 When one generalized Superclass has various subgroups with distinguishing features and these
subgroups are represented by specialized form, the groups are called subclasses.
 Subclasses can be either mutually exclusive (disjoint) or overlapping (inclusive).
 A single subclass may inherit attributes from two distinct superclasses.
 A mutually exclusive category/subclass is when an entity instance can be in only one of the
subclasses.
 E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER but not both.
 An overlapping category/subclass is when an entity instance may be in two or more subclasses.
 E.g.: A PERSON who works for a university can be both EMPLOYEE and a STUDENT at
the same time.

iv. Superclass /Supertype


 An entity type whose tuples share common attributes. Attributes that are shared by all entity
occurrences (including the identifier) are associated with the supertype.
 Is the generalized entity.

Relationship Between Superclass and Subclass


 The relationship between a superclass and any of its subclasses is called a superclass/subclass
or class/subclass relationship.
 An instance can not only be a member of a subclass. i.e. Every instance of a subclass is also
an instance in the Superclass.
 A member of a subclass is represented as a distinct database object, a distinct record that is
related via the key attribute to its super-class entity.
 An entity cannot exist in the database merely by being a member of a subclass; it must also be
a member of the super- class.
 An entity occurrence of a sub class not necessarily should belong to any of the subclasses
unless there is full participation in the specialization.
 The relationship between a subclass and a Superclass is an “IS A” or “IS PART OF” type.
• Subclass IS PART OF Superclass
• Manager IS AN Employe
 All subclasses or specialized entity sets should be connected with the superclass using a line
to a circle where there is a subset symbol indicating the direction of subclass/superclass
relationship.

32
 We can also have subclasses of a subclass forming a hierarchy of specialization.
 Superclass attributes are shared by all subclasses of that superclass.
 Subclass attributes are unique for the subclass.

Attribute Inheritance
 An entity that is a member of a subclass inherits all the attributes of the entity as a member of
the superclass.
 The entity also inherits all the relationships in which the superclass participates.
 An entity may have more than one subclass categories.
 All entities/subclasses of a generalized entity or superclass share a common unique identifier
attribute (primary key). i.e. The primary key of the superclass and subclasses are always
identical.

 Consider the EMPLOYEE supertype entity shown above. This entity can have several different
subtype entities (for example: HOURLY and SALARIED), each with distinct properties not
shared by other subtypes. But whether the employee is HOURLY or SALARIED, same
attributes (EmployeeId, Name, and DateHired) are shared.
 The Supertype EMPLOYEE stores all properties that subclasses have in common. And

33
HOURLY employees have the unique attribute Wage (hourly wage rate), while SALARIED
employees have two unique attributes, StockOption and Salary.

3.7. Constraints on specialization and generalization

Completeness Constraint

 The Completeness Constraint addresses the issue of whether or not an occurrence of a


Superclass must also have a corresponding Subclass occurrence.
 The completeness constraint requires that all instances of the subtype be represented in the
supertype.
 The Total Specialization Rule specifies that an entity occurrence should at least be a member
of one of the subclasses. Total Participation of superclass instances on subclasses is
diagrammed with a double line from the Supertype to the circle as shown below.
 E.g.: If we have EXTENTION and REGULAR as subclasses of a superclass STUDENT, then
it is mandatory that each student to be either EXTENTION or REGULAR student. Thus the
participation of instances of STUDENT in EXTENTION and REGULAR subclasses will be
total.

 The Partial Specialization Rule specifies that it is not necessary for all entity occurrences in
the superclass to be a member of one of the subclasses. Here we have an optional participation
on the specialization. Partial Participation of superclass instances on subclasses is diagrammed
with a single line from the Supertype to the circle.
 E.g.: If we have MANAGER and SECRETARY as subclasses of a superclass EMPLOYEE,
then it is not the case that all employees are either manager or secretary. Thus the participation
of instances of employee in MANAGER and SECRETARY subclasses will be partial.

Disjointness Constraints
 Specifies the rule whether one entity occurrence can be a member of more than one subclasses.
i.e. it is a type of business rule that deals with the situation where an entity occurrence of a

34
Superclass may also have more than one Subclass occurrence.
 The Disjoint Rule restricts one entity occurrence of a superclass to be a member of only one
of the subclasses. Example: a EMPLOYEE can either be SALARIED or PART-TIMER, but
not the both at the same time.
 The Overlap Rule allows one entity occurrence to be a member f more than one subclass.
Example: EMPLOYEE working at the university can be both a STUDENT and an
EMPLOYEE at the same time.
 This is diagrammed by placing either the letter "d" for disjoint or "o" for overlapping inside
the circle on the Generalization Hierarchy portion of the E-R diagram.
 The two types of constraints on generalization and specialization (Disjointness and
Completeness constraints) are not dependent on one another. That is, being disjoint will not
favour whether the tuples in the superclass should have Total or Partial participation for that
specific specialization.
 From the two types of constraints we can have four possible constraints
• Disjoint AND Total
• Disjoint AND Partial
• Overlapping AND Total
• Overlapping AND Partial

3.8. Relational Database Model

Activity 3.2
 Define relation, entity attribute, tuple, relationship?
 Discuss the building blocks of a relational data model?

Important terms in relational data model


• Relation: a table with rows and columns.
• Attribute: a named column of a relation.
• Domain: a set of allowable values for one or more attributes.
• Tuple: a row of a relation.
• Degree: the degree of a relation is the number of attributes it contains Unary relation, Binary
relation, Ternary relation, N-ary relation.
• Cardinality: of a relation is the number of tuples the relation has Relational Database: a
collection of normalized relations with distinct relation names.
• Relation Schema: a named relation defined by a set of attribute-domain name pair
Let A1, A2...........An be attributes with domain D1, D2 ………,Dn. Then the sets {A1:D1,

35
A2:D2… An:Dn} is a Relation Schema.
A relation R, defined by a relation schema S, is a set of mappings from attribute names to their
corresponding domains. Thus a relation is a set of n- tuples of the form (A1:d1, A2:d2 ,…, An:dn)
where d1 є D1, d2 є D2, dn є Dn,

Eg: Student (studentId char(10), studentName char(50), DOB date) is a relation schema for the
student entity in SQL

Relational Database schema: a set of relation schema each with distinct names. Suppose R1,
R2,……, Rn is the set of relation schema in a relational database then the relational database schema (R)
can be stated as R={ R1 , R2 ,…, Rn}

Properties of Relational Databases


A relation has a name that is distinct from all other relation names in the relational schema.

- Each tuple in a relation must be unique.


- All tables are LOGICAL ENTITIES.
- Each cell of a relation contains exactly one atomic (single) value.
- Each column (field or attribute) has a distinct name.
- The values of an attribute are all from the same domain.
- A table is either a BASE TABLES (Named Relations) or VIEWS (Unnamed Relations).
- Only Base Tables are physically stored.
- VIEWS are derived from BASE TABLES with SQL statements like: [SELECT .. FROM
.. WHERE .. ORDER BY]
- Relational database is the collection of tables
- Each entity in one table
- Attributes are fields (columns) in table
- Relational database is the collection of tables.
- Entries with repeating groups are said to be un-normalized.
- All values in a column represent the same attribute and have the same data
format.

3.9. Building Blocks of the Relational Data Model


The building blocks of the relational data model are:

• Entities: real world physical or logical object.


• Attributes: properties used to describe each Entity or real world object.

36
• Relationship: the association between Entities
• Constraints: rules that should be obeyed while manipulating the data.

1. The ENTITIES - (persons, places, things etc.) which the organization has to deal with.
Relations can also describe relationships.
• The name given to an entity should always be a singular noun descriptive of each item to be
stored in it. E.g. : student NOT students.
• Every relation has a schema, which describes the columns, or fields the relation itself
corresponds to our familiar notion of a table:
• A relation is a collection of tuples, each of which contains values for a fixed number of
attributes
• Existence Dependency: the dependence of an entity on the existence of one or
more entities.
• Weak entity : an entity that can not exist without the entity with which it has a
relationship – it is indicated by a double rectangle
2. The ATTRIBUTES - the items of information which characterize and describe these entities.
• Attributes are pieces of information ABOUT entities. The analysis must of course identify
those which are actually relevant to the proposed application. Attributes will give rise to
recorded items of data in the database
• At this level we need to know such things as:
• Attribute name (be explanatory words or phrases)
• The domain from which attribute values are taken (A DOMAIN is a set of values
from which attribute values may be taken.) Each attribute has values taken from a
domain. For example, the domain of Name is string and that for salary is real.
How ever these are not shown on E-R models.
• Whether the attribute is part of the entity identifier (attributes which just describe
an entity and those which help to identify it uniquely)
• Whether it is permanent or time-varying (which attributes may change their values
over time)
• Whether it is required or optional for the entity (whose values will sometimes be
unknown or irrelevant)
• Types of Attributes
• (1) Simple (atomic) Vs Composite attributes
• Simple: contains a single value (not divided into sub parts). E.g. Age, gender
• Composite: Divided into sub parts (composed of other attributes). E.g. Name,

37
address
• (2) Single-valued Vs multi-valued attributes
• Single-valued: have only single value(the value may change but has only one value
at one time). E.g. Name, Sex, Id. No. color_of_eyes
• Multi-Valued: have more than one value E.g. Address, dependent-name, Person
may have several college degrees
• (3) Stored vs. Derived Attribute
• Stored: not possible to derive or compute. E.g. Name, Address
• Derived: The value may be derived (computed) from the values of other attributes.
E.g. Age (current year – year of birth), Length of employment (current date- start
date) Profit (earning-cost), G.P.A (grade point/credit hours)
• (4) Null Values
• NULL applies to attributes which are not applicable or which do not have values.
• You may enter the value NA (meaning not applicable).
• Value of a key attribute cannot be null.
• Default value - assumed value if no explicit value

Entity versus Attributes


When designing the conceptual specification of the database, one should pay attention to the
distinction between an Entity and an Attribute.

• Consider designing a database of employees for an organization:


• Should address be an attribute of Employees or an entity (connected to Employees by a
relationship)?
• If we have several addresses per employee, address must be an entity (attributes
cannot be set-valued/multi valued)
• If the structure (city, Woreda, Kebele, etc) is important, e.g. want to retrieve employees
in a given city, address must be modeled as an entity (attribute values are atomic)

3. The RELATIONSHIPS between entities which exist and must be taken into account when
processing information. In any business processing one object may be associated with another
object due to some event. Such kind of association is what we call a RELATIONSHIP between
entity objects.
• One external event or process may affect several related entities.
• Related entities require setting of LINKS from one part of the database to another.
• A relationship should be named by a word or phrase which explains its function.

38
• Role names are different from the names of entities forming the relationship: one entity
may take on many roles, the same role may be played by different entities.
• For each RELATIONSHIP, one can talk about the Number of Entities and the Number of
Tuples participating in the association. These two concepts are called DEGREE and
CARDINALITY of a relationship respectively.

Degree of a Relationship
An important point about a relationship is how many entities participate in it. The number of
entities participating in a relationship is called the DEGREE of the relationship.
Among the Degrees of relationship, the following are the basic:

• UNARY/RECURSIVE RELATIONSHIP: Tuples/records of a Single entity are related


withy each other.
• BINARY RELATIONSHIPS: Tuples/records of two entities are associated in a
relationship
• TERNARY RELATIONSHIP: Tuples/records of three different entities are associated
And a generalized one:
• N-ARY RELATIONSHIP: Tuples from arbitrary number of entity sets are
participating in a relationship.

Cardinality of a Relationship
Another important concept about relationship is the number of instances/tuples that can be
associated with a single instance from one entity in a single relationship. The number of instances
participating or associated with a single instance from an entity in a relationship is called the
CARDINALITY of the relationship. The major cardinalities of a relationship are:

• ONE-TO-ONE: one tuple is associated with only one other tuple. E.g. Building –
as a single building will be located in a single location and as a single location will only
accommodate a single Building.
• ONE-TO-MANY, one tuple can be associated with many other tuples, but not the reverse. E.g.
Department-Stud
• MANY-TO-ONE, many tuples are associated with one tuple but not the reverse. E.g.
Employee – Department: as many employees belong to a single department.
• MANY-TO-MANY: one tuple is associated with many other tuples and from the other side,
with a different role name one tuple will be associated with many tuples. E.g. Student – Course
→ as a student can take many courses and a single course can be attended by many students.
However, the degree and cardinality of a relation are different from degree and cardinality of a
relationship.

39
3.13. Key constraints
If tuples are need to be unique in the database, and then we need to make each tuple distinct. To
do this we need to have relational keys that uniquely identify each record.

• Super Key: an attribute or set of attributes that uniquely identifies a tuple within a relation.
• Candidate Key: a super key such that no proper subset of that collection is a Super Key
within the relation. A candidate key has two properties: Uniqueness and Irreducibility.
 If a super key is having only one attribute, it is automatically a Candidate key.
 If a candidate key consists of more than one attribute it is called Composite Key.
• Primary Key: the candidate key that is selected to identify tuples uniquely within the
relation.
 The entire set of attributes in a relation can be considered as a primary key in a worst case.
• Foreign Key: an attribute, or set of attributes, within one relation that matches the candidate
key of some relation.
 A foreign key is a link between different relations to create a view or an unnamed relation

Relational Constraints/Integrity Rules


 Domain Integrity: No value of the attribute should be beyond the allowable limits.
 Entity Integrity: In a base relation, no attribute of a Primary Key can assume a value of
NULL.
 Referential Integrity: If a Foreign Key exists in a relation, either the Foreign Key value
must match a Candidate Key value in its home relation or the Foreign Key value must be
NULL.
 Enterprise Integrity: Additional rules specified by the users or database administrators of a
database are incorporated.

3.14. Relational Views


Relations are perceived as a Table from the users’ perspective. Actually, there are two kinds of
relation in relational database. The two categories or types of Relations are Named and Unnamed
Relations. The basic difference is on how the relation is created, used and updated:

1. Base Relation
A Named Relation corresponding to an entity in the conceptual schema, whose tuples are
physically stored in the database.

2. View (Unnamed Relation)


A View is the dynamic result of one or more relational operations operating on the base relations
to produce another virtual relation that does not actually exist as presented. So a view is virtually

40
derived relation that does not necessarily exist in the database but can be produced upon request
by a particular user at the time of request. The virtual table or relation can be created from single
or different relations by extracting some attributes and records with or without conditions.

Purpose of a view
 Hides unnecessary information from users: since only part of the base relation (Some
collection of attributes, not necessarily all) are to be included in the virtual table.
 Provide powerful flexibility and security: since unnecessary information will be hidden
from the user there will be some sort of data security.
 Provide customized view of the database for users: each user is going to be interfaced with
their own preferred data set and format by making use of the Views.
 A view of one base relation can be updated.
 Update on views derived from various relations is not allowed since it may violate the
integrity of the database.
 Update on view with aggregation and summary is not allowed. Since aggregation and
summary results are computed from a base relation and does not exist actually.

3.15. Chapter Three Review Questions

4. Define the following terms: entity, attribute, attribute value, relationship instance, composite
attribute, multivalued attribute, derived attribute, complex attribute, and key attribute.
5. What is an entity type? What is an entity set? Explain the differences among an entity, an
entity type, and an entity set.
6. Explain the difference between an attribute and a value set.
7. What is a relationship type? Explain the differences among a relationship instance, a
relationship type, and a relationship set.
8. What is a participation role? When is it necessary to use role names in the description of
relationship types?
9. Describe the two alternatives for specifying structural constraints on relationship types. What
are the advantages and disadvantages of each?
10. Under what conditions can an attribute of a binary relationship type be migrated to become
an attribute of one of the participating entity types?
11. When we think of relationships as attributes, what are the value sets of these attributes? What
class of data models is based on this concept?
12. What is meant by a recursive relationship type? Give some examples of recursive
relationship types.
13. When is the concept of a weak entity used in data modeling? Define the terms owner entity
type, weak entity type, identifying relationship type, and partial key.
14. Can an identifying relationship of a weak entity type be of a degree greater than two? Give

41
examples to illustrate your answer.
15. Discuss the conventions for displaying an ER schema as an ER diagram.
16. Discuss the naming conventions used for ER schema diagrams.Define the following terms as
they apply to the relational model of data: domain, attribute, n-tuple, relation schema,
relation state, degree of a relation, database schema, and database state.
17. Why are tuples in a relation not ordered?
18. Why are duplicate tuples not allowed in a relation?
19. What is the difference between a key and a superkey?
20. Why do we designate one of the candidate keys of a relation to be the primary key?
21. Discuss the characteristics of relations that make them different from ordinary tables and
files.
22. Discuss the various reasons that lead to the occurrence of NULL values in relations.
23. Discuss the entity integrity and referential integrity constraints. Why each is considered
important?
24. Define foreign key. What is this concept used for?
25. What is a transaction? How does it differ from an Update operation?

42
Chapter Four
Functional Dependency and Normalization

4.1. Introduction

In this chapter logical database design, converting ER diagram to relational tables, normalization,
functional dependency, first normal form, second normal form, third normal form, and other forms
of normalizations are discussed.
After completing this chapter, the students will be able to:
 Understand the logical database design.
 Understand how to convert ER diagram to relational tables.
 Identify steps of normalization such as first normal form, second normal form, and third
normal form.
 Design logical database using relational data model.

Activity 4.1
 Define the logical database design?
 Discuss steps of normalization such as first normal form, second normal form, and
third normal form?

The whole purpose of the data base design is to create an accurate representation of the data, the
relationship between the data and the business constraints pertinent to that organization. Therefore,
one can use one or more technique to design a data base. One such a technique was the E-R model.
In this chapter we use another technique known as “Normalization” with a different emphasis to
the database design---- defines the structure of a database with a specific data model.
Logical design is the process of constructing a model of the information used in an enterprise based
on a specific data model (e.g. relational, hierarchical or network or object), but independent of a
particular DBMS and other physical considerations.
The focus in logical database design is the Normalization Process

 Normalization process
• Collection of Rules (Tests) to be applied on relations to obtain the minimal, non-
redundant set or attributes.
• Discover new entities in the process
• Revise attributes based on the rules and the discovered Entities
• Works by examining the relationship between attributes known as functional
dependency.

43
The purpose of normalization is to find the suitable set of relations that supports the data
requirements of an enterprise.
A suitable set of relations has the following characteristics;
 Minimal number of attributes to support the data requirements of the enterprise
 Attributes with close logical relationship (functional dependency) should be placed in the same
relation.
 Minimal redundancy with each attribute represented only once with the exception of the
attributes which form the whole or part of the foreign key, which are used for joining of related
tables.
The first step before applying the rules in relational data model is converting the conceptual design
to a form suitable for relational logical model, which is in a form of tables.

4.2. Converting ER Diagram to Relational Tables

Three basic rules to convert ER into tables or relations:


Rule 1: Entity Names will automatically be table names
Rule 2: Mapping of attributes: attributes will be columns of the respective tables.

 Atomic or single-valued or derived or stored attributes will be columns


 Composite attributes: the parent attribute will be ignored and the decomposed attributes (child
attributes) will be columns of the table.
 Multi-valued attributes: will be mapped to a new table where the primary key of the main table
will be posted for cross referencing.

Rule 3: Relationships: relationship will be mapped by using a foreign key attribute. Foreign key
is a primary or candidate key of one relation used to create association between tables.

 For a relationship with One-to-One Cardinality: post the primary or candidate key of one
of the table into the other as a foreign key. In cases where one entity is having partial
participation on the relationship, it is recommended to post the candidate key of the partial
participants to the total participant so as to save some memory location due to null values on
the foreign key attribute. E.g.: for a relationship between Employee and Department where
employee manages a department, the cardinality is one-to-one as one employee will manage
only one department and one department will have one manager. here the PK of the Employee
can be posted to the Department or the PK of the Department can be posted to the Employee.
But the Employee is having partial participation on the relationship "Manages" as not all
employees are managers of departments. thus, even though both way is possible, it is
recommended to post the primary key of the employee to the Department table as a foreign
key.
 For a relationship with One-to-Many Cardinality: Post the primary key or candidate key
from the “one” side as a foreign key attribute to the “many” side. E.g.: For a relationship called

44
“Belongs To” between Employee (Many) and Department (One) the primary or candidate key
of the one side which is Department should be posted to the many side which is Employee
table.
 For a relationship with Many-to-Many Cardinality: for relationships having many to many
cardinality, one has to create a new table (which is the associative entity) and post primary key
or candidate key from the participant entities as foreign key attributes in the new table along
with some additional attributes (if applicable). The same approach should be used for
relationships with degree greater than binary.
 For a relationship having Associative Entity property: in cases where the relationship has
its own attributes (associative entity), one has to create a new table for the associative entity
and post primary key or candidate key from the participating entities as foreign key attributes
in the new table
 Example to illustrate the major rules in mapping ER to relational schema:
The following ER has been designed to represent the requirement of an organization to capture
Employee Department and Project information. And Employee works for department where
an employee might be assigned to manage a department. Employees might participate on
different projects within the organization. An employee might as well be assigned to lead a
project where the starting and ending date of his/her project leadership and bonus will be
registered.

45
After we have drawn the ER diagram, the next thing is to map the ER into relational schema so as
the rules of the relational data model can be tested for each relational schema. The mapping can
be done for the entities followed by relationships based on the rule of mapping. the mapping has
been done as follows.

 Mapping EMPLOYEE Entity:

There will be Employee table with EID, Salary, FName and LName being the columns. The
composite attribute Name will be ignored as its decomposed attributes (FName and LName) are
columns in the Employee Table. The Tel attribute will be a new table as it is multi-valued.

Employee

EID FName LName Salary

Telephone

Tel EID

 Mapping DEPARTMENT Entity:

There will be Department table with DID, DName, and DLoc being the columns.

Department

DID DName DLoc

 Mapping PROJECT Entity:

There will be Project table with PID, PName, and PFund being the columns.

Project

PID PName PFund

 Mapping the MANAGES Relationship:

As the relationship is having one-to-one cardinality, the PK or CK of one of the table can be posted
into the other. But based on the recommendation, the Pk or CK of the partial participant
(Employee) should be posted to the total participants (Department). This will require adding the
PK of Employee (EID) in the Department Table as a foreign key. We can give the foreign key
another name which is MEID to mean "managers employee id". This will affect the degree of the
Department table.

46
Department

DID DName DLoc MEID

 Mapping the WORKSFOR Relationship:

As the relationship is having one-to-many cardinality, the PK or CK of the "One" side (PK or CK
of Department table) should be posted to the many side (Employee table). This will require adding
the PK of Department (DID) in the Employee Table as a foreign key. We can give the foreign key
another name which is EDID to mean "Employee's Department id". This will affect the degree of
the Employee table.

Employee

EID FName LName Salary EDID

After converting the ER diagram in to table forms, the next phase is implementing the process of
normalization, which is a collection of rules each table should satisfy.

4.3. Normalization

A relational database is merely a collection of data, organized in a particular manner. As the father
of the relational database approach, Codd created a series of rules (tests) called normal forms that
help define that organization One of the best ways to determine what information should be stored
in a database is to clarify what questions will be asked of it and what data would be
included in the answers.

Database normalization is a series of steps followed to obtain a database design that allows for
consistent storage and efficient access of data in a relational database. These steps reduce data
redundancy and the risk of data becoming inconsistent.

Normalization is the process of identifying the logical associations between data items and
designing a database that will represent such associations but without suffering the update
anomalies which are; Insertion Anomalies, Deletion Anomalies, and Modification Anomalies.

Normalization may reduce system performance since data will be cross referenced from many
tables. Thus denormalization is sometimes used to improve performance, at the cost of reduced
consistency guarantees.

Normalization normally is considered “good” if it is lossless decomposition. All the normalization


rules will eventually remove the update anomalies that may exist during data manipulation after
the implementation. The update anomalies are; the type of problems that could occur in
insufficiently normalized table is called update anomalies which includes;

47
1) Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database entry into all the
places in the database where information about that new entry needs to be stored. Additionally, we
may have difficulty to insert some data. In a properly normalized database, information about a
new entry needs to be inserted into only one place in the database; in an inadequately normalized
database, information about a new entry may need to be inserted into more than one place and,
human fallibility being what it is, some of the needed additional insertions may be missed.

2) Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when it
is time to remove that entry. Additionally, deletion of one data may result in lose of other
information. In a properly normalized database, information about an old, to-be-gotten-rid-of entry
needs to be deleted from only one place in the database; in an inadequately normalized database,
information about that old entry may need to be deleted from more than one place, and, human
fallibility being what it is, some of the needed additional deletions may be missed.

3) Modification anomalies
A modification of a database involves changing some value of the attribute of a table. In a properly
normalized database table, whatever information is modified by the user, the change will be
effected and used accordingly. In order to avoid the update anomalies we in a given table, the
solution is to decompose it to smaller tables based on the rule of normalization. However,
the decomposition has two important properties:

a. The Lossless-join property insures that any instance of the original relation can be
identified from the instances of the smaller relations.
b. The Dependency preservation property implies that constraint on the original
dependency can be maintained by enforcing some constraints on the smaller
relations. i.e. we don’t have to perform Join operation to check whether a constraint
on the original relation is violated or not.
The purpose of normalization is to reduce the chances for anomalies to
occur in a database.

4.4. Functional Dependency (FD)

Before moving to the definition and application of normalization, it is important to have an


understanding of "functional dependency."

Data Dependency
The logical associations between data items that point the database designer in the direction of a
good database design are referred to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent relationship if certain values
of data item B always appears with certain values of data item A. if the data item A is the

48
determinant data item and B the dependent data item then the direction of the association is from
A to B and not vice versa.

The essence of this idea is that if the existence of something, call it A, implies that B must exist
and have a certain value, then we say that "B is functionally dependent on A." We also often
express this idea by saying that "A functionally determines B," or that "B is a function of A," or
that "A functionally governs B." Often, the notions of functionality and functional dependency are
expressed briefly by the statement, "If A, then B." It is important to note that the value of B must
be unique for a given value of A, i.e., any given value of A must imply just one and only one value
of B, in order for the relationship to qualify for the name "function." (However, this does not
necessarily prevent different values of A from implying the same value of B.)

However, for the purpose of normalization, we are interested in finding 1..1 (one to one)
dependencies, lasting for all times (intension rather than extension of the database), and the
determinant having the minimal number of attributes.
X  Y holds if whenever two tuples have the same value for X, they must have the same value
for Y
The notation is: A  B which is read as; B is functionally dependent on A In general, a functional
dependency is a relationship among attributes. In relational databases, we can have a determinant
that governs one or several other attributes.
FDs are derived from the real-world constraints on the attributes and they are properties on the
database intension not extension.
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the primary
key (if we have composite primary key) then that attribute is partially functionally dependent on
the primary key.
Let {A,B} is the Primary Key and C is no key attribute. Then if {A,B}C and BC
Then C is partially functionally dependent on {A,B}.
Full Functional Dependency
If an attribute which is not a member of the primary key is not dependent on some part of the
primary key but the whole key (if we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
Let {A,B} be the Primary Key and C is a non- key attribute
Then if {A,B}C and BC and AC does not hold Then C Fully functionally dependent on
{A,B}.

49
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must be an Animal.
Generalized way of describing transitive dependency is that:
If A functionally governs B, AND
If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B / A and C / A) In the normal notation:
{(AB) AND (BC)} ==> provided that B / A and C / A

4.5. Steps of Normalization

We have various levels or steps in normalization called Normal Forms. The level of complexity,
strength of the rule and decomposition increases as we move from one lower level Normal Form
to the higher.
A table in a relational database is said to be in a certain normal form if it satisfies certain
constraints. A normal form below represents a stronger condition than the previous one

Normalization towards a logical design consists of the following steps:

 UnNormalized Form (UNF): Identify all data elements.


 First Normal Form (1NF): Find the key with which you can find all data i.e. remove any
repeating group.
 Second Normal Form (2NF): Remove part-key dependencies (partial dependency). Make
all data dependent on the whole key.
 Third Normal Form (3NF): Remove non-key dependencies (transitive dependencies).
Make all data dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they adhere to the third
normal form (there is no transitive dependency).

50
First Normal Form (1NF)
Requires that all column values in a table are atomic (e.g., a number is an atomic value, while a
list or a set is not).
We have two ways of achieving this:

1. Putting each repeating group into a separate table and connecting them with a primary
key-foreign key relationship.

2. Moving these repeating groups to a new row by repeating the non- repeating attributes
known as “flattening” the table. If so then Find the key with which you can find all data.
Definition: a table (relation) is in 1NF If:

 There are no duplicated rows in the table. Unique identifier.


 Each cell is single-valued (i.e., there are no repeating groups).
 Entries in a column (attribute, field) are of the same kind.
Example for First Normal form (1NF):

 UNNORMALIZED
EmpID FirstName LastName Skill SkillType School SchoolAdd SkillLevel
12 Abebe Mekuria SQL, Database, AAU, Sidist_Kilo 5
VB6 Programming Helico Piazza 8
16 Lemma Alemu C++ Programming Unity Gerji 6
IP Programming Jimma Jimma 4
City
28 Chane Kebede SQL Database AAU Sidist_Kilo 10
65 Almaz Belay SQL Database Helico Piazza 9
Prolog Programming Jimma Jimma City 8
Java Programming AAU Sidist_Kilo 6

24 Dereje Tamiru Oracle Database Unity Gerji 5


94 Alem Kebede Cisco Networking AAU Sidist_Kilo 7

 FIRST NORMAL FORM (1NF)


Remove all repeating groups. Distribute the multi-valued attributes into different rows and identify
a unique identifier for the relation so that is can be said is a relation in relational database. Flatten
the table.

51
Emp First Last Skill Skill SkillType School School Skill
ID Name Name ID Add Level
12 Abebe Mekuria 1 SQL Database AAU Sidist_Kilo 5
12 Abebe Mekuria 3 VB6 Programming Helico Piazza 8
16 Lemma Alemu 2 C++ Programming Unity Gerji 6
16 Lemma Alemu 7 IP Programming Jimma Jimma 4
City
28 Chane Kebede 1 SQL Database AAU Sidist_Kilo 10
65 Almaz Belay 1 SQL Database Helico Piazza 9
65 Almaz Belay 5 Prolog Programming Jimma Jimma 8
City
65 Almaz Belay 8 Java Programming AAU Sidist_Kilo 6
24 Dereje Tamiru 4 Oracle Database Unity Gerji 5
94 Alem Kebede 6 Cisco Networking AAU Sidist_Kilo 7

Second Normal form (2NF)


No partial dependency of a non key attribute on part of the primary key. This will result in a set of
relations with a level of Second Normal Form.

Any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is automatically also
in 2NF.

Definition: a table (relation) is in 2NF If


 It is in 1NF and
 If all non-key attributes are dependent on the entire primary key.
i.e. no partial dependency.

Example for Second Normal form (1NF):

EMP_PROJ

EmpID EmpName ProjNo ProjName ProjLoc ProjFund ProjMangID Incentive

EMP_PROJ rearranged

EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID Incentive

Business rule: Whenever an employee participates in a project, he/she will be entitled for an
incentive.

52
This schema is in its 1NF since we don’t have any repeating groups or attributes with multi-valued
property. To convert it to a 2NF we need to remove all partial dependencies of non-key attributes
on part of the primary key.

{EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund, ProjMangID, Incentive


But in addition to this we have the following dependencies

FD1: {EmpID}EmpName

FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID FD3: {EmpID, ProjNo}


Incentive
As we can see, some non-key attributes are partially dependent on some part of the primary key.
This can be witnessed by analyzing the first two functional dependencies (FD1 and FD2). Thus,
each Functional Dependencies, with their dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for each.

EMPLOYEE

PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID

Third Normal Form (3NF)


Eliminate Columns dependent on another non-Primary Key - If attributes do not contribute to a
description of the key; remove them to a separate table. This level avoids update and deletes
anomalies.

Definition: a Table (Relation) is in 3NF If


 It is in 2NF and

 There are no transitive dependencies between a primary key and


non-primary key attributes.
Example for third Normal form (3NF):
Assumption: Students of same batch (same year) live in one building or dormitory

53
STUDENT

StudID Stud_F_Name Stud_L_Name Dept Year Dormitary


125/97 Abebe Mekuria Info Sc 1 401
654/95 Lemma Alemu Geog 3 403
842/95 Chane Kebede CompSc 3 403
165/97 Alem Kebede InfoSc 1 401
985/95 Almaz Belay Geog 3 403

This schema is in its 2NF since the primary key is a single attribute and there are no repeating
groups (multi valued attributes).
Let’s take StudID, Year and Dormitary and see the dependencies.
StudIDYear AND YearDormitary

And Year ca not determine StudID and Dormitary cannot determine


StudID Then transitively StudIDDormitary
To convert it to a 3NF we need to remove all transitive dependencies of non-key attributes on
another non-key attribute.

The non-primary key attributes, dependent on each other will be moved to another table and linked
with the main table using Candidate Key- Foreign Key relationship.

STUDENT DORM

StudID Stud Stud Dept Year Year Dormitary


F_Name L_Name 1 401
125/97 Abebe Mekuria Info Sc 1 3 403
654/95 Lemma Alemu Geog 3

842/95 Chane Kebede CompSc 3

165/97 Alem Kebede InfoSc 1

985/95 Almaz Belay Geog 3

54
Generally, eventhough there are other four additional levels of Normalization, a table is said to be
normalized if it reaches 3NF. A database with all tables in the 3NF is said to be Normalized
Database.
Mnemonic for remembering the rationale for normalization up to 3NF could be the following:
1. No Repeating or Redunduncy: no repeting fields in the table.
2. The Fields Depend Upon the Key: the table should solely depend on the key.
3. The Whole Key: no partial keybdependency.
4. And Nothing But the Key: no inter data dependency.
So Help Me Codd: since Codd came up with these rules

Boyce-Codd Normal Form (BCNF)


BCNF is based on functional dependency that takes in to account all the candidate keys in a
relation.

So, table is in BCNF if it is in 3NF and if every determinant is a candidate key. Violation of
the BCNF is very rare. The potential sources for violation of this rule are

i. The relation contains two (or more) composite candidate keys


ii. The candidate keys overlap i.e. have common attribute.
The issue is related to:
Isolating Independent Multiple Relationships - No table may contain two or more 1:N or N:M
relationships that are not directly related.

The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent, as shown below.

Forth Normal form (4NF)


Isolate Semantically Related Multiple Relationships - There may be practical constrains on
information that justify separating logically related many-to-many relationships.

MVD(Multi-Valued Dependency ) : represents a dependency between attributes( for example A,


B,C) in a relation such that for every value of A there is a set of values for B and there is a set of
values for C but the sets B and C are independent to each other.
MVD between attributes A, B, and C in a relation is represented as follows
A____>>B
A____>>C

55
Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies

Fifth Normal Form (5NF)


Sometimes called the Project –Join –Normal Form (PJNF)
5NF is based on the Join dependency.

Join Dependency: a property of decomposition that ensures that no spurious are generated when
rejoining to obtain the original relation

Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and
if every join dependency in the table is a consequence of the candidate keys of the table.

Domain-Key Normal Form (DKNF)


A model free from all modification anomalies.

Def: A table is in DKNF if every constraint on the table is a logical consequence of the definition
of keys and domains.

The underlying ideas in normalization are simple enough. Through normalization we want to
design for our relational database a set of tables that;
(1) Contain all the data necessary for the purposes that the database is to serve,
(2) Have as little redundancy as possible,
(3) Accommodate multiple values for types of data that require them,
(4) Permit efficient updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly

56
Problems associated with normalization
 Requires data to see the problems
 May reduce performance of the system
 Is time consuming,
 Difficult to design and apply and
 Prone to human error

57
4.6. Chapter Four Review Questions

1. Discuss attribute semantics as an informal measure of goodness for a relation schema.


2. Discuss insertion, deletion, and modification anomalies. Why are they considered bad?
Illustrate with examples.
3. Why should NULLs in a relation be avoided as much as possible? Discuss the problem of
spurious tuples and how we may prevent it.
4. State the informal guidelines for relation schema design that we discussed.
5. Illustrate how violation of these guidelines may be harmful.
6. What is a functional dependency? What are the possible sources of the information that defines
the functional dependencies that hold among the attributes of a relation schema?
7. Why can we not infer a functional dependency automatically from a particular relation state?
8. What does the term unnormalized relation refer to? How did the normal forms develop
historically from first normal form up to Boyce-Codd normal form?
9. Define first, second, and third normal forms when only primary keys are considered. How do
the general definitions of 2NF and 3NF, which consider all keys of a relation, differ from those
that consider only primary keys?
10. What undesirable dependencies are avoided when a relation is in 2NF?
11. What undesirable dependencies are avoided when a relation is in 3NF?
12. In what way do the generalized definitions of 2NF and 3NF extend the definitions beyond
primary keys?
13. Define Boyce-Codd normal form. How does it differ from 3NF? Why is it considered a
stronger form of 3NF?
14. What is multivalued dependency? When does it arise?
15. Does a relation with two or more columns always have an MVD? Show with an example.
16. Define fourth normal form. When is it violated? When is it typically applicable?
17. Define join dependency and fifth normal form.
18. Why is 5NF also called project-join normal form (PJNF)?
19. Why do practical database designs typically aim for BCNF and not aim for higher normal
forms?

58
Chapter Five
Record Storage and Primary File Organization

5.1. Introduction

In this chapter physical database design, file organizations, storage management, operations on
file, ordered and unordered indices, multilevel and single level indexes are discussed.

After completing this chapter, the students will be able to:


 Understand file organizations and storage management, and index structure for files
 Identify Operations on File
 Identify Ordered and unordered Indices
 Understand Multilevel and single level indexes

Activity 5.1
 Define the physical database design?
 Discuss the basic steps of physical database design?

We have established that there are three levels of database design:

 Conceptual design: producing a data model which accounts for the relevant entities and
relationships within the target application domain;
 Logical design: ensuring, via normalization procedures and the definition of integrity
rules, that the stored database will be non-redundant and properly connected;
 Physical design: specifying how database records are stored, accessed and related to
ensure adequate performance.
It is considered desirable to keep these three levels quite separate -- one of Codd's requirements
for an RDBMS is that it should maintain logical-physical data independence. The generality of the
relational model means that RDBMSs are potentially less efficient than those based on one of the
older data models where access paths were specified once and for all at the design stage.

However the relational data model does not preclude /prevent from happening/ the use of
traditional techniques for accessing data - it is still essential to exploit /make use of/ them to
achieve adequate performance with a database of any size.

59
We can consider the topic of physical database design from three aspects:

 What techniques for storing and finding data exist?


 Which are implemented within a particular DBMS?
 Which might be selected by the designer for a given application knowing the properties of
the data?
Thus the purpose of physical database design is:
1. How to map the logical database design to a physical database design.
2. How to design base relations for target DBMS.
3. How to design enterprise constraints for target DBMS.
4. How to select appropriate file organizations based on analysis of transactions.
5. When to use secondary indexes to improve performance.
6. How to estimate the size of the database
7. How to design user views
8. How to design security mechanisms to satisfy user requirements.
9. How to design procedures and triggers.
Physical database design is the process of producing a description of the implementation of the
database on secondary storage. Physical design describes the base relation, file organization, and
indexes used to achieve efficient access to the data, and any associated integrity constraints and
security measures.

 Sources of information for the physical design process include global logical data model
and documentation that describes model. Set of normalized relation.
 Logical database design is concerned with the what; physical database design is
concerned with the how.
 The process of producing a description of the implementation of the database on
secondary storage.
 Describes the storage structures and access methods used to achieve efficient access to
the data.

5.2. Operation on Files


Typical file operations include:
 OPEN: Readies the file for access, and associates a pointer that will refer to a
current file record at each point in time.
 FIND: Searches for the first file record that satisfies a certain condition, and
makes it the current file record.

60
 FINDNEXT: Searches for the next file record (from the current record) that satisfies a
certain condition, and makes it the current file record.
 READ: Reads the current file record into a program variable.
 INSERT: Inserts a new record into the file & makes it the current file record.
 DELETE: Removes the current file record from the file, usually by marking the record to
indicate that it is no longer valid.
 MODIFY: Changes the values of some fields of the current file record.
 CLOSE: Terminates access to the file.
 REORGANIZE: Reorganizes the file records.
- For example, the records marked deleted are physically removed from the file or a new
organization of the file records is created.
 READ_ORDERED: Read the file blocks in order of a specific field of the file.

Files of Unordered Records (Heap Files)


- Files of unordered records are also called a heap or a pile file.
 New records are inserted at the end of the file.
 A linear search through the file records is necessary to search for a record.
- This requires reading and searching half the file blocks on the average, and is hence quite
expensive.
- Record insertion is quite efficient.
- Reading the records in order of a particular field requires sorting the file records.

Files of Ordered Records (Sorted Files)


- File records are kept sorted by the values of an ordering field and also called a sequential
file.
- Best if records must be retrieved in some order, or only a ‘range’ of records is needed.
- Insertion is expensive: records must be inserted in the correct order.
 It is common to keep a separate unordered overflow (or transaction) file for new
records to improve insertion efficiency; this is periodically merged with the main
ordered file
- A binary search can be used to search for a record on its ordering field value.
 This requires reading and searching log2 of the file blocks on the average, an
improvement over linear search.
- Reading the records in order of the ordering field is quite efficient.

Choose File Organization


The objective here is to determine an efficient file organization for each base relation. File
organizations include Heap, Hash, Indexed Sequential office Access Method (ISAM), B+-Tree,
and Clusters. Most DBMSs provide little or no option to select file organization. However, they
prove the user with an option to select an index for every relation.

61
5.3. Hashing Techniques
Hashing for disk files is called External Hashing. The file blocks are divided into M equal-sized
buckets, numbered bucket0, bucket1, ….and bucketM-1.

- Typically, a bucket corresponds to one (or a fixed number of) disk block.

One of the file fields is designated to be the hash key of the file. The record with hash key value
K is stored in bucket i, where i=h(K), and h is the hashing function.

- Search is very efficient on the hash key.


- Collisions occur when a new record hashes to a bucket that is already full.
- An overflow file is kept for storing such records.
- Overflow records that hash to each bucket can be linked together.
There are numerous methods for collision resolution, including the following:
a. Open addressing: Proceeding from the occupied position specified by the hash
address, the program checks the subsequent positions in order until an unused (empty)
position is found.
b. Chaining: For this method, various overflow locations are kept, usually by extending
the array with a number of overflow positions. In addition, a pointer field is added to
each record location. A collision is resolved by placing the new record in an unused
overflow location and setting the pointer of the occupied hash address location to the
address of that overflow location.
c. Multiple hashing: The program applies a second hash function if the first results in a
collision. If another collision results, the program uses open addressing or applies a
third hash function and then uses open addressing if necessary.
To reduce overflow records, a hash file is typically kept 70-80% full.

The hash function h should distribute the records uniformly among the buckets. Otherwise,
search time will be increased because many overflow records will exist.
Main disadvantages of static external hashing:

- Fixed number of buckets M is a problem if the number of records in the file grows or
shrinks.
- Ordered access on the hash key is quite inefficient (requires sorting the records)

62
5.4. Choosing indexes
Index is a data structure that helps us find data quickly. It can be a separate structure or in the
records themselves. Like sorted files, they speed up searches for a subset of records, based on
values in certain (“search key”) fields.

The objective here is to determine whether adding indexes will improve the performance of the
system. One approach is to keep tuples unordered and create as many secondary indexes as
necessary. Another approach is to order tuples in the relation by specifying a primary or clustering
index.
In this case, choose the attribute for ordering or clustering the tuples as:

 Attribute that is used most often for join operations - this makes join operation more
efficient, or
 Attribute that is used most often to access the tuples in a relation in order of that attribute.
If ordering attribute chosen is on the primary key of a relation, index will be a primary index;
otherwise, index will be a clustering index.

Each relation can only have either a primary index or a clustering index. Secondary indexes
provide a mechanism for specifying an additional key for a base relation that can be used to retrieve
data more efficiently.

Overhead involved in maintenance and use of secondary indexes that has to be balanced against
performance improvement gained when retrieving data.

This /overhead/ includes:

 Adding an index record to every secondary index whenever tuple is inserted;


 Updating a secondary index when corresponding tuple is updated;
 Increase in disk space needed to store the secondary index;
 Possible performance degradation during query optimization to consider all secondary
indexes.

Guidelines for Choosing Indexes:


 Do not index small relations.
 Index PK of a relation if it is not a key of the file organization.
 Add secondary index to a FK if it is frequently accessed.
 Add secondary index to any attribute that is heavily used as a secondary key.
 Add secondary index on attributes that are involved in: selection or join criteria; ORDER
BY; GROUP BY; and other operations involving sorting (such as UNION or DISTINCT).

63
 Add secondary index on attributes involved in built-in functions.
 Add secondary index on attributes that could result in an index only plan
 Avoid indexing an attribute or relation that is frequently updated.
 Avoid indexing an attribute if the query will retrieve a significant proportion of the tuples
in the relation
 Avoid indexing attributes that consist of long character strings.

5.5. Multilevel Indexes


Because a single-level index is an ordered file, we can create a primary index to the index itself.

- In this case, the original index file is called the first-level index and the index to the index
is called the second-level index.

A multi-level index can be created for any type of first-level index (primary, secondary, clustering)
as long as the first-level index consists of more than one disk block. Such a multi-level index is a
form of search tree.

However, insertion and deletion of new index entries is a severe problem because every level of
the index is an ordered file.

If primary index does not fit in memory, access becomes expensive. Solution: treat primary index
kept on disk as a sequential file and construct a sparse index on it

- Outer index – a sparse index of primary index


- Inner index – the primary index file

If even outer index is too large to fit in main memory, yet another level of index can be created,
and so on. Indices at all levels must be updated on insertion or deletion from the file.

5.6. Dynamic Multilevel Indexes Using B-Trees and B+-Trees


Most multi-level indexes use B-tree or B+-tree data structures because of the insertion and deletion
problem.

- This leaves space in each tree node (disk block) to allow for new index entries.
These data structures are variations of search trees that allow efficient insertion and deletion of
new search values.

- In B-Tree and B+-Tree data structures, each node corresponds to a disk block.
- B-Tree is a type of multilevel index from another standpoint: it's a type of balanced tree.
- In a B-tree, all pointers to data records exist at all levels of the tree.
- Each node is kept between half-full and completely full.

64
- B+-tree indices are an alternative to indexed-sequential files
- A B+-tree can have less levels (or higher capacity of search values) than the corresponding
B-tree.

The advantage of B+-tree index files are:


- Automatically reorganizes itself with small, local, changes, in the face of
insertions and deletions.
Reorganization of entire file is not required to maintain performance.

5.7. Chapter Five Review Questions

1. What is the difference between primary and secondary storage?


2. Why are disks, not tapes, used to store online database files?
3. Define the following terms: disk, disk pack, track, block, cylinder, sector, interblock gap, and
read/write head.
4. Discuss the process of disk initialization.
5. Discuss the techniques for allocating file blocks on disk.
6. What is the difference between a file organization and an access method?
7. What is the difference between static and dynamic files?
8. What are the typical record-at-a-time operations for accessing a file? Which of these depend
on the current file record?
9. Discuss the techniques for record deletion
10. Define the following terms: indexing field, primary key field, clustering field, secondary key
field, block anchor, dense index, and nondense (sparse) index.
11. What are the differences among primary, secondary, and clustering indexes?
12. How do these differences affect the ways in which these indexes are implemented? Which of
the indexes are dense, and which are not?
13. Why can we have at most one primary or clustering index on a file, but several secondary
indexes?
14. How does multilevel indexing improve the efficiency of searching an index file?
15. Explain what alternative choices exist for accessing a file based on multiple search keys.
16. What is partitioned hashing? How does it work? What are its limitations?
17. What is a grid file? What are its advantages and disadvantages?
18. Show an example of constructing a grid array on two attributes on some file.
19. What is a fully inverted file? What is an indexed sequential file?
20. How can hashing be used to construct an index?

65
Chapter Six
Relational Algebra and Relational Calculus
6.1. Introduction

In this chapter relational query languages, relational algebra, relational calculus, selection,
projection, rename operation, cross-product, set-difference, union, intersection, join operation,
existential and universal quantifiers in relational calculus and the implementation of domain
relational calculus are discussed.

After completing this chapter, the students will be able to:


 Understand the relational query languages.
 Recognize the concept of relational algebra and relational calculus.
 Understand how to implement selection, projection, rename operation, cross-product, set-
difference, union, intersection, and join operations relational query languages.
 Differentiate the existential and universal quantifiers in relational calculus.
 Understand the implementation of domain relational calculus.

Activity 6.1
 Define the relational query languages?
 Discuss the basic steps of relational algebra and relational calculus?

In addition to the structural component of any data model equally important is the manipulation
mechanism. This component of any data model is called the “query language”.
 Query languages: Allow manipulation and retrieval of data from a database.
 Query Languages! = programming languages!
 QLs not intended to be used for complex calculations.
 QLs support easy, efficient access to large data sets.
 Relational model supports simple, powerful query languages.

Formal Relational Query Languages


 There are varieties of Query languages used by relational DBMS for manipulating
relations.
 Some of them are procedural
 User tells the system exactly what and how to manipulate the data

66
 Others are non-procedural
 User states what data is needed rather than how it is to be retrieved.

Two mathematical Query Languages form the basis for Relational Query Languages: Relational
Algebra and Relational Calculus:
 Relational algebra is a procedural language: it can be used to tell the DBMS how to
build a new relation from one or more relations in the database.
 Relational calculus is a non-procedural language: it can be used to formulate the
definition of a relation in terms of one or more database relations.
 Formally the relational algebra and relational calculus are equivalent to each other. For
every expression in the algebra, there is an equivalent expression in the calculus.
 Both are non-user friendly languages. They have been used as the basis for other, higher-
level data manipulation languages for relational databases.

A query is applied to relation instances, and the result of a query is also a relation instance.
 Schemas of input relations for a query are fixed.
 The schema for the result of a given query is also fixed! Determined by definition of query
language constructs.

6.2. Relational Algebra

The basic set of operations for the relational model is known as the relational algebra. These
operations enable a user to specify basic retrieval requests.

The result of the retrieval is a new relation, which may have been formed from one or more
relations. The algebra operations thus produce new relations, which can be further manipulated
using operations of the same algebra.

A sequence of relational algebra operations forms a relational algebra expression, whose result
will also be a relation that represents the result of a database query (or retrieval request).

Relational algebra is a theoretical language with operations that work on one or more relations to
define another relation without changing the original relation. The output from one operation can
become the input to another operation (nesting is possible)

There are different basic operations that could be applied on relations on a database based
on the requirement.
d. Selection (σ) - Selects a subset of rows from a relation.
e. Projection (  ) Deletes unwanted columns from a relation.
f. Renaming: assigning intermediate relation for a single operation

67
g. Cross-Product ( x ) Allows to concatenate a tuple from one relation with all the tuples
from the other relation.
h. Set-Difference ( - ) Tuples in relation R1, but not in relation R2.
i. Union (  ) Tuples in relation R1, or in relation R2.
j. Intersection ( ) Tuples in relation R1 and in relation R1
k. Join ( ) Tuples joined from two relations based on a condition Join and intersection
are derivable from the rest.
Using these, we can build up a sophisticated database queries.

6.3. Select Operation


Selects subset of tuples/rows in a relation that satisfy selection condition. Selection operation is a
unary operator (it is applied to a single relation) The Selection operation is applied to each tuple
individually

 The degree of the resulting relation is the same as the original relation but the cardinality
(no. of tuples) is less than or equal to the original relation.
 The Selection operator is commutative.
 Set of conditions can be combined using Boolean operations ( (AND),  (OR), and ~
(NOT))
 No duplicates in result!
 Schema of result identical to schema of (only) input relation.
 Result relation can be the input for another relational algebra operation! (Operator
composition.)
It is a filter that keeps only those tuples that satisfy a qualifying condition (those satisfying the
condition are selected while others are discarded.)

Notation:

σ <Selection Condition> <Relation Name>


Example: Find all Employees with skill type of Database.

σ < SkillType =”Database”> (Employee)


This query will extract every tuple from a relation called Employee with all the attributes where
the SkillType attribute with a value of “Database”.

68
The resulting relation will be the following.

EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel


12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
65 Almaz Belay 2 SQL Database Helico Piazza 9
24 Dereje Tamiru 8 Oracle Database Unity Gerji 5

If the query is all employees with a SkillType Database and School Unity the relational algebra
operation and the resulting relation will be as follows.

σ < SkillType =”Database” AND School=”Unity”> (Employee)

EmpID FName LName SkillID Skill SkillType School SchoolAdd SkillLevel


24 Dereje Tamiru 8 Oracle Database Unity Gerji 5

6.4. Project Operation


Selects certain attributes while discarding the other from the base relation. The PROJECT creates
a vertical partitioning – one with the needed columns (attributes) containing results of the operation
and other containing the discarded Columns.

 Deletes attributes that are not in projection list.


 Schema of result contains exactly the fields in the projection list, with the same names that
they had in the (only) input relation.
 Projection operator has to eliminate duplicates!
 Note: real systems typically don’t do duplicate elimination unless the user explicitly asks
for it.
 If the Primary Key is in the projection list, then duplication will not occur\Duplication
removal is necessary to insure that the resulting table is also a relation.
Notation:

 <Selected Attributes> <Relation Name>


Example: To display Name, Skill, and Skill Level of an employee, the query and the resulting
relation will be:

 <FName, LName, Skill, Skill_Level> (Employee)

69
FName LName Skill SkillLevel
Abebe Mekuria SQL 5
Lemma Alemu C++ 6
Chane Kebede SQL 10
Abera Taye VB6 8
Almaz Belay SQL 9
Dereje Tamiru Oracle 5
Selam Belay Prolog 8
Alem Kebede Cisco 7
Girma Dereje IP 4
Yared Gizaw Java 6

If we want to have the Name, Skill, and Skill Level of an employee with Skill SQL and SkillLevel
greater than 5 the query will be:

 <FName, LName, Skill, Skill_Level> (σ <Skill=”SQL”  SkillLevel> 5 >(Employee))


FName LName Skill SkillLevel
Chane Kebede SQL 10
Almaz Belay SQL 9

6.5. Rename Operation


We may want to apply several relational algebra operations one after the other. The query could
be written in two different forms:

 Write the operations as a single relational algebra expression by nesting the operations.
 Apply one operation at a time and create intermediate result relations. In the latter case, we
must give names to the relations that hold the intermediate resultsRename Operation
If we want to have the Name, Skill, and Skill Level of an employee with salary greater than 1500
and working for department 5, we can write the expression for this query using the two alternatives:

1. A single algebraic expression:


The above used query is using a single algebra operation, which is:

 <FName, LName, Skill, Skill_Level> (σ <Skill=”SQL”  SkillLevel>5>(Employee))


2. Using an intermediate relation by the Rename Operation:

Step1: Result1 σ <DeptNo=5  Salary>1500> (Employee)

70
Step2: Result  <FName, LName, Skill, Skill_Level> (Result1)
Then Result will be equivalent with the relation we get using the first alternative.

6.6. Set Operations


The three main set operations are the Union, Intersection and Set Difference. The properties of
these set operations are similar with the concept we have in mathematical set theory. The difference
is that, in database context, the elements of each set, which is a Relation in Database, will be tuples.
The set operations are Binary operations which demand the two operand Relations to have type
compatibility feature.

Type Compatibility
Two relations R1 and R2 are said to be Type Compatible if:
a) The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) have the same
number of attributes, and
b) The domains of corresponding attributes must be compatible; that is,
Dom(Ai)=Dom(Bi) for i=1, 2, ..., n.

A. UNION Operation
The result of this operation, denoted by R U S, is a relation that includes all tuples that are either
in R or in S or in both R and S. Duplicate tuple is eliminated. The two operands must be "type
compatible".
Eg: RelationOne U RelationTwo

B. INTERSECTION Operation
The result of this operation, denoted by R ∩ S, is a relation that includes all tuples that are in both
R and S. The two operands must be "type compatible"
Eg: RelationOne ∩ RelationTwo

C. Set Difference (or MINUS) Operation


The result of this operation, denoted by R - S, is a relation that includes all tuples that are in R
but not in S. The two operands must be "type compatible"

The resulting relation for; R1  R2, R1  R2, or R1-R2 has the same attribute names as the first
operand relation R1 (by convention).

71
Some Properties of the Set Operators
Notice that both union and intersection are commutative operations; that is

R  S = S  R, and R  S = S  R

Both union and intersection can be treated as n-nary operations applicable to any number of
relations as both are associative operations; that is

R  (S  T) = (R  S)  T, and (R  S)  T = R  (S  T)

The minus operation is not commutative; that is, in general


R-S≠S–R

6.7. CARTESIAN (cross product) Operation


This operation is used to combine tuples from two relations in a combinatorial fashion. That means,
every tuple in Relation (R) will be related with every other tuple in Relation (S).

 In general, the result of R(A1, A2, . . ., An) x S(B1,B2, . . ., Bm) is a relation Q with degree
n + m attributes Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
 Where R has n attributes and S has m attributes.
 The resulting relation Q has one tuple for each combination of tuples—one from R and one
from S.
 Hence, if R has n tuples, and S has m tuples, then | R x S | will have n * m tuples.
Example:
Employee

ID FName LName
123 Abebe Lemma
567 Belay Taye
822 Kefle Kebede
Department

DeptID DeptName MangID


2 Finance 567
3 Personnel 123
Then the Cartesian product between Employee and Dept relations will be of the form:

Employee X Dept:
ID FName LName DeptID DeptName MangID
123 Abebe Lemma 2 Finance 567
123 Abebe Lemma 3 Personnel 123

72
567 Belay Taye 2 Finance 567
567 Belay Taye 3 Personnel 123
822 Kefle Kebede 2 Finance 567
822 Kefle Kebede 3 Personnel 123
Basically, even though it is very important in query processing, the Cartesian Product is not useful
by itself since it relates every tuple in the First Relation with every other tuple in the Second Relation.
Thus, to make use of the Cartesian Product, one has to use it with the Selection Operation, which
discriminate tuples of a relation by testing whether each will satisfy the selection condition.

In our example, to extract employee information about managers of the departments (Managers of
each department), the algebra query and the resulting relation will be.

<ID, FName, LName, DeptName > (σ <ID=MangID>(Employee X Dept))


ID FName LName DeptName
123 Abebe Lemma Personnel
567 Belay Taye Finance

6.8. JOIN Operation


The sequence of Cartesian product followed by select is used quite commonly to identify and select
related tuples from two relations, a special operation, called JOIN. Thus in JOIN operation, the
Cartesian Operation and the Selection Operations are used together.

 JOIN Operation is denoted by a symbol.


This operation is very important for any relational database with more than a single relation,
because it allows us to process relationships among relations.

The general form of a join operation on two relations R(A1, A2,. . ., An) and S(B1, B2, . . ., Bm)
is:

R <join condition> S is equivalent to σ <selection condition> (R X S)


 Where <join condition> and <selection condition> are the same
 Where, R and S can be any relation that results from general relational algebra expressions.
Since JOIN is an operation that needs two relation, it is a Binary operation.
This type of JOIN is called a THETA JOIN (Θ - JOIN)

Where is the logical operator used in the join condition.


Θ Could be { <, ≤, >, ≥, ≠, = }

73
Example:
Thus in the above example we want to extract employee information about managers of the
departments, the algebra query using the JOIN operation will be.

Employee <ID=MangID>Dept

A. EQUIJOIN Operation
The most common use of join involves join conditions with equality comparisons only (=). Such
a join, where the only comparison operator used is the equal sign is called an EQUIJOIN. In the
result of an EQUIJOIN we always have one or more pairs of attributes (whose names need not be
identical) that have identical values in every tuple since we used the equality logical operator.

For example, the above JOIN expression is an EQUIJOIN since the logical operator used is the
equal to operator (=).

B. NATURAL JOIN Operation


We have seen that in EQUIJOIN one of each pair of attributes with identical values is extra, a new
operation called natural join was created to get rid of the second (or extra) attribute that we will
have in the result of an EQUIJOIN condition.

The standard definition of natural join requires that the two join attributes, or each pair of
corresponding join attributes, have the same name in both relations. If this is not the case, a
renaming operation on the attributes is applied first.

R1R S represents a natural join between R and S. The degree of R1 is degree of R plus
Degree of S less the number of common attributes.

C. OUTER JOIN Operation


OUTER JOIN is another version of the JOIN operation where non matching tuples from a relation
are also included in the result with NULL values for attributes in the other relation.
There are two major types of OUTER JOIN.
1. RIGHT OUTER JOIN: where non matching tuples from the second
(Right) relation are included in the result with NULL value for attributes of the first (Left) relation.

2. LEFT OUTER JOIN: where non matching tuples from the first (Left) relation are included
in the result with NULL value for attributes of the second (Right) relation.

Notation for Left Outer Join:

R <Join Condition > S  theta left outer Join


74
R S  natural left outer join

When two relations are joined by a JOIN operator, there could be some tuples in the first relation
not having a matching tuple from the second relation, and the query is interested to display these
non-matching tuples from the first or second relation. Such query is represented by the OUTER
JOIN.

D. SEMIJOIN Operation
SEMI JOIN is another version of the JOIN operation where the resulting Relation will contain
those attributes of only one of the Relations that are related with tuples in the other Relation. The
following notation depicts the inclusion of only the attributes form the first relation (R) in the result
which are actually participating in the relationship.

R <Join Condition>S
Aggregate functions and Grouping statements
Some queries may involve aggregate function (scalar aggregates like totals in a report, or Vector
aggregates like subtotals in reports)

a) AL (R): Scalar aggregate functions on relation R with AL as a list of (<aggregate function >
,<attribute >) pairs

b) GA AL(R): Vector aggregate functions on relation R with AL as list of (<aggregate function


>, <attribute >) pairs with a grouping attribute GA.
Example (a): the number of employees in an organization (assume you have an employee table)
This is a scalar aggregate.
PR(Num_Employees) Count EmpId (Employee) , where PR = Produce relation R

Example (b): the number of employees in each department of an organization (assume you have
an employee table) This is a vector aggregate PR (DeptId, Num_Employees) DeptId Count
EmpId (Employee) , where PR = Produce relation R.

6.9. Relational Calculus


A relational calculus expression creates a new relation, which is specified in terms of variables
that range over rows of the stored database relations (in tuple calculus) or over columns of the
stored relations (in domain calculus).
In a calculus expression, there is no order of operations to specify how to retrieve the query result.
A calculus expression specifies only what information the result should contain rather than how
to retrieve it.

75
In Relational calculus, there is no description of how to evaluate a query; this is the main
distinguishing feature between relational algebra and relational calculus. Relational calculus is
considered to be a nonprocedural language. This differs from relational algebra, where we must
write a sequence of operations to specify a retrieval request; hence relational algebra can be
considered as a procedural way of stating a query.

When applied to relational database, the calculus is not that of derivative and differential but in a
form of first-order logic or predicate calculus, a predicate is a truth-valued function with
arguments. When we substitute values for the arguments in the predicate, the function yields an
expression, called a proposition, which can be either true or false.
If a predicate contains a variable, as in ‘x is a member of staff’, there must be a range for x. When
we substitute some values of this range for x, the proposition may be true; for other values, it may
be false.

If COND is a predicate, then the set of all tuples evaluated to be true for the predicate COND will
be expressed as follows:

{t | COND(t)}
Where t is a tuple variable and COND (t) is a conditional expression involving t. The result of such
a query is the set of all tuples t that satisfy COND (t).

If we have set of predicates to evaluate for a single query, the predicates can be connected using
 (AND),  (OR), and ~ (NOT)
A relational calculus expression creates a new relation, which is specified in terms of variables
that range over rows of the stored database relations (in tuple calculus) or over columns of the
stored relations (in domain calculus).

Tuple-oriented Relational Calculus


 The tuple relational calculus is based on specifying a number of tuple variables. Each tuple
variable usually ranges over a particular database relation, meaning that the variable may
take as its value any individual tuple from that relation.
 Tuple relational calculus is interested in finding tuples for which a predicate is true for a
relation. Based on use of tuple variables.
 Tuple variable is a variable that ‘ranges over’ a named relation: that is, a variable whose
only permitted values are tuples of the relation.
 If E is a tuple that ranges over a relation employee, then it is represented as EMPLOYEE(E)
i.e. Range of E is EMPLOYEE
 Then to extract all tuples that satisfy a certain condition, we will represent it as all tuples E
such that COND(E) is evaluated to be true.

76
{E / COND(E)}
The predicates can be connected using the Boolean operators:

 (AND),  (OR), and ~ (NOT)

COND(t) is a formula, and is called a Well-Formed-Formula (WFF) if:

 Where the COND is composed of n-nary predicates (formula composed of n single


predicates) and the predicates are connected by any of the Boolean operators.
An { <, ≤, >, ≥,
≠, = } which could be evaluated to either true or false. And A and B are either constant or
variables.

 Formulae should be unambiguous and should make sense.

Example (Tuple Relational Calculus)


 Extract all employees whose skill level is greater than or equal to 8

{E | Employee(E)  E.SkillLevel >= 8}

EmpID FName LName SkillID Skill SkillType School SchoolAdd Skill


Level
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
51 Selam Belay 4 Prolog Programming Jimma Jimma 8
City
 To find only the EmpId, FName, LName, Skill and the School where the skill is attended
where of employees with skill level greater than or equal to 8, the tuple based relational
calculus expression will be:

{E.EmpId, E.FName, E.LName, E.Skill, E.School | Employee(E)  E.SkillLevel >= 8}

EmpID FName LName Skill School


28 Chane Kebede SQL AAU
25 Abera Taye VB6 Helico
65 Almaz Belay SQL Helico
51 Selam Belay Prolog Jimma

E.FName means the value of the First Name (FName) attribute for the tuple E.

77
6.10. Quantifiers in Relational Calculus
To tell how many instances the predicate applies to, we can use the two quantifiers in the predicate
logic. One relational calculus expressed using Existential Quantifier can also be expressed using
Universal Quantifier.

1. Existential quantifier  (‘there exists’)


Existential quantifier used in formulae that must be true for at least one instance, such as:

 An employee with skill level greater than or equal to 8 will be:

{E | Employee(E)  (E)(E.SkillLevel >= 8)}

This means, there exist at least one tuple of the relation employee where the value for the
SkillLevel is greater than or equal to 8.

2. Universal quantifier  (‘for all’)


Universal quantifier is used in statements about every instance, such as:

 An employee with skill level greater than or equal to 8 will be:

{E | Employee(E)  (E)(E.SkillLevel >= 8)}

This means, for all tuples of relation employee where value for the SkillLevel attribute is greater
than or equal to 8.

Example:

 Let’s say that we have the following Schema (set of Relations)


Employee(EID, FName, LName, EDID) Project(PID, PName, PDID)
Dept(DID, DName, DMangID) WorksOn(WEID, WPID)
To find employees who work on projects controlled by department 5 the query will be:

{E | Employee(E)  (P)(Project(P)  (w)(WorksOn(w)  PDID = 5  EID = WEID))}

6.11. Domain Relational Calculus


In tuple relational Calculus, we use variables that range over tuples of a relation, in the case of
domain relational calculus we use variables that range over domain elements (field variables).

 An expression in the domain relational calculus has the following general form
{(x1,x2,x3,….xn)| P(x1,x2,x3,….xn,xm)}
Where (x1,x2,x3,….xn) represents the domain variables and P(x1,x2,x3,….xn,xm) represents the
formula

78
Formulas are of the form R(x1,x2,x3,….xn), x1 x2 or

xi C where  є {<,>,<=,>=,=,≠} and R is a relation of degree n and each xi is domain


variable
If f1 and f2 are formulas then so are

f1  f2 , f1  f2 ,~f1 , (x)f1 , (x)f1

 The Answer for such a query includes all tuples with attributes (x1,x2,x3,….xn) that make
the formula P(x1,x2,x3,….xn,xm) be true.
 Formula is recursively defined, starting with simple atomic formulas (getting tuples from
relations or making comparisons of values), and building bigger and better formulas using
the logical connectives. i.e the Predicate P can be set of formula combined by Boolean
operators

Example: Consider the schema of relations in previous example


Query1: List Employees

{Fname, Lname| (Employee (EID,FName, LName)}


Query2: Find the list of Employees who work in the department of IS Domain relational Calculus
expression for the query

{EID, Fname, Lname|(DName, EDID, DID) (Employee (EID, FName,

LName) Department(DID, DName, DMangID)DID = EDID DName=’IS’)},

Where DName, EDID, DID  DName, EDID, DID

Query3:List the names of employees that do not manage any department

{Fname,Lname|(EID)(Employee(EID,Fname,Lname)

 (~(DMangId)(Dept(DID,Dname,DMangId)  (EID=DMangId))))}

79
6.12. Chapter Six Review Questions

1. List the operations of relational algebra and the purpose of each.


2. What is union compatibility? Why do the UNION, INTERSECTION, and
3. DIFFERENCE operations require that the relations on which they are applied be union
compatible?
4. Discuss some types of queries for which renaming of attributes is necessary in order to specify
the query unambiguously.
5. Discuss the various types of inner join operations. Why is theta join required?
6. What role does the concept of foreign key play when specifying the most common types of
meaningful join operations?
7. What is the FUNCTION operation? For what is it used?
8. How are the OUTER JOIN operations different from the INNER JOIN operations? How is the
OUTER UNION operation different from UNION?
9. In what sense does relational calculus differ from relational algebra, and in what sense are they
similar?
10. How does tuple relational calculus differ from domain relational calculus?
11. Discuss the meanings of the existential quantifier (∃) and the universal quantifier (∀).
12. Define the following terms with respect to the tuple calculus: tuple variable, range relation,
atom, formula, and expression.

80
Chapter Seven
The SQL Language
7.1. Introduction

In this chapter Structured Query Language, Data Manipulation Language, Data Definition
Language, the syntax to write SQL statements using SQL Server Management Studio, Create
Database Statement, Create Table Statement, Changing the Data type of a column, Adding a
Primary Key while creating a Table, SQL Drop statements, INSERT Command, UPDATE
Command, DELETE Command, Select Command, Order By clause, and Group by clause
commands are discussed.

After completing this chapter, the students will be able to:


 Understand the Structured Query Language.
 Differentiate the Data Manipulation Language and Data Definition Language.
 Understand how to follow the syntax to write SQL statements using SQL Server
Management Studio.
 Create a Database using the SQL Statement
 Create a Table using the SQL Statement

Activity 7.1
 Define the Structured Query Language?
 Discuss the data types and syntax of SQL commands?

7.2. The SQL Language


The name SQL stands for Structured Query Language. It is pronounced “S-Q-L” and can also be
pronounced “SE-QUE-EL”.

- SQL is a standard language for accessing and manipulating databases.


- SQL is a computer language designed to get information from data that is stored in a
relational database.
- SQL is a nonprocedural language or a declarative computer language in that you describes
what data to retrieve, delete, or insert, rather than how to perform the operation.

81
What Can SQL do?

SQL can execute queries against a database, create new databases, create new tables in a database,
retrieve data from a database, insert records in a database, update records in a database, delete
records from a database, create stored procedures in a database, create views in a database, and set
permissions on tables, procedures, and views.

7.3. Data Manipulation and Data Definition Language


The SQL can be divided into two parts:

 Data Manipulation Language (DML)


 Data Definition Language (DDL)

1. Data Manipulation Language (DML): is used to retrieve and manipulate data in a relational
database. The query and update commands form the Data Manipulation Language (DML) parts of
SQL are:

 SELECT: Extracts data from a database


 UPDATE: Updates data in a database
 DELETE: Deletes data from a database
 INSERT INTO: Inserts new data into a database

2. Data Definition Language (DDL): A part of SQL permits database tables to be created or
deleted. It also defines indexes (keys), specifies links between tables, and imposes constraints
between tables. The most important DDL statements in SQL are:

 CREATE DATABASE: Creates a new Database.


 ALTER DATABASE: Modifies a Database.
 DROP DATABASE: Deletes a Database.
 CREATE TABLE : Creates a new Table
 ALTER TABLE : Modifies a Table
 DROP TABLE : Deletes a Table
 CREATE INDEX: Creates an index (search key)
 DROP INDEX : Deletes an index

82
7.4. Writing SQL Statements using SQL Server Management Studio

1. Create Database Statement

Syntax:

- CREATE DATABASE database_name;

Example: Write the SQL program to create a database called University.

- CREATE DATABASE University;

2. Create Table Statement

Syntax:

- CREATE TABLE table_name;


- Or CREATE TABLE table_name (field1 datatype , field2 datatype, field3 datatype);

Example: Using SQL Server Create the following tables under the University Database

1. Department (dno, dname)

2. Student (idNo, fname, lname, age, sex, dnumber)

Solution:

CREATE TABLE Department ( DeptId int not null, DeptName varchar(15) not null );

CREATE TABLE Student ( idNo varchar(10) not null, FName varchar(15),LName varchar(15),

Age int, Sex char(1), DeptId int );

3. SQL Alter Table Syntax

To add a column in a table, use the following Syntax:

ALTER TABLE table_name

ADD column_name datatype

83
Example:

ALTER TABLE Student

ADD GPA float;

4. Changing the Data type of a column in a Relational Database

To change the data type of a column in a table, use the following Syntax:

SQL Server / MS Access:

ALTER TABLE table_name

ALTER COLUMN column_name datatype

Example:

ALTER TABLE Student

ALTER COLUMN Sex varchar (6)

5. Adding a Primary Key while creating a Table

Example:

CREATE DATABASE HumanResource;

CREATE TABLE Department (DeptId int IDENTITY(1,1) PRIMARY KEY, DeptName


varchar(50) NOT NULL);

6. Adding a Foreign Key while creating a Table

Example:

CREATE DATABASE HumanResource;

CREATE TABLE Employee (EmpId int primary key, Name nvarchar(15), City varchar(25),
DeptId int references Department(DeptId) on update cascade on delete cascade)

OR

84
CREATE TABLE Managers ( ManId int primary key, Name nvarchar(15), DeptId int null
foreign key (DeptID) references Department(DeptID) on update cascade on delete cascade)

OR

Create table Employee (EMPId int not null, Name nvarchar(15), DeptId int null , City varchar(25),
primary key(EmpID),CONSTRAIN fk foreign key(DeptId) references Department(DeptId))

7. SQL Drop statements

A drop statement is used to remove a Column, Table and an entire database when each of them is
needed to be deleted respectively.

a/ Dropping an existing column

ALTER TABLE table_name

DROP COLUMN column_name

Examples: alter table Person drop column Sex

ALTER TABLE Persons

DROP COLUMN Sex

b/ DROP TABLE Statement

The DROP TABLE statement is used to delete a table.

Syntax: DROP TABLE table_name

Examples: remove Employee table from the database

DROP TABLE Employee

8. Inserting Records (INSERT SQL Command)

Syntax:

USE database_name

INSERT INTO tablename [(first_column,...,last_column)] VALUES (first_value,...,last_value)

85
Example: • Option 1:

USE University

INSERT INTO Student (StudID, FirstName, LastName, Sex) VALUES ('AB101', 'Abebe',
'Kebede', 'Male')

Example: • Option 2:

USE University

INSERT INTO Student

VALUES ('AB101', 'Abebe', 'Kebede', 'Male')

9. Updating Records (UPDATE SQL Command) Syntax:

USE database_name

UPDATE tablename

SET columnname = “newvalue" ["nextcolumn" = “newvalue2"...]

WHERE columnname OPERATOR value [AND|OR columname OPERATOR value]

Example:

USE University UPDATE Student

SET FirstName = 'Almaz', Sex = 'Female' WHERE FirstName = 'Abebe' AND Sex = 'Male'

10. Deleting Records (DELETE Command) SYNTAX:

USE database_name

DELETE [FROM] table_name

[WHERE search_condition]

Example:

USE University

DELETE FROM Student

86
WHERE FirstName = 'Almaz' AND Sex = 'Female'

- In the Above example you can use “Delete Student” instead of “Delete From Student”
keyword.
11. Selecting Data from the Database Tables

Syntax of selecting data from SQL server 2012 Databases:

USE database_name

SELECT [ALL | DISTINCT] column1 [, column2] FROM table1 [, table2]

[WHERE conditions] [GROUP BY column-list] [HAVING conditions]

[ORDER BY column-list [ASC | DESC] ]

Example 1:

use AdventureWorksLT2012

SELECT * FROM SalesLT.Customer use AdventureWorksLT2012

SELECT FROM SalesLT.Customer use AdventureWorksLT2012

SELECT FirstName,LastName,EmailAddress FROM SalesLT.Customer

use AdventureWorksLT2012

SELECT FirstName + ' ' + LastName AS FULLNAME,EmailAddress

FROM SalesLT.Customer

Comparison Operators used in the WHERE clause are the following:

= Equal

> Greater than

< Less than

>= Greater than or equal

<= Less than or equal

87
<> or! = Not equal to

Examples 2:

Use AdventureWorksLT2012

SELECT FirstName,LastName,EmailAddress

FROM SalesLT.Customer

WHERE LastName=Vargas

Examples 3:

use AdventureWorksLT2012

SELECT FirstName,LastName,EmailAddress

FROM SalesLT.Customer

WHERE CustomerID > 600 AND CustomerID < =800

Example 4:

use AdventureWorksLT2012

SELECT CustomerID,FirstName,LastName,EmailAddress

FROM SalesLT.Customer

WHERE CustomerID BETWEEN 600 AND 800

12. Order By clause

Syntax:

SELECT column1, column2

FROM list-of-tables

ORDER BY column-list [ASC | DESC];

88
Example 1:

Use AdventureWorksLT2012

SELECT CustomerID, FirstName, MiddleName, LastName, EmailAddress, CompanyName

FROM SalesLT.Customer

where MiddleName IS NOT NULL ORDER BY FirstName

Example 2:

Use AdventureWorksLT2012

SELECT ProductID,Name,Color,ListPrice

FROM SalesLT.Product

ORDER BY ListPrice

13. Group by clause

To show how to collaborate with SELECT statement to arrange identical data into groups.

Syntax:

SELECT column1[, column2, etc] FROM list-of-tables

GROUP BY "column-list";

Example:

Select Color,count(*) from SalesLT.Product group by Color

89
7.5. Chapter Seven Review Questions

1. Write the SQL program to create a database called Company.


2. Write the SQL statement to create a table called employee.
3. Write the SQL statement to create a table under the Company database.
4. Write the SQL program to create a table called Student with the columns such as idNo, fname,
lname, age, sex, dNo and use idNo as primary key.
5. Write the SQL program to remove a column called Salary from the table called Employee.
6. Write the SQL program to insert records such as `012`, `Taidor`, `Kang` and `Male` into the
Student table with columns StudID, FirstName, LastName and Sex by using the University
database.
7. Write the SQL program to update records such as FirstName = 'Hanna', Age = `18`, Sex =
'Female' to record FirstName = 'Abyot', Age = `20`, Sex = 'Male' in the Customer table.
8. Write the SQL program to delete record idNo = ‘06’ from the Patient table.
9. Write the SQL program to retrieve or select data such as FirstName, LastName, Salary and
Tax from the Employee table to identify the employees whose Salary is greater than $2000.

90
References
Raghu Ramakrishnan, Johannes Gehrke. Database Management Systems, McGraw-Hill; 3rd
edition, 2002.
Elmasri, R., & Navathe, S. Fundamentals of database systems (7th Edition). Pearson. (2017).

Osama Mustafa, Robert P. Lockard. (2019). Oracle Database Application Security, Apress,
Berkeley, CA.
C. J. Date. (2019). Database Design and Relational Theory. 2nd Edition. Apress, Berkeley, CA.
Anthony Hack. (2019). SQL Computer Programming for Beginners. Independently published
Wondwossen Mulugeta (2012), Database Systems Lecture Note, Faculty of Informatics, AAU,
https://www.studocu.com.

91

You might also like