DBMS Lecture Notes part-1
DBMS Lecture Notes part-1
Raju College
Database and Database Users
Introduction:
Databases and database technologies have a major impact on the growth of
computers. The computers are used in business, engineering, medical, library etc. In all
these areas the database play a critical role.
A database is a collection of related data. The data is nothing but real facts. Consider
the names, phone no and address book or store in the computer hard disk using software
such as M.S-Access or Excel.
A database can be of any size and complexity. For example a computerized library. A
database may be generated and maintained manually or it may be computerized.
For Example ,The library database may be created and maintained manually or
The library database may be created and maintained by a group of programs or
database management system.
DBMS: The DBMS is a collection of programs. The DBMS is used to create and
maintain a database.
The DBMS is a general purpose software that facilitates the process of defining,
constructing, manipulating and sharing the database among various users and application.
Defining: Defining the database involves, specifying the datatypes, structures, constraints
of the data to be stored. The data description is also stored in the database. It is called
meta-data.
Constructing: Constructing the database is process of storing the data on some storage
medium that is controlled by DBMS.
Manipulating: Manipulating a database means retrieving specific data, updating the
database.
Sharing: Sharing a database allow multiple users and multiple programs to access database
simultaneously.
1
B. V. Raju College
The other function of DBMS is protecting the database and maintaining them long
period. The DBMS provide protection to the database software as well as hardware
malfunction and security protection from unauthorized users.
The large database may have a lifecycle many years, so that the DBMS must be able
to maintain the database system, by allowing systems, changes overtime.
The database and DBMS software together called a database system.
Ex: Consider the university database for maintaining information such as students,
course and grades.
The following diagram show database structure.
The database is organized five files, in which stored data records of same type.
a. Student file: stores the student data.
b. Course file: stores the course data.
c. Section file: stores the section data for the course.
d. Grade report: stores the section data for the course.
e. Prerequisite: stores the prerequisite of each course.
To define the database, we must specify the structure of the record of the file.
Student record include data to represent the course-number, class, major.
2
B. V. Raju College
Course record include data to represent the course-name, course- number, credit-
hours and department.
We must also specify a data type for the each data element in the record.
Ex: Student name of student file is a string of alphabetic characters. Student-number is an
integer. To construct the university database, we store data to each student, course ,
section, grade and prerequisite as a record in the appropriate file. The records in various file
related.
4
B. V. Raju College
these specifications as programs, then they rest and debug. System analyst and
programmers commonly referred to as software developers of software
engineers.
Advantages of using the DBMS:
1. Controlling redundancy: In traditional software development using file processing.
User groups maintain its own files for handling its data processing.
Ex: Consider college database, here two groups of users-Department, library. In
traditional software development, each user has to maintain files. The department
keeps data on student details and marks details and library keep data of student data
and books data. These two users keeps same student data in different files.
Storing the same data in multiple files are called redundancy. This redundancy
will create several problems. They are:
(i) Student data enter multiple files.
(ii) Storage space is wasted.
(iii) The same data in different file, are inconsistent.
In the database approach, the views of different user groups are integrated
during the database design. This inconsistence improve the storage space and
performance of queries.
2. Restricting unauthorized access: When multiple users shares a large database, all
users will not be authorized to access all information in the database. For example,
financial data, authorized persons are allowed to access the data. Same users are
permitted to retrieve the data and name users are permitted, to retrieve and update
data.
A DBMS should provide a security and authorization, the DBA creates accounts
and account restrictions. The DBMS creates accounts and account restrictions
automatically.
5
B. V. Raju College
3. Providing storage structures for efficient query processing: Database systems must
provide capabilities for efficiently executing queries and updates. The database is
stored on disk, the DBMS creates a special data structure to speedup disk search for
the desired records. The indexes are based on tree data structures or has data
structures suitably modified for disk search. The query processing module of DBMS is
responsible for choosing efficient query execution plan for each query based on
existing storage structure.
4. Providing Backup and Recovery: The DBMS provide backup and recovery module is
responsible for recovery of the data due to software or hardware failure.
For example, the computer fails in the middle of update transaction, the
database is in inconsistent state, the recovery system restore the database befoe the
transaction execution.
5. Providing multiple user interface: The DBMS provide multiple user interfaces. The
query languages for casual users, programming languages interfaces for application
programs, forms and command codes for parametric users, menu interfaces and
natural language interfaces for standalone users.
6. Representing complex relationships among data: A database contains varieties of
data that are interrelated may ways.
Ex: The DBMS have the capability to represent relationship among the data and to
retrieve and update related data easily and efficiently.
7. Enforcing integrity constraints: The DBMS define constraints on data. The simple
type of constraint specify the datatype of each data item.
Ex: number, size;
The database designer identify the constraints in the design of database.
8. Enforcing standards: The DBA provide some standards between users. Standards can
be defined for names, format and so on.
6
B. V. Raju College
9. Reduced application development time: In database approach, developing a new
application is retrieve the data from the database taken very little time designing and
implementation of new database will take more time.
10.Flexibility: Change the structure of database as requirements. Modern DBMS allows
modify the structure of DBMS without affecting of the store data and the existing
application programs.
11.Economies of scale: The DBMS approach reducing the amount of overlap between
activities in the different department. The organizations need not invest powerful
processors, storage devices to each department. This reduces overall cost of
operation.
2- DATABASE SYSTEMS CONCEPTS AND ARCHITECTURE
In early days, the whole DBMS package where as modern DBMS is based on client-
server architecture. In client-server DBMS architecture, the functionalities are divided into
two modules. A client module runs on the personal computer. The client module run the
application programs and provides user friendly interfaces such as forms and menus. The
server module handles storages, access search and other functions.
Data models, schemas and interfaces: The database provides a different levels of data
abstraction. Data abstraction refers hiding the details of data organization and storage and
highlighting the essential features of data. Data model is a collection of concepts that can
be used to describe the structure of a database.
CATERGORIES OF DATA MODELS:
High level or conceptual data model: It describes different user views.
Low level or physical data model: It describes the details of how the data is stored in the
computer such as record formats, record ordering an access paths.
Conceptual data model: It uses concepts such as entities, attributes and relationships. An
entity represents real world object, such as an employee in the database. An attribute
represents some properties of entity, such as employee name or salary. A relationship
7
B. V. Raju College
represents association between two or more entities. Implementation of data models are
relational data models, hierarchical data model and network data model.
SCHEMAS, INSTANCES AND DATABASE STATE:
The description of a database is called the database schema.
Ex: A schema diagram displays some aspect i.e., name of the record, data items and some
types of constraints. Other constraints are not represented in the schema diagram.
The data in the database is called the instances. A data in the database at a particular
moment in time is called database state.
THREE SCHEMA ARCHITECTURE: The three schema architecture is as shown below.
Internal level: The internal level has internal schema, which describes the physical storage
structure of the database such as record formats, record ordering and access paths.
Conceptual level: The conceptual level has conceptual schema, which describes entities,
attributes, relationships and constraints.
External level: The external level has external schema, which describes the part of the
database to a particular user group and hides the remaining part of the database to that
user group.
The DBMS transform a request specified on an external schema into a request on the
conceptual schema into a request on the conceptual schema and then into a request on the
internal schema for processing the stored database. If the request is a database retrieval,
the data extracted from the stored database must be reformatted to the match of the users
external view. This process is called mapping.
8
B. V. Raju College
DATA INDEPENDENCE:
Data independence means changing the schema at one level of a database system
without changing the schema at the next higher levels. There are two types of
independency.
1. Logical data independency
2. Physical data indpendency.
1. Logical data independency: It is to change the conceptual schema without having to
change external schema. Changes of the conceptual external schema such as addition
and deletion of relationships must be possible without changing the external schema.
2. Physical data independency: It is to change the physical schema without having to
change the conceptual schema and external schema. Changes of the physical storage
structure of ordering the database such as record formats, record ordering and
access paths must be possible without changing the conceptual and external schema.
DBMS LANGUAGES AND INTERFACES
DBMS provide various languages and interfaces for the users.
DBMS Languages:
Data definition language(DDL): Data definition language is used by the DBA and by
database designers to define conceptual schema and internal schema. The DBMS will have
a DDL compiler. The DDL compiler process DDL statements and store the schema
description in the DBMS catalog.
Storage definition language(SDL): The SDL is used to specify the internal schema view.
View definition language(VDL): The VDL is used to specify user views. In most of the
conceptual schema and external schema.
Data manipulations language(DML): The DML is used to perform the manipulation of the
database. The database manipulations are inserting, retrieving, updating and deleting.
There are 2 types of DML’s.
1. A high level or non-procedural DML: It is used to specify complex database
operations.
9
B. V. Raju College
2. A lowlevl or procedural DML: It I s used to retrieve individual records from the
database and processes each separately.
DBMS interfaces:
1. Menu based interface: These interfaces present the user with list of options(menus).
By using these interfaces the user form a request.
2. Form based interfaces: A form based interface displays a form to each user. User can
fill all the entries of form to insert new data or the user can fill only certain entries, in
such case the DBMS will retrieve matching data for the remaining entries.
3. Graphical user interface: A GUI display a schema to the user in a diagrammatic form.
The user can specify a query by the diagram.
4. Natural language interfaces: A natural language interface has own schema, as well as
dictionary of important word to interpret the request. If the interpretation is
successful, the interface generate high level query corresponding the natural request.
Otherwise clarify the request.
5. Speech input and output: Speech as an input query and speech is an answer.
6. Interfaces for the DBA: The database system contain privileged commands that can
be used only by the DBA’s staff. The commands are creating accounts, generating
authorization etc.
DATABASE SYSTEM ENVIRONMENT:
The following diagram shows the components of DBMS
The above diagram is divided into two halves. The upper halve shows various users
and interfaces. The lower halve shows internals of DBMS. The internals responsible for
storage and data processing of transactions.
The upper halve of diagram shows interfaces for the DBA staff, interfaces for casual
users, interfaces for application programmers and interfaces for parametric users.
DDL compiler: The DDL compiler processes schema definition specified in the DDL and the
description of schema stored in the DBMS catalog. The catalog contains information such as
10
B. V. Raju College
names and sizes of files, names and data types of data items storage details of each file,
constraints and other information.
Casual users: Casual users with occasional need for information form the database interact
using some form of interface as interactive query interface.
Query compiler: The query compiler compile the query into an internal form(passed,
analyzed, names of data elements etc).
Query optimizer: the query optimizes means reordering of operations, elimination of
redundancies use of correct index during execution.
Application programmers: It writes a program in high level language such as JAVA, C,
COBOL that are submitted to the pre compiler. The pre compiler extracts DML commands
from an application program written in the bost programming language. These commands
are sent to the DML compiler for the compilation into object code. The object codes for the
DML commands and the reset of the program are linked forming a transaction, the
transaction also contains cash to the database processor.
These transactions are useful to parametric users. The parametric users dimply supply the
parameters to the transactions. So they can be run repetitively as separate transaction.
11
B. V. Raju College
Lower halve: In the lower halve of above diagram shows runtime database processor to
execute.
1. The privileged commands.
2. The executable query plans
3. The canned transactions with runtime parameter.
Runtime database processor: It works with system dictionary and update it with statistics.
Runtime database processor works with stored data manager for carrying out low level
input/output operations between disk and main memory.
The above diagram shows concurrences control and backup and recovery system on
separate modules. They are integrated into the working of the runtime database processor
for purposes of transaction management.
CLASSIFICATION OF DATA MODELS:
Data models are classified into 3 types. They are
1. Hierarchical data model
2. Network data model
3. Relational data model
1. Hierarchical data model: The hierarchical data model consists of a set of nested
relationships one-to-many and one-to-one associations.
In hierarchical data model the relations are presented in the form of tree
relations are presented in the form of tree structure in which the root segment is
kept at the top and further branches emanate downwards from the root segment. In
this model, the type of association can be one-to-one and one-to-many read in
downward direction of the tree, this means that many-to-one association is not
permitted.
12
B. V. Raju College
The above conceptual data model can be mapped into anyone of two ways as
shown below.
An alternative student file is kept at the root segment of the tree and the
faculty file is kept at the bottom of the tree. By mapping the conceptual data model
into the hierarchical data model the following facts are observed.
The association from student to enrollment is one-to-many. This mapped
without any modifications. The association from enrollment to subject is many-to-
one which is not permitted in hierarchical data model. Hence it is modified into one-
to-one association.
In alternative1 while mapping the conceptual data model into hierarchical data
model, the many-to-one association prevents at two levels are modified into one-to-
one association. These modifications will increase the data redundancy.
In alternative2 faculty file is kept at the root of the tree and student file is kept
at the bottom of the tree. While mapping the conceptual data model into
hierarchical data model the following facts are observed.
13
B. V. Raju College
1. The association from faculty to subject file is one-to-many. So it is mapped without
any modifications.
2. The association from subject to enrollment is many. So it is mapped without any
modifications.
3. The association from enrollment to student is many-to-one which is not permitted in
hierarchical data model. Hence it is modified into one-to-one association.
Finally which alternative has less redundancy should be selected for implementation.
In alternative2 the association change between enrollment and student. That means
we are changing one type. When we compare alternative2 with alternative1, alternative2
has less redundancy and it is implemented.
2. Network data model: A network data model consists of a set of pair wise
associations between the entities. The network data model was created to improve
database performance, database standards and represent complex relationships
effectively than the hierarchical data model.
In network data model, the network database is a collection of records in one-to-many
relationships called a set. Each set contains two entities one entity is owner and other
entity is member.
14
B. V. Raju College
Sets
Set Name Owner Member
3. Relational data model: The relational data model was introduced in 1970 by
E.M.Codd. The foundation of relation is a mathematical concept known as relation.
The relation is composed of intersecting rows and columns. Each row in a relation
represents a tuple. Each column represents an attribute.
The relational data model is implemented through a relational database
management system(ROMS). Tables are related through the sharing of common
attribute.
For example, the table agent and customer as shown below.
15
B. V. Raju College
Agent
Agent_Code Agent_Name Agent_Address Agent_PhoneNo Agent_Area
code
Customer
Cust_No Cust_name Cust_Address Phone_No Agent_Code
By matching the Agent_code in the customer table with Agent_code in the
Agent_code in the Agent table we can find agent details of that customer. The
relationship types are one-to-one, one-to-many and many-to-many having a relational
data model.
A relational diagram in a representation of entities, the attributes within the entities
and the relationship between the entities.
Agent Customer
Agent_Code Cust_No
Agent_Name Cust_name
Agent_Address Cust_Address
Agent_PhoneNo Phone_No
16
B. V. Raju College
The user access the centralized system via computer terminal that don’t have the
process power and provide display capabilities.
17
B. V. Raju College
Some machines would be client sites only(diskless,PC’s with disk have only client
software installed). Other machines would be dedicated servers and other would have
both client and server functionality. A client is a user machine that provides user
interface capabilities and local processing.
A server is a system containing both between and software that can provide services
to the client machine. Such as file accessing, printing or database access.
There are two types of dbms architecture.
1. Two tier.
2. Three tier
1. Two tier client/server architecture for DBMS: In two tier client/server architecture the
user interface programs and application programs can run on the client side. When
DBMS access is required, the program established is created the client program
establishes a connection to the DBMS; once the connection is created the client
program can communicate with DBMS. A client program can sent request or transaction
using ODBC API, which are then processed at the server sites. Query results are send
back to the client program.
18
B. V. Raju College
2. Three tier and n tier architecture: Web applications uses three tier architecture, which
adds an intermediate layer between client and database server is shown below.
1. The intermediate layer or middle tier sometimes called the application server or web
server.
2. The server plays an intermediate role by storing business rules that are used to
access the data from the database server.
3. Application server improves the security by checking privileges of client before
sending the request to database server.
4. The application server accepts request from the client, processes the request, send
database commands to the database server, and then passing processed data from
the database server.
Fig(b):
1. The presentation layer display information to the user. The business logic layer
handless intermediate rules and constraints before data is passed up to the user
of down to the DBMS.
2. The bottom layer includes all data management services. The bottom layer is split
into 2 layers, then this becomes a four tier architecture.
3. Dividing the layers between user and stored data into finer components, there by
giving to n-tier architecture, where ‘n’ may be four or five.
19
B. V. Raju College
1. Requirements collection and Analysis: The database designers interview the database
users to understand and document their data requirements. The result of this step is
set of user’s requirement.
In paralleled with specifying the data requirements, it is useful to specify functional
requirements consists user defined operations requirements consists user defined
operations that will be applied to the database.
2. Conceptual design: This step is used to design the conceptual schema de
description
consists of data requirements of users and description of entity types, relationships,
20
B. V. Raju College
constraints. These are expressed using a high level data models. The conceptual
schema is required if some functional requirements are not initially specified.
3. Logical design: This is the implementation of database using a commercial DBMS. The
commercial DBMS. The commercial DBMS use an implementation data model such as
relational or object relational database model. So that the conceptual schema is
transformed from high level data model into implementation data model. This step is
called as Logical design.
4. Physical design: The physical design is the last step o database design in which the
internal storage structure, indexes, access, paths and file organization for the database
files are specified. In parallel with these activities application programs are designed
and implemented as database transactions corresponding to the high level transaction
specification.
ER DATA MODELS OR ER DIAGRAM:
ER model describes entities, attributes and relationships.
1. Entity: The entitiy is the real world object. In the ER model the basic object is an
entity. The entity is represented by the symbol.
Ex: The attributes of EMPLOYEE entity in emp-name, age, address, salary and job.
Serveral types of attributes in ER model: There are 4 types of attributes in ER moel.
a) Simple versus composite attribute
b) Single valued versus multivalued
c) Stored versus desired.
21
B. V. Raju College
a) Simple attribute: The attributes which are not divisible are called simple attributes.
These are represented by
Ex:
b) Single valued versus multivalued attributes:
Single valued attributes: The attributes having single values are called single values
attribute.
Ex: SSN.
Multivalued attributes:: The attributes containing set of values for the same entity
are called multivalued attributes. These are represented by
c) Null values for an attribute: Null values are used if we don’t know the values of an
attribute for a particular entity.
Entity set: The collection of all entities of a particular entity type is called an entity
set.
Ex: EMPLOYEE.
Key attribute of an entity type: An entity has an attribute whose values are distinct
for each individual entity in the entity set. Such an attribute is called as key attribute.
The key attribute has its name underlined inside the oval.
23
B. V. Raju College
Value sets of attribute: Each attribute associated with a value set.
Ex: If the employee age allowed between 16 to 70, we can specify the value set of
age attribute of employee to be set integer number between 16 to 70.
Initial conceptual design of the company database:
1. An entity department with attributes are Name, Number, Location, Manager,
Manager-start-date.
Key attributes – Name, Number.
Multi-valued – Location.
2. An entity project with attributes are Number, Name, Location, controlling dept.
Key attributes – Number, Name.
3. An entity employee with the attributes are SSN, Name, Gender, Salary, Birthdate,
Sunpervisor, Address, Department, Works-on.
Key attributes – SSN
Multi-valued attributes – Works-on
Compostite attributes – Name, Address.
24
B. V. Raju College
4. An entity dependent with attributes are dependent name, Gender, Birth
Birth_date and
Relationship.
RELATIONSHIP: Whenever an attribute of one entity type refers to another entity type
some relationship exist.
Ex: The attribute manager of DEPARTMENT refers to an employee who manages the
department. In ER model these references should not be represented as attributes but as
relationships.
Relationship types,sets and Instances:
A relationship type R among ‘n’ entity types E1,E2,………,En defines a set of associat
associations or
relationship set among entities from these entity types.
The relationship
hip ‘R’ is a set of relationship instances ‘ri’ where each ri associated ‘n’
individual entities (e1,e2,………………..,en) and each entity ej in ri is a member
mber of entity type Ej.
The relationship is represented by
Ex: Relationship types WORKS_FOR between the two entity types EMPLOYEE and
DEPARTMENT (which associates each employee with the department for which the
employee works).
25
B. V. Raju College
Each relationship instance in the relationship WORKS_FOR associates on EMPLOYEE entity
and one DEPARTMENT entity where each relationship instance ‘ri’ connected to the
EMPLOYEE and DEPARTMENT entity.
RELATIONSHIP DEGREE, ROLE NAMES AND RECURSIVE RELATIONSHIPS:
Relationship Degree: The degree of relationship is the no. of entities participating in the
relationship. Hence WORKS_FOR relationship is of degree two. A relationship degree of two
is called binary.
26
B. V. Raju College
The EMPLOYEE entity type participate twice in SUPERVISION, once in the role of
supervisor and one in the role of supervisee. Each relationship instance ri in SUPERVISION
associates two employee entities ej and ek, one which play the role supervisor and other
role is supervisee. In the above diagram lines are marked as ‘1’ represents supervisor role
and those marked as ‘2’ represents the supervisee role.
Cardinality ratios for binary relationships:
Cardinality ratio: The cardinality ratio for binary relationship specifies maximum no. of
relationship instances that an entity participate in.
For example WORKS_FOR relationship type DEPARTMENT:EMPLOYEE is of ratio 1:N,
meaning that each department can be related to any no. of employees, but an employee
related to only one department. The possible cardinality ratios for binary relationship types
are 1:1, 1:N, N:1, M:N.
Example of 1:1 binary relationships manages.
27
B. V. Raju College
28
B. V. Raju College
Attributes of relationship types: Relationship types can also have attribute similar to those
entity types.
Weak entity types: Entity types that do not have key attribute are called weak entity types.
Entity type that have a key attribute are called strong entity type. A weak entity type are
identified by being related to specific entities from another entity type in combination with
one of their attribute values. We call the other entity type as owner entity type and we call
the relationship type that relates a weak entity type to its owner the identifying
relationship of the weak entity type.
A weak entity type always has a total participation in the relationship.
Ex: Consider the entity type DEPENDENT, related to EMPLOYEE, which is used to keep track
of the dependents of each employee via 1:N relationship.
A weak entity type normally has a partial key, which is the set of attributes that can
uniquely identify weak entities that are related to the same owner entity.
Ex: If an employee has two dependents those are not having the same name, the attribute
name of the DEPENDENT is the partial key.
In ER diagram , both a weak entity type and its identifying relationship are specified by
surrounding their boxes and diamonds with double lines.
Symobls used in ER:
RULES FOR ER DIAGRAM: The database designer provide the following description of the
company database after the requirement collection and analysis phase.
29
B. V. Raju College
1. The company is organized into departments. Each department has a unique name,
unique no. and particular employee manages the departments. Keep track of start
date when that employee hegin managing the department.
2. A department controls a no. of projects. Each project has a unique name, unique no.
and a a single location.
3. We store each employee name, social security no., address, salary, gender and data
path. An employee assigns to one department but may work on several projects. We
keep track of the no. of hours per week that an employee works_on each project. We
also keep track of the direct supervisor of each employee.
4. We want to keep track of the dependence of each employee for insurance purpose.
We keep each dependence first name, gender, birthdate and relationship to the
employee.
30
B. V. Raju College
The entities are DEPARTMENT, PROJECT, DEPENDENCE, EMPLOYEE.
Relationship types:
MANAGES – EMPLOYEE – DEPARTMENT(1:1)
WORKS_FOR – EMPLOYEE –DEPARTMENT(M:1)
CONTROLS – DEPARTMENT – PROJECT(1:M)
SUPERVISION – EMPLOYEE(1:M)
WORKS_ON – EMPLOYEE – PROJECT(M:M)
DEPENDS_ON – EMPLOYEE – DEPENDENT(1:M)
UML CLASS DIAGRAM: In UML class diagram, a class is displayed as a box.
Example of other notations:
UML class diagram: The UML class diagram can be considered as an alternative notation to
ER diagrams. An entity in ER corresponds to an object in UML.
The following is the UML class diagram for company conceptual schema.
31
B. V. Raju College
In a class diagram, a class is displayed as a box that include three sections. The top
section gives the class name, the middle section includes the attributes section includes the
operations that can be applied to these objects.
Consider the EMPLOYEE class, the attributes are Name, SSN, Bdate, Address and
salary. The designer optionally specify the domain of an attribute by placing a color and
followed by description.
A multivalued attribute will generally be modeled as a separate class.
Ex: LOCATION.
Relationship types are called associations in UML terminology and relationship instances
are called LINKS. A binary association is represented as a line connecting the participating
class and may optionally have a name. A relationship attribute called the link attribute is
placed in a box that is connected to the association line by a dashed line.
The (Min,Max) notations are used to specify relationship constraints which are called
multiplicities in UML terminology. Multiplicities are specified in the form min….max and an
asterisk(*) indicates no maximum limit on participation. In UML, a single asterisk indicates a
multiplicity of 0……* and 1 indicates the multiplicity of 1………1.
In UML there are 2 types of relationships. They are association and aggregation.
Aggregation is meant to represent a relationship between a whole object and its
component parts and it has a distinct diagrammatic notation. We modeled the locations of
a department and the single location of a project on aggregation. UML also distinguishes
between un direction and bi direction association.
In unidirection case, a line connecting the classes is displayed with an arrow to
indicate that only one direction for accessing related objects in needed. If no arrow is
displayed, the bidirectional case is assumed which is the default. The operations given in
each class are derived from the functional requirements of the application.
Weak entities modeled using the construct called qualified association. In UML this
represent both the indentifying relationship and the partial key which is places in a box
attached to the owner entity.
32
B. V. Raju College
ENHANCED ENTITY RELATIONSHIP MODEL
Enhanced entity relationship model: The databases for engineering design and
manufacturing(CAD), telecommunication and complex software systems have more
complex requirements than the traditional applications. This led to the development of
additional semantic data modeling concepts that were incorporated into the conceptual
data model such as the ER model.
The additional semantic data modeling concepts are
1. Sub classes, Super classes and Inheritance.
2. Specialization and generalization.
3. Various types of constrains on specialization generation.
4. Union Constraint
1. Sub classes, Super classes and Inheritance: An entity type has numerous sub
groupings of its entities that are meaningful and need to be represented explicitly.
For example, the entity type EMPLOYEE may be grouped further into SECRETARY,
ENGINEER<, TECHNICIAN, SALARIED_EMPLOYEE, HOURLY_EMPLOYEE and so on.
Each of these sub groupings, a sub class of the employee entity type and the
EMPLOYEE entity type is called super class for each of these sub classes. The
following diagram shows how to represent these concepts diagrammatically in EER
diagram.
33
B. V. Raju College
The relationships are between a super class and anyone of its sub classes, a
super class, sub class or class/sub class relationship.
For example, EMPLOYEE/SECRETARY and EMPLOYEE/TECHICIAN are two
class/sub calss relationships.
An entity that is a member of the sub class inherits all the attributes of the
entity as a member of the super class. The entity also inherits all the relationships in
which the super class participates.
2. Specification and generalization:
Specialization: It is the process of defining a set of sub classes of an entity type.
This entity type is called the super class of the specialization.
For example, the set of sub classes {SECRETARY, ENGINEER, TECHNICIAN} is a
specialization of the super calss EMPLOYEE. The super class EMPLOYEE identifies sub
class entities based on the job type of the employee.
The following diagram shows few entity instances that belongs to sub classes
of the {SECRETARY, ENGINEER, TECHNICIAN} specialization.
34
B. V. Raju College
relationship the entity in the sub class is same as entity in the super calss but is playing a
specialized role.
For example, an EMPLOYEE specialized in the role of SECRETARY or an EMPLOYEE
specialized in the role of TECHNICIAN.
The specialization process allow us to do the following.
1. Define a set of sub classes of an entity type.
2. Establish additional specific attributes with each sub class.
3. Establish additional specific relationship types between each sub class and other
entity types.
Generalization: In Generalization we suppress the differences among several entity types,
identify their common features and generalize them into a single super class of which the
original entity types are special sub classes. Generalization refers to the process of defining
a generalized entity type from the given entity types.
For example, consider the entity types CAR and TRUCK. Because they have several
common attributes, they can be generalized into the entity type VEHICLE. Both CAR and
TRUCK are now sub classes of the generalized super class VEHICLE.
35
B. V. Raju College
Constraints on Specialization and Generalization: A specialization consists of no. of sub
classes. In such cases we use circle notation. A specialization consists of single sub class. In
such cases we do not use the circle notation. In some specializations we can determine
exactly the entities that will become members of each sub class by placing a condition on
the value of some attribute of the super class, such sub classes are called predicate defined
sub classes.
For example if the EMPLOYEE entity has an attribute job-type,we can specify
condition in the SECRETARY sub class by the condition (job-type=’secretary’). This condition
is a costrint specifying that exactly.
If all the sub class in a specialization have condition on the name attribute of the
super class, the specialization is called as “attribute defined specialization” and the
attribute is called as the “defining attribute of the specialization.”
Two other constraints may apply to a specialization. They are disjointness constraint
and completeness constraint.
Disjointness Constraint: An entity can be a member of atmost one of the sub classes of the
specialization. This is displayed by placing a ‘d’ in the circle. An entity may be a member of
more than one sub class of specialization. This is displayed by placing ‘o’ in the circle.
Completeness constraint: This completeness constraint may be total or partial. A total
specialization constraint specifies that every entity in the super class must be a member of
atleast one sub class in the specialization.
For example, if every EMPLOYEE must be either an HOURLY_EMPLOYEE or a
SALARIED_EMPLOYEE then the specialization is a total specialization of EMPLOYEE.
36
B. V. Raju College
The total specialization is represented in the EER by using a double line to connect
the super class to the circle. A single line is used to display a partial specialixation, which
allows an entity not to belongs to any of the sub classes.
For example, if some of the attributes do not belongs to any of the sub classes
{SECRETARY, ENGINEER, TECHNICIAN} then that specialization is partial.
The disjointness and completeness are independent. Hence we have the following 4
possible constraints on specialization.
1. Disjoint, total
2. Disjoint, partial
3. Overlapping, total
4. Overlapping, partial.
Specialization and Generalization hierarchies and lattices: A subclass itself may have
further sub classes, forming a hierarchy or a lattice of specialization. A specialization
hierarchy has the constraint that every sub class participates on a sub class in only one
class/sub class relationship that is each sub class has only one parent. A specialization
lattice, a sub class can be a sub class in more than one class/sub class relationship.
The following diagram shows hierarchical lattices.
37
B. V. Raju College
The PERSON entity type is specialized into the sub classes {EMPLOYEE, ALUMNI,
STUDENTS}. This specialization is overlapping.
The sub class EMPLOYEE in the super class for the specialization {FACULTY, STAFF,
STUDENT_ASSISTANTS}.
The sub classs STUDENT is the super class for the specialization {GRADUATE
STUDENT, UNDER GRADUATE).
The sub class STUDENT_ASSISTANTS in the super class of specialization into
{RESEARCH ASSISTANTS, TEACHING ASSISTANTS}.
The sub class with more than one super class is called a shared sub class leads to
lattices, if no shared sub classes leads to hierarchy.
In the university database the shared sub class in STUDENT_ASSISTANTS which
inherits attributes from both EMPLOYEE and STUDENT inherit the same attributes from
PERSON. If an attribute in the super class is inherited more than once via different paths in
the lattice, then the attributes are included only once in the shared sub classes.
Some models and languages do not allow multiple inheritance.
38
B. V. Raju College
MODELING OF UNION TYPES USING CATEGORIES:
Some models and languages not allow to model single sub class more than one super class
relationship class/sub class. In this, the sub class will represent a collection of objects that is
a subset of the UNION of distinct entity types. We call such a sub class a union type or
category.
For example, consider three entity types PERSON, BANK and COMPANY. In a
database for vehicle registration, an owner of a vehicle registration, an owner of a vehicle
can be a person, a bank or a company. We need to create a class that include all entities of
three types of roles of vehicle owner.
A category OWNER that is a sub class of the UNION of the three entity, its of
COMPANY, BANK and PERSON is created for this purpose. We display categories in an ERR
diagram shown below.
The super class COMPANY, BANK and PERSON are connected to the circle with the
‘U’ symbol, which stands for the set union operation. If a predicate is needed, it is displayed
next to the line from the super class to which the predicate applies.
39
B. V. Raju College
A category such as REGISTERED_VEHICLE implies that only CARS and TRUCKS, but not
other types of entities, can be member of REGISTERED_VEHICLE. A category can be total or
partial. A total category holds the union of all entities in its super class, whereas partial
category can hold a subset of the union. A total category is represented by a double line
connecting the category and the circle whereas partial categories are indicated by a single
line.
THE UNIVERSITY DATABASE EXAMPLE
40
B. V. Raju College
4.THE RELATIONAL DATA MODEL AND RELATIONAL DATABASE CONSTRAINTS
RELATIONAL MODEL CONCEPTS:
The relational model represents the database as a collection of relations. Each
relation looks like a table of values or records. When a relation is considered as a table of
values, each row in the table represents a collection of related data values or facts of the
entity.
The table name and column names are used to interpret the meaning of values in
each row.
For example, in STUDENT table rows represents facts about a particular student
entity. The column names are Name, Student-no, class are used to interpret data values in
each row. All values in a column are same data type.
In relational model terminology, a row is called a tuple , a column is called an
attribute and the table is called a relation. The data type describes the types of values that
can appear in each column is represented by a domain of possible values.
DOMAINS, ATTRIBUTES, TUPLES AND RELATIONS:
A domain ‘D’ is a set of atomic values. The atomic means each value in the domain is
not further divisible. A common method of specifying a column is to specify a data type
from the data values forming the domain to specify a name for the domain or it helps in
interpreting domain values.
Examples of domain:
Phone-number: The set of 10 digit phone-no.
Social-security-number: The set of 9 digit social security numbers.
Names: The set of characters represents names of persons.
For example, the data type for the domain phone_no can be declared as a ddd-ddd-
dddd, where each ‘d’ is a numeric digit.
Relation schema:
A relation schema R, denoted by R(A1,A2,…………….,An) is made of a relation name ‘R’
and a ilst of attributes A1,A2,……………,An. Each attribute Ai is the name of some domain ‘D’.
41
B. V. Raju College
‘D’ is called the domain of Ai. The degree of a relation is the no. of attributes ‘n’ of tits
relation schema.
For ex, the relation schema STUDENT of degree seven is
STUDENT(Name, SSN, Home-phone, Address, office-phone, Age, GPA)
Relation:
A relation ‘r’ of the relations schema R(A1,A2,……………An) denoted by r(R) is a set of n-
tuples r={t1,t2,……….tm}. Each n-tuple ‘t’ is an ordered list of ‘n’ values. T=<V1,V2,……………Vn>
where each values Vi, 1≤i≤n, is an element of dom(Ai) or a NULL value. The ith value in tuple
t, which corresponds to the attribute Ai, is referred to as t[Ai]. For example, the STUDENT
relation which corresponds to the STUDENT schema.
The attributes and tuples of a relation STUDENT
CHARACTERISTICS OF A RELATION:
The characteristics of a relation are
1. Ordering of tuples in a relation: A relation is defined as a set of tuples. Tuples in a
relation do not have any particular order. When we display a relation as a table, the
rows are displayed in a certain order.
2. Ordinary of values within tuples: An n-tuple is an ordered list of ‘n’ values, so the
ordering values in a tuple is important.
3. Values and NULLS in the tuple: Each value in a tuple is an atomic i.e., it is not
divisible into components in the relational model. Composite and multivalued
attributes are not allowed.
The NULL values which are used to represent the values of attributes are unknown. For
example, STUDENT has a NULL value for home_phone because he does not have a
home_phone.
42
B. V. Raju College
RELATIONAL MODEL NOTATION:
The following notations are used in relational model.
1) A relation schema ‘R’ of degree ‘n’ is denoted by R(A1,A2,……….,An).
2) The letters Q,R,S denote Relation Names.
3) The letters q,r,s denote relation state.
4) The letters t,u,v denote tuples.
5) An attribute ‘A’ can be qualified with the relation name ‘R’ to which it belongs by
using the do notation R.A . For ex, STUDENT.name or STUDENT.age.
6) An n-tuple ‘k’ in a relation r(R) is denoted by t=<v1,V2,……………………Vn> where Vi is the
values corresponding to the attribute Ai.
The following notation shows component values of tuples.
Both t[Ai] and t.Ai refers to the value vi in t for attribute Ai.
RELATIONAL MODEL CONSTRAINTS:
There are many restrictions or constraints on the actual values in the database. These
constraints are derived from the rules in the world.
Constraints on databases can be divided into 3 main categories.
1. Constraints are inherent in the data model. These constraints are called Implicit
constraints.
2. Constraints are directly expressed in schemas of the data model. These constraints
are called Explicit constraints.
3. Constraints are not directly expressed in the schemas of the data model, the
constraints must be expressed by the application programs. These constraints are
called application based constraints.
In relational model we mainly used schema based constraints. The schema based
constraints are
a) Domain constraints.
b) Key constraints and constraint s on NULL values.
c) Entity Integrity constraints and
43
B. V. Raju College
d) Referential Integrity constraints.
a) Domain constraints: Domain constraints specify that within each tuple, the value of
each attribute ‘A’ must be an atomic value from the domain dom(A).
b) Key constraints and constraints on NULL values: A relation is defined on a set of
tuples. The elements in the set are distinct. Any two tuples cannot have a the same
combination of values for their attribute. Usually, other subset of attributes in the
relational schema have the same combination of values.
Super Key : A sub set of Attributes that uniquely identifies a tuple(row) in a relation(table).
Eno Ename Salary Dept_no Voter_Id
1 Raju 20000 10 V12345
2 Prasad 40000 10 V12222
3 Raju 20000 20 V45666
{Eno } No two rows have same Eno ( Eno uniquely identifies a tuple(row) in a relation)
{Ename } Two employee’s may have same name.
{Voter_id} No two rows have same Voter_id (Voter_id uniquely identifies a tuple(row) in a relation)
{Eno, Ename } Eno itself uniquely identifies a tuple(row) in a relation, hence combination of Eno and
Ename also uniquely identifies a tuple(row) in a relation
*Minimal Super Key (Key) :
Definition : A Minimal Super Key (Key) K is a superkey with the additional property that removal of any
attribute from K will cause K not to be a superkey any more.
{Eno } is Minimal Super Key ( A Super Key which have only one attribute is Minimal Super Key)
{Voter_id} is Minimal Super Key
{Eno, Ename } is Not a Minimal Super Key ( Removal of Ename from {Eno, Ename} = {Eno} is also a
Super Key } hence {Eno, Ename} is not Minimal Super Key.
*Candidate Key :
Definition : If a relation schema has more than one key (Minimal Super Key) then each of them is called as
candidate key.
One of the candidate keys is arbitrarily designated to be the primary key, and the others are
called secondary keys(or Alternative key).
A key formed by combining at least two or more columns is called composite key
44
B. V. Raju College
*Primary Key :
Definition : Set of attributes of a relation which uniquely identifies a tuple in a relation.
A Relation(table) can have many Superkeys, and also Minimal Superkeys.
If a Relation(table) has more than on Minimal Superkeys each can be called as Candidate Keys.
One of the candidate keys is arbitrarily designated to be the primary key, and the others are called
secondary keys(or Alternative key).
NOT NULL: Another constraint on attributes specifies NULL values are permitted or NULL
values are not permitted.
For ex, if every STUDENT tuple must have a valid, non-NULL value for the name
attribute, then ‘Name’ of the STUDENT is constraint to be NOT NULL.
RELATIONAL DATABASES & RELATIONAL DATABASE SCHEMAS:
A relational database constraints many relations with tuples in relations that are
related various ways.
A relational database schema ‘S’ is a set of relation schema.
S={R1,R2,……………,Rn} and a set of integrity constraints.
The following figure shows relational database schema.
COMPANY={EMPLOYEE,DEPARTMENT,DEPT_LOCATIONS,PROJECT,WORKS_ON,
DEPENDENT}.
The underlined attributes represents primary key.
A database stat not satisfying all the constraints is called an invalid state and a state
satisfying all the constraints is called a valid state.
45
B. V. Raju College
Entity Integrity, Referential Integrity and Foreign keys:
The entity integrity constraint states that no primary key value can be NULL. Because
the primary key value is used to identify tuple in a relation. Having NULL values for the
primary key we cannot identify tuples.
The referential integrity constraint is specified between two relations and is used to
maintain the consistency among the tuples in the two relations.
For ex, the attribute Dno of employee gives the Department number must match the
Dnumber value of some tuple in the Department relation.
To define referential integrity formally we define the foreign key.
The concept of foreign key is to specify a referential integrity constraints between
the two relation schema R1 & R2.
A set of attributes FK in relation schema R1 is a foreign key of R1 that references
relation R2 if satisfies the following rules.
1. The attributes in FK as a primary key attributes PK of R2, the attributes FK are said to
reference relation R2.
2. A value of FK in a tuple t1 as a value of PK for some tuple t2 or NULL
t1[FK]=t2[PK]
and we say that the tuple t1 references to tuple t2.
46
B. V. Raju College
Referential Integrity Constraint is used to specify interdependencies between relations. This constraint
specifies a column or list of columns as a foreign key of the referencing table.
A foreign key means the values of a column in one table must also appear in a column in another
table. The foreign key in the child table will generally reference a primary key in the parent table.
The referencing table is called the child table & referenced table is called the parent table
Self Referential Integrity Constraint mean a column in one table references the primary key
column(s) in the same table.
EMP Referencing relation
Eno Ename Salary Dept_no Voter_Id
(FK)
1 raju 20000 10 V12345
2 ravi 40000 10 V12222
3 Raju 25000 20 V45666
In EMP table Eno is Primary Key. (Duplicates and Null values are not allowed in Eno)
In EMP table Dept_no is foreign key which references DEPT table Dept_no column. (A value for Dept_no
in EMP table accepts only if it exists in Dept_no column in DEPT table.)
DEPT Referenced relation
Dept_no Dname Dloc
(PK)
10 MTech BVRM
20 MBA HYD
30 MCA BVRM
In DEPT table Dept_no is Primary key.
In the Above relation Manger_Eno is foreign key which references Eno in same column. Means a
value in Manger_Eno is valid if it exists in Eno of the same table.
47
B. V. Raju College
48
B. V. Raju College
RELATIONAL DATABASE DESIGN BY
ER AND EER TO RELATIONAL MAPPING
In this chapter we are learning how to design a relational database how to design a
relational database schema based on a conceptual schema design. This corresponds to the
logical database design or data model mapping.
We have used seven algorithms to convert the basic ER model construct entity
types(strong and weak), binary relationships(with various structural constraints), n-ary
relationships and attributes(simple, composite and multi-valued) into relations. We have
also used some other algorithms how to map Specialization/Generalization and union types
into relations.
Relational database design using ER to Relational Mapping:
1. ER to Relational Mapping Algorithms:
The ER diagram for COMPANY schema is converted into relational database schema
using seven algorithms.
49
B. V. Raju College
Step 1: Mapping of Regular Entity Types:
For each regular(strong) entity type E in the ER schema, create a relation of E.
Include only the simple component attributes of the composite attributes. Choose one of
the key attribute of E on the primary key for R.
In our example, we create a relations EMPLOYEE, DEPARTMENT and PROJECT to
correspond to the regular entity types EMPLOYEE, DEPARTMENT and PROJECT. We choose
SSN, DNumber and DNumber as primary keys for the relations EMPOYEE, DEPARTMENT
and PROJECT respectively.
50
B. V. Raju College
Step 3: Mapping of binary 1:1 relationship types:
For each binary 1:1 relationship type ‘R’ in the ER schema, identify the relations S and
T that corresponds to the entity types participating in R.
Foreign key approach: Choose one of the relations, and include as a foreign key in S
the primary key of T. It is better to choose an entity type with total participation in ‘R’ in
the role of S.
In our ex., choose the DEPARTMENT relation because it participate totally on the
manages relationship, we include primary key of the employee relation as foreign key in
the DEPARTMENT relation and rename it as Mgr_SSN. We also include the simple
attribute start_date of the MANAGES relationship in the DEPARTMENT relation and
rename it Mgr_start_date.
52
B. V. Raju College
WORKS_ON
DNumber DLocation
54
B. V. Raju College
Overlapping
Option 8D: Single relation with multiple type attributes:
Create a single relation schema L with attributes Attrs(L) = {k,a1,a2,………….,an} U
{attributes of S1} U…………..U{attributes of Sm} U {t1,t2,…………,tm} and Pk(L) = k. Each ti is a
Boolean type attribute indicate whether a tuple belongs to sub class Si.
55
B. V. Raju College
Step 9: Mapping of Union types (Categories):
For mapping a category when defining super classes have different keys. It is
customary to specify a new key attribute called a surrogate key, when creating a
relation to correspond to the category include the surrogate key as foreign key in the
super classes.
For category whose super classes have the same key, there is no need for a surrogate
key.
In the above ex., we create a relation OWNER to correspond to the OWNER category.
The primary key of the OWNER relation is the surrogate key, which we called owner_id.
We also include the surrogate key attribute owner_id as a foreign key in each relation
corresponding to the super class of the category.
56
B. V. Raju College
The relational algebra operations can be divided into two groups. One group include
the set operations. The set operations include UNION, INTERSECTION, SET DIFFERENCE
and CARTESIAN PRODUCT.
The other group include SELECT, PROJECT and JOIN.
SELECT and PROJECT operations are unary operations that operate on one relation.
JOIN and other operations are binary operations which operates on two tables.
SELECT operation:
The select operation select tuples that satisfy a given condition. We use the
lowercase Greek letter(σ) to denote selection. The condition (Predicate) appears as a
subscript to σ. The argument “relation” is given in parenthesis following to σ.
The comparision (<, >, ≤, ≥, =, ≠) and logical (AND, OR, NOT) operators are allowed in
conditions.
Ex:
1. Select the EMPLOYEE tuples whose dept is 4
σ DNO = 4 (EMPLOYEE)
2. Select the EMPLOYEE tuples whose salary is greater than 30,000
σ salary > 30000(EMPLOYEE)
3. Select the tuples for all employees who either work in dept 4 and salary > 25000 per
year or work in dep 3 and salary > 30000
σ
Dno = 4 AND salary > 25000)(EMPLOYEE) or (Dno = 5 AND salary > 30000)
PROJECTION operation:
The projection operation selects certain column from the relation and other
columns are discarded, we used the Greek letter ‘π’ to denote the projection. The column
names appears as a subscript of ‘π’, these column names appear in the result. The
argument ‘relation’ is given in the parenthesis following the ‘π’.
Syntax: π(attribute list) (Relation)
57
B. V. Raju College
Ex: To list each employee first name and last name and salary.
Sequences of operations and the RENAME operation: In general we can apply several
relational algebra operations one after the other.
For ex, to retrieve the firstname, lastname and salary of all employee who works in
department number 5.
Where the symbol is used to denote the RENAME operator, S is the new
relation name and B1,B2,…………Bn are the new attribute name.
The first expression renames both the relation and its attributes, the second renames
the relation only and the third renames the attributes only.
Relational Algebra operations from set theory:
These are the binary operations. These operations will take two relations as input
and produce one relation as output.
UNION:
The union operation is denoted by ‘U’. The result of “R U S” is a relation that includes
all the tuples that are either R or in S or in bothe R and S. Duplicate tuples are eliminated.
58
B. V. Raju College
Ex: To retrieve the social security number of all employees who work in department 5 or
directly supervise an employee who work in department 5.
INTERSECTION:
The intersection operation is denoted by ‘∩’. The result of R∩S in a relation that
includes all the tuples that are in both R and S.
SET DIFFERENCE (or MINUS) : The difference operation is denoted by ‘-‘. The result of R-S,
is a relation that includes all tuples that are in ‘R’ but not in ‘S’.
CARTESIAN PRODUCT(CROSS PRODUCT) operation:
The CARTESIAN PRODUCT operation is denoted by X. The result of RXS is a relation
that include new element by combining every tuple from relation R with every tuple from
relation S.
Ex: To retrieve a list of names of each female employee dependents.
JOIN operation:
The JOIN operation is denoted by , is used to combine related tuples from two
relation into single tuples.
The general form of a JOIN operation on two relations R(A1, A2, ……………………., An) and S(B1,
B2, ……………,Bm) is
59
B. V. Raju College
Variations of JOIN: The EQUIJOIN and NATURAL JOIN.
EQUI JOIN: It produces all the combination of tuples from R and S that satisfy join condition
with only equality comparisons.
NATURAL JOIN: It produces all the combination of tuples from R and S that satisfy a join
condition. In the join condition the attributes have t he same name in both relation. The
natural join is represented by *.
R * S or <join condition> or R * S.
For ex., to apply a natural join on the Dnumber attributes of DEPARTMENT and
DEPT_LOCATIONS. It is sufficient to write
DEPT_LOCATIONS DEPARTMENT*DEPT_LOCATIONS
PROJ_DEPT PROJECT * DEPT.
DIVISION operation:
The division operation is denoted by ‘ ’. The division operation is applied to two
relations R(z) S(x). The result of DIVISION is a realtion T(Y). For a tuple t to appear in the
result T, the values in t must appear in ‘R’ in combination with every tuple in S.
Ex: Retrieve names of employee who work on all projects that “John smith” works on
60
B. V. Raju College
Syntax:
Where <grouping attributes> is a list of attributes in relation ‘R’ and <function list> in
list of <function list> is list of <function> <attribute> pairs. The following functions SUM,
AVERAGE, MAXIMUM, MINIMUM, COUNT are allowed in each pair and <attribute> is an
attribute in relation R.
61
B. V. Raju College
Ex:
1. To retrieve each department number, the no. of employees in the department and
their average salary.
62
B. V. Raju College
Recursive operation:
This operation is applied in between tuples of same type. For ex., the relationship
between an employee and supervisor.
The relationship is described by the foreign super_ssn of the employee relation.
Ex: Retrieve the details of supervisors.
FUNCTIONAL DEPENDENCIES
AND
NORMALIZATION FOR RELATIONAL DATABASES
Definition:
A functional dependency, denoted by x y, between two sets of attributes x and y that
are subsets of ‘R’ specifies a constraint on the possible tables that can form a relation r of R.
The constraints is that for any two tuples t1 ant t2 in r that have t1[x] = t2[x], they must also
have t1[y] = t2[y].
1. This means that the values of the ‘y’ component of a tuple in ‘r’ depends on the
values of the x components.
2. Alternatively, the values of x components determine the y components. We also say
that the functional dependency from x to y or that y is functionally dependent on x.
The abbrevatoin for functional dependency is FD or f.d. The set of attributes is called
the left hand side of the FD and y is called the right hand side of the FD.
Thus, x functionally determines y in a relation schema R, if and only if, whenever two
tuples of r(R) agree on their x-value they must necessarily agree on their y-value.
1. X is a candidate key of R this implies that x y for any subset of attributes y of R.
2. If x y in R, this does not say whether or not y x in R.
63
B. V. Raju College
Consider the relation schema EMP_PROJ.
Each FD is displayed as a horizontal line. The left hand side attributes of the FD are
connected by a vertical lines to the line representing the FD, while the right hand side
attributes are connected by arrows pointing towards the attributes.
Inference rules for functional dependencies:
F is the set of functional dependencies that are specified as relation schema ‘R’.
However numerous other functional dependencies hold in all legal relational instances
among set of attributes that are derived from and satisfy the dependencies in F. Those
other dependencies can be inferred from the functional dependencies in ‘F’.
For ex., if each department has one manager, so that Dept_no uniquely determines
Mgr_ssn(Dept_no Mgr_ssn) and a manager has a unique phone number called
64
B. V. Raju College
Mgr_phone (Mgr_ssn Mgr_phone), then these two dependencies together imply that
Dept_no mgr_phone. This is an inferred FD.
Definition: The set of all dependencies that include ‘F’ as well as all dependencies
that can be inferred from ‘F’ is called the closure of ‘F’, it is denoted by F+.
For ex, suppose that the set of functional dependencies on the relational schema
EMP_DEPT is
F={ssn {Ename,Bdate,Address,Dnumber}
, Dnumber {Dname,DMgr_ssn} }
Some additional functional dependencies that we can infer from ‘F’ are the following
Ssn {Dname, DMgr_ssn}
Ssn ssn
Dnumber Dname
To determine a systematic way to infer dependencies we must discover a set of
inference rules that can be used to infer new dependencies from a given set of
dependencies.
The FD { x , y } z is abbreviated to xy z and the FD {x,y,z} {U,V} is abbreviated
to xyz UV.
The following 6 rules IR1 through IR6 are well known inference rules for functional
dependencies.
IR1(Reflexive rule) : if x y, then x y
IR2(Augmentation rule) : {x y}= xz yz
IR3(Transitive rule) : {x y, y z} = x z
IR4(Decomposition rule) : {x yz} = x y
IR5(Union rule) : {x y , x z} = x yz
IR6 (Pseudotransitive rule) : {x y, wy z} ={wx z}
65
B. V. Raju College
A systematic way to determine additional functional dependencies in first to
determine each set of attributes x that appears as a left hand side of some functional
dependency in ‘F’ and to determine the set of all attributes that are dependent on x.
Definition: For each set of attributes x, we determine the set x+ of attributes that are
functionally determined by x based on F, x+ is called the closure of x under F.
Algorithm: Determining x+, the closure of x under F.
x+ := x
repeat
old x+: = x+
For each functional dependency y z in F do if x+ y then x+: = x+ U z
Until (x+ = old x+);
66
B. V. Raju College
Minimal sets of functional Dependencies:
A minimal cover of a set of functional dependencies E is a set of functional dependencies
F that satisfies the properly that every dependency in E is in the closure F+ of F.
In addition, this property is lost if any dependency from the set F is removed. F must
have no redundancies in it and dependencies in F are in a standard form.
We can formally define a set of functional dependencies F to be minimal if it satisfies
the following conditions.
1. Every dependency in F has a single attribute for its right hand side.
2. We cannot replace any dependency XA in F with a dependency Y A, where y is a
proper subset of X and still have a set of dependencies that is equivalent to F.
3. We cannot remove any dependency F and still have a set of dependencies that is
equivalent to F.
We can think of a minimal set of dependencies as being a set of dependencies in a
canonical form and with no redundancies.
1. Every dependency in a canonical form with a single attribute as the right hand side.
2. There are no redundancies in the dependency either by having redundant attributes
on the left hand side of dependency or having a dependency inferred from the
remaining FD’s of F.
Definition: A minimal cover of a set of functional dependencies E is a minimal set of
dependencies that is equivalent to ‘E’.
Ex:
Problem 1: The set of FD’s E:{B A, DA, AB D} {B A,D A, BD} {D A, BD}.
Find the minimal cover of E.
Solution:
Step 1: Satisfied step1 because in R.H. S there is no redundant attribute right side.
Step 2: AB D
AD
Or
67
B. V. Raju College
B D should determine only one
Since B A by IR2 BB AB ---- i) B AB --- (1)
ii) B AB
AB D given FD ---(2)
From (1) & (2) IR3
BD
Problem 2: Let the given set of FD’s be R=R(ABCDEFGH) and F={CD AB, C D, D EH,
AE C, AE, BD}. Find the minimal cover of F.
Solution: Step1: After performing step1 in the algorithm we get,
F’ = CD A ---(1)
CD B ---(2)
C D ---(3)
D E ---(4)
D H ---(5)
AE C ---(6)
A E ---(7)
B D ---(8)
By using the step2 of algorithm eliminate redundancy in the LHS.
Take FD’s (1) & (2)
CD A X Y IR6
CD WY Z
CC A WX Z
CA
Take FD’s (2) & (3)
CD B
CD
CC B
CB
68
B. V. Raju College
Take FD’s (6)
AE C
AE
IR6
AA C
AC
F1 = C A, C B, A C, C D, D E
D H, A E, B D
Remove functional dependencies
C D is eliminated because it can be derived from C B and B D.
F1 = CA , AC , DE , DH , AE , BD
Problem3: F = {B A, D A, AB D}. Find the minimum cover of F.
Solution: After performing step1 B A D A AB D in B = A.
AB D
BA
BB D, B D
F 1 = B A DA BD
Problem 4: Consider the relational schema R(ESBADNAM) and the following set of G of FD’s
on are G = {S EBAD, D NM}. Calculate the closure S+ and D+ with respect to G.
Solution: S EBAD
S+ = {SEBADNM}
D NM
D+ = {DNM}
69
B. V. Raju College
Normalization:
It is a process of analyzing the given relation schemas based on their Functional
Dependencies (FDs) and primary key to achieve the properties
1) Minimizing redundancy (Duplication of data)
2) Minimizing insertion, deletion and update anomalies.
A prime attribute of a relation schema is any attribute of the Candidate Key’s of the relation
schema.
• First Normal Form (1 NF)
The domain of an attribute must include only atomic (simple, indivisible) values, and
that the value of any attribute in a tuple must be a single value from the domain of
that attribute.
Or
• “1NF disallows relations as relations or relations as attribute values within tuples”
Example :
Department is Not in 1 NF (Since Dept_Location is multi valued attribute)
Dept_no Dept_Name Dept_Location
10 MCA Bhimavaram, Hyderabad
20 MBA Bhimavaram
30 BE Vizag, Chennai
Dept_Locations
70
B. V. Raju College
Dept_no Dept_Location
10 Bhimavaram
10 Hyderabad
20 Bhimavaram
30 Vizag
30 Chennai
In the above three relations there all are full functional dependencies.
Third Normal Form (3NF)
3NF doesn’t allows transitive Functional Dependency
A FD X → Y is a transi ve FD if we have an attribute(s) A where X → A and A → Y holds.
A Relation Schema R is in 3NF if “every non prime attribute should not transitively
dependent on primary key”
Consider the following relation schema of EMP_DEPT
73
B. V. Raju College
In the above relation schema S_C_I
{ Student, Course} is Primary Key (Candidate Key)
The Functional Dependencies are
FD1 : {Student, Course} -> Instructor
FD2 : Instructor -> Course
The Functional Dependency FD1: {Student, Course} -> Instructor is satisfying since the condition-
1 {Student, Course} is Super Key. But FD2: Instructor -> Course is not satisfying the condition since
Instructor is not Super Key, Hence S_C_I is not in BCNF
So S_C_I can be decomposed into the following relations
I_C
Instructor Course
IRK DBMS
KBR OS
RAO DBMS
I_S
Instructor Student
IRK Rama
KBR Rama
IRK Sita
KBR Sita
RAO Madhu
KBR Madhu
74
B. V. Raju College
Fourth Normal Form ( Based on Multi Valued Dependency )
A multivalued dependency (MVD) X ->> Y specified on relation schema R, where X and Y
are both subsets of R, specifies the following constraint on any relation state r of R: If two
tuples t1 and t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also
exist in r with the following properties, where we use Z to denote (R - (X ᴜ Y)):
• t1[X] = t2[X] = t3[X] = t4[X].
• t1[Y] = t3[Y] and t2Y] = t4[Y].
• t2[Z] = t3[Z] and t1[Z] = t4[Z].
4NF : A relation R is in 4NF if whenever X->>Y is a nontrivial Multi valued Functional
Dependency, then X is a Superkey.
75
B. V. Raju College
5NF - Project-join normal form (PJNF)
• Fifth normal form is based on Join Dependency:
• join dependency (JD): A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified
on relation schema R, specifies a constraint on the states r of R. The constraint states
that every legal state r of R should have a non-additive join decomposition into R1,
R2, ...,Rn; that is, for every such r we have
76