Unit 1 DBMS
Unit 1 DBMS
1
Students, such as the courses they are enrolled in, details about their
scholarships, the courses they have studied in previous years or are taking
this year, and examination results. There may also be a database containing
details relating to the next year’s admissions and a database containing
details of the staff working at the university, giving personal details and
salary related details for the payroll system.
TRADITIONAL FILE PROCESSING SYSTEM
File processing systems was an early attempt to computerize the manual
filing system that we are all familiar with. A file system is a method for
storing and organizing computer files and the data they contain to make it
easy to find and access them. File systems may use a storage device such as
a hard disk or CD-ROM and involve maintaining the physical location of the
files.
Characteristics of File Processing System
Here is the list of some important characteristics of file processing system:
It is a group of files storing data of an organization.
Each file is independent from one another.
Each file is called a flat file.
Each file contained and processed information for one specific
function, such as accounting or inventory.
Files are designed by using programs written in
programming languages such as COBOL, C, C++.
The physical implementation and access procedures are written into
database application; therefore, physical changes resulted in intensive
rework on the part of the programmer.
As systems became more complex, file processing systems offered little
flexibility, presented many limitations, and were difficult to maintain.
Managing data with file system is now out dated. But the following are the
reasons for studying them in detail.
An understanding of the relatively simple characteristics of file
systems makes the complexity of database design easier to
understand.
An awareness of the problems of file system can help us to avoid those
problems in DBMS Software.
If we want to convert the file systems data to database system, the
knowledge of file system and its limitations are useful.
The limitations of File System are explained below:
Program-Data Dependence:
The file descriptions are stored in the application programs that accesses
the given file. Due to this reason when we want to change the file, we should
also change all the application programs that access the file. And when we
2
change the application program we have to change the related file also. This
is denoted as program – data dependence.
Duplication of Data:
In the file system the applications are developed independently. This
process causes duplication of files. The duplication causes wastage of
storage space. It also leads to loss of data integrity and meta data integrity. It
means the same field names in different files may represent different data
items and different field names in different files may represent same data
items.
Limited Data Sharing:
Each application has its own private files with little opportunity to share
data outside of their own applications. A requested report may require data
from several incompatible files in separate systems. Generally such report is
always not possible with the file system.
Lengthy Development Times:
There is little opportunity to use previous development efforts. For each
new application the developer has to start from scratch by designing new
file formats and descriptions.
Excessive Program Maintenance:
The preceding factors such as program data dependence, duplication of data
Lengthy development times and limited data sharing causes heavy program
maintenance load.
BASIC DEFINATIONS
Data:
It is the collection of raw facts.
Data consists text, numbers, images, audio and video segments.
Data is the plural of the Latin term datum.
Example: In a college admission form the fields such as student name,
father’s name, dob, address and phone etc. are treated as data.
Information:
The processed data is called information.
Information increases the knowledge of the user who uses the data.
Data is the lowest level of knowledge and information is the second
level of knowledge.
Data by itself alone is not significant. But information is significant by
it self.
Observations and recording are done for data, analysis and
calculation are done for information.
Example:
Suppose that we have entered the student’s data such as student’s name
and marks in 3 subjects. After performing calculations we can generate the
information such total marks, percentage etc.
3
Database:
A shared collection of logically related data and its description,
designed to meet the information needs of an organization.
The description of the data is known as the system catalog (or data
dictionary or metadata—the “data about data”). It is the self-
describing nature of a database that provides program–data
independence.
For example, a sales person may maintain a small database of
customer contacts on his/her laptop that consists of a few mega bytes
of data.
A large corporation may build a very large database consisting of
several tera bytes of data on a large mainframe computer that is used
for decision support applications.
A data warehouse contains peta bytes of data
DBMS (DATABASE MANAGEMENT SYSTEM)
It is a software that manages the data stored in a database. It enables
the user to store, modify and extract the data/information from a
database.
In SQL (Structured Query Language) we have three facilities for
Definition, Manipulation, and Control. There are denoted as DDL,DML
& DCL.
The DBMS serves as the mediator between the user and database.
The DBMS receives all application requests from user and provides
the response.
There are many different types of Database Management Systems
ranging from small systems that run on personal computers to huge
systems that run on mainframes.
DBMS provides controlled access to the database. For example, it may
provide:
A a security system, which prevents unauthorized users accessing
the database;
An integrity system, which maintains the consistency of stored
data;
A concurrency control system, which allows shared access of the
database;
A recovery control system, which restores the database to a
previous consistent state following a hardware or software failure;
A user-accessible catalog, which contains descriptions of the data
in the database.
Examples for different DBMS Softwares are Ms. Access, Oracle, Ms SQL
server.
Meta Data:
Data that describe the properties or characteristics of other data.
4
Example: consider the following data
5
Hardware
Hardware identifies all the system's physical devices. It includes computers,
computer peripherals, network components etc.
Software Software refers to the collection of programs used with in the
database system. It includes :
1. Operating System
The operating System manages all the hardware components and
makes it possible for all other software to run on the computers.
UNIX, LINUX, Microsoft Windows etc are the popular operating
systems used in database environment.
2. DBMS Software
DBMS software manages the database with in the database
system. Oracle Corporation's ORACLE, IBM's DB2, Sun's MYSQL,
Microsoft's MS Access and SQL Server etc are the popular DBMS
(RDBMS) software used in the database environment.
3. Application Programs and Utilities
Application programs and utilities software are used to access and
manipulate the data in the database and to manage the operating
environment of the database.
People in a Database System Environment
People component includes the following five types of users in a database
system
System Administrators
Data designer
Database Administrators
System Analysts and Programmers
End Users
System Administrators oversees the database system's general
operations.
Data Designer (Architect) prepare the conceptual design
Database Administrator (DBA) physically implements and
maintains the database according to the logical design.
System Analysts and programmers design and implement the
application programs.
End Users are the people who use the application. For example
in case of a banking system, the employees, customer using
ATM or online banking facility are end users.
6
Procedures in a Database Environment
Procedures are the instructions and business rules that govern the design
and use of the database system.
Data in the Database
Data are the very important basic entity in a database. It is the collection of
facts stored in the database.
Roles in the Database Environment
There are 4 distinct types of people who involve in Database Environment:
Data & Database Administrators
Database Designers
Application Developers
End-Users
DA (data administrator) is responsible for the management of the data
resource:
Database planning
Development & maintenance of standards
Policies and procedures
Conceptual/logical database design
DBA (Database Administrator) is responsible for the physical realisation of
the database:
Physical database design & implementation
Security & integrity control
Maintenance of operational system
Ensuring satisfactory performance of the applications for
users Database designers is concerned with:
Identifying the data
Identifying relationship between entities and attributes
Identify the relationships between the data
Understand the constraints on the data (business rules)
The work of the logical database designers can be split into
two stages:
Conceptual database design
o Independent of implementation details
o Application programs
o Programming languages
Logical database design
o Specific data models
o E.g.: relational, network, hierarchical or object-oriented
Physical database designer decides how the logical database design
is to be physically realised. It involves:
Mapping the logical database design into a set of tables &
integrity constraints
Selecting specific storage structures and access methods for
the data
Designing any security measures
Application Developers
They works from the specification produced by systems analysts
7
Each program may contain statements that request the DBMS to
perform some operation:
Retrieving data
Insert data
Delete data
Updating data
End Users
End Users are the Clients of the database and they can be
classified as:
o Naïve users
Typically unaware of the DBMS
o Sophisticated users
Familiar with the structure of the DBMS
May use a high-level query language to perform
required operation
History of Database Management Systems
It is believed that the Lack of structural independence was the main cause.
8
1970's- 1990's: The emergence of the relational DBMS on the hands of
Edgar Codd. He worked at IBM, and he was unhappy with the navigational
model of the CODASYL APPROACH. To him, a tool for searching, such as a
search facility was very useful, and it was absent . In 1970, he proposed a
new approach to database construction, which made the creation of a
Relational DBMS intended for Large Shared Data Banks, possible and easy
to grab.
This was a new system for entering data and working with big databases,
where the idea was to use a table of records. All tables will be then linked by
either one to one relationships, one to many, or many to many. when
elements took space and were not useful, it was easy to remove them from
the original table, and all the other "entries" in other tables linked to this
record were removed. Worth mentioning, is that two initial projects were
launched, the R program at IBM, and INGRES program at the university of
California. In 1985, the object oriented DBMS was developed, but it did not
have any booming commercial profit because of the high unjustified costs to
change systems, and format.
In 1990, the DBMS took on a new object oriented approach joint with
relational DBMS. In this approach, text, multimedia, internet and web use
in conjunction with DBMS were available and possible.
9
A user view is a logical description of some portion of database that is
required by a user to perform some task. A user view is often a form or
report that comprises data from more than one table.
Increased Productivity:
There are two reasons for the rapid development of applications:
The programmer concentrates on the specific functions required for
new application, without having to worry about file design or low level
implementation details.
DBMS provides a number of high level productivity tools such as form
and report generators and high-level languages that automate the
activities of database design and implementation.
Enforcement of standards:
The standards include naming conventions, data quality standards and
uniform procedures for accessing, updating and protecting data.
DBMS provides powerful set of tools for developing and enforcing the above
standards.
Improved Data Quality:
The DBMS provides number of tools and processes to improve data quality.
Two of the more important processes are the following.
Database designers can specify the integrity constraints that are
enforced by the DBMS. A constraint is a rule that can’t be violated by
database users.
One of the objectives of data warehouse environment is to clean up
operational data before they are placed in the data warehouse.
10
Greater impact on system failure:
This is similar to “All eggs in a same basket”. As database is a shared
resource, it must be available to all users all times. So the failure of system
causes greater impact on the organization.
Complex Backup and Recovery procedures:
The organization must maintain complex procedures for backup and
recovery. Back up refers maintaining an additional copy of data to use when
there is damage to the data in the database. Recovery procedures recover
the database when damage occurs.
The Three-Level ANSI-SPARC (DATABASE) Architecture
The Architecture of most of commercial dbms are available today is mostly
based on this ANSI-SPARC database architecture.
ANSI SPARC THREE-TIER architecture has main three levels:
1. Internal Level
2. Conceptual Level
3. External Level
These three levels provide data abstraction. It means it hides the
low level complexities from end users .
A database system should be efficient in performance and convenient
in use.
Using these three levels,it is possible to use complex structures at
internal level for efficient operations and to provide simpler convenient
interface at external level.
1. Internal level:
This is the lowest level of data abstraction.
It describes how the data are actually stored on storage devices.
It is also known as physical level.
It provides internal view of physical storage of data.
It deals with complex low level data structures, file structures and
access methods in detail.
It also deals with Data Compression and Encryption techniques,if
used.
2. Conceptual level:
This is the next higher level than internal level of data abstraction.
It describes what data are stored in the database and what
relationships exist among those data.
It is also known as Logical level.
11
It hides low level complexities of physical storage.
Database administrator and designers work at this level to
determine what data to keep in database.
Application developers also work on this level.
3. External Level:
This is the highest level of data abstraction.
It describes only part of the entire database that a end user concern.
It is also known as an view level.
End users need to access only part of the database rather than entire
database.
Different user need different views of database.And so,there can
be many view level abstractions of the same database.
Advantages of Three-tier Architecture:
The main objective of it is to provide data abstraction.
Same data can be accessed by different users with different
customized views.
The user is not concerned about the physical data storage details.
Physical storage structure can be changed without requiring changes
in internal structure of the database as well as users view.
Conceptual structure of the database can be changed without
affecting end users.
12
DATA INDEPENDENCE
A major objective for the three-level architecture is to provide data
independence, which means that upper levels are unaffected by changes to
lower levels. There are two levels of data independence:
1. Physical Data Independence
2. Logical Data Independence
These are described below:
1. Physical Data Independence:
Physical Data Independence is the ability to modify the physical
schema without requiring any change in application programs.
Modifications at the internal levels are occasionally necessary to
improve performance. Possible modifications at internal levels are
change in file structures, compression techniques, hashing
algorithms, storage devices, etc.
Physical data independence separates conceptual levels from
the internal levels.
This allows to provide a logical description of the database without the
need to specify physical structures.
Comparatively, it is easy to achieve physical data independence.
2. Logical Data Independence:
Logical data independence is ability to modify the conceptual schema
without requiring any change in application programs.
13
Modification at the logical level is necessary whenever the logical
structures of the database are altered.
Logical data independence separates external level from the
conceptual view.
Comparatively it is difficult to achieve logical data independence.
Application programs are heavily dependent on logical structures of
the data they access.so any change in logical structure also requires
programs to change.
Database Languages
A data sublanguage consists of two parts: a Data Definition Language
(DDL) and a Data Manipulation Language (DML). The DDL is used to
specify the database schema and the DML is used to both read and
update the database. These languages are called data sublanguages
because they do not include constructs for all computing needs, such as
conditional or iterative statements, which are provided by the high-level
programming languages. Many DBMSs have a facility for embedding the
sublanguage in a high-level programming language such as COBOL,
Fortran, Pascal, Ada, C, C++, C#, Java, or Visual Basic. In this case, the
high-level language is sometimes referred to as the host language.
DATA MODELS
The structure of the database is called the data models.A Collection of
conceptual tools for describing data, data relationship, data semantic and
consistency constraint.
There are three different groups.
Record-based logical models
1. Relational model
2. Network model
3. Hierarchical model
Object-based logical models
1. The entity-relationship model
2. The object-oriented model
3. The semantic data model
4. The functional data model
14
Physical models
1. Unifying model
2. Frame-memory model
Record-based logical models
Record based logical models are used in describing data at the logical
and view levels.
Record-base models are named as database structure have fixed
format records of several types.
Each record type define a fixed number of fields or attributes.
Each attributes and each fields is a usually of a fixed length.
There are three most widely used record-based models are:
Relational model:
The relational model was introduced by E.F. Codd in 1970. The basic data
structure of the relational model is the table. In a table the information
about an entity is stored. The are represented in rows and columns. Each
row is an instance of entity type. Each column is an attribute of an entity. In
model there are no physical links as in hierarchical and network model.
Advantages
o Structural independence
o Improved conceptual simplicity
o Easier database design, implementation, management, and use
o Ad hoc query capability with SQL
o Powerful database management system
15
Disadvantages
o Large hardware and system software overhead
o Possibility of Poor design and implementation
May promote “islands of information” problems
Network model:
Data in the network model are represented by collections of record
and relationships among data are represented by links, which can be
viewed as pointers.
The records in the database are organized as collection of arbitrary
graphs
Disadvantages
o System complexity
o Lack of structural independence
Hierarchical model:
In hierarchical model the data and relationships among
the data are represented by records and links.
It is same as network model but differs in terms of
organization of records as collections of trees rather than
graphs.
16
Hierarchical Model
Advantages Hierarchical model :
1. Simplicity
2. Data Security and Data Integrity
3. Efficiency
Disadvantages Hierarchical model :
1. Implementation Complexity
2. Lack of structural independence
Object-based logical models
Object-based logical models are used in describing data at the logical
and the view levels.
They provide fairly flexible structuring capabilities and allow data
constraints to be specified explicitly.
There are many different models more widely known models are:
Entity-relationship model:
It is the graphical representation of entities, attributes and
relationships. It is developed by peter chen in 1976. ER models are
normally represented in an entity relationship diagram(ERD). The ER
model is based on the following components.
The entity name is a noun and is generally written in capital letters
and is written in the singular form. An entity is represented in the
ERD by a rectangle.
An attribute is represented by an ellipse. Each entity is described by a
set of attributes.
Relationships describe associations among data. The name of the
relationship usually is a verb. For example, a PAINTER paints many
PAINTINGS. Relationship is represented by a diamond.
17
The following diagram is an example for entity relationship model.
• Advantages
– Exceptional conceptual simplicity
– Visual representation
– Effective communication tool
– Integrated with the relational database model
• Disadvantages
– Limited constraint representation
– Limited relationship representation
– No data manipulation language
– Loss of information content
Object-oriented model:
In the object oriented data model, both data and their relationships
(operations) are contained in a single structure known as an object. In turn
the OODM is the basis for the object-oriented database management
system (OODBMS). A traditional database stores just data and not
procedures. In contrast, an Object oriented database (OODB) stores objects.
An object consists of data and related methods. The OODM is said to be a
semantic (meaningful) data model.
The OO data model is based on the following componets.
Object: It is an abstraction of real world entity. It is equivalent to ER model
entity.
Attribute: It describes the properties of an object.
Method: It specifies how an operation is performed on data.
Class: It includes the data members and a set of methods.
Inheritance: It is the ability of object to inherit attributes and methods of
classes above it
18
• Advantages
– Adds semantic content
– Visual presentation includes semantic content
– Database integrity
– Both structural and data independence
• Disadvantages
– Lack of Standards
– Complex navigational data access
– High system overhead slows transactions
Semantic model:
The Semantic Data Model (SDM), like other data models, is a way of
structuring data to represent it in a logical way. SDM differs from other data
models, however, in that it focuses on providing more meaning of the data
itself, rather than solely or primarily on the relationships and attributes of
the data.
SDM provides a high-level understanding of the data by abstracting it
further away from the physical aspects of data storage
Functional data model:
In conventional database systems, procedures, data structures and actual
content are usually separated. Thus, a conventional database management
systems (DBMS) provides users with a possibility to store, modify or
retrieve data that structured in accordance with a current database
schema.
19
It should be especially noted, that a DBMS retrieves data as they were
stored into the database and additional procedures can be applied to such
data as an independent level of application programs.
In contrast, the functional data model provides an unified approach to
manipulation both data and procedures. Main idea of the functional data
model is a definition of all components of an information system in the form
of functions. Thus, for example, the functional data model defines data
objects, attributes and relationships as so-called database functions.
Moreover, a Functional Data Manipulation Language is a number of data
manipulation functions which can be applied to database functions. Finally,
users are provided with a special mechanism which is called Lambda
Calculus to define their own functions which can be seamlessly combined
with database and data
Physical Data Models
Physical data models describe how data is stored in the computer,
representing information such as record structures, record orderings, and
access paths. There are not as many physical data models as logical data
models; the most common ones are the unifying model and the frame
memory.
FUNCTIONS OF DBMS
There are several functions that a DBMS performs to ensure data integrity
and consistency of data in the database. The functions in the DBMS are
explained below
1. Data Dictionary Management
Data Dictionary is a loction where the DBMS stores definitions of the data
and their relationships (metadata). This function removes structural and
data dependency and provides data abstraction. The Data Dictionary is
hidden from the user and is used by Database Administrators and
Programmers.
2. Data Storage Management
This particular function is used for the storage of data and any related data
entry forms or screen definitions, report definitions etc. Users do not need to
know how data is stored or manipulated. Data storage structure effects the
speed of operation.
3. Data Transformation and Presentation
This function exists to transform any data entered into required data
structures. By using the data transformation and presentation function the
DBMS can determine the difference between logical and physical data
formats.
4. Security Management
This is one of the most important functions in the DBMS. Security
management determines specific users that are allowed to access the
database. Users are given a username and password. This function
determines what specific data any user can see or manage.
20
5. Multiuser Access Control
Data integrity and data consistency are the basis of this function. Multiuser
access control is a very useful tool in a DBMS, it enables multiple users to
access the database simultaneously without affecting the integrity of the
database.
6. Backup and Recovery Management
Backup and recovery functions are essential in a database. Recovery
management is how to recover the database after the outage. Backup
management refers maintaing additional copy of data for the data safety
and integrity.
7. Data Integrity Management
The DBMS enforces the rules to reduce data redundancy, and maximize
data consistency, due to this data integrity management is possible.
8. Database Access Languages and Application Programming Interfaces
Database approah provids non procedural query languages such as SQL to
provide easy access to database. It also provides API with the concepts such
as VB.NET, JAVA.
9. Database Communication Interfaces
A DBMS can provide access to the database using the Internet through Web
Browsers (Mozilla Firefox, Internet Explorer, Netscape).
Components of DBMS (structure of DBMS)
DBMSs are highly complex and sophisticated pieces of software that aim to
provide the services. It is not possible to generalize the component structure
of a DBMS, as it varies greatly from system to system.
The major software components in a DBMS environment are depicted below:
21
Query processor. This is a major DBMS component that transforms
queries into a series of low-level instructions directed to the database
manager.
Database manager (DM). The DM interfaces with user-submitted
application programs and queries. The DM accepts queries and
examines the external and conceptual schemas to determine what
conceptual records are required to satisfy the request. The DM then
places a call to the file manager to perform the request.
File manager. The file manager manipulates the underlying storage
files and manages the allocation of storage space on disk. It
establishes and maintains the list of structures and indexes defined in
the internal schema. If hashed files are used, it calls on the hashing
functions to generate record addresses. However, the file manager
does not directly manage the physical input and output of data.
Rather, it passes the requests on to the appropriate access methods,
which either read data from or write data into the system buffer (or
cache).
DML preprocessor: This module converts DML statements embedded
in an application program into standard function calls in the host
22
language. The DML preprocessor must interact with the query
processor to generate the appropriate code.
DDL compiler :The DDL compiler converts DDL statements into a set of
tables containing metadata. These tables are then stored in the
system catalog while control information is stored in data file headers.
Catalog manager: The catalog manager manages access to and
maintains the system catalog. The system catalog is accessed by most
DBMS components.
The major software components for the database manager are as follows:
Authorization control. This module confirms whether the user has
the necessary authorization to carry out the required operation.
Command processor. Once the system has confirmed that the user has
authority to carry out the operation, control is passed to the command
processor.
Integrity checker. For an operation that changes the database, the
integrity checker checks whether the requested operation satisfies
all necessary integrity constraints (such as key constraints).
Query optimizer. This module determines an optimal strategy for the
query execution.
Transaction manager. This module performs the required processing of
operations that it receives from transactions.
Scheduler. This module is responsible for ensuring that concurrent
operations on the database proceed without conflicting with one
another. It controls the relative order in which transaction operations
are executed.
Recovery manager. This module ensures that the database remains in
a consistent state in the presence of failures. It is responsible for
transaction commit and abort.
Buffer manager. This module is responsible for the transfer of data
between main memory and secondary storage, such as disk and tape.
The recovery manager and the buffer manager are sometimes referred
to collectively as the data manager. The buffer manager is sometimes
known as the cache manager.
ERD DEFINITION & SYMBOLS USED IN ERD
ENTITY RELATIONSHIP MODELING
Entity-Relationship Model: A database can be modeled as a collection of
entities, relationship among entities in a graphical representation. This
model is called Entity Relationship Model. It first developed by peter chen. It
has become more popular because it is easy to understand.
The ER model is based on the following components:
Entities:
23
An entity may be a person, place, object, event or even a concept about
which the organization wish to maintain the data in the database.
Examples: Person : CUSTOMER, EMPLOYEE
Place : WAREHOUSE, CITY, STORE
Object: PRODUCT, MACHINE
Event : SALE, ADMISSION
Concept : ACCOUNT, COURSE
Attributes :
The properties or characteristics of an entity type. For example the
attributes of CUSTOMER entity may be customer number, customer name,
address, phone number etc.
Relationship:
The meaningful association between entities is called relationship. The
relationships may of many types such as one – to – one relationship, one –
to – many relationship, many – to – many relationship.
Eg EMPLOYEE is assigned PARKING PLACE
ORDER contains ORDERLINE
STUDENT takes COURSE.
Entity set
An entity set is a set of entities of the same type that share the same
properties.
Example: set of all persons, companies, trees, holidays
Entity Relationship Diagram (E R diagram): ERD is the graphical
representation of an entity-relationship model. It includes the components
such as entities, relationships, attributes with different symbols.
24
Rectangles represent entity sets.
Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship sets.
Underline indicates primary key attributes
The basic ER model symbols are represented below
25
Strong entity
Weak entity
Associative entity
Strong Entity:
An entity whose existence is independent of other entities. Its primary key is
not derived from the attributes of other entities.
Weak Entity
An entity whose existence is dependent of other entity. Its primary key is
partially or fully derived from related entity.
Example:
26
An associative entity
27
In ER Model attributes can be classified into the following types.
Simple and Composite Attribute
Single Valued and Multi Valued attribute
Stored and Derived Attributes
Identifier and composite identifier
Required and optional attributes
Simple and Composite Attribute
Simple attribute that consist of a single atomic value. A composite attribute
is an attribute that can be further subdivided. For example the attribute
ADDRESS can be subdivided into street, city, state, and zip code. A simple
attribute cannot be subdivided. For example the attributes age, sex etc are
simple attributes.
Simple Attribute: Attribute that consist of a single atomic value.
Example: Salary, age etc
Composite Attribute: Attribute value not
atomic. Example: Address :
‘House_no:City:State
Name : ‘First Name: Middle Name: Last Name’
Single Valued and Multi Valued attribute
A single valued attribute can have only a single value. For example a person
can have only one 'date of birth', 'age' etc. That is a single valued attributes
can have only single value. But it can be simple or composite attribute. That
is 'date of birth' is a composite attribute , 'age' is a simple attribute. But
both are single valued attributes.
Multi valued attributes can have multiple values. For instance a person may
have multiple phone numbers, multiple degrees etc.
Stored and Derived Attributes
The value for the derived attribute is derived from the stored attribute. For
example 'Date of birth' of a person is a stored attribute. The value for the
attribute 'AGE' can be derived by subtracting the 'Date of Birth'(DOB) from
the current date. Stored attribute supplies a value to the related attribute.
Stored Attribute: An attribute that supplies a value to the related attribute.
Example: Date of Birth
Derived Attribute: An attribute that’s value is derived from a stored
28
attribute.
29
Example : age, and it’s value is derived from the stored attribute Date of
Birth.
Identifier and composite identifier
An identifier is an attribute that identifies each instance of entity uniquely.
For example student_number is an identifier in student table.
Composite identifier is an identifier which is formed with the combination of
two or more attributes. For example (emp_num, course_id) is a composite
identifier in CERTIFICATE entity.
Required and Optional attributes:
Required attribute is an attribute that must have a value, i.e. it cannot be
null.
Example: In EMPLOYEE Table, emp_name must not be null so we can say it
a required attribute.
Optional attribute is an attribute that may or may not have a value for each
instance of entity. Often it accepts null
Example. Email is an optional attribute for student table. Because every
student may not have email address.
DIFFERENCE BETWEEN COMPOSITE IDENTIFIER & COMPOSITE
ATTRIBUTE
The composite identifier consists two or more attributes to meet the unique
identification property. But the composite attribute can be divided into two
or more meaningful components (attributes)
composite identifier
30
DOMAIN (2 MARKS)
Domain is a set of possible values for an attribute. For example a domain of
marks attribute include the values from 0 to 100. Similarly a domain of
course attribute include the values { B.Sc., B.Com, B.A.,B.B.M}
IMPLEMENTING MULTI VALUED ATTRIBUTE
An attribute that takes more than one value for each instance of entity type
is called multi valued attribute.
For example : In an EMPLOYEE entity, the attribute skill is a multi valued
attribute. Because each employee may have more than one skill. This
attribute can be represented as follows:
The value for the derived attribute is derived from the stored attribute. For
example 'Date of birth' of a person is a stored attribute. The value for the
attribute 'AGE' can be derived by subtracting the 'Date of Birth'(DOB) from
the current date. Stored attribute supplies a value to the related attribute.
31
The following are the advantages and disadvantages of storing and not
storing of derived attributes.
32
The PK of parent entity acts as foreign key in the child entity.
Example
33
In the above example, DEPENDENT is an existence dependent on
EMPLOYEE. The relationship between those entities is called identifying
relationship. Here the PK of child entity is derived from parent entity.
EXISTENCE DEPENDENCE
An entity is said to be existence dependent if it can exist in the database if
the related entity occurrence exists. For example a D EPENDENT entity
instance exists only when its concerned EMPLOYEE Entity instance exists.
RELATIONSHIP PARTICIPATION (OR) OPTIONAL AND MANDATORY
PARTICIPATION OF ENTITIES IN RELATIONSHIPS
34
Example:
35
“Is_Married_To” is unary 1:1 relationship between the instances of
PERSON entity type. i.e One person is married to only one other
person.
“Manages’ is unary 1:M relationship between the instances of
EMPLOYEE. Because an employee manages one or more other
employees
Binary Relationship:
It is the relationship between the instances of two entity types. Binary
relationship is the most common type of relationship in data modeling. Here
the degree of the relationship is 2.
Example:
36
“Contains” is the binary 1:M relationship between the instances of
ORDER and ORDERLINE. It means an order may contain one or more
order lines.
“Registers_For” is the binary M:N relationship between the instances
of STUDENT and COURSE. It means a student may register for many
courses similarly a course may be registered by many students.
Ternary Relationship:
It is the simultaneous relationship among the instances of three entity
types. It is recommended to convert ternary relationships to associative
entities. Here the degree of the relationship is 3.
Example:
37
“Is_Married_To” is unary/recursive 1:1 relationship between the
instances of PERSON entity type. i.e One person is married to only
one other person.
“Manages’ is unary/recursive 1:M relationship between the instances
of EMPLOYEE. Because an employee manages one or more other
employees
38
the entities, using the format (x,y). The first value represents the minimum
number of associated entities, and the second value represents the
maximum number of associated entities.
Minimum Cardinality: The minimum number of instances of one entity
that may be associated with each instance of other entity.
Maximum Cardinality: The maximum number of instances of one entity
that may be associated with each instance of other entity.
Example:
The below diagram shows connectivities and cardinalities in both chen and
crow’s foot notation.
39
ENHANCED ENTITY RELATIONSHIP MODELING (EERM) (OR)
USE OF SUPER TYPE/ SUB TYPE RELATIONSHIPS IN DATA
MODELING.
40
Example 1 in peter chen notation:
Employee super type with three subtypes
41
Example 2 in Crow’s Foot Notation
42
GENERALIZATION AND SPECIALIZATION IN SUPER TYPE AND SUB
TYPE RELATIONSHIPS. (OR)
THE VARIOUS APPROACHES TO DEVELOP SUPER TYPE
SUB TYPE RELATIONSHIPS
43
(b) – Generalization to VEHICLE supertype
44
The below example demonstrates Specialization process.
Here an organization wish to store details about a PRODUCT. But here some
attributes are applicable to one set of instances and some other
attributes/relationships are specific to one other set of instances so the
specialization is done as follows.
45
SUB TYPE DISCRIMINATOR
Subtype Discriminator: An attribute of the supertype whose values
determine the target subtype(s)
Disjoint – a simple attribute with alternative values to indicate the
possible subtypes
Overlapping – a composite attribute whose subparts are related to
different subtypes. Each subpart contains a boolean value to
indicate whether or not the instance belongs to the associated
subtype
Example.
46
47
CONSTRAINTS ON SUPER TYPE SUB TYPE RELATIONSHIPS.
(OR)
DISJOINT NESS CONSTRAINT AND COMPLETENESS
CONSTRAINT WITH EXAMPLE.
Disjointness Constraints: Whether an instance of a supertype may
simultaneously be a member of two (or more) subtypes.
Disjoint Rule: An instance of the supertype can be member of only
ONE of the subtypes
Overlap Rule: An instance of the supertype could be member of
more than one of the subtypes
48
Completeness Constraints: Whether an instance of a supertype must also
be a member of at least one subtype
Total Specialization Rule:It specifies that each entity instance of the
super type must be a member of at least one sub type in the
relationship. It is represented with double line in ERD
Partial Specialization Rule: It specifies that an entity instance of the
super type is allowed that is not belongs to any subtype. It is
represented with single line in the ERD.
49
50
For additional information refer the following books:
1. Database System Concepts, Silberschatz, Korth, McGraw hill, V edition.3rd Edition
2. Database Management Systems, Raghurama Krishnan, Johannes Gehrke, Tata Mc Graw Hill
51