DBMS Unit-I Notes
DBMS Unit-I Notes
DBMS Unit-I Notes
Database System Applications: A Historical Perspective, File Systems versus a DBMS, the Data
Introduction to Database Design: Database Design and ER Diagrams, Entities, Attributes, and
Entity Sets, Relationships and Relationship Sets, Additional Features of the ER Model,
Conceptual Design With the ER Model
Data:
Data is the known facts or figures that have implicit meaning. It can also be defined as it is the
representation of facts, concepts or instruction in a formal manner, which is suitable for
understanding and processing. Data can be represented in alphabets (A-Z, a-z),in digits(0-9) and
using special characters(+,-.#,$, etc) e. g: 25, “ ajit ” etc.
Information:
Information is the processed data on which decisions and actions are based. Information can be
defined as the organized and classified data to provide meaningful values. Eg: “The age of Ravi
is 25”
The traditional file oriented approach to information processing has for each application a
separate master file and its own set of personal file. In file oriented approach the program
dependent on the files and files become dependent on the files and files become dependents upon
the programs.
1) Data redundancy and inconsistency: The same information may be written in several files.
This redundancy leads to higher storage and access cost. It may lead data inconsistency that is
the various copies of the same data may longer agree for example a changed customer address
may be reflected in single file but not elsewhere in the system.
2) Difficulty in accessing data: The conventional file processing system does not allow data to
retrieve in a convenient and efficient manner according to user choice.
3) Data isolation: Because data are scattered in various file and files may be in different formats
with new application programs to retrieve the appropriate data is difficult.
4) Integrity Problems: Developers enforce data validation in the system by adding appropriate
code in the various application programs. However when new constraints are added, it is difficult
to change the programs to enforce them.
6) Concurrent access: In the file processing system it is not possible to access a same file for
transaction at same time.
7) Security problems: There is no security provided in file processing system to secure the data
from unauthorized user access.
Database:
It is a collection of interrelated data. These can be stored in the form of tables. A database can be
of any size and varying complexity.
Example: Customer database consists the fields as cname, cno, and ccity
My SQL
Oracle
IBM DB2
SQLite
Advantages of DBMS:
Concurrent Access: A DBMS schedules concurrent access to the data in such manner that users
can think of the data being accessed by only one user at a time.
Data Security: The DBA who has the ultimate responsibility for the data in the DBMS can
ensure that proper access procedures are followed including proper authentication schemas for
access to the DBS and additional check before permitting access to sensitive data.
Backup and Recovery: It provides backup and recovery subsystems which create automatic
backup of data, from hardware and software failures and restores the data if required.
Data Independence: Ina database system, the database management system provides the
interface between the application programs and the data. When changes are made to the data
representation, the meta data obtained by the DBMS is changed but the DBMS is continues to
provide the data to application program in the previously used way. The DBMs handles the task
of transformation of data wherever necessary.
Applications of DBMS:
Library Management System: There are thousands of books in the library so it is very difficult
to keep record of all the books in a copy or register. So DBMS used to maintain all the
information relate to book issue dates, name of the book, author and availability of the book.
Banking: For customer information, account activities, payments, deposits, loans, etc.
Airlines: For reservations and schedule information.
Universities: For to store the information about all instructors, students, departments, and course
offerings, colleges, grades.
Telecommunication: It helps to keep call records, monthly bills, maintaining balances, etc.
Manufacturing: It is used for the management of supply chain and for tracking production of
items.
For example distribution centre should keep a track of the product units that supplied into the
centre as well as the products that got delivered out from the distribution centre on each day.
HR Management: For information about employees, salaries, payroll, deduction, generation of
paychecks, etc.
Online Shopping:
Online shopping has become a big trend of these days. No one wants to go to shops and waste
his time. Everyone wants to shop from home. So all these products are added and sold only with
the help of DBMS. Purchase information, invoice bills and payment, all of these are done with
the help of DBMS.
History of DBMS
1960’s:
From the earliest days of computers storing and manipulating data have been a major application
focus. The first general purpose DBMS was designed by Charles Bachman at General Electric
in the early 1960s, called Integrated Data Store. It formed the basis for network data model,
which was standardized by the Conference on Data Systems Languages(CODASYL) type of
DBMS, the hierarchical DBMS.
In the late 1960’s IBM developed the Information Management System(IMS) DBMS, used even
today in many major illustrations. IMS formed as basis for an alternative data representation
framework called the hierarchical data model. The SABRE system for making airline
reservations was jointly developed by American Airlines and IBM at the same time and allows
the people to access the same data through computer network.
In 1970’s, Edger codd, at IBM, proposed a new data representation frame work called the
relational model: It sparked the rapid development of several DBMS’s based on the relational
model.
In 1980’s, the relational model consolidated this position as the dominant DBMS paradigm and
database systems continued to gain widespread use. The SQL query language for relational
databases developed at IBM. SQL was standardized in the late 1980’s and current standard
SQL: 1999, was adopted by ANSI and ISO.
In the late1980’s and 1990’s advances were made in many areas of database systems. Several
vendors (DB2, Oracle 8) extended their systems with ability to store new data types such as
images and text. Specialized systems have been developed by numerous for creating data
warehouses, consolidating data from several databases, and for carrying specialized analysis.
The Internet Age has perhaps influenced the data models much more. Data models were developed
using object oriented programming features, embedding with scripting languages like Hyper Text
Markup Language (HTML) for queries. With humongous data being available online, DBMS is
gaining more importance as more data is brought online and made over more accessible through
computer networking.
In the early years of computing, punch card was used in unit record machines for input, data
storage and processing this data. Data was entered offline and for both data, and computer
programs input. This input method is similar to voting machines now a days. This was the only
method, where it was fast to enter data, and retrieve it, but not to manipulate or edit it.
After that era, there was the introduction of the file type entries for data, then the DBMS as
hierarchical, network, and relational.
Components of DBMS
The database management system can be divided into five major components, they are:
1. Hardware
2. Software
3. Data
1. Procedures
2. Database Access Language
Hardware:
When we say Hardware, we mean computer, hard disks, I/O channels for data, and any other
physical component involved before any data is successfully stored into the memory.
When we run Oracle or MySQL on our personal computer, then our computer's Hard Disk, our
Keyboard using which we type in all the commands, our computer's RAM, ROM all become a
part of the DBMS hardware.
Software
This is the main component, as this is the program which controls everything. The DBMS
software is more like a wrapper around the physical database, which provides us with an easy-to-
use interface to store, access and update data.
The DBMS software is capable of understanding the Database Access Language and intrepret it
into actual database commands to execute them on the DB.
Data:
Data is that resource, for which DBMS was designed. The motive behind the creation of DBMS
was to store and utilize data.
In a typical Database, the user saved Data is present and meta data is stored.
Metadata is data about the data. This is information stored by the DBMS to better understand
the data stored in it.
For example: When I store my Name in a database, the DBMS will store when the name was
stored in the database, what is the size of the name, is it stored as related data to some other data,
or is it independent, all this information is metadata.
Procedures:
Procedures refer to general instructions to use a database management system. This includes
procedures to setup and install a DBMS, To login and logout of DBMS software, to manage
databases, to take backups, generating reports etc.
Database Access Language
Database Access Language is a simple language designed to write commands to access, insert,
update and delete data stored in any database.
A user can write commands in the Database Access Language and submit it to the DBMS for
execution, which is then translated and executed by the DBMS, can create new databases, tables,
insert data, fetch stored data, update data and delete the data using the access language.
Users
Database Administrators: Database Administrator or DBA is the one who manages the
complete database management system. DBA takes care of the security of the DBMS, it's
availability, managing the license keys, managing user accounts and access etc.
End User: These days all the modern applications, web or mobile, store user data. How do you
think they do it? Yes, applications are programmed in such a way that they collect user data and
store the data on DBMS systems running on their server. End users are the one who store,
retrieve, update and delete data.
Data Model:
Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Data models are used for to describe the structure of the database.
Hierarchical Model
In a Hierarchical database, model data is organized in a tree-like structure. The hierarchy starts
from the Root data, and expands like a tree, adding child nodes to the parent nodes. Data is
represented using a parent-child relationship. In Hierarchical DBMS parent may have many
children, but children have only one parent.
This model efficiently describes many real-world relationships like index of a book, recipes etc.
In hierarchical model, data is organized into tree-like structure with one one-to-many
relationship between two different types of data, for example, one department can have many
courses, many professors and of-course many students.
Network Model
The network database model allows each child to have multiple parents. It helps you to address
the need to model more complex relationships like as the orders/parts many-to-many
relationship. In this model, entities are organized in a graph which can be accessed through
several paths.
Relational model
Relational DBMS is the most widely used DBMS model because it is one of the easiest. This
model is based on normalizing data in the rows and columns of the tables. Relational model
stored in fixed structures and manipulated using SQL.
Ex: an instance of a student relation.
The above relation contains four attributes sid, name, age,gpa and five tuples or records or rows.
Object-Oriented Model
In Object-oriented Model data stored in the form of objects. The structure which is called classes
which display data within it. It defines a database as a collection of objects which stores both
data members values and operations.
Entity Relationship Model: The entity-relationship (E-R) data model uses a collection of basic
objects, called entities, and relationships among these objects. An entity is a “thing” or “object”
in the real world that is distinguishable from other objects. A relationship is an association
among several entities. For example, a depositor relationship associates a customer with each
account that she has
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with an
abstract view of the data. That is, the system hides certain details of how the data are stored and
maintained.
Data Abstraction
Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding
irrelevant details from user is called data abstraction.
A data definition language (DDL) is used to define the external and conceptual schemas
Physical level: This is the lowest level of data abstraction. It the physical schema summarizes
how the relations described in the conceptual schema are actually stored on secondary storage
devices such as disks and tapes.
The access methods like sequential or random access and file organization methods like B+
trees, hashing used for the same.
Suppose we need to store the details of an employee. Blocks of storage and the amount of
memory used for these purposes is kept hidden from the user.
Conceptual level: The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The conceptual schema describes all
relations that are stored in the database. In our sample university database, these relations contain
information about entities, such as students and faculty, and about relationships, such as
students’ enrollment in courses.
View level:
External schema allows data access to be customized (and authorized) at the level of individual
users or groups of users. Any given database has exactly one conceptual schema and one
physical schema because it has just one set of stored relations, but it may have several external
schemas. Each external schema consists of a collection of one or more views and relations from
the conceptual schema. A view is conceptually a relation, but the records in a view are not stored
in the DBMS, they are computed using a definition for the view, in terms of relations stored in
the DBMS.
For example, we might want to allow students to find out the names of faculty members teaching
courses, as well as course enrollments.
This can be done by defining the following view: Courseinfo (cid: string, fname: string,
enrollment: integer)
A user can treat a view just like a relation and ask questions about the records in the view. Even
though the records in the view are not stored explicitly, they are computed as needed. We did not
include Courseinfo in the conceptual schema because we can compute Courseinfo from the
relations in the conceptual schema, and to store it in addition would be redundant. Such
redundancy, in addition to the wasted space, could lead to inconsistencies.
Data Independence:
A very important advantage of using a DBMS is that it offers data independence. The ability to
modify schema definition in one level without affecting schema of that definition in the next
higher level is called data independence. Data independence is achieved through use of the three
levels of data abstraction;
There are two levels of data independence; they are Physical data independence and Logical data
independence.
Physical data independence is the ability to modify the physical schema without causing
application programs to be rewritten. Modifications at the physical level are occasionally
necessary to improve performance. It means we change the physical storage/level without
affecting the conceptual or external view of the data.
Logical data independence is the ability to modify the logical schema without causing
application program to be rewritten. Modifications at the logical level are necessary whenever
the logical structure of the database is altered.
For example, suppose that the Faculty relation in our university database is replaced by the
following two relations:
Faculty public(fid: string, fname: string, office: integer)
Faculty private(fid: string, sal: real)
The Courseinfo view relation can be redefined in terms of Faculty public and Faculty private,
which together contain all the information in Faculty, so that a user who queries Courseinfo will
get the same answers as before.
The design of a database at physical level is called physical schema, how the data stored in
blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data
records gets stored in data structures, however the internal details such as implementation of data
structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user
interaction with database systems.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called
instance of database. Database schema defines the variable declarations in tables that belong to a
particular database; the value of these variables at a moment of time is called the instance of that
database.
For example, lets say we have a single table student in the database, today the table has 100
records, so today the instance of the database has 100 records. Lets say we are going to add
another 100 records in this table by tomorrow so the instance of database tomorrow will have
200 records in table. In short, at a particular moment the data stored in database is called the
instance, that changes over time when we add or delete data from the database.
Database Architecture
The architecture of a database system is greatly influenced by the underlying computer system
on which the database is running. The Database systems can be Centralized, Client-server,
Parallel (multi-processor), Distributed, the architecture database is divided into two types: Two-
tier architecture and Three-tier architecture.
Two-tier architecture: In Two-tier architecture the application resides at the client machine,
where it invokes database system functionality at the server machine through query language
statements. Application program interface standards like ODBC and JDBC are used for
interaction between the client and the server.
Example: Transaction performed by the bank employee after filling the deposit form by the
customer. Another example is railway reservation made by the employee in reservation counter.
Three-tier architecture: In three-tier architecture, the client machine acts as merely a front end
and does not contain any direct database calls. Instead, the client end communicates with an
application server, usually through a forms interface. The application server in turn
communicates with a database system to access data. The business logic of the application,
which says what actions to carry out under what conditions, is embedded in the application
server, instead of being distributed across multiple clients. Three-tier applications are more
appropriate for large applications, and for applications that run on the Worldwide Web.
Example: Amount transferred by the user through application program (Net banking) without
physically going to the bank. Another example is railway reservation made by the user from his
location without going to the reservation counter.
Three tier architecture is provides scalability when more no of clients or users interact with the
database.
Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written by the application programmer previously.
For example, a user who wishes to find her account balance over the World Wide Web. Such a
user may access a form, where she enters her account number. An application program at the
Web server then retrieves the account balance, using the given account number, and passes this
information back to the user.
The typical user interface for naive users is a forms interface, where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated from the
database.
Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
There are also special types of programming languages that combine imperative control
structures (for example, for loops, while loops and if-then-else statements) with statements of the
data manipulation language. These languages, sometimes called fourth-generation languages,
often include special features to facilitate the generation of forms and the display of data on the
screen. Most major commercial database systems include a fourth generation language.
Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the storage manager
understands. Analysts who submit queries to explore data in the database fall in this category.
Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view
summaries of data in different ways. For instance, an analyst can see total sales by region (for
example, North, South, East, and West), or by product, or by a combination of region and
product (that is, total sales of each product in each region). The tools also permit the analyst to
select specific regions, look at data in more detail (for example, sales by city within a region) or
look at the data in less detail (for example, aggregate products together by category). Another
class of tools for analysts is data mining tools, which help them find certain kinds of patterns in
data.
Sophisticated users who write specialized database applications that do not fit into the traditional
data processing framework. Among these applications are computer-aided design systems,
knowledge- base and expert systems, systems that store data with complex data types (for
example, graphics data and audio data), and environment-modeling systems.
Chapters 8 and 9 cover several of these applications.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA).
Schema definition: The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
Schema and physical-organization modification: The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
Granting of authorization for data access: By granting different types of authorization, the
database administrator can regulate which parts of the data base various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
Routine maintenance:
Periodically backing up the database, either onto tapes or onto remote servers, to prevent
loss of data in case of disasters such as flooding.
Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.
Storage Manager
A storage manager is a program module that provides the interface between the low level data
stored in the database and the application programs and queries submitted to the system. The
storage manager is responsible for the interaction with the file manager. The raw data are stored
on the disk using the file system, which is usually provided by a conventional operating system.
The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in
the database.
Authorization and integrity manager: Tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
Transaction manager: Ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
File manager: Manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
Buffer manager: It is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.
Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Introduction to Database Design
Database Design: The database design process can be divided into six steps.
Requirements Analysis: The very first step in designing a database application is to understand
what data is to be stored in the database, what applications must be built on top of it, and what
operations are most frequent and subject to performance requirements. In other words, we must
find out what the users want from the database.
This is usually an informal process that involves discussions with user groups, a study of the
current operating environment and how it is expected to change, analysis of any available
documentation on existing applications that are expected to be replaced or complemented by the
database, and so on. Several methodologies have been proposed for organizing and presenting
the information gathered in this step, and some automated tools have been developed to support
this process.
Conceptual Database Design: The information gathered in the requirements analysis step is
used to develop a high-level description of the data to be stored in the database, along with the
constraints that are known to hold over this data. This step is often carried out using the ER
model, or a similar high-level data model, and is discussed in the rest of this chapter.
Logical Database Design: We must choose a DBMS to implement our database design, and
convert the conceptual database design into a database schema in the data model of the chosen
DBMS. We will only consider relational DBMSs, and therefore, the task in the logical design
step is to convert an ER schema into a relational database schema.
Schema Refinement: The fourth step in database design is to analyze the collection of relations
in our relational database schema to identify potential problems, and to refine it. In contrast to
the requirements analysis and conceptual design steps, which are essentially subjective, schema
refinement can be guided by some elegant and powerful theory.
Physical Database Design: In this step we must consider typical expected workloads that our
database must support and further refine the database design to ensure that it meets desired
performance criteria. This step may simply involve building indexes on some tables and
clustering some tables, or it may involve a substantial redesign of parts of the database schema
obtained from the earlier design steps.
Security Design: In this step, we identify different user groups and different roles played by
various users (e.g., the development team for a product, the customer support representatives, the
product manager). For each role and user group, we must identify the parts of the database that
they must be able to access and the parts of the database that they should not be allowed to
access, and take steps to ensure that they can access only the necessary parts.
Entity-Relationship Model:
An Entity–relationship model (ER model) describes the structure of a database with the help of
a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a
design or blueprint of a database that can later be implemented as a database.
The entity-relationship (E-R) data model perceives the real world as consisting of basic objects,
called entities, and relationships among these objects. It develops a conceptual design for the
database.
The E-R data model employs three basic notions: entity sets, relationship sets, and attributes.
For example, Suppose we design a school database. In this database, the student will be an
entity with attributes like address, name, id, age, etc.
Fig: ER Model
Entity Sets:
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects.
An entity has a set of properties, and the values for some set of properties may uniquely identify
an entity.
An entity set is a set of entities of the same type that share the same properties, or attributes. An
entity set is represented by a rectangle, and an attribute is represented by an oval.
For example, the Employees entity set could use name, social security number (ssn), and parking
lot (lot) as attributes.
name
ssn lot
Employees
An entity which doesn’t have its key attribute, it is uniquely identified by the key attribute of
another entity set is called weak entity. It can be represented by using double rectangle.
For example: weak entity transactions is identified using ATM ID in ATM Transactions.
The attributes have been simple; that is, they are not divided into subparts. Composite attributes,
on the other hand, can be divided into subparts (that is, other attributes).
The attributes in our examples all have a single value for a particular entity. For instance, the
CustID attribute for a specific customer entity refers to only one customer. Such attributes are
said to be single valued.
There may be instances where an attribute has a set of values for a specific entity. Consider an
customer entity set with the attribute Custphone. A customer may have more than one phone
numbers. This type of attribute is said to be multi valued. The Multi valued attribute is modeled
in ER using double circle
Derived attribute: The value for this type of attribute can be derived from the values of other
related attributes or entities. Derived attribute is modeled in ER using dotted ellipse.
Suppose that the student entity set has an attribute age, which indicates the students’s age. If the
student entity set also has an attribute date-of-birth, we can calculate age from date-of-birth, age
is a derived attribute.
Relationship Sets:
The relationship set Works In, in which each relationship indicates a department in which an
employee works. Note that several relationship sets might involve the same entity sets. For
example, we could also have a Manages relationship set involving Employees and Departments.
A relationship can also have descriptive attributes. Descriptive attributes are used to record
information about the relationship, rather than about any one of the par- ticipating entities; for
example, we may wish to record that Attishoo works in the pharmacy department as of January
1991. This information is captured by adding an attribute, since, to Works In.
Binary Relationship:
When there are two entities set participating in a relation, the relationship is called as binary
relationship.For example, Student is enrolled in Course.
Ternary Relationship:
When there are 3 entities set participating in a relation, the relationship is called as Ternary
The number of times an entity of an entity set participates in a relationship set is known as
cardinality. Cardinality can be of different types:
1. Key constraints:
The values of the attribute, values of an entity such that uniquely identify the entity.
Keys also help to identify relationship uniquely, thus distinguish relationship from each
other.
For example consider the relationship set called manages between the employees and
department’s entity sets such that each department has at most one manager, although
single employee is allowed to manage more than one department. The restriction each
department has at most one manager is the key constraint. This constraint represented in
ER diagram using arrow from Departments to Manages. Here ssn is key attribute of
employees entity set, did is key attribute of departments entity set
ssn
name since did dname budget
lot
Manages Departments
Employees
Participation Constraint:
Participation Constraint is applied on the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of student will be total. Total
participation is shown by double line in ER diagram.
2. Partial Participation – The entity in the entity set may or may not participate in the
relationship. If some courses are not enrolled by any of the student, the participation of
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Weak entities:
An entity can be identified uniquely only by considering some of its attributes in conjunction
with the primary key of another entity is called weak entity.
It must hold the following restrictions
The owner entity set and the weak entity set must participate in one to-many relationship set.
This relationship set is called identifying relationship set of the weak entity set. The weak entity
set must have the total participation in the identifying relationship set.
For example a dependents entity can be identified uniquely only if we take the key of the owing
entity employees. Here the relationship between dependents and policy is total, it indicates that
each dependent entity is appears in at most one policy relationship
pname age
ssn lot cost
name
Class Hierarchies:
Class hierarchies are used to classify the entities of entity set into sub classes. It represented
using ISA (is a) relationship and can be viewed in two ways.
Generalization: Generalization is the process of extracting common properties from a set of
entities and creates a generalized entity from it. It is a bottom-up approach in which two or more
entities can be generalized to a higher level entity if they have some attributes in common. For
Example,
STUDENT and FACULTY can be generalized to a higher level entity called PERSON as shown
in Figure 1. In this case, common attributes like P_NAME, P_ADD become part of higher entity
(PERSON) and specialized attributes like S_FEE become part of specialized entity (STUDENT).
Aggregation: Aggregation allows us to indicate that the relationship set participates in another
relationship set. An ER diagram is not capable of representing relationship between an entity and
a relationship which may be required in some scenarios. In those cases, a relationship with its
corresponding entities is aggregated into a higher level entity.
For Example, Employee working for a project may require some machinery. So, REQUIRE
relationship is needed between relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is
aggregated into single entity and relationship REQUIRES is created between aggregated entity
and MACHINERY.
Example2: suppose that we have entity set called projects and each project entity is sponsored
by one or more departments. A department that sponsors a project might assign employees to
monitor the sponsorship.
Ssn name lot
Employees
Monitors until
dname budget
since
did
Started_o
pid pbudget
n
Fig: Aggregation
Conceptual Design with the ER Model:
3. What are the relationship sets and their participating entity sets? Should we use binary or
ternary relationships?
Sometimes it is not clear whether a property should be modeled as an attribute or an entity set.
We have to record only one value; it can be modeled as attribute. Modeled as Entity set, we
have to record more than one value and we want to capture the structure of property in ER
model.
For example: to model a concept as an entity set rather than an attribute an attribute. Employees
works for a department, it records the interval during an employee works for a department.
did dname
name to e budget
Ssn lot from
Suppose an employee works a department more than one period, the problem is that we want to
record several values for descriptive attributes for each instance of the works_in relationship.
We could define duration as entity set in association with works_in relationship.
did name budget
Ssn name
lot
Departments
Employees Works_in
from Duration to
Consider the relationship manages suppose each department manager is given a discretionary
budget.
If the discretionary budget is sum that covers all departments managed by employee, in this case
manages relationship that involves a given employee will have in the dbudget field. It leading to
redundant storage of same information.
This problem can be addressed by adding a new entity set called managers is a hierarchy with
employees entity set, with dbudget is an attribute of manager, and since is an attribute of
manages relationship.
dname
ssn did
name lot budget
since
ISA
Manager dbudget
Consider the ER diagram an employee can own several policies, each policy can be owned by
several employees, and each dependent covered by several policies, and dependents entity must
be deleted if the owning employees entity is deleted.
Employees Dependents
Benficiary
purchaser
We can model Concept a project can be sponsored by any number of number of departments, a
department can sponsor one or more projects, and each sponsorship is monitored by one or more
employees using aggregation.
Employees
Monitors until
Started_on
Departments
Projects sponsors
Fig: Aggregation
If we don’t need to record the until attribute of monitors, i.e we can’t express the constraint each
sponsorship is monitored by at most one employee, then we might use ternary relationship.
Employees
dname
did
Started_on budget
pbudget
pid
ER Diagrams to practice:
ER diagram of Bank Management System