ER Diagram Representation
ER Diagram Representation
Let us now learn how the ER Model is represented by means of an ER diagram. Any object, for example,
entities, attributes of an entity, relationship sets, and attributes of relationship sets, can be represented
with the help of an ER diagram.
Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they represent.
Attributes
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse
represents one attribute and is directly connected to its entity (rectangle).
If the attributes are composite, they are further divided in a tree like structure. Every node is then
connected to its attribute. That is, composite attributes are represented by ellipses that are connected with
an ellipse.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is written inside the
diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.
Binary Relationship and Cardinality
A relationship where two entities are participating is called a binary relationship. Cardinality is the number
of instance of an entity from a relation that can be associated with the relation.
One-to-one − When only one instance of an entity is associated with the relationship, it is marked
as '1:1'. The following image reflects that only one instance of each entity should be associated
with the relationship. It depicts one-to-one relationship.
One-to-many − When more than one instance of an entity is associated with a relationship, it is
marked as '1:N'. The following image reflects that only one instance of entity on the left and more
than one instance of an entity on the right can be associated with the relationship. It depicts one-
to-many relationship.
Many-to-one − When more than one instance of entity is associated with the relationship, it is
marked as 'N:1'. The following image reflects that more than one instance of an entity on the left
and only one instance of an entity on the right can be associated with the relationship. It depicts
many-to-one relationship.
Many-to-many − The following image reflects that more than one instance of an entity on the left
and more than one instance of an entity on the right can be associated with the relationship. It
depicts many-to-many relationship.
Participation Constraints
Total Participation − Each entity is involved in the relationship. Total participation is represented by
double lines.
Partial participation − Not all entities are involved in the relationship. Partial participation is
represented by single lines.
Generalization Aggregation
The ER Model has the power of expressing database entities in a conceptual hierarchical manner. As the
hierarchy goes up, it generalizes the view of entities, and as we go deep in the hierarchy, it gives us the
detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to represent a more
generalized view. For example, a particular student named Mira can be generalized along with all the
students. The entity shall be a student, and further, the student is a person. The reverse is
called specialization where a person is a student, and that student is Mira.
Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain the
properties of all the generalized entities, is called generalization. In generalization, a number of entities are
brought together into one generalized entity based on their similar characteristics. For example, pigeon,
house sparrow, crow and dove can all be generalized as Birds.
Specialization
Specialization is the opposite of generalization. In specialization, a group of entities is divided into sub-
groups based on their characteristics. Take a group ‘Person’ for example. A person has name, date of
birth, gender, etc. These properties are common in all persons, human beings. But in a company, persons
can be identified as employee, employer, customer, or vendor, based on what role they play in the
company.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what
role they play in school as entities.
Inheritance
We use all the above features of ER-Model in order to create classes of objects in object-oriented
programming. The details of entities are generally hidden from the user; this process known
as abstraction.
Inheritance is an important feature of Generalization and Specialization. It allows lower-level entities to
inherit the attributes of higher-level entities.
For example, the attributes of a Person class such as name, age, and gender can be inherited by lower-
level entities such as Student or Teacher.
Codd's 12 Rules
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came up
with twelve rules of his own, which according to him, a database must obey in order to be regarded as a
true relational database.
These rules can be applied on any database system that manages stored data using only its relational
capabilities. This is a foundation rule, which acts as a base for all the other rules.
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table cell.
Everything in a database must be stored in a table format.
Rule 2: Guaranteed Access Rule
Every single data element (value) is guaranteed to be accessible logically with a combination of table-
name, primary-key (row value), and attribute-name (column value). No other means, such as pointers, can
be used to access data.
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a very
important rule because a NULL can be interpreted as one the following − data is missing, data is not
known, or data is not applicable.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in an online catalog, known as data
dictionary, which can be accessed by authorized users. Users can use the same query language to
access the catalog which they use to access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that supports data definition, data
manipulation, and transaction management operations. This language can be used directly or by means of
some application. If the database allows access to data without any help of this language, then it is
considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the system.
Rule 7: High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updation, and deletion. This must not be limited to a single
row, that is, it must also support union, intersection and minus operations to yield sets of data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that access the database. Any
change in the physical structure of a database must not have any impact on how the data is being
accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its user’s view (application). Any change in logical
data must not affect the applications using it. For example, if two tables are merged or one is split into two
different tables, there should be no impact or change on the user application. This is one of the most
difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the application. This rule makes a database
independent of the front-end application and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed over various locations. Users should
always get the impression that the data is located at one site only. This rule has been regarded as the
foundation of distributed database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must not be able
to subvert the system and bypass security and integrity constraints.
Key constraints
Domain constraints
Referential integrity constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely.
This minimal subset of attributes is called key for that relation. If there are more than one such minimal
subsets, these are called candidate keys.
Key constraints force that −
in a relation with a key attribute, no two tuples can have identical values for key attributes.
a key attribute cannot have NULL values.
Key constraints are also referred to as Entity Constraints.
Domain Constraints
Attributes have specific values in real-world scenario. For example, age can only be a positive integer.
The same constraints have been tried to employ on the attributes of a relation. Every attribute is bound to
have a specific range of values. For example, age cannot be less than zero and telephone numbers
cannot contain a digit outside 0-9.
Referential integrity Constraints
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a
relation that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or same
relation, then that key element must exist.
ER Model to Relational Model
ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship, which is
easier to understand. ER diagrams can be mapped to relational schema, that is, it is possible to create
relational schema using ER diagram. We cannot import all the ER constraints into relational model, but an
approximate schema can be generated.
There are several processes and algorithms available to convert ER Diagrams into Relational Schema.
Some of them are automated and some of them are manual. We may focus here on the mapping diagram
contents to relational basics.
ER diagrams mainly comprise of −
Mapping Process
Mapping Process
Mapping Process
Create tables for all higher-level entities.
Create tables for lower-level entities.
Add primary keys of higher-level entities in the table of lower-level entities.
In lower-level tables, add all other attributes of lower-level entities.
Declare primary key of higher-level table and the primary key for lower-level table.
Declare foreign key constraints.
SQL Overview
SQL is a programming language for Relational Databases. It is designed over relational algebra and tuple
relational calculus. SQL comes as a package with all major distributions of RDBMS.
SQL comprises both data definition and data manipulation languages. Using the data definition properties
of SQL, one can design and modify database schema, whereas data manipulation properties allows SQL
to store and retrieve data from database.
Data Definition Language
SQL uses the following set of commands to define database schema −
CREATE
Creates new databases, tables and views from RDBMS.
For example −
Create database dbms;
Create table article;
Create view for_students;
DROP
Drops commands, views, tables, and databases from RDBMS.
For example−
Drop object_type object_name;
Drop database dbms;
Drop table article;
Drop view for_students;
ALTER
Modifies database schema.
Alter object_type object_name parameters;
For example−
Alter table article add subject varchar;
This command adds an attribute in the relation article with the name subject of string type.
Data Manipulation Language
SQL is equipped with data manipulation language (DML). DML modifies the database instance by
inserting, updating and deleting its data. DML is responsible for all forms data modification in a database.
SQL contains the following set of commands in its DML section −
SELECT/FROM/WHERE
INSERT INTO/VALUES
UPDATE/SET/WHERE
DELETE FROM/WHERE
These basic constructs allow database programmers and users to enter data and information into the
database and retrieve efficiently using a number of filter options.
SELECT/FROM/WHERE
SELECT − This is one of the fundamental query command of SQL. It is similar to the projection
operation of relational algebra. It selects the attributes based on the condition described by
WHERE clause.
FROM − This clause takes a relation name as an argument from which attributes are to be
selected/projected. In case more than one relation names are given, this clause corresponds to
Cartesian product.
WHERE − This clause defines predicate or conditions, which must match in order to qualify the
attributes to be projected.
For example −
Select author_name
From book_author
Where age > 50;
This command will yield the names of authors from the relation book_author whose age is greater than 50.
INSERT INTO/VALUES
This command is used for inserting values into the rows of a table (relation).
Syntax−
INSERT INTO table (column1 [, column2, column3 ... ]) VALUES (value1 [, value2, value3 ... ])
Or
INSERT INTO table VALUES (value1, [value2, ... ])
For example −
INSERT INTO dbms (Author, Subject) VALUES ("anonymous", "computers");
UPDATE/SET/WHERE
This command is used for updating or modifying the values of columns in a table (relation).
Syntax −
UPDATE table_name SET column_name = value [, column_name = value ...] [WHERE condition]
For example −
UPDATE dbms SET Author="webmaster" WHERE Author="anonymous";
DELETE/FROM/WHERE
This command is used for removing one or more rows from a table (relation).
Syntax −
DELETE FROM table_name [WHERE condition];
For example −
DELETE FROM dbms
WHERE Author="unknown";
DBMS - Normalization
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional
dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples
must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F +, is the set of all functional
dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly,
generates a closure of functional dependencies.
Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding
attributes in dependencies, does not change the basic dependencies.
Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c
also holds. a → b is called as a functionally that determines b.
Trivial Functional Dependency
Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a
trivial FD. Trivial FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.
Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a
completely non-trivial FD.
Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database
administrator. Managing a database with anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked to each other properly, then it
could lead to strange situations. For example, when we try to update one data item having its
copies scattered over several places, a few instances get updated properly while a few others are
left with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent state.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the
attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.
Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the following −
Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime
attribute.
If we follow second normal form, then every non-prime attribute should be fully functionally dependent on
prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y
→ A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to
the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any
of the prime key attribute individually. But we find that Stu_Name can be identified by Stu_ID and
Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not
allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy
−
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find
that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime
attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that
−
We understand the benefits of taking a Cartesian product of two relations, which gives us all the possible
tuples that are paired together. But it might not be feasible for us in certain cases to take a Cartesian
product where we encounter huge relations with thousands of tuples having a considerable large number
of attributes.
Join is a combination of a Cartesian product followed by a selection process. A Join operation pairs two
tuples from different relations, if and only if a given join condition is satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta condition. The join
condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the attributes don’t
have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.
Student
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Student_Detail −
STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above example
corresponds to equijoin.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a Cartesian product
does. We can perform a Natural Join only if there is at least one common attribute that exists between two
relations. In addition, the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the relations are
same.
Courses
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
HoD
Dept Head
CS Alex
ME Maya
EE Mira
Courses ⋈ HoD
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only those tuples with
matching attributes and the rest are discarded in the resulting relation. Therefore, we need to use outer
joins to include all the tuples from the participating relations in the resulting relation. There are three kinds
of outer joins − left outer join, right outer join, and full outer join.
Left
A B
100 Database
101 Mechanics
102 Electronics
Right
A B
100 Alex
102 Maya
104 Mira
Courses HoD
A B C D
A B C D
Courses HoD
A B C D