DBMS Unit 2 Part 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

UNIT - 2

Relational Data Model in DBMS


In relational data model, the data and relationships are represented by collection of inter-related
tables (or relations). Each table is a group of column and rows, where column represents
attribute of an entity and rows represents records. The table name and column names are
helpful to interpret the meaning of values in each row. In the formal relational model
terminology, a row is called a tuple, a column header is called an attribute, and the table is
called a relation. The data type describing the types of values that can appear in each column
is represented by a domain of possible values.

Sample relationship model: Student table with 3 columns and four records.

Table: Student

Stu_Id Stu_Name Stu_Age

111 Ashish 23

123 Saurav 22

169 Lester 24

234 Lou 26

Table: Course

Stu_Id Course_Id Course_Name

111 C01 Science

111 C02 DBMS

169 C22 Java

169 C39 Computer Networks

Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id &
Course_Name are attributes of table Course. The rows with values are the records (commonly
known as tuples).

Page 1
Relational model Concepts

Table (Relation): In the Relational model, the relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows represent
records and columns represent attributes.

A table is a collection of data elements organised in terms of rows and columns. A table is also
considered as a convenient representation of relations. But a table can have duplicate row of
data while a true relation cannot have duplicate data. Table is the most simplest form of data
storage.
Each table has a name in database.
For example, the following table “STUDENT” stores the information of students in database.
Table: STUDENT

Student_Id Student_Name Student_Addr Student_Age

101 Chaitanya Dayal Bagh, Agra 27

102 Ajeet Delhi 26

103 Rahul Gurgaon 24

104 Shubham Chennai 25

Attribute (Field or Column name): Each column in a Table. Attributes are the
properties which define a relation.

The above table “STUDENT” has four fields (or attributes): Student_Id, Student_Name,
Student_Addr & Student_Age.

Tuple (Record or Row): It is nothing but a single row of a table, which contains a single
record.

Each row of a table is known as record. It is also known as tuple.

For example, the following row is a record that we have taken from the above table.

102 Ajeet Delhi 26

Page 2
Domain: A domain is a set of values permitted for an attribute in a table. Domain is atomic.

For example, a domain of month-of-year can accept January, February, …, December as


values, a domain of dates can accept all possible valid dates etc. We specify domain of attribute
while creating a table.

An attribute cannot accept values that are outside of their domains. For example, In the above
table “STUDENT”, the Student_Id field has integer domain so that field cannot accept values
that are not integers for example, Student_Id cannot have values like, “First”, 10.11 etc.

Relation Schema: A relational schema is the design for the table. It includes none of the
actual data, but is like a blueprint or design for the table, so describes what columns are on the
table and the data types. It may show basic table constraints (e.g. if a column can be null) but
not how it relates to other tables.

A relational schema specifies the relation name (table name), its attributes & their names &
type and the domain of each attribute.

A relation schema R, denoted by R(A1, A2, ..., An), is made up of a relation name R and a list
of attributes, A1, A2, ..., An. Each attribute Ai is the name of a role played by some domain D
in the relation schema R. D is called the domain of Ai and is denoted by dom(Ai). The relation
schema R(A1, A2, ..., An), also denoted by r(R), is a set of n -tuples r= {t1,t2, ...,tm}.

Degree (Arity): The total number of attributes in the relation is known as degree of the
relation.

For Example:

STUDENT (Student_Id, Student_Name, Student_Addr, Student_Age)

STUDENT relation schema contains four attributes, so this relation is of degree 4.

Cardinality: Total number of rows present in the Table.

The number of tuples in a relation is known as cardinality. The STUDENT relation defined
above has cardinality 4.

Relation Instance: The finite set of tuples of a relation at a particular instance of time is
called as relation instance. Relation instances do not have duplicate tuples.

Table 1 shows the relation instance of STUDENT at a particular time. It can change whenever
there is insertion, deletion or updation in the database.

Page 3
NULL Value: The value which is not known or unavailable is called NULL value. It is
represented by blank space.

A field with a NULL value is a field with no value.

Primary key can’t be a null value.

Page 4
Characteristics of Relations / Characteristics of Relational Database Model

As we know we have several relations in a database. Now, each relation must be


uniquely identified. If it is not so, then it would create a lot of confusion. Here, we will discuss
some characteristics that when followed will automatically make a relation distinct in a
database.

1.) Each relation in a database must have a distinct or unique name which would separate it
from the other relations in a database.

2.) A relation must not have two attributes with the same name. Each attribute must have a
distinct name.

3.) Duplicate tuples must not be present in a relation.

Page 5
4.) Each tuple must have exactly one data value for an attribute.

For example, below in the first table, you can see that for Roll_No. 265 we have enrolled
two students Jhoson and Charles, this would not work. We must have only one student for
one Roll_No.

5.) Ordering of Tuples in a Relation: A relation is defined as a set of tuples. The tuples in a
relation do not have any particular order. In other words, a relation is not sensitive to the
ordering of tuples. The definition of a relation does not specify any order: There is no
preference for one ordering over another. Hence, Tuples in a relation do not have to follow
a significant order as the relation is not order-sensitive.

6.) Ordering of Attributes in a Relation: The ordering of attributes is not important, because
the attribute name appears with its value. There is no reason to prefer having one attribute
value appear before another in a tuple.

7.) Values in a tuple: All values are considered atomic. A special null value is used to
represent values that are unknown or inapplicable to certain tuples. In general, NULL
values, means value unknown or value exists but is not available.

8.) Interpretation (Meaning) of a Relation: The relation schema can be interpreted as a


declaration or a type of assertion. Each tuple in the relation can then be interpreted as a fact
or a particular instance of the assertion.

Page 6
KEYS in DBMS
 Keys play an important role in the relational database.
 It is used to uniquely identify any record or row of data from the table. It is also used
to establish and identify relationships between tables.

KEYS in DBMS is an attribute or set of attributes which helps you to identify a row (tuple) in
a relation (table). They allow you to find the relation between two tables. Keys help you
uniquely identify a row in a table by a combination of one or more columns in that table.
Database Key is also helpful for finding unique record or row of data from the table.

Example:

Employee ID FirstName LastName

11 Andrew Johnson
22 Tom Wood
33 Alex Hale

In the above example, Employee ID is used as a key because it uniquely identifies an employee
record. In this table, no other employee can have the same Employee ID.

Why we need a Key?

Here are some reasons for using SQL key in the DBMS system.

 Keys help you to identify any row of data in a table. In a real-world application, a table
could contain thousands of records. Moreover, the records could be duplicated. Keys
ensure that you can uniquely identify a table record despite these challenges.

 Allows you to establish a relationship between and identify the relation between tables.

 Help you to enforce identity and integrity in the relationship.

Page 7
Types of Keys in Database Management System
There are mainly seven different types of Keys in DBMS and each key has its different
functionality:

 Super Key: A Super Key is a group of single or multiple keys which identifies rows in a
table.

 Primary Key: Primary Key is a column or group of columns in a table that uniquely
identify every row in that table.

 Candidate Key: Candidate Key is a set of attributes that uniquely identify tuples in a
table. Candidate Key is a super key with no repeated attributes.

 Alternate Key: Alternate Key is a column or group of columns in a table that uniquely
identify every row in that table.

 Foreign Key: Foreign Key is a column that creates a relationship between two tables. The
purpose of Foreign keys is to maintain data integrity and allow navigation between two
different instances of an entity.

 Compound Key: Compound Key has two or more attributes that allow you to uniquely
recognize a specific record. It is possible that each column may not be unique by itself
within the database.

 Composite Key: Composite Key is a combination of two or more columns that uniquely
identify rows in a table.

 Surrogate Key: An artificial key which aims to uniquely identify each record is called a
surrogate key. These kinds of key are unique because they are created when you don't have
any natural primary key.

Page 8
Super Key
A super key is a group of single or multiple keys which identifies rows in a table. A Super key
may have additional attributes that are not needed for unique identification.

 The set of attributes which can uniquely identify a tuple in the given relation is known
as Super Key.
 A super key is not restricted to have any specific number of attributes. Thus, a super
key may consist of any number of attributes.
 Super key is a superset of a Candidate key.
 Adding zero or more attributes to candidate key generates super key.
 A candidate key is a super key but vice versa is not true.

Example:

EmpSSN EmpNum Empname


9812345098 AB05 Shown
9876512345 AB06 Roslyn
199937890 AB07 James

In the above example, EmpSSN and EmpNum are Super Keys.

Primary Key
PRIMARY KEY is a column or group of columns in a table that uniquely identify every tuple
(row) in that table. The Primary Key can't be a duplicate meaning the same value can't appear
more than once in the table. A table cannot have more than one primary key.

Rules for defining Primary key:

 The value of primary key must always be unique.


 Two rows can't have the same primary key value
 It’s must for every row to have a primary key value.
 The value of primary key field can never be NULL.
 The value in a primary key column can never be modified or updated if any foreign
key refers to that primary key.
 The value of primary key must be assigned when inserting a record.
 A relation is allowed to have only one primary key.

Example:

In the following example, StudID is a Primary Key.

StudID Roll No First Name LastName Email


1 11 Tom Price [email protected]
2 12 Nick Wright [email protected]
3 13 Dana Natan [email protected]
Page 9
Candidate Key
CANDIDATE KEY is a set of attributes that uniquely identify tuples in a table. Candidate
Key is a super key with no repeated attributes. The Primary key should be selected from the
candidate keys. Every table must have at least a single candidate key. A table can have multiple
candidate keys but only a single primary key.

Properties of Candidate key:

 All the attributes in a candidate key are sufficient as well as necessary to identify each
tuple uniquely.
 Removing any attribute from the candidate key fails in identifying each tuple uniquely.
 The value of candidate key must always be unique.
 The value of candidate key can never be NULL.
 Candidate key may have multiple attributes.
 It should contain minimum fields to ensure uniqueness.
 Uniquely identify each record in a table.
 There can be more than one candidate keys for a table.
 A candidate key can be a combination of more than one column (attributes).
 It is possible to have multiple candidate keys in a relation.
 Those attributes which appears in some candidate key are called as prime attributes.

Example:

In the given table, StudID, Roll No, and Email are candidate keys which help us to uniquely
identify the student record in the table.

StudID Roll No First Name LastName Email


1 11 Tom Price [email protected]
2 12 Nick Wright [email protected]
3 13 Dana Natan [email protected]

Remember:

Page 10
Alternate Key
ALTERNATE KEYS is a column or group of columns in a table that uniquely identify every
row in that table. A table can have multiple choices for a primary key but only one can be set
as the primary key. All the keys which are not primary key are called an Alternate Key.

Example:

In the given table, StudID, Roll No, and Email are qualified to become a primary key. But
since StudID is the primary key therefore Roll No, Email becomes the alternative key.

StudID Roll No First Name LastName Email


1 11 Tom Price [email protected]
2 12 Nick Wright [email protected]
3 13 Dana Natan [email protected]

Foreign Key
FOREIGN KEY is a column that creates a relationship between two tables. Foreign keys are
the columns of a table that points to the primary key of another table. The purpose of Foreign
keys is to maintain data integrity and allow navigation between two different instances of an
entity. It acts as a cross-reference between two tables as it references the primary key of another
table.

NOTES:

 Foreign key references the primary key of the table.


 Foreign key can take only those values which are present in the primary key of the
referenced relation.
 Foreign key may have a name other than that of a primary key.
 Foreign key can take the NULL value.
 There is no restriction on a foreign key to be unique.
 In fact, foreign key is not unique most of the time.
 Referenced relation may also be called as the master table or primary table.
 Referencing relation may also be called as the foreign table.
Page 11
Example:

Table: Department

DeptCode DeptName
001 Science
002 English
005 Computer

Table: Teacher

Teacher ID Fname Lname


B002 David Warner
B017 Sara Joseph
B009 Mike Brunton

In this key in DBMS example, we have two table, Teacher and Department in a school.
However, there is no way to see which teacher work in which department.

In this table, adding the foreign key in DeptCode to the Teacher name, we can create a
relationship between the two tables.

Teacher ID DeptCode Fname Lname


B002 002 David Warner
B017 002 Sara Joseph
B009 001 Mike Brunton

This concept is also known as Referential Integrity.

Compound Key
COMPOUND KEY has two or more attributes that allow you to uniquely recognize a specific
record. It is possible that each column may not be unique by itself within the database.
However, when combined with the other column or columns the combination of composite
keys become unique. The purpose of the compound key in database is to uniquely identify each
record in the table.

Example:

OrderNo ProductID Product Name Quantity


B005 JAP102459 Mouse 5
B005 DKT321573 USB 10
B005 OMG446789 LCD Monitor 20
B004 DKT321573 USB 15
B002 OMG446789 Laser Printer 3

In this example, OrderNo and ProductID can't be a primary key as it does not uniquely identify
a record. However, a compound key of OrderID and ProductID could be used as it uniquely
identified each record.

Page 12
Composite Key
COMPOSITE KEY is a combination of two or more columns that uniquely identify rows in
a table. The combination of columns guarantees uniqueness, though individually uniqueness
is not guaranteed. Hence, they are combined to uniquely identify records in a table.

A primary key comprising of multiple attributes and not just a single attribute is called as a
composite key.

The difference between compound and the composite key is that any part of the compound key
can be a foreign key, but the composite key may or maybe not a part of the foreign key.

Surrogate Keys
SURROGATE KEYS is an artificial key which aims to uniquely identify each record is called
a surrogate key. This kind of partial key in DBMS is unique because it is created when you
don't have any natural primary key. They do not lend any meaning to the data in the table.
Surrogate key is usually an integer. A surrogate key is a value generated right before the record
is inserted into a table.

Fname Lastname Start Time End Time


Anne Smith 09:00 18:00
Jack Francis 08:00 17:00
Anna McLean 11:00 20:00
Shown Willam 14:00 23:00

Above, given example, shows shift timings of the different employee. In this example, a
surrogate key is needed to uniquely identify each employee.

Properties of Surrogate key:


 It is unique for all the records of the table.
 It is updatable.
 It cannot be NULL i.e. it must have some value.

Surrogate keys in SQL are allowed when

 No property has the parameter of the primary key.


 In the table when the primary key is too big or complicated.

Page 13
Difference Between Primary Key & Foreign Key
Here is the important difference between Primary key and Foreign key:

S. NO. PRIMARY KEY FOREIGN KEY

1. A primary key is used to ensure data A foreign key is a column or group of


in the specific column is unique. columns in a relational database table
that provides a link between data in
two tables.

2. It uniquely identifies a record in the It refers to the field in a table which is


relational database table. the primary key of another table.

3. Only one primary key is allowed in a Whereas more than one foreign key is
table. allowed in a table.

4. It is a combination of UNIQUE and It can contain duplicate values and a


Not Null constraints. table in a relational database.

5. It does not allow NULL values. It can also contain NULL values.

6. The value of the primary key cannot The value of foreign key value can be
be deleted from the parent table. deleted from the child table.

7. Its constraint can be implicitly defined Its constraint cannot be defined on the
on the temporary tables. local or global temporary tables.

8. The primary key is a clustered index, A foreign key cannot automatically


and data in the DBMS table are create an index, clustered, or non-
physically organized in the sequence clustered.
of the clustered index.

9. No two rows can have any identical A foreign key can contain duplicate
values for a primary key. values.

10. There is no limitation in inserting the While inserting any value in the
values into the table column. foreign key table, ensure that the value
is present into a column of a primary
key.

Page 14
Relational Model Constraints

Relational model constraints are restrictions specified to the data values in the relational
database.

Constraints enforce limits to the data or restriction of data that can be inserted/updated/deleted
from a table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table.

Constraints on the database can generally be divided into three main categories:

1.) Inherent Model-Based Constraints or Implicit Constraints


2.) Schema-Based Constraints or Explicit Constraints
3.) Application-based Constraints or Semantic Constraints

1.) Inherent Model-Based Constraints or Implicit Constraints:

The constraints that are inherent (implicit) in a data model are inherent model-based
constraints or implicit constraints. For example, a relation in a database must not have
duplicate tuples, there is no constraint in the ordering of the tuples and attributes.

2.) Schema-Based Constraints or Explicit Constraints:

The constraints that are specified while defining the schema of a database using DDL are
schema-based constraints or explicit constraints.

They are further categorized as domain constraints, key constraints, entity integrity
constraints, referential integrity constraints and constraints on Null Value.

3.) Application-based Constraints or Semantic Constraints:

The constraints that cannot be applied while defining the database schema and hence
must be expressed and enforced by the application programs. This is known as application-
based or semantic constraints or business rules. For example, the salary of an employee
cannot be more than his supervisor.

The schema-based constraints include:

1.) Domain Constraint


2.) Key Constraint and Constraint on NULL values
3.) Entity Integrity Constraint
4.) Referential Integrity Constraint

Page 15
1.) Domain Constraint

Each table has certain set of columns and each column allows a same type of data, based on its
data type. The column does not accept values of any other data type.

Domain constraints can be defined as the definition of a valid set of values for an attribute.

The data type of domain includes string, character, integer, time, date, currency, etc. The value
of the attribute must be available in the corresponding domain.

Domain constraints are user defined data type and we can define them like this:

Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY /
FOREIGN KEY / CHECK / DEFAULT)

Example:

Here, value ‘A’ is not allowed since only integer values can be taken by the age attribute.

2.) Key Constraint and Constraint on NULL Values

An attribute that can uniquely identify a tuple in a relation is called the key of the table. All the
values of primary key must be unique.

In relation, a key can either be a single attribute or a subgroup of attributes that can
recognize a particular tuple in a relation.

Now, the key constraint specifies that a key (attribute/subset of attribute) must not have the
same set of values for the tuples in a relation.

The constraint on NULL values defines whether an attribute is allowed to carry Null value or
not. For example, in a student tuple, its name attribute must be NOT NULL.

NULL values are not allowed in the primary key, hence NOT NULL constraint is also a part
of key constraint.

Page 16
Example:

Consider the following Student table:

STU_ID Name Age

S001 Akshay 20

S001 Abhishek 21

S003 Shashank 20

S004 Rahul 20

This relation does not satisfy the key constraint as here all the values of primary key are not
unique.

3.) Entity Integrity Constraint

Entity integrity constraint specifies that no attribute of primary key must contain a null value
in any relation. This is because the presence of null value in the primary key violates the
uniqueness property.

Entity integrity constraint specifies that a primary key of a tuple can never be NULL. As
primary key is used to identify individual tuple in a relation.
Example:

Consider the following Student table:

STU_ID Name Age

S001 Akshay 20

S002 Abhishek 21

S003 Shashank 20

Rahul 20

This relation does not satisfy the entity integrity constraint as here the primary key contains a
NULL value.

Page 17
4.) Referential Integrity Constraint

Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key
attribute of a relation that can be referred in other relation.

A set of attributes FK in relation schema R1 is a foreign key of R1 that references relation R2 if


it satisfies. R1 is called the referencing relation and R2 is the referenced relation.

 The Referential integrity constraints is specified between two relations or tables and used
to maintain the consistency among the tuples in two relations.
 This constraint is enforced when a foreign key references the primary key of a relation.
 It specifies that all the values taken by the foreign key must either be available in the
relation of the primary key or can take NULL values, but can’t be empty.

Important Results:

The following two important results emerges out due to referential integrity constraint-
 We cannot insert a record into a referencing relation if the corresponding record does
not exist in the referenced relation.

 We cannot delete or update a record of the referenced relation if the corresponding


record exists in the referencing relation.

Example:

Consider the following two relations: - ‘Student’ and ‘Department’.


Here, relation ‘Student’ references the relation ‘Department’.

Page 18
Student

STU_ID Name Dept_no

S001 Akshay D10

S002 Abhishek D10

S003 Shashank D11

S004 Rahul D14

Department

Dept_no Dept_name

D10 ASET

D11 ALS

D12 ASFL

D13 ASHS

Here,
 The relation ‘Student’ does not satisfy the referential integrity constraint.
 This is because in relation ‘Department’, no value of primary key specifies department
no. 14 (D14).
 Thus, referential integrity constraint is violated.

Explanation:

In the above, Dept_no of the first relation ‘Student’ is the foreign key, and Dept_no in the
second relation ‘Department’ is the primary key. Dept_no = D14 in the foreign key of the
first table ‘Student’ is not allowed since Dept_no = D14 is not defined in the primary key of
the second relation ‘Department’. Therefore, Referential integrity constraints is violated here.

Page 19
Operations in Relational Model with Constraint Violations

Here, we will learn about the violations that can occur on a database as a result of any changes
made in the relation.
There are mainly three basic operations that have the ability to change the state of relations in
the database: - Insert, Delete, and Update (or Modify). They insert new data, delete old data,
or modify existing data records.
1. Insert – Insert is used to insert one or more new tuples in a relation in the database.
2. Delete – Delete is used to delete tuples from the table.
3. Update (or Modify) – Update (or Modify) is used to make changes in the value of
some attributes in existing tuples.

Whenever one of these operations are applied, integrity constraints specified on the relational
database schema should not be violated.

In this section we discuss the types of constraints that may be violated by each of these
operations and the types of actions that may be taken if an operation causes a violation.

1.) Insert Operation

The insert operation gives values of the attribute for a new tuple which should be inserted into
a relation.

Insert is used to insert data into the relation.

Insert can violate any of the four types of constraints.

Domain Constraint: Domain constraint can be violated if an attribute value is given that does
not appear in the corresponding domain or is not of the appropriate data type.

Example:

Assume that the domain constraint says that all the values you insert in the relation should be
greater than 10, and in case you insert a value less than 10 will cause you violation of the
domain constraint, so gets rejected.

Page 20
Key Constraints: Key constraints can be violated if a key value in the new tuple t already
exists in another tuple in the relation r(R).

Example:

Insert (’1200’, ‘Arjun’, ‘9976657777’, ‘Mumbai’) into EMPLOYEE

This insertion violates the key constraint if EID=1200 is already present in some tuple in the
same relation, so it gets rejected.

Entity Integrity Constraint: Entity integrity can be violated if any part of the primary key of
the new tuple t in the relation is NULL.

Example:

Insert (NULL, ‘Bikash, ‘M’, ‘Jaipur’, ‘123456’) into EMP

The above insertion violates the entity integrity constraint since there is NULL for the
primary key EID, it is not allowed, so it gets rejected.

Referential Integrity: Referential integrity can be violated if the value of any foreign key in t
refers to a tuple that does not exist in the referenced relation.

On inserting a value in the foreign key of relation 1, for which there is no corresponding value
in the Primary key which is referred to in relation 2, in such case Referential integrity is
violated.

Example:

When we try to insert a value say 1200 in EID (foreign key) of table 1, for which there is no
corresponding EID (primary key) of table 2, then it causes violation, so gets rejected.

If an insertion violates one or more constraints, the default option is to reject the insertion.

If the insertion is not rejected then, the insertion violation can cause cascade in the relation. A
foreign key with cascade delete means that if a record in the parent table is deleted, then the
corresponding records in the child table will automatically be deleted. This is called a cascade
delete.

Page 21
2.) Delete Operation

To specify deletion, a condition on the attributes of the relation selects the tuple to be deleted.

In the above-given example, CustomerName= "Apple" is deleted from the table.

Delete is used to delete tuples from the table.

The Delete operation can violate only referential integrity. This occurs if the tuple which is
deleted is referenced by foreign keys from other tuples in the same database.

Here are some examples:

Operation: Delete the Department tuple with Dept NO = 1.

Result: This deletion is acceptable and deletes exactly one tuple.

Operation: Delete the Student tuple with Dept No = 1.

Result: This deletion is not acceptable, because there are tuples in Department that refer to this
tuple. Hence, if the tuple in Student is deleted, referential integrity violations will result.

Several options are available if a deletion operation causes a violation.

The first option, called restrict, is to reject the deletion.

The second option, called cascade, is to attempt to cascade (or propagate) the deletion by
deleting tuples that reference the tuple that is being deleted. Here if a record in the parent table
(referencing relation) is deleted, then the corresponding records in the child table (referenced
relation) will automatically be deleted.

A third option, called set null or set default, is to modify the referencing attribute values
that cause the violation; each such value is either set to NULL or changed to reference another
default valid tuple.

And also, combinations of these three options are also possible.

Page 22
3.) Update Operation

You can see that in the below-given relation table CustomerName= 'Apple' is updated from
Inactive to Active.

Update (or Modify) – Update (or Modify) is used to make changes in the value of some
attributes in existing tuples.

Consider two table EMPLOYEE (Ssn, name, salary, Dno) and DEPARTMENT (Dno, Dname)

Operation: Update the salary of the EMPLOYEE tuple with Ssn = ‘123’ to 2800.

Result: Acceptable.

Operation: Update the Dno of the EMPLOYEE tuple with Ssn = ‘123’ to 7.

Result: Unacceptable, because it violates referential integrity.

Operation: Update the Ssn of the EMPLOYEE tuple with Ssn = ‘123’ to ‘321’.

Result: Unacceptable, because it violates primary key constraint

Updating an attribute that is neither part of a primary key nor of a foreign key usually
causes no problems.

4.) The Transaction Concept

A transaction is an executing program that includes some database operations, such as reading
from the database, or applying insertions, deletions, or updates to the database. At the end of
the transaction, it must leave the database in a valid or consistent state that satisfies all the
constraints specified on the database schema.

A single transaction may involve any number of retrieval operations and any number of update
operations. For example, a transaction to apply a bank withdrawal will typically read the user
account record, check if there is a sufficient balance, and then update the record by the
withdrawal amount.

Page 23
Relational Algebra & Calculus
Preliminaries

A query language is a language in which user requests to retrieve some information from the
database. The query languages are considered as higher-level languages than programming
languages.

Query languages are of two types:

1.) Procedural Query Language


2.) Non-Procedural Query Language

1.) Procedural Language: In procedural language, the user has to describe the specific
procedure to retrieve the information from the database.

Example: The Relational Algebra is a procedural query language; it means that it tells
what data to be retrieved and how to be retrieved.

2.) Non-Procedural Language: In non-procedural language, the user retrieves the


information from the database without describing the specific procedure to retrieve it.

Example: The Tuple Relational Calculus and the Domain Relational Calculus are non-
procedural query language, which means it tells what data to be retrieved but doesn’t tell
how to retrieve it.

Relational Algebra
The relational algebra is a procedural query language that works on relational model. It consists
of a set of operations that take one or two relations (tables) as input and produce a new relation,
on the request of the user to retrieve the specific information, as the output.

It uses operators to perform queries. An operator can be either unary or binary. They accept
relations as their input and yield relations as their output. Relational algebra is performed
recursively on a relation and intermediate results are also considered relations.

Basic Relational Algebra Operations


Relational Algebra divided in various groups:

Unary Relational Operations

 SELECT (symbol: σ)
 PROJECT (symbol: π)
 RENAME (symbol: ρ)

Page 24
Relational Algebra Operations From Set Theory

 UNION (∪)
 INTERSECTION (∩)
 DIFFERENCE (-)
 CARTESIAN PRODUCT (x)

Binary Relational Operations

 JOIN
 DIVISION

1.) Select Operation (σ):


 The select operation selects tuples that satisfy a given predicate.
 It is denoted by sigma (σ) symbol.

Notation: σp(r)
Where:

σ is the predicate and is used for selection prediction


r is used for relation which is the name of the table
p is used as a propositional logic formula which may use connectors like: AND OR
and NOT. These relational can use as relational operators like =, ≠, ≥, <, >, ≤.

For Example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT


Downtown L-17 1000
Redwood L-23 2000
Perryride L-15 1500
Downtown L-14 1500
Mianus L-13 500
Roundhill L-11 900
Perryride L-16 1300

Input:

σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT


Perryride L-15 1500
Perryride L-16 1300

Page 25
2. Project Operation (π):
 This operation shows the list of those attributes that we wish to appear in the result.
Rest of the attributes are eliminated from the table.
 It is denoted by π.

Notation: π A1, A2, An (r)


Where

A1, A2, An is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY


Jones Main Harrison
Smith North Rye
Hays Main Harrison
Curry North Rye
Johnson Alma Brooklyn
Brooks Senator Brooklyn

Input:

π NAME, CITY (CUSTOMER)


Output:

NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn

Page 26
3. Union Operation (∪):
 Suppose there are two tuples R and S. The union operation contains all the tuples that
are either in R or S or both in R & S.
 It eliminates the duplicate tuples. It is denoted by ∪.

Notation: R ∪ S

A union operation must hold the following condition:

 R and S must have the attribute of the same number.


 Duplicate tuples are eliminated automatically.

Example:

DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17

Page 27
Input:

π CUSTOMER_NAME (BORROW) ∪ π CUSTOMER_NAME (DEPOSITOR)


Output:

CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes

Note: As you can see there are no duplicate names present in the output even though we had
few common names in both the tables, also in the DEPOSITOR table and BORROW table
we had the duplicate name itself.

Page 28
4. Set Intersection (∩):
 Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in both R & S.
 It is denoted by intersection ∩.

Note: Only those rows that are present in both the tables will appear in the result set.

Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

π CUSTOMER_NAME (BORROW) ∩ π CUSTOMER_NAME (DEPOSITOR)


Output:

CUSTOMER_NAME
Smith
Jones

5. Set Difference (-):


 Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in R but not in S.
 It is denoted by intersection minus (-).

Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

π CUSTOMER_NAME (BORROW) - π CUSTOMER_NAME (DEPOSITOR)


Output:

CUSTOMER_NAME
Jackson
Hayes
Willians
Curry

Page 29
6. Cartesian Product (X):
 The Cartesian product is used to combine each row in one table with each row in the
other table. It is also known as a cross product. It is denoted by X.

Notation: E XD
Example:

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT


1 Smith A
2 Harry C
3 John B

DEPARTMENT

DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal

Input:
EMPLOYEE X DEPARTMENT
Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME


1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal

Page 30
Note: The number of rows in the output will always be the cross product of number of rows
in each table. In our example table 1 has 3 rows and table 2 has 3 rows so the output has 3×3
= 9 rows.

7. Rename Operation (ρ):


 The rename operation is used to rename the output relation or an attribute of a relation.
 It is denoted by rho (ρ).

Rename (ρ) Syntax:

ρ(new_relation_name, old_relation_name)

Rename (ρ) Example

Let’s say we have a table customer; we are fetching customer names and we are renaming the
resulted relation to CUST_NAMES.

Table: CUSTOMER

Customer_Id Customer_Name Customer_City


C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi

Query:
ρ(CUST_NAMES, π (Customer_Name)(CUSTOMER))

Output:

CUST_NAMES
Steve
Raghu
Chaitanya
Ajeet
Carl

Page 31
8.) Join Operations (⋈):

 A Join operation combines related tuples from different relations, if and only if a given
join condition is satisfied.
 Join operation is essentially a cartesian product followed by a selection criterion.
 Join operation is denoted by ⋈.

Example:

EMPLOYEE

EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry

SALARY

EMP_CODE SALARY
101 50000
102 30000
103 25000

Operation: (EMPLOYEE ⋈ SALARY)

Result:

EMP_CODE EMP_NAME SALARY


101 Stephan 50000
102 Jack 30000
103 Harry 25000

Page 32
Types of Join operations

Various forms of Join operation are:

Inner Joins:

 Theta Join
 EQUI Join
 Natural Join

Outer Joins:

 Left Outer Join


 Right Outer Join
 Full Outer Join

Inner Join:

In an inner join, only those tuples that satisfy the matching criteria are included, while the rest
are excluded.

Let's study various types of Inner Joins:

a.) Theta Join:

The general case of JOIN operation is called a Theta join. It is denoted by symbol θ

Example

A ⋈θ B

Theta join can use any conditions in the selection criteria.

For Example:

A ⋈ A.column 2 > B.column 2 (B)

A ⋈ A.column 2 > B.column 2 (B)

column 1 column 2
1 2

Page 33
b.) EQUI Join:

When a theta join uses only equivalence condition, it becomes an equi join.

It is also known as an inner join. It is the most common join. It is based on matched data as per
the equality condition. The equi join uses the comparison operator (=).

Example:

CUSTOMER RELATION

CLASS_ID NAME
1 John
2 Harry
3 Jackson

PRODUCT

PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida

Input:

CUSTOMER ⋈ PRODUCT

Output:

CLASS_ID NAME PRODUCT_ID CITY


1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida

Page 34
c.) NATURAL JOIN (⋈)

Natural join can only be performed if there is a common attribute (column) between the
relations. The name and type of the attribute must be same.

Example:

Consider the following two tables

Num Square
2 4
3 9

Num Cube
2 8
3 27

C⋈D

Num Square Cube


2 4 8
3 9 27

Page 35
OUTER JOIN

The outer join operation is an extension of the join operation. It is used to deal with missing
information.

In an outer join, along with tuples that satisfy the matching criteria, we also include some or
all tuples that do not match the criteria.

Example:

EMPLOYEE

EMP_NAME STREET CITY


Ram Civil line Mumbai
Shyam Park street Kolkata
Ravi M.G. Street Delhi
Hari Nehru nagar Hyderabad

FACT_WORKERS

EMP_NAME BRANCH SALARY


Ram Infosys 10000
Shyam Wipro 20000
Kuber HCL 30000
Hari TCS 50000

Input:

(EMPLOYEE ⋈ FACT_WORKERS)

Output:

EMP_NAME STREET CITY BRANCH SALARY


Ram Civil line Mumbai Infosys 10000
Shyam Park street Kolkata Wipro 20000
Hari Nehru nagar Hyderabad TCS 50000

Page 36
An outer join is basically of three types:

a.) Left outer join


b.) Right outer join
c.) Full outer join

a.) Left outer join ( ⟕ ):

 Left outer join contains the set of tuples of all combinations in R and S that are equal
on their common attribute names.
 In the left outer join, tuples in R have no matching tuples in S.
 It is denoted by ⟕.

In the left outer join, operation allows keeping all tuple in the left relation. However, if there
is no matching tuple is found in right relation, then the attributes of right relation in the join
result are filled with null values.

Example 1:

Consider the following 2 Tables

Num Square
2 4
3 9
4 16

Num Cube
2 8
3 27
5 125

Page 37
A B

Num Square Cube


2 4 8
3 9 27
4 16 NULL

Example 2: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

EMPLOYEE ⟕ FACT_WORKERS

Output:

EMP_NAME STREET CITY BRANCH SALARY


Ram Civil line Mumbai Infosys 10000
Shyam Park street Kolkata Wipro 20000
Hari Nehru street Hyderabad TCS 50000
Ravi M.G. Street Delhi NULL NULL

b.) Right outer join (⟖):

 Right outer join contains the set of tuples of all combinations in R and S that are equal
on their common attribute names.
 In right outer join, tuples in S have no matching tuples in R.
 It is denoted by ⟖.

In the right outer join, operation allows keeping all tuple in the right relation. However, if there
is no matching tuple is found in the left relation, then the attributes of the left relation in the
join result are filled with null values.

Page 38
Example 1:

Consider the following 2 Tables

Num Square
2 4
3 9
4 16

Num Cube
2 8
3 27
5 125

A B

Num Cube Square


2 8 4
3 27 9
5 125 NULL

Example 2: Using the above EMPLOYEE table and FACT_WORKERS Relation

Input:

EMPLOYEE ⟖ FACT_WORKERS

Output:

EMP_NAME BRANCH SALARY STREET CITY


Ram Infosys 10000 Civil line Mumbai
Shyam Wipro 20000 Park street Kolkata
Hari TCS 50000 Nehru street Hyderabad
Kuber HCL 30000 NULL NULL

Page 39
c.) Full outer join (⟗):

 Full outer join is like a left or right join except that it contains all rows from both tables.
 In full outer join, tuples in R that have no matching tuples in S and tuples in S that have
no matching tuples in R in their common attribute name.
 It is denoted by ⟗.

In a full outer join, all tuples from both relations are included in the result, irrespective of the
matching condition.

Example 1:

Consider the following 2 Tables

Num Square
2 4
3 9
4 16

Num Cube
2 8
3 27
5 125

A⟗B

Num Square Cube


2 4 8
3 9 27
4 16 NULL
5 NULL 125

Page 40
Example 2: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

EMPLOYEE ⟗ FACT_WORKERS

Output:

EMP_NAME STREET CITY BRANCH SALARY


Ram Civil line Mumbai Infosys 10000
Shyam Park street Kolkata Wipro 20000
Hari Nehru street Hyderabad TCS 50000
Ravi M.G. Street Delhi NULL NULL
Kuber NULL NULL HCL 30000

9.) Division Operator (÷):

Division operator A÷B can be applied if and only if:

 Attributes of B is proper subset of Attributes of A.


 The relation returned by division operator will have attributes = (All attributes of A –
All Attributes of B).
 The relation returned by division operator will return those tuples from relation A
which are associated to every B’s tuple.

Consider the relation STUDENT_SPORTS and ALL_SPORTS given in Table 2 and Table 3
above.

STUDENT_SPORTS

ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton

Table 2

Page 41
ALL_SPORTS

SPORTS
Badminton
Cricket

Table 3

To apply division operator as


STUDENT_SPORTS ÷ ALL_SPORTS
 The operation is valid as attributes in ALL_SPORTS is a proper subset of attributes in
STUDENT_SPORTS.
 The attributes in resulting relation will have attributes {ROLL_NO, SPORTS}-
{SPORTS}=ROLL_NO
 The tuples in resulting relation will have those ROLL_NO which are associated with
all B’s tuple {Badminton, Cricket}. ROLL_NO 1 and 4 are associated to Badminton
only. ROLL_NO 2 is associated to all tuples of B. So, the resulting relation will be:

ROLL_NO
2

Page 42
Summary

Operation (Symbols) Purpose


Select (σ) The SELECT operation is used for selecting
a subset of the tuples according to a given
selection condition.
Projection (π) The projection eliminates all attributes of the
input relation but those mentioned in the
projection list.
Union Operation (∪) UNION is symbolized by symbol. It
includes all tuples that are in tables A or in
B.
Set Difference (-) - Symbol denotes it. The result of A - B, is a
relation which includes all tuples that are in
A but not in B.
Intersection (∩) Intersection defines a relation consisting of a
set of all tuple that are in both A and B.
Cartesian Product (X) Cartesian operation is helpful to merge
columns from two relations.
Inner Join Inner join, includes only those tuples that
satisfy the matching criteria.
Theta Join(θ) The general case of JOIN operation is called
a Theta join. It is denoted by symbol θ.
EQUI Join When a theta join uses only equivalence
condition, it becomes an equi join.
Natural Join (⋈) Natural join can only be performed if there
is a common attribute (column) between the
relations.
Outer Join In an outer join, along with tuples that satisfy
the matching criteria.
Left Outer Join (⟕ ) In the left outer join, operation allows
keeping all tuple in the left relation.
Right Outer join (⟖) In the right outer join, operation allows
keeping all tuple in the right relation.
Full Outer Join (⟗) In a full outer join, all tuples from both
relations are included in the result
irrespective of the matching condition.

Page 43
Relational Calculus
Relational calculus is an alternative to relational algebra. In contrast to the relational algebra,
which is procedural, the relational calculus is non-procedural or declarative.

It allows user to describe the set of answers without showing procedure about how they should
be computed. Relational calculus has a big influence on the design of commercial query
languages such as SQL and QBE (Query-by Example).

Relational calculus is a non-procedural query language that tells the system what data to be
retrieved but doesn’t tell how to retrieve it.

Types of Relational Calculus


There are two types of relational calculus:

1.) Tuple Relational Calculus (TRC)


2.) Domain Relational Calculus (DRC)

Variables in TRC takes tuples (rows) as values and TRC had strong influence on SQL.

Variables in DRC takes fields (attributes) as values and DRC had strong influence on QBE.

Page 44
1. Tuple Relational Calculus (TRC)

A tuple relational calculus is a non-procedural query language which specifies to select the
tuples in a relation. It can select the tuples with range of values or tuples for certain attribute
values etc. The resulting relation can have one or more tuples.

Notation:

It is denoted as below:

{t | P (t)} or {t | Condition (t)} - This is also known as expression of relational calculus

Where

t is the resulting tuples

P(t) is the condition used to fetch t

{t | EMPLOYEE (t) and t.SALARY>10000} – implies that it selects the tuples from
EMPLOYEE relation such that resulting employee tuples will have salary greater than 10000.
It is example of selecting a range of values.

{t | EMPLOYEE (t) AND t.DEPT_ID = 10} – this select all the tuples of employee name
who work for Department 10.

The variable which is used in the condition is called tuple variable. In above example
t.SALARY and t.DEPT_ID are tuple variables. In the first example above, we have specified
the condition t.SALARY >10000. What is the meaning of it? For all the SALARY>10000,
display the employees. Here the SALARY is called as bound variable. Any tuple variable with
‘For All’ (?) or ‘there exists’ (?) condition is called bound variable. Here, for any range of
values of SALARY greater than 10000, the meaning of the condition remains the same. Bound
variables are those ranges of tuple variables whose meaning will not change if the tuple
variable is replaced by another tuple variable.

In the second example, we have used DEPT_ID= 10. That means only for DEPT_ID = 10
display employee details. Such variable is called free variable. Any tuple variable without any
‘For All’ or ‘there exists’ condition is called Free Variable. If we change DEPT_ID in this
condition to some other variable, say EMP_ID, the meaning of the query changes. For
example, if we change EMP_ID = 10, then above it will result in different result set. Free
variables are those ranges of tuple variables whose meaning will change if the tuple variable
is replaced by another tuple variable.

All the conditions used in the tuple expression are called as well-formed formula – WFF. All
the conditions in the expression are combined by using logical operators like AND, OR and
NOT, and qualifiers like ‘For All’ (?) or ‘there exists’ (?). If the tuple variables are all bound
variables in a WFF is called closed WFF. In an open WFF, we will have at least one free
variable.

Page 45
2.) Domain Relational Calculus

In contrast to tuple relational calculus, domain relational calculus uses list of attribute to be
selected from the relation based on the condition. It is same as TRC, but differs by selecting
the attributes rather than selecting whole tuples.

Notation:

It is denoted as below:

{< a1, a2, a3, … an > | P(a1, a2, a3, … an)}

Where
a1, a2, a3, … an are attributes of the relation and
P is the condition.

For example:

select EMP_ID and EMP_NAME of employees who work for department 10

{<EMP_ID, EMP_NAME> | <EMP_ID, EMP_NAME> ? EMPLOYEE Λ DEPT_ID =


10}

Get name of the department name that Alex works for.

{DEPT_NAME |< DEPT_NAME > ? DEPT Λ ? DEPT_ID (<DEPT_ID> ? EMPLOYEE


Λ EMP_NAME = Alex)}

Here green color expression is evaluated to get the department Id of Alex and then it is used
to get the department name form DEPT relation.

Let us consider another example where select EMP_ID, EMP_NAME and ADDRESS the
employees from the department where Alex works. What will be done here?

{<EMP_ID, EMP_NAME, ADDRESS, DEPT_ID > | <EMP_ID, EMP_NAME,


ADDRESS, DEPT_ID> ? EMPLOYEE Λ ? DEPT_ID (<DEPT_ID> ? EMPLOYEE Λ
EMP_NAME = Alex)}

First, formula is evaluated to get the department ID of Alex (green color), and then all the
employees with that department is searched (red color).

Other concepts of TRC like free variable, bound variable, WFF etc remains same in DRC too.
Its only difference is DRC is based on attributes of relation.

Page 46
Difference between Tuple Relational Calculus (TRC) and Domain
Relational Calculus (DRC)

TUPLE RELATIONAL CALCULUS DOMAIN RELATIONAL CALCULUS


(TRC) (DRC)

In TRS, the variables represent the tuples In DRS, the variables represent the value
from specified relation. drawn from specified domain.

A tuple is a single element of relation. In A domain is equivalent to column data type


database term, it is a row. and any constraints on value of data.

In this filtering variable uses tuple of In this filtering is done based on the domain
relation. of attributes.

Notation: Notation:

{t | P (t)} or {t | Condition (t)} { a1, a2, a3, …, an | P (a1, a2, a3, …, an)}

Example: Example:

{t | EMPLOYEE (t) AND t.DEPT_ID = 10} { | < EMPLOYEE > DEPT_ID = 10 }

This select all the tuples of employee name select EMP_ID and EMP_NAME of
who work for Department 10. employees who work for department 10.

Page 47
Difference Between Relational Algebra and Relational Calculus

Relational Algebra and Relational Calculus are the formal query languages for a relational
model. Both form the base for the SQL language which is used in most of the relational
DBMSs. Relational Algebra is a procedural language.

On the other hands, Relational Calculus is a declarative language. Relational Algebra and
Relational Calculus can be further differentiated on many aspects, which I have discussed
below with the help of comparison chart.

Comparison Chart

BASIS FOR RELATIONAL ALGEBRA RELATIONAL CALCULUS


COMPARISON

Basic Relational Algebra is a Relational Calculus is a Non-


Procedural language. Procedural language, instead it is
a Declarative language.

States Relational Algebra states how Relational Calculus states what


to obtain the result. result we have to obtain.

Order Relational Algebra describes Relational Calculus does not


the order (sequence) in which specify the order (sequence) of
operations have to be operations to performed in the
performed in the query. query.

Domain Relational Algebra is not Relation Calculus can be domain


domain dependent. dependent as we have Domain
Relational Calculus.

Related Relational Algebra query Relational Calculus is closely


language is closely related to related to the Natural Language.
Programming Language.

Page 48
Difference Between E-R Model and Relational Model in DBMS
E-R Model and Relational Model both are the types of Data Model. Data Model describes a
way to design database at physical, logical and view level. The main difference between E-R
Model and Relational Model is that E-R Model is entity specific, and Relational
Model is table specific.

Let us discuss some differences between E-R Model and Relation model with the help of
comparison chart shown below.

Comparison Chart

BASIS FOR E-R MODEL RELATIONAL MODEL


COMPARISON

Basic It represents the collection of It represents the collection of


objects called entities and Tables and the relation between
relation between those entities. those tables.

Describe Entity Relationship Model Relational Model describes data


describe data as Entity set, in a table as Domain, Attributes,
Relationship set and Attribute. Tuples.

Relationship E-R Model is easier to Comparatively, it is less easy to


understand the relationship derive a relation between tables
between entities. in Relational Model.

Mapping E-R Model describes Mapping Relational Model does not


Cardinalities. describe mapping cardinalities.

Page 49
Codd's Rule for Relational DBMS
Every database which has tables and constraints need not be a relational database system. Any
database which simply has relational data model is not a relational database system (RDBMS).
There are certain rules for a database to be perfect RDBMS. These rules are developed by Dr
Edgar F Codd (E F Codd) in 1985 to define a perfect RDBMS. For a RDBMS to be a perfect
RDBMS, it has to follow his rules. But no RDBMS can obey all his rules.

Dr E. F. Codd, also known to the world as the ‘Father of Database Management Systems’
was a Computer Scientist who invented the Relational model for Database management.
Based on relational model, the Relational database was created. Codd has developed 13 rules
popularly known as Codd's 12 rules for a database to be a RDBMS or to test DBMS's concept
against his relational model. The rules are numbered from zero to twelve. According to him,
all these rule help to have perfect RDBMS and hence correct data and relation among the
objects in database. According to him, a DBMS is fully relational if it abides by all his twelve
rules. Codd's rule actually define what quality a DBMS requires in order to become a
Relational Database Management System (RDBMS). But, till now, none of the database
follows all these rules; but obeys to some extent. For example, Oracle follows only 8.5 Codd’s
rules. His twelve rules are fondly called ‘E. F. Codd’s Twelve Commandments’. His brilliant
and seminal research paper ‘A Relational Model of Data for Large Shared Data Banks’ in its
entirety is a visual treat to eyes.

Relational Database Management System

There is an unspoken rule in the jargon of Database Management Systems. As the databases
that implement all the E. F. Codd’s rules are scare, the unspoken rule has been gaining traction.

 If a management system or software follows any of 5-6 rules proposed by E. F. Codd,


it qualifies to be a Database Management System (DBMS).

 If a management system or software follows any of 7-9 rules proposed by E. F. Codd,


it qualifies to be a semi-Relational Database Management System (semi - RDBMS).

 If a management system or software follows 9-12 rules proposed by E. F. Codd, it


qualifies to be a complete Relational Database Management System (RDBMS).

Page 50
Dr Edgar F Codd’s Twelve Commandments

Let us see E.F Codd’s Twelve rules one by one:

Codd’s Rule 0 − Foundation rule

This is the Foundational Rule. This rule states that any database system should have
characteristics as relational, as a database and as a management system to be RDBMS.
That means a database should be a relational by having the relation / mapping among the tables
in the database. They have to be related to one another by means of constraints/ relation. There
should not be any independent tables hanging in the database.

RDBMS is a database i.e.; it stores the data in a well-organized form called tables. It should
be able to handle large amount of information too. In short, it should meet the objectives of a
database.

RDBMS is management system – that means it should be able to manage the data, relation,
retrieval, update, delete, permission on the objects. It should be able handle all these
administrative tasks without affecting the objectives of database. It should be performing all
these tasks by using query languages.

Codd’s Rule 1 − Information Rule

A database consists of lot of data – may be user data and the data about these data or metadata.
Each group of these data must be stored in a table in the form of rows and columns. Each cell
in the table should have these data’s. The order of rows and columns in the table should not
affect the meaning of the table. Each cell should have single data. There should not be any
group/range of values separated by comma, space or hyphen (Normalized data). This should
be the only way to store the data in a database. This rule is satisfied by all the databases.

For Example:

Order of storing personal details about ‘James’ and ‘Antony’ in PERSON table should not
have any difference. There should be flexibility of storing them in any order in a row.
Similarly, storing Person name first and then his address should be same as storing address and
then his name. It does not make any difference on the meaning of table.

Page 51
Codd’s Rule 2 − Guaranteed Access Rule

This rule refers to the primary key. It states that any data/column/attribute in the table should
be able logically accessed by using the table in which it is stored, the primary key column of
the table and the column which we want to access. When combination of these 3 is used, it
should give the correct result. Any column/ cell value should not be directly accessed without
specifying the table and primary key.

Each unique piece of data (atomic value) should be accessible by : Table Name + Primary
Key (Row) + Attribute (column).
NOTE: Ability to directly access via POINTER is a violation of this rule.

Address of Kathy STUDENT + STUDENT_ID (Kathy) + ADDRESS is the right way of


getting any cell value.

Address of Kathy Troy should be able to access like this.

Codd’s Rule 3 − Systematic Treatment of NULL

This rule states about handling the NULLs in the database. As database consists of various
types of data, each cell will have different datatypes. If any of the cell value is unknown, or
not applicable or missing, it cannot be represented as zero or empty. It will be always
represented as NULL. This NULL should be acting irrespective of the datatype used for the
cell. When used in logical or arithmetical operation, it should result the value correctly.

For Example:

Adding NULL to numeric 5 should result NULL –

5 + unknown = unknown 5 + NULL = NULL

5 + NULL! = 5 or 0

It should not result in any zero or numeric value. DBMS should be strong enough to handle
these NULLs according to the situation and the datatypes.

Page 52
Codd’s Rule 4 − Active Online Catalog

This rule illustrates data dictionary. Metadata should be maintained for all the data in the
database. These metadata should also be stored as tables, rows and columns. It should also
have access privileges. In short, these metadata stored in the data dictionary should also obey
all the characteristics of a database. Also, it should have corrected up to date data. We should
be able to access these metadata by using same query language that we use to access the
database.

SELECT * FROM ALL_TAB; -- ALL_TAB is the table which has the table definitions that
the user owns and has access. It is queried using the same SQL query that we use in the
database.

Codd’s Rule 5 − Comprehensive Data Sub-Language Rule

Any RDBMS database should not be directly accessed. It should always be accessed by using
some strong query language. This query language should be able to access the data, manipulate
the data and maintain the consistency and integrity of the database. They query should make
sure that the transaction is fully complete or not done at all.

For Example:

SQL is a structured query language which support creating tables / views / constraints /
indexes, accessing the records of tables / views (SELECT), manipulating the records by insert
/ delete / update, provides security by giving different level of access rights (GRANT and
REVOKE) and integrity and consistency by using constraints.

Any database without any query language is not a RDBMS. Database can be accessed by using
query language directly or using them in the application.

Codd’s Rule 6 − View Updating Rule

Views are the virtual tables created by using queries to show the partial view of the table. That
is views are subset of table, it is only partial table with few rows and columns. This rule states
that views are also be able to get updated as we do with its table.

For Example:

Suppose we have created a view on Employee table, in which we have details of the employees
who work for particular department, say ‘Testing’. Here STUDENT is the whole table and
STUDENT_TEST is the view with Testing Employees. According to this rule, we should be
able to update the records in STUDENT_VIEW.

But in real database systems, we cannot give this privilege on views. Basic intension of
creating the view is to give the group of data to the user in the form of table. When lengthy
queries have to be written to get some details from the database, view shortens the length of
the query and gives more meaningful and shorter query. In such case, updating the view is not

Page 53
feasible. Although updating the view will update the table used for creating it, it is not
recommended by most of the database. Hence this rule is not used in most of the database.

Codd’s Rule 7 − High-level insert, update, and delete

This rule states that every query language used by the database should support INSERT,
DELETE and UPDATE on the records. It should also support set operations like UNION,
UNION ALL, MINUS, INTERSECT and INTERSECT ALL. All these operations should not
be restricted to single table or row at a time. It should be able to handle multiple tables and
rows in its operation.

For Example:

Suppose employees got 5% hike in a year. Then their salary has to be updated to reflect the
new salary. Since this is the annual hike given to the employees, this increment is applicable
for all the employees. Hence, the query should not be written for updating the salary one by
one for thousands of employees. A single query should be strong enough to update the entire
employee’s salary at a time.

Codd’s Rule 8 − Physical Data Independence

If there is any change in the physical storage of the data, it should not affect the data at the
logical or external view.

For Example:

If the data stored in one disk is transferred to another disk, then the user viewing the data should
not feel the difference or delay in access time. The user should be able to access the data as he
was accessing before. Similarly, if the file name for the table is changed in the memory, it
should not affect the table or the user viewing the table. This is known as physical
independence and database should support this feature.

Codd’s Rule 9 − Logical Data Independence

This is similar to physical data independence. Here if there are any changes to the logical view,
then it should not be reflected in the user view.

For Example:

If we split the EMPLOYEE table according to his department into multiple employee tables,
the user viewing the employee table should not feel that these records are coming from
different tables. These split tables should be able to get joined and show the result. In our
example we can use UNION and display the results to the user.

But in ideal scenario, this is difficult to achieve since all the logical and user view will be tied
so strongly that they will be almost same.

Page 54
Codd’s Rule 10 − Integrity Independence

Database should be able apply integrity rules by using its query languages. It should not be
dependent on any external factor or application to maintain the integrity. The keys and
constraints in the database should be strong enough to handle the integrity. A good RDBMS
should be independent of the frontend application. It should at least support primary key and
foreign key integrity constraints.

For Example:

Suppose we want to insert an employee for department 50 using an application. But department
50 does not exists in the system. In such case, the application should not perform the task of
fetching if department 50 exists, if not insert the department and then inserting the employee.
It should all handled by the database.

Codd’s Rule 11 − Distribution Independence

The database can be located at the user server or at any other network. The end user should not
be able to know about the database servers. He should be able to get the records as if he is
pulling the records locally. Even if the database is located in different servers, the accessibility
time should be comparatively less.

Codd’s Rule 12 − Non-Subversion Rule

When a query is fired in the database, it will be converted into low level language so that it
can be understood by the underlying systems to retrieve the data. In such case, when accessing
or manipulating the records at low level language, there should not be any loopholes that alter
the integrity of the database. In other words, even though the query written does not change
the integrity of the tables, the converted low-level language should be same as the query
written. It should not be converted into some other low-level language which changes the data
integrity in the database or performs some unwanted actions in the database.

For Example:

Update Student’s address query should always be converted into low level language which
updates the address record in the student file in the memory. It should not be updating any
other record in the file nor inserting some malicious record into the file/memory.

Page 55
Difference between DBMS vs RDBMS

Although DBMS and RDBMS both are used to store information in physical database but there
are some remarkable differences between them.

The main differences between DBMS and RDBMS are given below:

Parameter DBMS RDBMS

In RDBMS, data is stored in


Storage DBMS stores data as a file.
the form of tables.

RDBMS uses a tabular


DBMS system stores data in structure where the headers
Database structure either a navigational or are the column names, and
hierarchical form. the rows contain
corresponding values

DBMS supports single user


Number of Users It supports multiple users.
only.

In a regular database, the Relational databases are


data may not be stored harder to construct, but they
following the ACID model. are consistent and well
ACID
This can develop structured. They obey ACID
inconsistencies in the (Atomicity, Consistency,
database. Isolation, Durability).

It is the program for It is the database systems


managing the databases on which are used for
Type of program
the computer networks and maintaining the relationships
the system hard disks. among the tables.

Hardware and software Low software and hardware Higher hardware and
needs. needs. software need.

Page 56
RDBMS supports the
DBMS does not support the integrity constraints at the
integrity constants. The schema level. Values beyond
Integrity constraints
integrity constants are not a defined range cannot be
imposed at the file level. stored into the particular
RDMS column.

DBMS does not support


Normalization RDBMS can be Normalized.
Normalization
DBMS does not support RBMS offers support for
Distributed Databases
distributed database. distributed databases.
DBMS system mainly deals RDMS is designed to handle
Ideally suited for
with small quantity of data. a large amount of data.
DBMS satisfy less than 7 of RDBMS satisfy 8 to 10 Dr.
Dr. E. F. Codd Rules
Dr. E. F. Codd Rules E. F. Codd Rules
DBMS does not support RDBMS supports client-
Client Server
client-server architecture server architecture.
Data fetching is slower for Data fetching is rapid
Data Fetching the complex and large because of its relational
amount of data. approach.
Data redundancy is common Keys and indexes do not
Data Redundancy
in this model. allow Data redundancy.
Data is stored in the form of
tables which are related to
Data Relationship No relationship between data
each other with the help of
foreign keys.
Multiple levels of security.
Security There is no security. Log files are created at OS,
Command, and object level.
Data can be easily accessed
Data elements need to access using SQL query. Multiple
Data Access
individually. data elements can be
accessed at the same time.
Examples of DBMS are a Example of RDBMS is
Examples file system, XML, Windows MySQL, Oracle, SQL
Registry, etc. Server, etc.

After observing the differences between DBMS and RDBMS, you can say that RDBMS is an
extension of DBMS. There are many software products in the market today who are compatible
for both DBMS and RDBMS. Means today a RDBMS application is DBMS application and
vice-versa.

Page 57

You might also like