M2 DBMS
M2 DBMS
CHAPTER 1:
RELATIONAL MODEL CONCEPTS
Definition: The relational model means that the logical data structures—the data tables, views, and
indexes—are separate from the physical storage structures. This separation means that database
administrators can manage physical data storage without affecting access to that data as a logical
structure.
• The relational model uses a collection of tables to represent both data and the relationships among
those data.
• Each table has multiple columns, and each column has a unique name.
• Tables are also known as relations. The relational model is an example of a record-based model.
• Record-based models are so named because the database is structured in fixed-format records of
several types.
• Each table contains records of a particular type. Each record type defines a fixed number of fields,
or attributes.
• The columns of the table correspond to the attributes of the record type. The relational data model
is the most widely used data model, and a vast majority of current database systems are based on
the relational model.
Now, let us consider a relation STUDENT with attributes NAME, ADDRESS, ROLL_NO,
PHONE_NOand AGE shown in this table:
1.3 TERMINOLOGIES:
2. Tuple:Each row in the relation is known as a tuple. The above relation contains 5 tuples, one
of which is shown as:
3. Table: In a relational model all the relations are saved in a table format and is stored along
with the attributes.
5. Relation Schema: A relation schema defines the structure of the relation and represents the
name of the relation with its attributes.
Ex: STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, ,AGE) is the relation schema for
STUDENT.
6. Column: The column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from the relation STUDENT.
8. Relation Instance: The set of tuples of a relation at a particular instance of time is called a
relation instance. Table 1 shows the relation instance of STUDENT at a particular time. It can
change whenever there is an insertion, deletion, or update in the database.
9. Relation Key:These are basically the keys that are used to identify the rows uniquely or also
help in identifying tables. These are of the following types.
Primary key, Candidate key, Foreign key, Super key, Alternate key.
10. NULL VALUES:The value which is not known or unavailable is called a NULL value. It
is represented by blank space.
NOTE:
1. PRIMARY KEY
PRIMARY KEY in SQL is a column (or group of columns) that uniquely identifies the records in that
table. A primary key must contain unique values and can not have any NULL value.A primary key
automatically has a UNIQUE constraint defined on it, and it ensures that there are no
duplicate or NULL values in that column.
• No duplicate values are allowed, i.e. The column assigned as the primary key should have
UNIQUE values only.
• NO NULL values are present in the Primary key column. Hence there is a Mandatory value in the
column having the Primary key.
• Only one primary key per table exists although the Primary key may have multiple columns.
• No new row can be inserted with the already existing primary key.
• Primary keys can be classified into two categories Simple primary key that consists of one
column and composite primary key that consists of Multiple column.
• Defined in CREATE TABLE or ALTER TABLE statement.
2. CANDIDATE KEY
A candidate key is a set of attributes (or attribute) which uniquely identify the tuples in relation or
table. As we know that Primary key is a minimal super key, so there is one and only one primary key
in any relationship but there is more than one candidate key can take place. Candidate key’s attributes
can contain a NULL value which opposes to the primary key.
3. SUPER KEY
Super Key is an attribute (or set of attributes) that is used to uniquely identifies all attributes in a
relation. All super keys can’t be candidate keys but the reverse is true. In relation, a number of super
keys is more than a number of candidate keys.
4. FOREIGN KEY
Is a column or a group of columns used to identify a row uniquely of a different table. The table that
comprises the foreign key is called the referencing table or child table. And the table to that the foreign
key references is known as the referenced table or parent table. A table can possess multiple foreign
keys according to its relationships with other tables.
5. ALTERNATE KEY
Keys are an important part of any Relational Database. There are various types of keys and among one
of these is the Alternate Key. The keys that contain all the properties needed to become a Candidate
Key are known as Alternate Keys. These are basically secondary Candidate Keys that can uniquely
identify a row in a table. So, Alternate Keys are also sometimes known as “Secondary Keys”.
6. COMPOSITE KEY
A composite key is made by the combination of two or more columns in a table that can be used to
uniquely identify each row in the table when the columns are combined uniqueness of a row is
guaranteed, but when it is taken individually it does not guarantee uniqueness, or it can also be
understood as a primary key made by the combination of two or more attributes to uniquely identify
every row in a table.
Note:
A composite key can also be made by the combination of more than one candidate key.
A composite key cannot be null.
• Relation schema A relation schema R, denoted by R(A1, A2, ...,An), is made up of a relation name
R and a list of attributes A1, A2, ...,An.
• Each attribute Ai is the name of a role played by some domain D in the relation schema R.
• D is called the domain of Ai and is denoted by dom(Ai). A relation schema is used to describe a
relation; R is called the name of this relation.
• The degree (or arity) of a relation is the number of attributes n of its relation schema. A relation of
degree seven, which stores information about university students,would contain seven attributes
describing each student. as follows:
• STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, Gpa) Using the data type of
each attribute, the definition is sometimes written as:
STUDENT(Name: string, Ssn: string, Home_phone: string, Address: string, Office_phone: string,
Age: integer, Gpa: real)
Domains for some of the attributes of the STUDENT relation: dom(Name) = Names; dom(Ssn) =
Social_security_numbers;dom(HomePhone)=USA_phone_numbers,dom(Office_phone)=USA_pho
ne_numbers,
1. Domain Constraints
• Every domain must contain atomic values(smallest indivisible units) which means composite and
multi-valued attributes are not allowed.
• We perform a datatype check here, which means when we assign a data type to a column we limit the
values that it can contain. Eg. If we assign the datatype of attribute age as int, we can’t give it values
other than int datatype.
Explanation: In the above relation, Name is a composite attribute and Phone is a multi-values attribute,
so it is violating domain constraint.
Explanation: In the above table, EID is the primary key, and the first and the last tuple have the same
value in EID ie 01, so it is violating the key constraint.
3. Entity Integrity Constraints:
Entity Integrity constraints say that no primary key can take a NULL value, since using the primary
key we identify each tuple uniquely in a relation.
Explanation: In the above relation, EID is made the primary key, and the primary key can’t take NULL
values but in the third tuple, the primary key is null, so it is violating Entity Integrity constraints.
Explanation: In the above tables, the DNO of Table 1 is the foreign key, and DNO in Table 2
is the primary key. DNO = 22 in the foreign key of Table 1 is not allowed because DNO =
22 is not available in table 2.
1. Simplified Model: In comparison to other database models, the relational database model is
simple, requiring SQL queries for data manipulation to be straightforward.
2. User-Friendliness: Users can quickly retrieve essential data without dealing with database
complexities as SQL queries are easy to execute.
3. Data Accuracy: Relational databases benefit from well-defined, organized structures, solving
Navyashree KS , CSE(DS), Asst Prof RNSIT,Bangalore
Module 2 DBMS(BCS403)
6. Normalization:With the increasing complexity of data, the necessity for fast storage mechanisms
arises. Normalization arises as a technique to break information into manageable parts, hence
solving the storage overhead. It entails grouping the data into separate levels with each level
requiring preparatory steps before moving to the next normalization level. Database normalization
also ensures the structural consistency of a relational database thereby enabling accurate data
manipulation and maintenance of data integrity which is critical for informed business decisions.
7. Security:The RDBMS imposes tight access controls, allowing only authenticated users to have
direct contact with the database. Unauthorized access attempts are denied, thus ensuring data
security and confidentiality.
DISAVANTAGES:
Their efficacy in managing structured data and facilitating seamless relationship establishment
between disparate datasets renders them indispensable in various domains.Despite the benefits,
relational databases also present certain drawbacks.
1. Maintenance Challenges: Over time, managing relational databases is hard because of the
increasing data volumes which necessitate the high involvement of developers and programmers to
take care of the system.
2. Cost Implications: Setting up and maintaining relational database systems is rather expensive.
The initial costs for acquiring the software can be quite high for smaller enterprises and even further
increased due to the need to hire specialized technicians who are knowledgeable in the specific
database software.
3. Physical Storage Requirements: Relational databases, organized around rows and columns,
require a huge amount of physical memory which grows with the data size. The increasing demand
for physical storage presents scalability issues.
4. Limited Scalability: The structural complexities of scaling relational databases across multiple
servers are predominantly linked to large data volumes. Scalability constraints prevent data
distribution across heterogeneous physical storage servers, thus leading to performance
degradation, such as latency and availability issues.
5. Structural Complexity: The table-like nature of relational databases limits their ability to
represent intricate object relationships, thus hindering the representation of complex data
relationships critical for different application scenarios.
6. Performance Degradation over Time: Relational databases can experience performance
degradation which is due to the dependence on multiple tables and the consequences of that
complexity. As the data volume and table count increments, the response times may slow causing
query latency and possible system failures due to heavy user loads.
A Relational database schema S is a set of relation schemas S = {R1, R2, ..., Rm} and a s et of integrity
constraints IC. Example of relational database schema: COMPANY = {EMPLOYEE, DEPARTMENT,
DEPT_LOCATIONS, PROJECT, WORKS_ON, DEPENDENT}
A relational database schema S is a set of relation schemas S = {R1, R2, … , Rm} and a set of integrity
constraints IC. A relational database state DB of S is a set of relation states DB = {r1, r2, … , rm} such
that each ri is a state of Ri and such that the ri relation states satisfy the integrity constraints specified
in IC.
Figure :shows a relational database schema that we call
COMPANY = {EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON,
DEPENDENT}. In each relation schema, the underlined attribute represents the primary key.
Referential integrity constraint The referential integrity constraint is specified between two relations
and is used to maintain the consistency among tuples in the two relations. Informally, the referential
integrity constraint states that a tuple in one relation that refers to another relation must refer to an
existing tuple in that relation. For example COMPANY database, the attribute Dno of EMPLOYEE
gives the department number for which each employee works; hence, its value in every EMPLOYEE
tuple must match the Dnumber value of some tuple in the DEPARTMENT relation.
A Referential Integrity Constraint is specified between two relations and is used to maintain the
consistency among the tuples in the two relations.
To define referential integrity more formally, first we define the concept of a foreign key.
The conditions for a foreign key, given below, specify a referential integrity constraint between the two
relation schemas R1 and R2.
A set of attributes FK in relation schema R1 is a foreign key of R1 that references relation R2 if it
satisfies the following rules:
1. Attributes in FK have the same domain(s) as the primary key attributes PK of R2; the attributes FK
are said to reference or refer to the relation R2.
2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK for some tuple
t2 in the current state r2(R2) or is NULL. In the former case, we have t1[FK] = t2[PK], and we say that
the tuple t1 references or refers to the tuple t2. In this definition, R1 is called the referencing relation
and R2 is the referenced relation. If these two conditions hold, a referential integrity constraint from R1
to R2 is said to hold.
In the above image, the arrow mark represents that the values of the Dno of the employee table are
referencing the values in the department table.For better understanding, we will insert a new employee
table as shown below.
Is the new row insertion consistent?No, the Dno 2 does not exist, but we are trying to insert an employee
working for department 2.By establishing a relation, we can maintain consistency and integrity among
the tables of the database.The referential integrity constraint is established by using a foreign key
keyword.
We have created the employee and department tables, but we didn’t implement the relationship.We
will start creating the employee table and department again with the referential integrity constraints
using foreign keys.First, we will implement the department table.
The command for the creation of the department table is as shown below.
CREATE TABLE department (Dname varchar(15) NOT NULL, Dnumber int NOT NULL PRIMARY
KEY, Locations varchar(20));
Next, we will implement an employee table with the foreign key constraints, as shown below.
CREATE TABLE employee ( Fname varchar (15) NOT NULL, Minit varchar(1), Lname varchar(15)
NOT NULL, SSN varchar(9) NOT NULL, Bdate date, Address varchar(30), Sex varchar(1), Salary
float, Dno int NOT NULL, PRIMARY KEY (ssn), FOREIGN KEY (Dno) REFERENCES
department(Dnumber));
In the above command, the FOREIGN KEY is a keyword in the employee table. Dno is a foreign key
that references the Dnumber of the department table.
triggers and assertions can be used. In SQL, CREATE ASSERTION and CREATE TRIGGER
statements can be used for this purpose.
Functional dependency constraint Functional dependency constraint establishes a functional
relationship among two sets of attributes X and Y. This constraint specifies that the value of X
determines a unique value of Y in all states of a relation; it is denoted as a functional dependency X Y.
We use functional dependencies and other types of dependencies as tools to analyze the quality of
relations to improve their quality.
State constraints(static constraints) Define the constraints that a valid state of the database must
satisfy .
Transition constraints(dynamic constraints) Define to deal with state changes in the database.
The Insert operation provides a list of attribute values for a new tuple t that is to be inserted into a
elation R.
Insert can violate any of the four types of constraints
➢ Domain constraint :
• Domain constraint gets violated only when a given value to the attribute does not appear in the
corresponding domain or in case it is not of the appropriate datatype.
Example:
• Assume that the domain constraint says that all the values you insert in the relation should be
greater than 10, and in case you insert a value less than 10 will cause you violation of the
domain constraint, so gets rejected.
➢ Entity Integrity constraint :
• On inserting NULL values to any part of the primary key of a new tuple in the relation can
cause violation of the Entity integrity constraint
• Example: Insert (NULL, ‘Bikash, ‘M’, ‘Jaipur’, ‘123456’) into EMP
• The above insertion violates the entity integrity constraint since there is NULL for the
• primary key EID, it is not allowed, so it gets rejected.
➢ Key Constraints :
On inserting a value in the new tuple of a relation which is already existing in another tuple of the
same relation, can cause violation of Key Constraints.
Example: Insert (’1200’, ‘Arjun’, ‘9976657777’, ‘Mumbai’) into EMPLOYEE
This insertion violates the key constraint if EID=1200 is already present in some tuple in the same
relation, so it gets rejected.
➢ Referential integrity :
On inserting a value in the foreign key of relation 1, for which there is no corresponding value in the
Primary key which is referred to in relation 2, in such case Referential integrity is violated.
Example: When we try to insert a value say 1200 in EID (foreign key) of table 1, for which there is no
corresponding EID (primary key) of table 2, then it causes violation, so gets rejected.
2. The Delete Operation
On deleting the tuples in the relation, it may cause only violation of Referential integrity constraints.
➢ Referential Integrity Constraints
It causes violation only if the tuple in relation 1 is deleted which is referenced by foreign key from other
tuples of table 2 in the database, if such deletion takes place then the values in the tuple of the foreign
key in table 2 will become empty, which will eventually violate Referential Integrity constraint.
UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified.
Any of the other constraints may also be violated, depending on the attribute being updated:
➢ Updating the primary key (PK): It is Similar to a DELETE followed by an INSERT.
Need to specify similar options to DELETE.
➢ Updating a foreign key (FK): May violate referential integrity.
➢ Updating an ordinary attribute (neither PK nor FK): Can only violate domain constraints.
Ex: 1. Update the salary of an EMPLOYEE tuple with SSN =‘999887777’ to 28000.
Result: Acceptable Operation
2. Update the Dno of the EMPLOYEE tuple with SSN=‘999887777’ to 7.
Result: Acceptable Operation
3. Update the SSN of the EMPLOYEE tuple with SSN=‘999887777’ to ‘987654321’.
Result: Violates Primary key constraints,Violates Referential IC
TRANSACTION CONCEPT:
• A database application program running against a relational database typically executes one or
more transactions.
• A transaction is an executing program that includes some database operations, such as reading
from the database, or applying insertions, deletions, or updates to the database.
• At the end of the transaction, it must leave the database in a valid or consistent state that satisfies
all the constraints spec-ified on the database schema.
• A single transaction may involve any number of retrieval operations and any number of update
operations.
• EXAMPLE: A transaction to apply a bank with-drawal will typically read the user account record,
check if there is a sufficient bal-ance, and then update the record by the withdrawal amount.
CHAPTER 2:
RELATIONAL ALGEBRA
2.1 Introduction
Relational algebra is the basic set of operations for the relational model. These operations enable a user
to specify basic retrieval requests as relational algebra expressions. The result of an operation is a new
relation, which may have been formed from one or more input relations. The relational algebra is very
important for several reasons First, it provides a formal foundation for relational model operations.
Second, and perhaps more important, it is used as a basis for implementing and optimizing queries in
the query processing and optimization modules that are integral parts of relational database management
systems (RDBMSs) Third, some of its concepts are incorporated into the SQL standard query language
for RDBMSs.
Relational algebra is a procedural query processing language that provides the base of relational
databases and SQL.Relational algebra has various operators like select, project, rename, cartesian
product, union, intersection, set difference, joins, etc. These operators are used to form queries in
relational algebra.
The SELECT operation can also be visualized as a horizontal partition of the relation into two sets of
tuples those tuples that satisfy the condition and are selected, and those tuples that do not satisfy the
condition and are discarded.
In general, the select operation is denoted by
We can also add multiple conditions if required using the operators ∧ (AND), ⋁ (OR). These operators
are used to combine multiple conditions as required in the problem. Below is the selection operator
representation with multiple conditions.
To select all the tuples of a relation we write the selection operation without any condition.
where,
- the symbol is used to denote the select operator
- the selection condition is a Boolean (conditional) expression specified on the attributes of
relation R
- tuples that make the condition true are selected
appear in the result of the operation
- tuples that make the condition false are filtered out
discarded from the result of the operation
The Boolean expression specified in <selection condition> is made up of a number of clauses of the
form: <attribute name> <comparison op> <constant value>
or
<attribute name> <comparison op> <attribute name>
where
<attribute name> is the name of an attribute of R,
<constant value> is a constant value from the attribute domain
<conparision operators>is the operators like {<,<=,>,>=,!=}
Clauses can be connected by the standard Boolean operators and, or, and not to form a general
selection condition.
3. Select the tuples for all employees who either work in department 4 and make over $25,000
per year, or work in department 5 and make over $30,000.
1. For example, if we are interested in books published after 1960, we can write the selection
operation to retrieve just those books as:
• The operator is written with the Boolean condition as a subscript, and then the operand (the
input relation) is given in parentheses.
• Note that the Boolean condition refers to an attribute of the books relation, comparing it to a
constant value. The result of this operation is a relation with the same schema as books, but
with no name:
More complex Boolean expressions can be constructed from simple expressions using AND, OR,
and NOT. For instance, if we are interested in books published after 1960 as well as books by the author
with author_id equal to 6, we could write:
2. For example, we could find the books published after 1950, and then select from that result the
books with author_id equal to 6:
If the attribute list includes only monkey attributes of R, duplicate tuples are likely to occur. The result
of the PROJECT operation is a set of distinct tuples, and hence a valid relation. This is known as
duplicate elimination. For example, consider the following PROJECT operation:
T resulting relation even though this combination of
values appears twice in the EMPLOYEE relation.
The number of tuples in a relation resulting from a PROJECT operation is always less than or equal
to the number of tuples in R. Commutativity does not hold on PROJECT. as long as <list2> contains
the attributes in <list1>; otherwise, the left-hand side is an incorrect
expression. In SQL, the PROJECT attribute list is specified in the SELECT clause of a query. For
example, the following operation:
For example, to retrieve the first name, last name, and salary of all employees who work in
department number 5, we must apply a SELECT and a PROJECT operation. We can write a single
relational algebra expression, also known as an in-line expression, as follows:
If no renaming is applied, the names of the attributes in the resulting relation of a SELECT operation
are the same as those in the original relation and in the same order.For a PROJECT operation with no
renaming, the resulting relation has the same attribute names as those in the projection list and in the
same order in which they appear in the list.
2. Retrieve the SSN of all employees who either work in dep 5 or directly supervise and employee
who works in department 5.
4. Retrieve SSN of all the Employees who work on all project numbers 1,2,&3.
A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied. It is denoted by ⋈.
Syntax: R3 <- 𝔚(R1) <join_condition> (R2)
• Inner Join is used to return rows from both tables which satisfy the given condition. It is
the most widely used join operation and can be considered as a default join-type
• An Inner join or equijoin is a comparator-based join which uses equality comparisons in
the join-predicate. However, if you use other comparison operators like “>” it can’t be
called equijoin.
i) Theta Join
• Theta Join allows you to merge two tables based on the condition represented by theta.
Theta joins work for all comparison operators. It is denoted by symbol θ. The general case
of JOIN operation is called a Theta join.
RESULT:
3.OUTER JOIN :The outer join operation is an extension of the join operation. It is used to
deal with missing information.
where < grouping attributes > is a list of attributes of R and < function list > is a list of pairs of
the form (< function >, < attribute >) where < function > is one of SUM, AVERAGE, MAX,
MIN, COUNT
CHAPTER 3
ER-to-Relational Mapping Algorithm
Steps:
1. Mapping of Regular Entity Types.
2. Mapping of Weak Entity Types.
3. Mapping of Binary 1:1 Relationship Types.
4. Mapping of Binary 1:N Relationship Types.
5. Mapping of Binary M:N Relationship Types.
6. Mapping of Multivalued Attributes.
7. Mapping of N-ary Relationship Types.
Step 1: Mapping of Regular Entity Types.
For each regular (strong) entity type E in the ER schema, create a relation R that includes all
the simple attributes of E.
Include only the simple component attributes of a composite attribute.Choose one of the key
attributes of E as the primary key for R.
In our example, we create the relations EMPLOYEE, DEPARTMENT, and PROJECT in
Figure to correspond to the regular entity types .
EMPLOYEE, DEPARTMENT, and PROJECT from Figure , The foreign key and relationship
attributes, if any, are not included yet; they will be added during subsequent steps.
These include the attributes Super_ssn and Dno of EMPLOYEE, Mgr_ssn and Mgr_start_date
of DEPARTMENT, and Dnum of PROJECT. In our example, we choose Ssn, Dnumber, and
Pnumber as primary keys for the relations EMPLOYEE, DEPARTMENT, and PROJECT,
respectively.
Knowledge that Dname of DEPARTMENT and Pname of PROJECT are unique keys is kept
for possible use later in the design.
The relations that are created from the mapping of entity types are sometimes called entity
relations because each tuple represents an entity instance.
foreign key approach, (2) the merged relationship approach, and (3) the cross-reference
or relationship relation approach
1. Choose one of the relations—S, say—and include as a foreign key in S the primary key of T.
It is better to choose an entity type with total participation in R in the role of S. In our example,
we map the 1:1 relationship type MANAGES from Figure 9.1 by choosing the participating
entity type DEPARTMENT to serve in the role of S because its participation in the MANAGES
relationship type is total (every department has a manager).
2. An alternative mapping of a 1:1 relationship type is to merge the two entity types and the
relationship into a single relation. This is possible when both participations are total, as this
would indicate that the two tables will have the exact same number of tuples at all times.
3. The third option is to set up a third relation R for the purpose of cross-referencing the primary
keys of the two relations S and T representing the entity types.
Step 7: Mapping of N-ary Relationship Types. We use the relationship relation option
• For each n-ary relationship type R, where n > 2, create a new relationship relation S to
represent R. Include as foreign key attributes in S the primary keys of the relations that
represent the participating entity types.
• Also include any simple attributes of the n-ary relationship type (or simple components of
composite attributes) as attributes of S.
• The primary key of S is usually a combination of all the foreign keys that reference the
relations representing the participating entity types.