unit_2_DBMS
unit_2_DBMS
Unit II
Domain :Every attribute has some predefined value and scope, which is known as the
attribute domain
Attributes: It refers to every column present in a table. The attributes refer to the properties
that help us define a relation. E.g., Employee_ID, Student_Rollno, SECTION, NAME, etc.
Keys: Each and every row consists of a single or multiple attributes. It is known as a relation
key.
Characteristics of relations:
All of the values present in a column hold the same data type
Relational model in DBMS is an approach to logically represent and manage the data
stored in a database by storing data in tables.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 20
DBMS CS 503 AIST, SAGAR
Relations, Attributes and Tuples, Degree and Cardinality, Relational Schema and
Relation instance, and Relation Keys are some important components of the
Relational Model.
To maintain data integrity constraints such as domain, key, and referential integrity
are implemented in the relational model.
Presence of redundancy in data can lead to insertion, deletion, and updation anomalies
in a relational database.
A perfect relational database follows and implements all the 13 Codd Rules.
Because of the use of tables and constraints, relational models are simple to use, easy
to manage, provide data integrity, and are query capable.
Increasing the amount of data can lead to performance and storage issues with
relational databases.
Keys
Keys are used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
Keys are of different types eg: Super key, Candidate key, Primary Key, Foreign key, etc.
Super Key: Super Key is the set of all the keys which help to identify rows in a table
uniquely. This means that all those columns of a table than capable of identifying the
other columns of that table uniquely will all be considered super keys
o Super Key is the superset of a candidate key (explained below). The Primary
Key of a table is picked from the super key set to be made the table’s identity
attribute.
Candidate Key: A candidate key is an set of More than one attribute that can
uniquely identify a tuple
Primary Key: It is the first key used to identify one and only one instance of an
entity uniquely. An entity can contain multiple keys, as we saw in the PERSON table.
The key which is most suitable from those lists becomes a primary key.
For each entity, the primary key selection is based on requirements and developers.
alternate key. In other words, the total number of the alternate keys is the total number
of candidate keys minus the primary key. The alternate key may or may not exist. If
there is only one candidate key in a relation, it does not have an alternate key.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 22
DBMS CS 503 AIST, SAGAR
Foreign Key: Foreign keys are the column of the table used to point to the primary
key of another table
Composite Key: Whenever a primary key consists of more than one attribute, it is
known as a composite key. This key is also known as Concatenated Key.
Integrity constraints:
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.
Thus, integrity constraint is used to guard against accidental damage to the database.
o There are four types of integrity constraints in DBMS:
Domain Constraint
Entity Constraint
Referential Integrity Constraint
Key Constraint
Domain Constraint: Domain integrity constraint contains a certain set of rules or conditions
to restrict the kind of attributes or values a column can hold in the database table. The data
type of a domain can be string, integer, character, , currency, etc.
Entity Integrity Constraint: Entity Integrity Constraint is used to ensure that the primary
key cannot be null. A primary key is used to identify individual records in a table and if the
primary key has a null value, then we can't identify those records. There can be null values
anywhere in the table except the primary key column.
Referential Integrity Constraint: Referential Integrity Constraint ensures that there must
always exist a valid relationship between two relational database tables. This valid
relationship between the two tables confirms that a foreign key exists in a table. It should
always reference a corresponding value or attribute in the other table or be null.
Key constraint: Keys are the set of entities that are used to identify an entity within its entity
set uniquely. There could be multiple keys in a single entity set, but out of these multiple
keys, only one key will be the primary key. A primary key can only contain unique and not
null values in the relational database table.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 22
DBMS CS 503 AIST, SAGAR
Intension: The intension of a given relation is independent of time. It is the permanent part
of the relation. It corresponds to what is specified in the relational schema. The intension thus
defines all permissible extensions. The intension is a combination of two things : a structure
and a set of integrity constraints
Extension: The extension of a given relation is the set of tuples appearing in that relation at
any given instance. The extension thus varies with time. It changes as tuples are created,
destroyed, and updated Relation
The intension corresponds to what is The extension of a given relation is the set of
specified in the relational schema. The tuples appearing in that relationship at any
intension thus defines all permissible given instance
extensions.
The intension is a combination of two things: It changes as tuples are created, destroyed,
a structure and a set of integrity constraints. and updated.
Relational algebra is used to break the user requests and instruct the DBMS to execute them.
Relational Query language is used by the user to communicate with the database. They are
generally on a higher level than any other programming language.
CREATE Command
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 23
DBMS CS 503 AIST, SAGAR
);
DROP Command
ALTER Command
TRUNCATE Command
RENAME Command
SELECT Command
INSERT Command
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 24
DBMS CS 503 AIST, SAGAR
UPDATE Command
DELETE Command:
SQL Join statement is used to combine data or rows from two or more tables based on a
common field between them. Different types of Joins are as follows:
o INNER JOIN
Equi join
Natural join
o OUTER JOIN
INNER JOIN: The INNER JOIN keyword selects all rows from both the tables as long as
the condition (<, >, <=, >=, ==, !=) is satisfied. This keyword will create the result-set by
combining all rows from both the tables where the condition satisfies i.e value of the common
field will be the same.
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
ON table1.matching_column = table2.matching_column;
Theta /conditional join: Theta Join allows you to merge two tables based on the
condition represented by theta. Theta joins work for all comparison operators. It is denoted by
symbol θ. The general case of JOIN operation is called a Theta join.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 25
DBMS CS 503 AIST, SAGAR
Equi join: EQUI Join is done when a Theta join uses only the equivalence(==)
condition. EQUI join is the most difficult operation to implement efficiently in an RDBMS,
and one reason why RDBMS have essential performance problems
Natural join: A natural join is a type of Equi join which occurs implicitly by
comparing all the same names columns in both tables. The join result has only one column
for each pair of equally named columns.
Outer Join: An Outer Join doesn’t require each record in the two join tables to have a
matching record. In this type of join, the table retains each record even if no other matching
record exists.
Left Outer Join: Left Outer Join returns all the rows from the table on the left even if
no matching rows have been found in the table on the right. When no matching record
is found in the table on the right, NULL is returned.
Right Outer Join: Right Outer Join returns all the columns from the table on the
right even if no matching rows have been found in the table on the left. Where no
matches have been found in the table on the left, NULL is returned. RIGHT outer
JOIN is the opposite of LEFT JOIN
Full Outer Join: In a Full Outer Join , all tuples from both relations are included in
the result, irrespective of the matching condition.
Indexing
Indexing refers to a data structure technique that is used for quickly retrieving entries from
database files using some attributes that have been indexed. In database systems, indexing is
comparable to indexing in books. The indexing attributes are used to define the indexing.
Indexing is a technique for improving database performance by reducing the number of disk
accesses necessary when a query is run. An index is a form of data structure. It’s used to
swiftly identify and access data and information present in a database table.
Structure of Index:
The search key is the database’s first column, and it contains a duplicate or copy of
the table’s candidate key or primary key. The primary key values are saved in sorted
order so that the related data can be quickly accessible.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 26
DBMS CS 503 AIST, SAGAR
The data reference is the database’s second column. It contains a group of pointers
that point to the disk block where the value of a specific key can be found.
Methods of Indexing:
1. Ordered Indices: To make searching easier and faster, the indices are frequently
arranged/sorted. Ordered indices are indices that have been sorted
2. Primary Index:
Primary indexing refers to the process of creating an index based on the table’s
primary key. These primary keys are specific to each record and establish a 1:1
relationship between them.
The searching operation is fairly efficient because primary keys are stored in
sorted order.
There are two types of primary indexes: dense indexes and sparse indexes.
3. Dense Index: Every search key value in the data file has an index record in the dense
index. It speeds up the search process. The total number of records present in the
index table and the main table are the same in this case. It requires extra space to hold
the index record. A pointer to the actual record on the disk and the search key are both
included in the index records.
4. Sparse Index: Only a few items in the data file have index records. Each and every
item points to a certain block. Rather than pointing to each item in the main database,
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 28
DBMS CS 503 AIST, SAGAR
the index, in this case, points to the records that are present in the main table that is in
a gap.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 28
DBMS CS 503 AIST, SAGAR
1. Clustering Index:
b. In this situation, we’ll join two or more columns to acquire the unique value
and generate an index out of them to make it easier to find the record. A
clustering index is a name for this method.
c. Records with comparable properties are grouped together, and indices for
these groups are constructed.
2. Secondary Index:
a. When using sparse indexing, the size of the mapping grows in sync with the
size of the table. These mappings are frequently stored in primary memory to
speed up address fetching. The secondary memory then searches the actual
data using the address obtained through mapping. Fetching the address
becomes slower as the mapping size increases. The sparse index will be
ineffective in this scenario, so secondary indexing is used to solve this
problem.
Triggers: Triggers are the SQL statements that are automatically executed when there is any
change in the database. The triggers are executed in response to certain events(INSERT,
UPDATE or DELETE) in a particular table. These triggers help in maintaining the integrity
of the data by changing the data of the database in a systematic fashion.
Advantages:
Forcing security approvals on the table that are present in the database
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 28
DBMS CS 503 AIST, SAGAR
Normally triggers can be useful for inspecting the data changes in tables
Triggers give an alternative way to run scheduled tasks. Using triggers, we don’t have
to wait for the scheduled events to run because the triggers are invoked automatically
before or after a change is made to the data in a table
Disadvantages:
Triggers can only provide extended validations, i.e, not all kind validations. For
simple validations, you can use the NOT NULL, UNIQUE, CHECK and FOREIGN
KEY constraints
BEFORE | AFTER: It specifies when the trigger will be initiated i.e. before the
ongoing event or after the ongoing event.
INSERT | UPDATE | DELETE: These are the DML operations and we can use
either of them in a given trigger.
ON[TABLE_NAME]: It specifies the name of the table on which the trigger is going
to be applied.
FOR EACH ROW: Row-level trigger gets executed when any row value of any
column changes.
TRIGGER BODY: It consists of queries that need to be executed when the trigger is
called.
When a constraint involves 2 (or) more tables, the table constraint mechanism is sometimes
hard and results may not come as expected. To cover such situation SQL supports the
creation of assertions that are constraints not associated with only one table. And an assertion
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 30
DBMS CS 503 AIST, SAGAR
statement should ensure a certain condition will always exist in the database. DBMS always
checks the assertion whenever modifications are done in the corresponding table.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 30
DBMS CS 503 AIST, SAGAR
Syntax –
CHECK ( [ condition ] );
Relational algebra is used to break the user requests and instruct the DBMS to execute them.
Relational Query language is used by the user to communicate with the database. They are
generally on a higher level than any other programming language.
Non-Procedural Language
Procedural Language: We write the program code in a procedural language in the form of a
sequence of various instructions. A user must specify what the machines need to do and also
specify how to do it (by mentioning a procedure indicating individual steps). The execution
of these instructions occurs in a sequential manner. The instructions typically exist for
solving a specified set of problems.
Non-Procedural Language: In these types of languages, the concerned user only needs to
specify what the device or system needs to do. We don’t have to specify how to perform the
specified operation. A non-procedural language is also called functional or applicative
language. Its mode of operation comprises function development from other given functions
(for the construction of more complex and large functions).
Relational algebra in DBMS is a procedural query language. Queries in relational algebra are
performed using operators. Relational Algebra is the fundamental block for modern language
SQL and modern Database Management Systems such as Oracle Database, Mircosoft SQL
Server, IBM Db2, etc.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 30
DBMS CS 503 AIST, SAGAR
1. Basic Operations
2. Derived Operations
Applying these operations over relations/tables will give us new relation as output.
Here σ stands for the selection predicate while r stands for the relation. p refers to the
prepositional logic formula that may use connectors such as or, and, and not. Also, these
terms may make use of relational operators such as − =, ≠, ≥, < , >, ≤.
Project Operation (or ∏): It projects those column(s) that satisfy any given predicate.
Union Operation (or ∪): It would perform binary union between two relations.
The notation is − r U s
The given conditions must hold if we want any union operation to be valid:
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 31
DBMS CS 503 AIST, SAGAR
Derived Operations
Join
Intersection
Division
Division Operator (÷): Division operator A÷B can be applied if and only if:
The relation returned by division operator will have attributes = (All attributes of A –
All Attributes of B)
The relation returned by division operator will return those tuples from relation A
which are associated to every B’s tuple
A ÷B
Join Operations: Join Operations are binary operations that allow us to combine two or
more relations.
They are further classified into two types: Inner Join, and Outer Join.
Inner Join:
When we perform Inner Join, only those tuples are returned which satisfies the certain
condition. It is also classified into three types: Theta Join, Equi Join and Natural Join.
Theta Join (θ): Theta Join combines two relations using a condition. This condition is
represented by the symbol "theta"(θ). Here conditions can be inequality conditions such
as >,<,>=,<=, etc.
Notation : R ⋈θ S
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 32
DBMS CS 503 AIST, SAGAR
Equi Join: Equi Join is a special case of theta join where the condition can only contain
**equality(=)** comparisons.
Natural Join (⋈):A comparison operator is not used in a natural join. It does not concatenate
like a Cartesian product. A Natural Join can be performed only if two relations share at least
one common attribute. Furthermore, the attributes must share the same name and domain.
Notation : R ⋈ S
Outer Join: Unlike Inner Join which includes the tuple that satisfies the given condition,
Outer Join also includes some/all the tuples which doesn't satisfies the given condition. It is
also of three types: Left Outer Join, Right Outer Join, and Full Outer Join.
Left Outer Join: As we can see from the diagram, Left Outer Join returns the matching
tuples(tuples present in both relations) and the tuples which are only present in Left Relation,
here R.
It is denoted by ⟕.
Right Outer Join : Right Outer Join returns the matching tuples and the tuples which are
only present in Right Relation here S.
It is denoted by ⟖.
Full Outer Join: Full Outer Join returns all the tuples from both relations. However if there
are no matching tuples then, their respective attributes are made NULL in output relation.
It is denoted by ⟗.
Relational Calculus
You would have learned SQL in Databases. Do you know the mathematical foundation of
SQL? The mathematical foundation of SQl is based upon Relational Algebra and Relational
Calculus. The Tuple Relational Calculus and Domain Relational Calculus are the two ways
we can write Relational Calculus Queries
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 33
DBMS CS 503 AIST, SAGAR
Procedural Language - Those Languages which clearly define how to get the required
results from the Database are called Procedural Language. Relational algebra is a
Procedural Language.
Declarative Language - Those Language that only cares about What to get from the
database without getting into how to get the results are called Declarative Language.
Relational Calculus is a Declarative Language.
Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each row of the
table and checks if the predicate is true or false for the given row. Depending on the given
predicate condition, it returns the row or part of the row
{t | P(t)}
OR
Domain Relational Calculus uses domain Variables to get the column values required from
the database based on the predicate expression or condition.
{<x1,x2,x3,x4...> | P(x1,x2,x3,x4...)}
<x1,x2,x3,x4...> are domain variables used to get the column values required, and
P(x1,x2,x3...) is predicate expression or condition.
Conclusion:
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 34
DBMS CS 503 AIST, SAGAR
Relational Calculus in DBMS tells us what we want from the database and not how to
get that.
TRC uses tuple variable and checks every Row with the Predicate expression
condition.
DRC uses domain variables and returns the required attribute or column based on the
condition.
TRC and DRC queries can give more than one tuple or attribute in the result.
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 35