0% found this document useful (0 votes)
2 views

unit_2_DBMS

The document provides an overview of relational data models, including concepts such as domains, tuples, attributes, relations, keys, and integrity constraints. It discusses relational query languages, SQL commands, various types of joins, indexing methods, and triggers in the context of database management systems. Additionally, it emphasizes the importance of maintaining data integrity and the characteristics of a well-structured relational database.

Uploaded by

Raj Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

unit_2_DBMS

The document provides an overview of relational data models, including concepts such as domains, tuples, attributes, relations, keys, and integrity constraints. It discusses relational query languages, SQL commands, various types of joins, indexing methods, and triggers in the context of database management systems. Additionally, it emphasizes the importance of maintaining data integrity and the characteristics of a well-structured relational database.

Uploaded by

Raj Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

DBMS CS 503 AIST, SAGAR

Unit II

Relational Data models: Domains, Tuples, Attributes, Relations, Characteristics of relations,


Keys, Key attributes of relation, Relational database, Schemas, Integrity constraints.
Referential integrity, Intension and Extension, Relational Query languages: SQL-DDL,
DML, integrity con straints, Complex queries, various joins, indexing, triggers, assertions ,
Relational algebra and relational calculus, Relational algebra operations like select, Project
,Join, Division, outer union. Types of relational calculus i.e. Tuple oriented and domain
oriented relational calculus and its operations

Domain :Every attribute has some predefined value and scope, which is known as the
attribute domain

Tuples: It is a single row of a table that consists of a single record

Attributes: It refers to every column present in a table. The attributes refer to the properties
that help us define a relation. E.g., Employee_ID, Student_Rollno, SECTION, NAME, etc.

Relations: : It represents the relation’s name along with its attributes

Keys: Each and every row consists of a single or multiple attributes. It is known as a relation
key.

Characteristics of relations:

 Every row is unique

 All of the values present in a column hold the same data type

 Values are atomic

 The columns sequence is not significant

 The rows sequence is not significant

 The name of every column is unique

Key attributes of relation

 Relational model in DBMS is an approach to logically represent and manage the data
stored in a database by storing data in tables.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 20
DBMS CS 503 AIST, SAGAR

 Relations, Attributes and Tuples, Degree and Cardinality, Relational Schema and
Relation instance, and Relation Keys are some important components of the
Relational Model.

 To maintain data integrity constraints such as domain, key, and referential integrity
are implemented in the relational model.

 Presence of redundancy in data can lead to insertion, deletion, and updation anomalies
in a relational database.

 A perfect relational database follows and implements all the 13 Codd Rules.

 Because of the use of tables and constraints, relational models are simple to use, easy
to manage, provide data integrity, and are query capable.

 Increasing the amount of data can lead to performance and storage issues with
relational databases.

Keys

Keys are used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
Keys are of different types eg: Super key, Candidate key, Primary Key, Foreign key, etc.

Types of DBMS Keys:

 Super Key: Super Key is the set of all the keys which help to identify rows in a table
uniquely. This means that all those columns of a table than capable of identifying the
other columns of that table uniquely will all be considered super keys

o Super Key is the superset of a candidate key (explained below). The Primary
Key of a table is picked from the super key set to be made the table’s identity
attribute.

 Candidate Key: A candidate key is an set of More than one attribute that can
uniquely identify a tuple

 Primary Key: It is the first key used to identify one and only one instance of an
entity uniquely. An entity can contain multiple keys, as we saw in the PERSON table.
The key which is most suitable from those lists becomes a primary key.

 For each entity, the primary key selection is based on requirements and developers.

 Alternate Key: There may be one or more attributes or a combination of attributes


that uniquely identify each tuple in a relation. These attributes or combinations of the
attributes are called the candidate keys. One key is chosen as the primary key from
these candidate keys, and the remaining candidate key, if it exists, is termed the
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 22
DBMS CS 503 AIST, SAGAR

alternate key. In other words, the total number of the alternate keys is the total number
of candidate keys minus the primary key. The alternate key may or may not exist. If
there is only one candidate key in a relation, it does not have an alternate key.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 22
DBMS CS 503 AIST, SAGAR

 Foreign Key: Foreign keys are the column of the table used to point to the primary
key of another table

 Composite Key: Whenever a primary key consists of more than one attribute, it is
known as a composite key. This key is also known as Concatenated Key.

Integrity constraints:

 Integrity constraints are a set of rules. It is used to maintain the quality of information.

 Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.

 Thus, integrity constraint is used to guard against accidental damage to the database.
o There are four types of integrity constraints in DBMS:
 Domain Constraint
 Entity Constraint
 Referential Integrity Constraint
 Key Constraint
Domain Constraint: Domain integrity constraint contains a certain set of rules or conditions
to restrict the kind of attributes or values a column can hold in the database table. The data
type of a domain can be string, integer, character, , currency, etc.
Entity Integrity Constraint: Entity Integrity Constraint is used to ensure that the primary
key cannot be null. A primary key is used to identify individual records in a table and if the
primary key has a null value, then we can't identify those records. There can be null values
anywhere in the table except the primary key column.
Referential Integrity Constraint: Referential Integrity Constraint ensures that there must
always exist a valid relationship between two relational database tables. This valid
relationship between the two tables confirms that a foreign key exists in a table. It should
always reference a corresponding value or attribute in the other table or be null.
Key constraint: Keys are the set of entities that are used to identify an entity within its entity
set uniquely. There could be multiple keys in a single entity set, but out of these multiple
keys, only one key will be the primary key. A primary key can only contain unique and not
null values in the relational database table.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 22
DBMS CS 503 AIST, SAGAR

Intension and Extension

Intension: The intension of a given relation is independent of time. It is the permanent part
of the relation. It corresponds to what is specified in the relational schema. The intension thus
defines all permissible extensions. The intension is a combination of two things : a structure
and a set of integrity constraints

Extension: The extension of a given relation is the set of tuples appearing in that relation at
any given instance. The extension thus varies with time. It changes as tuples are created,
destroyed, and updated Relation

Intension Database Extension Database

The intension corresponds to what is The extension of a given relation is the set of
specified in the relational schema. The tuples appearing in that relationship at any
intension thus defines all permissible given instance
extensions.

The intension of a given relation is The intension of a given relation is


independent of time. It is the permanent part independent of time. It is the permanent part
of the relationship of the relationship

The intension is a combination of two things: It changes as tuples are created, destroyed,
a structure and a set of integrity constraints. and updated.

Relational Query Languages

Relational algebra is used to break the user requests and instruct the DBMS to execute them.
Relational Query language is used by the user to communicate with the database. They are
generally on a higher level than any other programming language.

This is further divided into two types:

 Procedural Query Language (Relation Algebra)

 Non-Procedural (or Declarative) Language (Relation calculus)

DDL Commands in SQL

Following are the five DDL commands in SQL:

 CREATE Command

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 23
DBMS CS 503 AIST, SAGAR

CREATE TABLE table_name

column_Name1 data_type ( size of the column ) ,

column_Name2 data_type ( size of the column) ,

column_Name3 data_type ( size of the column) , .

column_NameN data_type ( size of the column )

);

 DROP Command

DROP TABLE Table_Name;

 ALTER Command

ALTER TABLE name_of_table


ADD column_name column_definition;

 TRUNCATE Command

TRUNCATE TABLE Table_Name;

 RENAME Command

RENAME TABLE Old_Table_Name TO New_Table_Name;

DML Commands in SQL

Following are the four main DML commands in SQL:

 SELECT Command

SELECT column_Name_1, column_Name_2, ….., column_Name_N FROM


Name_of_table;

 INSERT Command

INSERT INTO TABLE_NAME ( column_Name1 , column_Name2 , column_Name3 , ....


column_NameN ) VALUES (value_1, value_2, value_3, .... value_N ) ;

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 24
DBMS CS 503 AIST, SAGAR

 UPDATE Command

UPDATE Table_name SET [column_name1= value_1, ….., column_nameN = value_N]


WHERE CONDITION;

 DELETE Command:

DELETE FROM Table_Name WHERE condition;

Various Joins in DBMS

SQL Join statement is used to combine data or rows from two or more tables based on a
common field between them. Different types of Joins are as follows:

o INNER JOIN

 Theta /conditional join

 Equi join

 Natural join

o OUTER JOIN

 LEFT Outer JOIN

 RIGHT Outer JOIN

 FULL Outer JOIN

INNER JOIN: The INNER JOIN keyword selects all rows from both the tables as long as
the condition (<, >, <=, >=, ==, !=) is satisfied. This keyword will create the result-set by
combining all rows from both the tables where the condition satisfies i.e value of the common
field will be the same.

SELECT table1.column1,table1.column2,table2.column1,....

FROM table1

INNER JOIN table2

ON table1.matching_column = table2.matching_column;

Theta /conditional join: Theta Join allows you to merge two tables based on the
condition represented by theta. Theta joins work for all comparison operators. It is denoted by
symbol θ. The general case of JOIN operation is called a Theta join.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 25
DBMS CS 503 AIST, SAGAR

Equi join: EQUI Join is done when a Theta join uses only the equivalence(==)
condition. EQUI join is the most difficult operation to implement efficiently in an RDBMS,
and one reason why RDBMS have essential performance problems

Natural join: A natural join is a type of Equi join which occurs implicitly by
comparing all the same names columns in both tables. The join result has only one column
for each pair of equally named columns.

Outer Join: An Outer Join doesn’t require each record in the two join tables to have a
matching record. In this type of join, the table retains each record even if no other matching
record exists.

Left Outer Join: Left Outer Join returns all the rows from the table on the left even if
no matching rows have been found in the table on the right. When no matching record
is found in the table on the right, NULL is returned.

Right Outer Join: Right Outer Join returns all the columns from the table on the
right even if no matching rows have been found in the table on the left. Where no
matches have been found in the table on the left, NULL is returned. RIGHT outer
JOIN is the opposite of LEFT JOIN

Full Outer Join: In a Full Outer Join , all tuples from both relations are included in
the result, irrespective of the matching condition.

Indexing

Indexing refers to a data structure technique that is used for quickly retrieving entries from
database files using some attributes that have been indexed. In database systems, indexing is
comparable to indexing in books. The indexing attributes are used to define the indexing.

What is Indexing in DBMS?

Indexing is a technique for improving database performance by reducing the number of disk
accesses necessary when a query is run. An index is a form of data structure. It’s used to
swiftly identify and access data and information present in a database table.

Structure of Index:

We can create indices using some columns of the database.

 The search key is the database’s first column, and it contains a duplicate or copy of
the table’s candidate key or primary key. The primary key values are saved in sorted
order so that the related data can be quickly accessible.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 26
DBMS CS 503 AIST, SAGAR

 The data reference is the database’s second column. It contains a group of pointers
that point to the disk block where the value of a specific key can be found.

Methods of Indexing:

1. Ordered Indices: To make searching easier and faster, the indices are frequently
arranged/sorted. Ordered indices are indices that have been sorted

2. Primary Index:

 Primary indexing refers to the process of creating an index based on the table’s
primary key. These primary keys are specific to each record and establish a 1:1
relationship between them.

 The searching operation is fairly efficient because primary keys are stored in
sorted order.

 There are two types of primary indexes: dense indexes and sparse indexes.

3. Dense Index: Every search key value in the data file has an index record in the dense
index. It speeds up the search process. The total number of records present in the
index table and the main table are the same in this case. It requires extra space to hold
the index record. A pointer to the actual record on the disk and the search key are both
included in the index records.

4. Sparse Index: Only a few items in the data file have index records. Each and every
item points to a certain block. Rather than pointing to each item in the main database,

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 28
DBMS CS 503 AIST, SAGAR

the index, in this case, points to the records that are present in the main table that is in
a gap.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 28
DBMS CS 503 AIST, SAGAR

1. Clustering Index:

a. An ordered data file can be defined as a clustered index. Non-primary key


columns, which may or may not be unique for each record, are sometimes
used to build indices.

b. In this situation, we’ll join two or more columns to acquire the unique value
and generate an index out of them to make it easier to find the record. A
clustering index is a name for this method.

c. Records with comparable properties are grouped together, and indices for
these groups are constructed.

2. Secondary Index:

a. When using sparse indexing, the size of the mapping grows in sync with the
size of the table. These mappings are frequently stored in primary memory to
speed up address fetching. The secondary memory then searches the actual
data using the address obtained through mapping. Fetching the address
becomes slower as the mapping size increases. The sparse index will be
ineffective in this scenario, so secondary indexing is used to solve this
problem.

b. Another level of indexing is introduced in secondary indexing to reduce the


size of the mapping. The massive range for the columns is chosen first in this
method, resulting in a small mapping size at the first level. Each range is then
subdivided into smaller groups. Because the first level’s mapping is kept in
primary memory, fetching the addresses is faster. The second-level mapping,
as well as the actual data, are kept in secondary memory (or hard disk).

Triggers: Triggers are the SQL statements that are automatically executed when there is any
change in the database. The triggers are executed in response to certain events(INSERT,
UPDATE or DELETE) in a particular table. These triggers help in maintaining the integrity
of the data by changing the data of the database in a systematic fashion.

Advantages:

 Forcing security approvals on the table that are present in the database

 Triggers provide another way to check the integrity of data

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 28
DBMS CS 503 AIST, SAGAR

 Counteracting invalid exchanges

 Triggers handle errors from the database layer

 Normally triggers can be useful for inspecting the data changes in tables

 Triggers give an alternative way to run scheduled tasks. Using triggers, we don’t have
to wait for the scheduled events to run because the triggers are invoked automatically
before or after a change is made to the data in a table

Disadvantages:

 Triggers can only provide extended validations, i.e, not all kind validations. For
simple validations, you can use the NOT NULL, UNIQUE, CHECK and FOREIGN
KEY constraints

 Triggers may increase the overhead of the database

 Triggers can be difficult to troubleshoot because they execute automatically in the


database, which may not invisible to the client applications
CREATE TRIGGER: These two keywords specify that a triggered block is going to
be declared.

TRIGGER_NAME: It creates or replaces an existing trigger with the Trigger_name.


The trigger name should be unique.

BEFORE | AFTER: It specifies when the trigger will be initiated i.e. before the
ongoing event or after the ongoing event.

INSERT | UPDATE | DELETE: These are the DML operations and we can use
either of them in a given trigger.

ON[TABLE_NAME]: It specifies the name of the table on which the trigger is going
to be applied.

FOR EACH ROW: Row-level trigger gets executed when any row value of any
column changes.

TRIGGER BODY: It consists of queries that need to be executed when the trigger is
called.

What are Assertions

When a constraint involves 2 (or) more tables, the table constraint mechanism is sometimes
hard and results may not come as expected. To cover such situation SQL supports the
creation of assertions that are constraints not associated with only one table. And an assertion

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 30
DBMS CS 503 AIST, SAGAR

statement should ensure a certain condition will always exist in the database. DBMS always
checks the assertion whenever modifications are done in the corresponding table.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 30
DBMS CS 503 AIST, SAGAR

Syntax –

CREATE ASSERTION [ assertion_name ]

CHECK ( [ condition ] );

Relational Query Languages

Relational algebra is used to break the user requests and instruct the DBMS to execute them.
Relational Query language is used by the user to communicate with the database. They are
generally on a higher level than any other programming language.

This is further divided into two types

 Procedural Query Language

 Non-Procedural Language

Procedural Language: We write the program code in a procedural language in the form of a
sequence of various instructions. A user must specify what the machines need to do and also
specify how to do it (by mentioning a procedure indicating individual steps). The execution
of these instructions occurs in a sequential manner. The instructions typically exist for
solving a specified set of problems.

Non-Procedural Language: In these types of languages, the concerned user only needs to
specify what the device or system needs to do. We don’t have to specify how to perform the
specified operation. A non-procedural language is also called functional or applicative
language. Its mode of operation comprises function development from other given functions
(for the construction of more complex and large functions).

Relational Algebra in DBMS:

Relational algebra in DBMS is a procedural query language. Queries in relational algebra are
performed using operators. Relational Algebra is the fundamental block for modern language
SQL and modern Database Management Systems such as Oracle Database, Mircosoft SQL
Server, IBM Db2, etc.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 30
DBMS CS 503 AIST, SAGAR

Types of Relational Operations

In Relation Algebra, we have two types of Operations.

1. Basic Operations

2. Derived Operations

Applying these operations over relations/tables will give us new relation as output.

Relational Algebra Basic Operations:

The following are the fundamental operations present in a relational algebra:

 Select Operation "sigma" (σ)

 Project Operation "pi"(∏)

 Union Operation (U)

 Set Different Operation(-)

 Cartesian Product Operation( X)

 Rename Operation "Rho"(ρ).

Select Operation ( σ):

It selects tuples(ROWS) from a relation that satisfy the provided predicate.

The notation is – σp (r)

Here σ stands for the selection predicate while r stands for the relation. p refers to the
prepositional logic formula that may use connectors such as or, and, and not. Also, these
terms may make use of relational operators such as − =, ≠, ≥, < , >, ≤.

Project Operation (or ∏): It projects those column(s) that satisfy any given predicate.

The notation is − ∏c1,c2..cn (relation name)

Union Operation (or ∪): It would perform binary union between two relations.

The notation is − r U s

The given conditions must hold if we want any union operation to be valid:
Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 31
DBMS CS 503 AIST, SAGAR

 s, and r must contain a similar number of attributes.

 The domains of an attribute must be compatible.

 The duplicate tuples are eliminated automatically.

Derived Operations

 Join

 Intersection

 Division

Division Operator (÷): Division operator A÷B can be applied if and only if:

 Attributes of B is proper subset of Attributes of A.

 The relation returned by division operator will have attributes = (All attributes of A –
All Attributes of B)

 The relation returned by division operator will return those tuples from relation A
which are associated to every B’s tuple

A ÷B

Join Operations: Join Operations are binary operations that allow us to combine two or
more relations.

They are further classified into two types: Inner Join, and Outer Join.

Inner Join:

When we perform Inner Join, only those tuples are returned which satisfies the certain
condition. It is also classified into three types: Theta Join, Equi Join and Natural Join.

Theta Join (θ): Theta Join combines two relations using a condition. This condition is
represented by the symbol "theta"(θ). Here conditions can be inequality conditions such
as >,<,>=,<=, etc.

Notation : R ⋈θ S

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 32
DBMS CS 503 AIST, SAGAR

Equi Join: Equi Join is a special case of theta join where the condition can only contain
**equality(=)** comparisons.

Natural Join (⋈):A comparison operator is not used in a natural join. It does not concatenate
like a Cartesian product. A Natural Join can be performed only if two relations share at least
one common attribute. Furthermore, the attributes must share the same name and domain.

Notation : R ⋈ S

Outer Join: Unlike Inner Join which includes the tuple that satisfies the given condition,
Outer Join also includes some/all the tuples which doesn't satisfies the given condition. It is
also of three types: Left Outer Join, Right Outer Join, and Full Outer Join.

Let's say we have two relations R and S, then

Below is the representation of Left, Right, and Full Outer Joins

Left Outer Join: As we can see from the diagram, Left Outer Join returns the matching
tuples(tuples present in both relations) and the tuples which are only present in Left Relation,
here R.

It is denoted by ⟕.

Right Outer Join : Right Outer Join returns the matching tuples and the tuples which are
only present in Right Relation here S.

It is denoted by ⟖.

Full Outer Join: Full Outer Join returns all the tuples from both relations. However if there
are no matching tuples then, their respective attributes are made NULL in output relation.

It is denoted by ⟗.

Relational Calculus

You would have learned SQL in Databases. Do you know the mathematical foundation of
SQL? The mathematical foundation of SQl is based upon Relational Algebra and Relational
Calculus. The Tuple Relational Calculus and Domain Relational Calculus are the two ways
we can write Relational Calculus Queries

What is Relational Calculus?

Before understanding Relational calculus in DBMS, we need to understand Procedural


Language and Declarative Langauge.

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 33
DBMS CS 503 AIST, SAGAR

 Procedural Language - Those Languages which clearly define how to get the required
results from the Database are called Procedural Language. Relational algebra is a
Procedural Language.

 Declarative Language - Those Language that only cares about What to get from the
database without getting into how to get the results are called Declarative Language.
Relational Calculus is a Declarative Language.

Relational Calculus is of Two Types:

 Tuple Relational Calculus (TRC)

 Domain Relational Calculus (DRC)

Tuple Relational Calculus (TRC):

Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each row of the
table and checks if the predicate is true or false for the given row. Depending on the given
predicate condition, it returns the row or part of the row

{t | P(t)}

{T.name| Student(T) ^ T.age >17}

OR

{T.name| T ∈ Student ^ T[age] >17}

where T is tuple variable name

Domain Relational Calculus (DRC):

Domain Relational Calculus uses domain Variables to get the column values required from
the database based on the predicate expression or condition.

{<x1,x2,x3,x4...> | P(x1,x2,x3,x4...)}

<x1,x2,x3,x4...> are domain variables used to get the column values required, and
P(x1,x2,x3...) is predicate expression or condition.

{<name ,age> | <name ,age> ∈ student ^ age>17}

Conclusion:

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 34
DBMS CS 503 AIST, SAGAR

 Relational Calculus in DBMS tells us what we want from the database and not how to
get that.

 Relational Calculus is a Declarative Language.

 TRC uses tuple variable and checks every Row with the Predicate expression
condition.

 DRC uses domain variables and returns the required attribute or column based on the
condition.

 For any requirement both, TRC and DRC can be written.

 TRC and DRC queries can give more than one tuple or attribute in the result.

**************ENDS OF UNIT 2 ********************

Mr. Vaibhav Jain, Asst. Professor, Computer Science & Engineering. Page 35

You might also like