0% found this document useful (0 votes)
36 views

Database Management Module - 2-2

The document discusses relational query languages and relational database design. It describes procedural query languages which specify the sequence of queries to meet a user request, and non-procedural languages which specify what is to be done without specifying how. Relational algebra is a procedural query language that uses operators to perform queries. Relational calculus is a declarative, non-procedural language. Structured Query Language (SQL) is the most widely used relational database language.

Uploaded by

22Sneha JhaIT2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Database Management Module - 2-2

The document discusses relational query languages and relational database design. It describes procedural query languages which specify the sequence of queries to meet a user request, and non-procedural languages which specify what is to be done without specifying how. Relational algebra is a procedural query language that uses operators to perform queries. Relational calculus is a declarative, non-procedural language. Structured Query Language (SQL) is the most widely used relational database language.

Uploaded by

22Sneha JhaIT2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

Database Management

Module -2

1. Relational Query Language


2. Relational Database Design
Relational Query Language
Relational query languages use relational algebra to break the user requests and instruct the DBMS to execute
the requests. It is the language by which user communicates with the database. These relational query
languages can be procedural or non-procedural.

Relational Query
language

Procedural Query Non-Procedural


Language Query Language
Procedural Query Language

● A procedural query language will have set of queries instructing the DBMS to perform
various transactions in the sequence to meet the user request.
● For example, get_CGPA procedure will have various queries to get the marks of
student in each subject, calculate the total marks, and then decide the CGPA based
on his total marks.
● This procedural query language tells the database what is required from the
database and how to get them from the database. Relational algebra is a procedural
query language
Non-procedural languages

● Non-procedural languages are fact-oriented programing languages. The


programs written in non-procedural languages specify what is to be done and
do not state exactly how a result is to be evaluated.
● In the non-procedural programming language, the user would specify what has
to be done but doesn't get into the how it has to be done part. It is known as
an applicative or functional language because it works with the help of
mathematical functions.
● Non-procedural languages have the ability to return any datatype or value. The
program size is also small in case of non-procedural language. The common
examples of non-procedural languages are LISP, SQL, PROLOG, etc.
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the query. It uses
operators to perform queries.

Types of Relational operation


Types of Relational operation

1. Select Operation:

● The select operation selects tuples that satisfy a given predicate.


● It is denoted by sigma (σ).

Notation: σ p(r)

Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR
and NOT. These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
Eg:Loan Relation

Input: σ BRANCH_NAME="perryride" (LOAN)

Output:
2. Project Operation:

● This operation shows the list of those attributes that we wish to appear in the result.
Rest of the attributes are eliminated from the table.
● It is denoted by ∏.

Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.


Input: ∏ NAME, CITY (CUSTOMER)

Output:
3. Union Operation:
● Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
● It eliminates the duplicate tuples. It is denoted by ∪.

Notation: R ∪ S

A union operation must hold the following condition:

● R and S must have the attribute of the same number.


● Duplicate tuples are eliminated automatically.
Depositor relation:

Borrow Relations:
Input: ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:
4. Set Intersection:

● Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in both R & S.
● It is denoted by intersection ∩.

Notation: R ∩ S

Eg: Let us consider the previous tables.

Input: ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:
5. Set Difference:
● Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R but not in S.
● It is denoted by intersection minus (-).

Notation: R - S

Using the above DEPOSITOR table and BORROW table


Input:∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
6. Cartesian product

● The Cartesian product is used to combine each row in one table with each row in the other table. It is also known as
a cross product.
● It is denoted by X.
Notation: E X D
Input: EMPLOYEE X DEPARTMENT

Output:
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho
(ρ).

Example: We can use the rename operator to rename STUDENT relation to


STUDENT1.

Input: ρ(STUDENT1, STUDENT)


Relational Calculus
● Relational calculus is a non-procedural query language. In the non-procedural query language, the user is concerned
with the details of how to obtain the end results.
● The relational calculus tells what to do but never explains how to do.

Types of Relational calculus:


1. Tuple Relational Calculus (TRC)

● The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering variable
uses the tuples of a relation.
● The result of the relation can have one or more tuples.

Notation: {T | P (T)} or {T | Condition (T)}

Where

T is the resulting tuples

P(T) is the condition used to fetch T


For example:

Input: { T.name | Author(T) AND T.article = 'database' }


OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name'
from Author who has written an article on 'database'.

● TRC (tuple relational calculus) can be quantified. In TRC, we can use Existential (∃) and
Universal Quantifiers (∀).

For example:

Input: { R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}


OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name'
from Author who has written an article on 'database'.
2. Domain Relational Calculus (DRC)

● The second form of relation is known as Domain relational calculus. In domain relational calculus,
filtering variable uses the domain of attributes.
● Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧
(and), ∨ (or) and ┓ (not).
● It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.

Notation:{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where

a1, a2 are attributes

P stands for formula built by inner attributes


For example:

INPUT: {< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}

OUTPUT: This query will yield the article, page, and subject from the relational javatpoint, where
the subject is a database.
Relational Algebra vs Relational Calculus
Relational Algebra Relational Calculus
● It is a Procedural language. ● While Relational Calculus is Declarative
● Relational Algebra means how to obtain the language.
result. ● While Relational Calculus means what result
● In Relational Algebra, The order is specified in we have to obtain.
which the operations have to be performed. ● While in Relational Calculus, The order is not
● Relational Algebra is independent of the specified.
domain. ● While Relation Calculus can be
● Relational Algebra is nearer to a programming domain-dependent.
language. ● While Relational Calculus is not nearer to
● The SQL includes only some features from the programming language.
relational algebra. ● SQL is based to a greater extent on the tuple
● Relational Algebra is one of the languages in relational calculus.
which queries can be expressed but the queries ● For a database language to be relationally
should also be expressed in relational calculus complete., the query written in it must be
to be relationally complete. expressible in relational calculus.
Structured Query Language(SQL)

● SQL is a standard language for storing, manipulating and retrieving data in databases.
● SQL is the standard language for Relational Database System. All the Relational Database Management
Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use SQL as
their standard database language.

Applications of SQL
SQL is one of the most widely used query language over the databases. I'm going to list few of them here:
● Allows users to access data in the relational database management systems.
● Allows users to describe the data.
● Allows users to define the data in a database and manipulate that data.
● Allows to embed within other languages using SQL modules, libraries & pre-compilers.
● Allows users to create and drop databases and tables.
● Allows users to create view, stored procedure, functions in a database.
● Allows users to set permissions on tables, procedures and views.
SQL Commands
● SQL commands are instructions. It is used to communicate with the database. It is also used to perform
specific tasks, functions, and queries of data.
● SQL can perform various tasks like create a table, add data to tables, drop the table, modify the table, set
permission for users.

Types of SQL Commands


There are five types of SQL commands:
● DDL
● DML
● DCL
● TCL
● DQL
1. Data Definition Language (DDL)

● DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.
● All the command of DDL are auto-committed that means it permanently save all the changes in the database.

Here are some commands that come under DDL:

● CREATE
● ALTER
● DROP
● TRUNCATE

a. CREATE

It is used to create a new table in the database.

Syntax: CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);

Eg: CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);


b. DROP:

It is used to delete both the structure and record stored in the table.

Syntax: DROP TABLE table_name;

Eg: DROP TABLE EMPLOYEE;

c. ALTER:

It is used to alter the structure of the database. This change could be either to modify the
characteristics of an existing attribute or probably to add a new attribute.

Syntax:

1. To add a new column in the table

ALTER TABLE table_name ADD column_name COLUMN-definition;


2. To modify existing column in the table:

ALTER TABLE table_name MODIFY(column_definitions....);


Eg:
ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));
ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));
d. TRUNCATE:
It is used to delete all the rows from the table and free the space containing the table.
Syntax:TRUNCATE TABLE table_name;
Eg: TRUNCATE TABLE EMPLOYEE;
2. Data Manipulation Language
● DML commands are used to modify the database. It is responsible for all form of changes in the database.
● The command of DML is not auto-committed that means it can't permanently save all the changes in the database. They can
be rollback.

Here are some commands that come under DML:

● INSERT
● UPDATE
● DELETE

a. INSERT:

The INSERT statement is a SQL query. It is used to insert data into the row of a table.

Syntax:

INSERT INTO TABLE_NAME

(col1, col2, col3,.... col N)


VALUES (value1, value2, value3, .... valueN);
Or
INSERT INTO TABLE_NAME
VALUES (value1, value2, value3, .... valueN);
Eg: INSERT INTO Book (Author, Subject) VALUES ("Sonoo", "DBMS");

b. UPDATE
This command is used to update or modify the value of a column in the table.
Syntax:
UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITION]

Eg:
UPDATE students
SET User_Name = 'Sonoo'
WHERE Student_Id = '3'
c. DELETE:
It is used to remove one or more row from a table.
Syntax: DELETE FROM table_name [WHERE condition];
Eg:
DELETE FROM javatpoint
WHERE Author="Sonoo";
3. Data Control Language

DCL commands are used to grant and take back authority from any database user.

Here are some commands that come under DCL:

● Grant
● Revoke
1. Grant: It is used to give user access privileges to a database.

Example: GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;

2. Revoke: It is used to take back permissions from the user.

Example: REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;


4. Transaction Control Language

TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.

These operations are automatically committed in the database that's why they cannot be used while creating tables or
dropping them.

Here are some commands that come under TCL:

● COMMIT
● ROLLBACK
● SAVEPOINT
a. Commit: Commit command is used to save all the transactions to the database.

Syntax: COMMIT;

Example: DELETE FROM CUSTOMERS

WHERE AGE = 25;

COMMIT;
b. Rollback: Rollback command is used to undo transactions that have not already
been saved to the database.

Syntax: ROLLBACK;

Example:DELETE FROM CUSTOMERS


WHERE AGE = 25;
ROLLBACK;

c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling
back the entire transaction.

Syntax:SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language

DQL is used to fetch the data from the database.


It uses only one command:

● SELECT

a. SELECT: This is the same as the projection operation of relational algebra. It is used to select the attribute
based on the condition described by WHERE clause.
Syntax:
SELECT expressions
FROM TABLES
WHERE conditions;
Example:
SELECT emp_name
FROM employee
WHERE age > 20;
SQL Rules

Rules:
SQL follows the following rules:

● Structure query language is not case sensitive. Generally, keywords of SQL are written
in uppercase.
● Statements of SQL are dependent on text lines. We can use a single SQL statement on
one or multiple text line.
● Using the SQL statements, you can perform most of the actions in a database.
● SQL depends on tuple relational calculus and relational algebra.
SQL process:

● When an SQL command is executing for any


RDBMS, then the system figure out the best
way to carry out the request and the SQL
engine determines that how to interpret the
task.
● In the process, various components are
included. These components can be
optimization Engine, Query engine, Query
dispatcher, classic, etc.
● All the non-SQL queries are handled by the
classic query engine, but SQL query engine
won't handle logical files.
SQL Datatype
● SQL Datatype is used to define the values that a column can contain.
● Every column is required to have a name and data type in the database table.
1. Binary DataTypes

There are Three types of binary Datatypes which are given below:
2. Approximate Numeric Datatype :

3. Exact Numeric Datatype


4. Character String Datatype

5. Date and time Datatypes


SQL Operator

There are various types of SQL operator:

1. SQL Arithmetic Operators

Let's assume 'variable a' and 'variable b'. Here, 'a' contains 20 and 'b'
contains 10.
2. SQL Comparison Operators:
Let's assume 'variable a' and 'variable b'. Here, 'a' contains 20 and 'b' contains 10.
3. SQL Logical Operators
Open Source Database and Commercial Database

1. Open Source Database:


○ An open source database is a database that anyone can easily view the source code
and this is open and free to download.
○ An open source database allows users to create a system based on their unique
requirements and business needs. It is free and can also be shared. The source code
can be modified to match any user preference.
○ Open source databases address the need to analyze data from a growing number
of new applications at lower cost.
○ Also for community version some small additional and affordable cost are imposed.
○ Open Source Database provide Limited technical support to end users. Here
Installation and updates are administered by user.
○ For examples: MYSQL, PostgreSQL, MongoDB etc.
The most common open source databases include:

● Key-value databases — Store key and value data in memory for speedy
lookup.
● Document databases — Store document information.
● Wide-column store databases — Similar to key-value with a large
number of columns. They are well suited for analyzing huge data sets.
● Graph databases — Explore the relationships that link data together, allowing
rapid execution of complex queries over millions of connections. Use cases
include recommendations, social networks and fraud detection.
2. Commercial Database
● Commercial database are that which has been created for Commercial Purpose only.
● They are premium and are not free like Open Source Database.
● In Commercial Database it is guaranteed that technical support is provided.
● In this Installation and updates are Administered by software Vendor.
● For examples: Oracle, IBM DB2 etc.
Difference between Open Source Database and Commercial Database :

Open Source Database Commercial Database

● In open source Database anyone ● Commercial Database are that which


has been created for Commercial
can easily view Source code of it. purpose only.
● They are free or have additional and ● They are premium and are not free
affordable cost. like open source database.
● It provide limited technical support. ● It provide guaranteed technical
● In this software is available under support.
free licensing. ● In this Software is available under high
licensing cost.
● In this User’s needs to rely on ● In this user’s get dedicated support
Community Support. from Vendor’s from where one’s buy.
● In this Installation and Updates are ● In this Installation and updates are
administered by user. administered by Software Vendor.
Database Design Objective

● Eliminate Data Redundancy: the same piece of data shall not be stored in more than one place. This is
because duplicate data not only waste storage spaces but also easily lead to inconsistencies.
● Ensure Data Integrity and Accuracy: is the maintenance of, and the assurance of the accuracy and
consistency of, data over its entire life-cycle, and is a critical aspect to the design, implementation, and
usage of any system which stores, processes, or retrieves data.
The relational model has provided the basis for:
● Research on the theory of data/relationship/constraint
● Numerous database design methodologies
● The standard database access language called structured query language (SQL)
● Almost all modern commercial database management systems
Relational Database Design Process

Step 1: Define the Purpose of the Database (Requirement Analysis)

● Gather the requirements and define the objective of your database.Drafting out the sample input forms,
queries and reports, often helps.
Step 2: Gather Data, Organize in tables and Specify the Primary Keys

● Once you have decided on the purpose of the database, gather the data that are needed to be stored in the
database. Divide the data into subject-based tables. Choose one column (or a few columns) as the so-called
primary key, which uniquely identify the each of the rows.
Step 3: Create Relationships among Tables
Step 4: Refine & Normalize the Design

● adding more columns,


● create a new table for optional data using one-to-one relationship,
● split a large table into two smaller tables,
Keys in DBMS

● A key in DBMS is an attribute or a set of attributes that help to uniquely identify a tuple (or
row) in a relation (or table). Keys are also used to establish relationships between the
different tables and columns of a relational database. Individual values in a key are called key
values.
● A key is used in the definitions of various kinds of integrity constraints. A table in a database
represents a collection of records or events for a particular relation. Now there can be
thousands and thousands of such records, some of which may be duplicated.
● There should be a way to identify each record separately and uniquely, i.e. no duplicates.
Keys allow us to be free from this hassle.
● A key could either be a combination of more than one attribute (or columns) or just a single
attribute. The main motive of this is to give each record a unique identity.
Types of Keys in DBMS

There are broadly seven types of keys in DBMS:

1. Primary Key
2. Candidate Key
3. Super Key
4. Foreign Key
5. Composite Key
6. Alternate Key
7. Unique Key
1. Primary Key

A primary key is a column of a table or a set of columns that helps to identify every record present
in that table uniquely. There can be only one primary Key in a table. Also, the primary Key cannot
have the same values repeating for any row. Every value of the primary key has to be different with
no repetitions.

The PRIMARY KEY (PK) constraint put on a column or set of columns will not allow them to have
any null values or any duplicates. One table can have only one primary key constraint.
2. Super Key

Super Key is the set of all the keys which help to identify rows in a table uniquely. This means that

all those columns of a table than capable of identifying the other columns of that table uniquely will

all be considered super keys.

Super Key is the superset of a candidate key. The Primary Key of a table is picked from the super

key set to be made the table’s identity attribute.


3. Candidate Key

Candidate keys are those attributes that uniquely identify rows of a table. The Primary Key of a
table is selected from one of the candidate keys. So, candidate keys have the same properties as
the primary keys explained above. There can be more than one candidate keys in a table.

4. Alternate Key

As stated above, a table can have multiple choices for a primary key; however, it can choose only

one. So, all the keys which did not become the primary Key are called alternate keys.
5. Foreign Key
Foreign Key is used to establish relationships between two tables. A foreign key will require each value in a
column or set of columns to match the Primary Key of the referential table. Foreign keys help to maintain
data and referential integrity.

6. Composite Key
Composite Key is a set of two or more attributes that help identify each tuple in a table uniquely. The
attributes in the set may not be unique when considered separately. However, when taken all together, they
will ensure uniqueness.
7. Unique Key

Unique Key is a column or set of columns that uniquely identify each record in a table. All values

will have to be unique in this Key. A unique Key differs from a primary key because it can have only

one null value, whereas a primary Key cannot have any null values.
Types of dependencies in DBMS

Dependencies in DBMS is a relation between two or more attributes. It has the


following types in DBMS −

● Functional Dependency
● Fully-Functional Dependency
● Transitive Dependency
● Multivalued Dependency
● Partial Dependency
Functional Dependency

The functional dependency is a relationship that exists between two attributes. It typically exists between the primary key and non-key
attribute within a table.

X → Y

The left side of FD is known as a determinant, the right side of the production is known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the Emp_Id, we can tell
that employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency

1. Trivial functional dependency

● A → B has trivial functional dependency if B is a subset of A.


● The following dependencies are also trivial like: A → A, B → B

Example:

Consider a table with two columns Employee_Id and Employee_Name.


{Employee_id, Employee_Name} → Employee_Id is a trivial functional
dependency as Employee_Id is a subset of {Employee_Id,
Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name →
Employee_Name are trivial dependencies too.
Every dependent in trivial functional Dependency is a subset of the determinant. To
put it another way, a functional relationship is said to be simple if its right-side
characteristics are a subset of its left-side attributes. Right-side characteristics are a
subset of its left-side attributes.

If Y is a subset of X, the functional relationship X->Y is referred to as trivial.


Example
Given that the dependent Name is a subset of the determinant Employee Id, Name,
the functional Dependency between

{Employee Id, Name} and {Name} in this case is trivial.

Additionally trivial are Name, Age, and Employee Id. The name is also trivial.
Employee Id is trivial.
2. Non-trivial functional dependency

● A → B has a non-trivial functional dependency if B is not a subset of A.


● When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

ID → Name,

Name → DOB
The trivial functional Dependency in DBMS is opposed by it. Formally speaking,
dependent if not a subset of the determinant in Non-Trivial functional Dependency.

If Y is not a subset of X, the relationship between X and Y is said to be non-trivial


functional. A functional dependency X Y that is not trivial is one in which X is a
collection of attributes, and Y is likewise a set of those attributes but not a subset of
X.
Because Name(dependent) is not a subset of Employee Id, there is a nontrivial
functional dependency between Employee Id and Name in this situation.

The functional dependencies {Employee Id, Name} -> { Age } are likewise nontrivial.
Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is Functionally Dependent on that attribute and not on any of its proper subset.
For example, an attribute Q is fully functional dependent on another attribute P, if it is Functionally Dependent on P and not on any of the proper subset of P.

The above relations states:

EmpID, ProjectID, ProjectCost -> Days

However, it is not fully functional dependent.

Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on the project by the employee.

This summarizes and gives our fully functional dependency −


Transitive Dependency
When an indirect relationship causes functional dependency it is called Transitive Dependency.
If P -> Q and Q -> R is true, then P-> R is a transitive dependency.

Multivalued Dependency
When existence of one or more rows in a table implies one or more other rows in the same table, then the Multi-valued
dependencies occur.
If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.
It is represented by double arrow − (->->)
For our example:
P->->Q
Q->->R
In the above case, Multivalued Dependency exists only if Q and R are independent attributes.
Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a candidate key.
The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example −
<StudentProject>

In the above table, we have partial dependency; let us see how −


The prime key attributes are StudentID and ProjectNo.
As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally dependent on part of a
candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID that makes the relation Partial Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial Dependent.
Armstrong’s Axioms in Functional Dependency in DBMS

● Armstrong's Axioms is a set of rules.

● It provides a simple technique for reasoning about functional dependencies.

● It was developed by William W. Armstrong in 1974.

● It is used to infer all the functional dependencies on a relational database.

Various Axioms Rules


A. Primary Rules

B. Secondary Rules
A. Primary Rules
B. Secondary Rules
Sometimes Functional Dependency Sets are not able to reduce if the set
has following properties,

1. The Right-hand side set of functional dependency holds only one attribute.

2. The Left-hand side set of functional dependency cannot be reduced, it changes


the entire content of the set.

3. Reducing any functional dependency may change the content of the set.

A set of functional dependencies with the above three properties are also called as
Canonical or Minimal.
How to find functional dependencies for a relation?
Functional Dependencies in a relation are dependent on the
domain of the relation. Consider the STUDENT relation given
in Table 1.

We know that STUD_NO is unique for each student. So


STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE,
STUD_NO->STUD_STATE, STUD_NO->STUD_COUNTRY and
STUD_NO -> STUD_AGE all will be true.

Similarly, STUD_STATE->STUD_COUNTRY will be true as if two


records have same STUD_STATE, they will have same
STUD_COUNTRY as well.

For relation STUDENT_COURSE,


COURSE_NO->COURSE_NAME will be true as two records
with same COURSE_NO will have same COURSE_NAME.
Functional Dependency Set: Functional Dependency set or FD set of a relation is
the set of all FDs present in the relation. For Example, FD set for relation STUDENT
shown in table 1 is:

{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE,
STUD_NO->STUD_STATE, STUD_NO->STUD_COUNTRY, STUD_NO ->
STUD_AGE, STUD_STATE->STUD_COUNTRY }
Attribute Closure: Attribute closure of an attribute set can be defined as set of attributes
which can be functionally determined from it.

How to find attribute closure of an attribute set?


To find attribute closure of an attribute set:
Add elements of attribute set to the result set.
Recursively add elements to the result set which can be functionally determined from the
elements of the result set.
Using FD set of table 1, attribute closure can be determined as:
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
Relational Decomposition
● When a relation in the relational model is not in appropriate normal form then
the decomposition of a relation is required.
● In a database, it breaks the table into multiple tables.
● If the relation has no proper decomposition, then it may lead to problems like
loss of information.
● Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.

Types of Relation decomposition:

1. Lossless Decompositions
2. Dependency Preserving
Lossless Decomposition

● Lossless join decomposition is a decomposition of a relation R into relations R1, R2 such that if we
perform a natural join of relation R1 and R2, it will return the original relation R. This is effective in
removing redundancy from databases while preserving the original data…
● In other words by lossless decomposition, it becomes feasible to reconstruct the relation R from
decomposed tables R1 and R2 by using Joins.
● In Lossless Decomposition, we select the common attribute and the criteria for selecting a common
attribute is that the common attribute must be a candidate key or super key in either relation R1, R2,
or both.
● Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at least one of the
following functional dependencies are in F+ (Closure of functional dependencies)
EMPLOYEE_DEPARTMENT table:

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:
Hence, the decomposition is Lossless join decomposition
Dependency Preserving

● It is an important constraint of the database.


● In the dependency preservation, at least one decomposed table must satisfy every
dependency.
● If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
● For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
Multivalued Dependency

● Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a
third attribute.
● A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it
always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each model every
year.
Here columns COLOR and MANUF_YEAR
are dependent on BIKE_MODEL and
independent of each other.In this case,
these two columns can be called as
multivalued dependent on BIKE_MODEL.
The representation of these
dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR

BIKE_MODEL → → COLOR
Join Dependency

● Join decomposition is a further generalization of Multivalued dependencies.


● If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD)
exists.
● Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).
● Alternatively, R1 and R2 are a lossless decomposition of R.
● A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join
decomposition.
● The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.
● Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R
Inclusion Dependency
● Multivalued dependency and join dependency can be used to guide database design although they both
are less common than functional dependencies.
● Inclusion dependencies are quite common. They typically show little influence on designing of the
database.
● The inclusion dependency is a statement in which some columns of a relation are contained in other
columns.
● The example of inclusion dependency is a foreign key. In one relation, the referring relation is contained in
the primary key column(s) of the referenced relation.
● Suppose we have two relations R and S which was obtained by translating two entity sets such that every R
entity is also an S entity.
● Inclusion dependency would be happen if projecting R on its key attributes yields a relation that is
contained in the relation obtained by projecting S on its key attributes.
● In inclusion dependency, we should not split groups of attributes that participate in an inclusion
dependency.
● In practice, most inclusion dependencies are key-based that is involved only keys.
Query Processing in DBMS

Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps
involved are:

1. Parsing and translation


2. Optimization
3. Evaluation
● As query processing includes certain activities for data retrieval.
● Initially, the given user queries get translated in high-level
Parsing and Translation database languages such as SQL. It gets translated into
expressions that can be further used at the physical level of the
file system.
● After this, the actual evaluation of the queries and a variety of
query -optimizing transformations and takes place. Thus before
processing a query, a computer system needs to translate the
query into a human-readable and understandable language.
● Consequently, SQL or Structured Query Language is the best
suitable choice for humans. But, it is not perfectly suitable for
the internal representation of the query to the system.
● Relational algebra is well suited for the internal representation
of a query. The translation process in query processing is similar
to the parser of a query.
● When a user executes any query, for generating the internal
form of the query, the parser in the system checks the syntax of
the query, verifies the name of the relation in the database, the
tuple, and finally the required attribute value.
● The parser creates a tree of the query, known as 'parse-tree.'
Further, translate it into the form of relational algebra. With this,
it evenly replaces all the use of the views when used in the
query.
Suppose a user executes a query. As we have learned that there are various methods of extracting the data from
the database. In SQL, a user wants to fetch the records of the employees whose salary is greater than or equal to
10000. For doing this, the following query is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the form of relational algebra. We
can bring this query in the relational algebra form as:

● σsalary>10000 (πsalary (Employee))


● πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using different algorithms.
So, in this way, a query processing begins its working.
Evaluation

For this, with addition to the relational algebra translation, it is required to annotate the translated relational algebra
expression with the instructions used for specifying and evaluating each operation. Thus, after translating the user query, the
system executes a query evaluation plan.

Query Evaluation Plan

● In order to fully evaluate a query, the system needs to construct a query evaluation plan.
● The annotations in the evaluation plan may refer to the algorithms to be used for the particular index or the specific
operations.
● Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation primitives carry the
instructions needed for the evaluation of the operation.
● Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a query. The query
evaluation plan is also referred to as the query execution plan.
● A query execution engine is responsible for generating the output of the given query. It takes the query execution
plan, executes it, and finally makes the output for the user query.
Optimization
● The cost of the query evaluation can vary for different types of queries. Although the system is responsible for
constructing the evaluation plan, the user does need not to write their query efficiently.
● Usually, a database system generates an efficient query evaluation plan, which minimizes its cost. This type of task
performed by the database system and is known as Query Optimization.
● For optimizing a query, the query optimizer should have an estimated cost analysis of each operation. It is because
the overall operation cost depends on the memory allocations to several operations, execution costs, and so on.
Evaluation of Expressions

For evaluating an expression that carries multiple operations in it, we can perform the
computation of each operation one by one. However, in the query processing system, we use
two methods for evaluating an expression carrying multiple operations. These methods are:

1. Materialization
2. Pipelining
Materialization

● In this method, the given expression evaluates one relational operation at a


time. Also, each operation is evaluated in an appropriate sequence or order.
● After evaluating all the operations, the outputs are materialized in a temporary
relation for their subsequent uses. It leads the materialization method to a
disadvantage.
● The disadvantage is that it needs to construct those temporary relations for
materializing the results of the evaluated operations, respectively.
● These temporary relations are written on the disks unless they are small in size.
● Evaluate one operation at a time. Evaluate the expression in a bottom-up
manner and stores intermediate results to temporary files.
Store the result of A ⋈ B in a temporary file.
Store the result of C ⋈ D in a temporary file.
Finally, join the results stored in temporary files.
● The overall cost=sum of costs of individual operations + cost of writing
intermediate results to disk, cost of writing results to results to temporary files
and reading them back is quite high.
Pipelining
● Pipelining is an alternate method or approach to the materialization method. In
pipelining, it enables us to evaluate each relational operation of the expression
simultaneously in a pipeline.
● In this approach, after evaluating one operation, its output is passed on to the
next operation, and the chain continues till all the relational operations are
evaluated thoroughly. Thus, there is no requirement of storing a temporary
relation in pipelining.
● Such an advantage of pipelining makes it a better approach as compared to the
approach used in the materialization method. Even the costs of both
approaches can have subsequent differences in-between. But, both approaches
perform the best role in different cases.
● Thus, both ways are feasible at their place.
● Evaluate several operations simultaneously. Result of one operation is passed
to the next operation. Evaluate the expression in a bottom-up manner and don’t
store intermediate results to temporary files.
● Don’t store the result of A ⋈ B in a temporary file. Instead the result is passed
directly for projection with C and so on.
Query Equivalence
● Any two relational expressions are said to be equivalent, if both the expression generate same set of records. When
two expressions are equivalent we can use them interchangeably. i.e.; we can use either of the expression
whichever gives better performance.
● The equivalence rule says that expressions of two forms are the same or equivalent because both expressions
produce the same outputs on any legal database instance. It means that we can possibly replace the expression of
the first form with that of the second form and replace the expression of the second form with an expression of the
first form. Thus, the optimizer of the query-evaluation plan uses such an equivalence rule or method for transforming
expressions into the logically equivalent one.
● The optimizer uses various equivalence rules on relational-algebra expressions for transforming the relational
expressions. For describing each rule, we will use the following symbols:

θ, θ1, θ2 … : Used for denoting the predicates.


L1, L2, L3 … : Used for denoting the list of attributes.
E, E1, E2 …. : Represents the relational-algebra expressions.
Let's discuss a number of equivalence rules:
Joins in DBMS

● A join is an operation that combines the rows of two or more tables based on related columns. This
operation is used for retrieving the data from multiple tables simultaneously using common columns
of tables.
● Join is an operation in DBMS(Database Management System) that combines the row of two or more
tables based on related columns between them. The main purpose of Join is to retrieve the data from
multiple tables in other words Join is used to perform multi-table query. It is denoted by ⨝.
● Types of Join

Inner Join

Outer join

● Inner join
● Inner Join is a join operation in DBMS that combines two or more table based on related columns and
return only rows that have matching values among tables.
Inner join of three types.

● Equi Join
● Natural Join
● Theta join

Equi Join

● Equi Join is a type of Inner join in which we use equivalence(‘=’) condition in


join condition
Natural Join

Natural join is a type of inner join in which we not need of any comparison
operators. In natural join columns should have the same name and domain. There
should be at least one common attribute between two tables.
Theta (θ) Join

Theta join combines tuples from different relations provided they satisfy the theta
condition. The join condition is denoted by the symbol θ.

Notation:

R1 ⋈θ R2

R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that
the attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Outer Join

Outer join is a type of join that retrieve matching as well as non-maching records
from related tables.

There three types of outer join

● Left outer join


● Right outer join
● Full outer join
Left Outer Join

It is also called left join. This type of outer join retrieve all records from left table and
retrieve matching record from right table.
Right Outer Join

It is also called right join. This type of outer join retrieve all records from right table
and retrieve matching record from right table.
Full Outer Join

In full outer join all the rows from both table are inserted in result table
Query Optimization in DBMS
There are two methods of query optimization:The query optimizer uses these two techniques to
determine which process or expression to consider for evaluating the query.

1. Cost based Optimization (Physical)

This is based on the cost of the query. The query can use different paths based on indexes, constraints,
sorting methods etc. This method mainly uses the statistics like record size, number of records, number
of records per block, number of blocks, table size, whether whole table fits in a block, organization of
tables, uniqueness of column values, size of columns etc.

2. Heuristic Optimization (Logical)

This method is also known as rule based optimization. This is based on the equivalence rule on
relational expressions; hence the number of combination of queries get reduces here. Hence the cost of
the query too reduces.
1. Cost based Optimization (Physical)
Suppose, we have series of table joined in a query.

T1 ∞ T2 ∞ T3 ∞ T4∞ T5 ∞ T6

For above query we can have any order of evaluation. We can start taking any two tables in any order and start
evaluating the query. Ideally, we can have join combinations in (2(n-1))! / (n-1)! ways.

For example, suppose we have 5 tables involved in join, then we can have 8! / 4! = 1680 combinations. But
when query optimizer runs, it does not evaluate in all these ways always. It uses Dynamic Programming where
it generates the costs for join orders of any combination of tables.

It is calculated and generated only once. This least cost for all the table combination is then stored in the
database and is used for future use. i.e.; say we have a set of tables, T = { T1 , T2 , T3 .. Tn}, then it generates
least cost combination for all the tables and stores it.
2. Heuristic Optimization (Logical)
This method creates relational tree for the given query based on the equivalence rules. These equivalence
rules by providing an alternative way of writing and evaluating the query, gives the better path to evaluate the
query. This rule need not be true in all cases. It needs to be examined after applying those rules. The most
important set of rules followed in this method is listed below:

● Perform all the selection operation as early as possible in the query. This should be first and
foremost set of actions on the tables in the query. By performing the selection operation, we can
reduce the number of records involved in the query, rather than using the whole tables throughout
the query.

Suppose we have a query to retrieve the students with age 18 and studying in class DESIGN_01. We can get all
the student details from STUDENT table, and class details from CLASS table. We can write this query in two
different ways.
Here both the queries will return same result. But when we observe them closely we can see that first query will join
the two tables first and then applies the filters. That means, it traverses whole table to join, hence the number of
records involved is more. But he second query, applies the filters on each table first. This reduces the number of
records on each table (in class table, the number of record reduces to one in this case). Then it joins these
intermediary tables. Hence the cost in this case is comparatively less.

Instead of writing query the optimizer creates relational algebra and tree for above case.

Perform all the projection as early as possible in the query. This is similar to selection but will reduce the number of
columns in the query.
Query optimization

Query optimization is of great importance for the performance of a relational database, especially for the execution of complex SQL statements. A
query optimizer decides the best methods for implementing each query.

The query optimizer selects, for instance, whether or not to use indexes for a given query, and which join methods to use when joining multiple
tables. These decisions have a tremendous effect on SQL performance, and query optimization is a key technology for every application, from
operational Systems to data warehouse and analytical systems to content management systems.

There is the various principle of Query Optimization are as follows −

● Understand how your database is executing your query − The first phase of query optimization is understanding what the database is
performing. Different databases have different commands for this. For example, in MySQL, one can use the “EXPLAIN [SQL Query]”
keyword to see the query plan. In Oracle, one can use the “EXPLAIN PLAN FOR [SQL Query]” to see the query plan.
● Retrieve as little data as possible − The more information restored from the query, the more resources the database is required to
expand to process and save these records. For example, if it can only require to fetch one column from a table, do not use ‘SELECT *’.
● Store intermediate results − Sometimes logic for a query can be quite complex. It is possible to produce the desired outcomes through
the use of subqueries, inline views, and UNION-type statements. For those methods, the transitional results are not saved in the
database but are directly used within the query. This can lead to achievement issues, particularly when the transitional results have a
huge number of rows.
There are various query optimization strategies are as follows −

● Use Index − It can be using an index is the first strategy one should use to speed up a query.
● Aggregate Table − It can be used to pre-populating tables at higher levels so less amount of information is required
to be parsed.
● Vertical Partitioning − It can be used to partition the table by columns. This method reduces the amount of
information a SQL query required to process.
● Horizontal Partitioning − It can be used to partition the table by data value, most often time. This method reduces
the amount of information a SQL query required to process.
● De-normalization − The process of de-normalization combines multiple tables into a single table. This speeds up
query implementation because fewer table joins are required.
● Server Tuning − Each server has its parameters and provides tuning server parameters so that it can completely
take benefit of the hardware resources that can significantly speed up query implementation.

You might also like