Database Management Module - 2-2
Database Management Module - 2-2
Module -2
Relational Query
language
● A procedural query language will have set of queries instructing the DBMS to perform
various transactions in the sequence to meet the user request.
● For example, get_CGPA procedure will have various queries to get the marks of
student in each subject, calculate the total marks, and then decide the CGPA based
on his total marks.
● This procedural query language tells the database what is required from the
database and how to get them from the database. Relational algebra is a procedural
query language
Non-procedural languages
1. Select Operation:
Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR
and NOT. These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
Eg:Loan Relation
Output:
2. Project Operation:
● This operation shows the list of those attributes that we wish to appear in the result.
Rest of the attributes are eliminated from the table.
● It is denoted by ∏.
Where
Output:
3. Union Operation:
● Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
● It eliminates the duplicate tuples. It is denoted by ∪.
Notation: R ∪ S
Borrow Relations:
Input: ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
4. Set Intersection:
● Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in both R & S.
● It is denoted by intersection ∩.
Notation: R ∩ S
Output:
5. Set Difference:
● Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R but not in S.
● It is denoted by intersection minus (-).
Notation: R - S
● The Cartesian product is used to combine each row in one table with each row in the other table. It is also known as
a cross product.
● It is denoted by X.
Notation: E X D
Input: EMPLOYEE X DEPARTMENT
Output:
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho
(ρ).
● The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering variable
uses the tuples of a relation.
● The result of the relation can have one or more tuples.
Where
● TRC (tuple relational calculus) can be quantified. In TRC, we can use Existential (∃) and
Universal Quantifiers (∀).
For example:
● The second form of relation is known as Domain relational calculus. In domain relational calculus,
filtering variable uses the domain of attributes.
● Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧
(and), ∨ (or) and ┓ (not).
● It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.
Notation:{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
OUTPUT: This query will yield the article, page, and subject from the relational javatpoint, where
the subject is a database.
Relational Algebra vs Relational Calculus
Relational Algebra Relational Calculus
● It is a Procedural language. ● While Relational Calculus is Declarative
● Relational Algebra means how to obtain the language.
result. ● While Relational Calculus means what result
● In Relational Algebra, The order is specified in we have to obtain.
which the operations have to be performed. ● While in Relational Calculus, The order is not
● Relational Algebra is independent of the specified.
domain. ● While Relation Calculus can be
● Relational Algebra is nearer to a programming domain-dependent.
language. ● While Relational Calculus is not nearer to
● The SQL includes only some features from the programming language.
relational algebra. ● SQL is based to a greater extent on the tuple
● Relational Algebra is one of the languages in relational calculus.
which queries can be expressed but the queries ● For a database language to be relationally
should also be expressed in relational calculus complete., the query written in it must be
to be relationally complete. expressible in relational calculus.
Structured Query Language(SQL)
● SQL is a standard language for storing, manipulating and retrieving data in databases.
● SQL is the standard language for Relational Database System. All the Relational Database Management
Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use SQL as
their standard database language.
Applications of SQL
SQL is one of the most widely used query language over the databases. I'm going to list few of them here:
● Allows users to access data in the relational database management systems.
● Allows users to describe the data.
● Allows users to define the data in a database and manipulate that data.
● Allows to embed within other languages using SQL modules, libraries & pre-compilers.
● Allows users to create and drop databases and tables.
● Allows users to create view, stored procedure, functions in a database.
● Allows users to set permissions on tables, procedures and views.
SQL Commands
● SQL commands are instructions. It is used to communicate with the database. It is also used to perform
specific tasks, functions, and queries of data.
● SQL can perform various tasks like create a table, add data to tables, drop the table, modify the table, set
permission for users.
● DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.
● All the command of DDL are auto-committed that means it permanently save all the changes in the database.
● CREATE
● ALTER
● DROP
● TRUNCATE
a. CREATE
It is used to delete both the structure and record stored in the table.
c. ALTER:
It is used to alter the structure of the database. This change could be either to modify the
characteristics of an existing attribute or probably to add a new attribute.
Syntax:
● INSERT
● UPDATE
● DELETE
a. INSERT:
The INSERT statement is a SQL query. It is used to insert data into the row of a table.
Syntax:
b. UPDATE
This command is used to update or modify the value of a column in the table.
Syntax:
UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITION]
Eg:
UPDATE students
SET User_Name = 'Sonoo'
WHERE Student_Id = '3'
c. DELETE:
It is used to remove one or more row from a table.
Syntax: DELETE FROM table_name [WHERE condition];
Eg:
DELETE FROM javatpoint
WHERE Author="Sonoo";
3. Data Control Language
DCL commands are used to grant and take back authority from any database user.
● Grant
● Revoke
1. Grant: It is used to give user access privileges to a database.
TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.
These operations are automatically committed in the database that's why they cannot be used while creating tables or
dropping them.
● COMMIT
● ROLLBACK
● SAVEPOINT
a. Commit: Commit command is used to save all the transactions to the database.
Syntax: COMMIT;
COMMIT;
b. Rollback: Rollback command is used to undo transactions that have not already
been saved to the database.
Syntax: ROLLBACK;
c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling
back the entire transaction.
Syntax:SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language
● SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to select the attribute
based on the condition described by WHERE clause.
Syntax:
SELECT expressions
FROM TABLES
WHERE conditions;
Example:
SELECT emp_name
FROM employee
WHERE age > 20;
SQL Rules
Rules:
SQL follows the following rules:
● Structure query language is not case sensitive. Generally, keywords of SQL are written
in uppercase.
● Statements of SQL are dependent on text lines. We can use a single SQL statement on
one or multiple text line.
● Using the SQL statements, you can perform most of the actions in a database.
● SQL depends on tuple relational calculus and relational algebra.
SQL process:
There are Three types of binary Datatypes which are given below:
2. Approximate Numeric Datatype :
Let's assume 'variable a' and 'variable b'. Here, 'a' contains 20 and 'b'
contains 10.
2. SQL Comparison Operators:
Let's assume 'variable a' and 'variable b'. Here, 'a' contains 20 and 'b' contains 10.
3. SQL Logical Operators
Open Source Database and Commercial Database
● Key-value databases — Store key and value data in memory for speedy
lookup.
● Document databases — Store document information.
● Wide-column store databases — Similar to key-value with a large
number of columns. They are well suited for analyzing huge data sets.
● Graph databases — Explore the relationships that link data together, allowing
rapid execution of complex queries over millions of connections. Use cases
include recommendations, social networks and fraud detection.
2. Commercial Database
● Commercial database are that which has been created for Commercial Purpose only.
● They are premium and are not free like Open Source Database.
● In Commercial Database it is guaranteed that technical support is provided.
● In this Installation and updates are Administered by software Vendor.
● For examples: Oracle, IBM DB2 etc.
Difference between Open Source Database and Commercial Database :
● Eliminate Data Redundancy: the same piece of data shall not be stored in more than one place. This is
because duplicate data not only waste storage spaces but also easily lead to inconsistencies.
● Ensure Data Integrity and Accuracy: is the maintenance of, and the assurance of the accuracy and
consistency of, data over its entire life-cycle, and is a critical aspect to the design, implementation, and
usage of any system which stores, processes, or retrieves data.
The relational model has provided the basis for:
● Research on the theory of data/relationship/constraint
● Numerous database design methodologies
● The standard database access language called structured query language (SQL)
● Almost all modern commercial database management systems
Relational Database Design Process
● Gather the requirements and define the objective of your database.Drafting out the sample input forms,
queries and reports, often helps.
Step 2: Gather Data, Organize in tables and Specify the Primary Keys
● Once you have decided on the purpose of the database, gather the data that are needed to be stored in the
database. Divide the data into subject-based tables. Choose one column (or a few columns) as the so-called
primary key, which uniquely identify the each of the rows.
Step 3: Create Relationships among Tables
Step 4: Refine & Normalize the Design
● A key in DBMS is an attribute or a set of attributes that help to uniquely identify a tuple (or
row) in a relation (or table). Keys are also used to establish relationships between the
different tables and columns of a relational database. Individual values in a key are called key
values.
● A key is used in the definitions of various kinds of integrity constraints. A table in a database
represents a collection of records or events for a particular relation. Now there can be
thousands and thousands of such records, some of which may be duplicated.
● There should be a way to identify each record separately and uniquely, i.e. no duplicates.
Keys allow us to be free from this hassle.
● A key could either be a combination of more than one attribute (or columns) or just a single
attribute. The main motive of this is to give each record a unique identity.
Types of Keys in DBMS
1. Primary Key
2. Candidate Key
3. Super Key
4. Foreign Key
5. Composite Key
6. Alternate Key
7. Unique Key
1. Primary Key
A primary key is a column of a table or a set of columns that helps to identify every record present
in that table uniquely. There can be only one primary Key in a table. Also, the primary Key cannot
have the same values repeating for any row. Every value of the primary key has to be different with
no repetitions.
The PRIMARY KEY (PK) constraint put on a column or set of columns will not allow them to have
any null values or any duplicates. One table can have only one primary key constraint.
2. Super Key
Super Key is the set of all the keys which help to identify rows in a table uniquely. This means that
all those columns of a table than capable of identifying the other columns of that table uniquely will
Super Key is the superset of a candidate key. The Primary Key of a table is picked from the super
Candidate keys are those attributes that uniquely identify rows of a table. The Primary Key of a
table is selected from one of the candidate keys. So, candidate keys have the same properties as
the primary keys explained above. There can be more than one candidate keys in a table.
4. Alternate Key
As stated above, a table can have multiple choices for a primary key; however, it can choose only
one. So, all the keys which did not become the primary Key are called alternate keys.
5. Foreign Key
Foreign Key is used to establish relationships between two tables. A foreign key will require each value in a
column or set of columns to match the Primary Key of the referential table. Foreign keys help to maintain
data and referential integrity.
6. Composite Key
Composite Key is a set of two or more attributes that help identify each tuple in a table uniquely. The
attributes in the set may not be unique when considered separately. However, when taken all together, they
will ensure uniqueness.
7. Unique Key
Unique Key is a column or set of columns that uniquely identify each record in a table. All values
will have to be unique in this Key. A unique Key differs from a primary key because it can have only
one null value, whereas a primary Key cannot have any null values.
Types of dependencies in DBMS
● Functional Dependency
● Fully-Functional Dependency
● Transitive Dependency
● Multivalued Dependency
● Partial Dependency
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists between the primary key and non-key
attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the Emp_Id, we can tell
that employee name associated with it.
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
Example:
Additionally trivial are Name, Age, and Employee Id. The name is also trivial.
Employee Id is trivial.
2. Non-trivial functional dependency
Example:
ID → Name,
Name → DOB
The trivial functional Dependency in DBMS is opposed by it. Formally speaking,
dependent if not a subset of the determinant in Non-Trivial functional Dependency.
The functional dependencies {Employee Id, Name} -> { Age } are likewise nontrivial.
Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is Functionally Dependent on that attribute and not on any of its proper subset.
For example, an attribute Q is fully functional dependent on another attribute P, if it is Functionally Dependent on P and not on any of the proper subset of P.
Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on the project by the employee.
Multivalued Dependency
When existence of one or more rows in a table implies one or more other rows in the same table, then the Multi-valued
dependencies occur.
If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.
It is represented by double arrow − (->->)
For our example:
P->->Q
Q->->R
In the above case, Multivalued Dependency exists only if Q and R are independent attributes.
Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a candidate key.
The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example −
<StudentProject>
B. Secondary Rules
A. Primary Rules
B. Secondary Rules
Sometimes Functional Dependency Sets are not able to reduce if the set
has following properties,
1. The Right-hand side set of functional dependency holds only one attribute.
3. Reducing any functional dependency may change the content of the set.
A set of functional dependencies with the above three properties are also called as
Canonical or Minimal.
How to find functional dependencies for a relation?
Functional Dependencies in a relation are dependent on the
domain of the relation. Consider the STUDENT relation given
in Table 1.
{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE,
STUD_NO->STUD_STATE, STUD_NO->STUD_COUNTRY, STUD_NO ->
STUD_AGE, STUD_STATE->STUD_COUNTRY }
Attribute Closure: Attribute closure of an attribute set can be defined as set of attributes
which can be functionally determined from it.
1. Lossless Decompositions
2. Dependency Preserving
Lossless Decomposition
● Lossless join decomposition is a decomposition of a relation R into relations R1, R2 such that if we
perform a natural join of relation R1 and R2, it will return the original relation R. This is effective in
removing redundancy from databases while preserving the original data…
● In other words by lossless decomposition, it becomes feasible to reconstruct the relation R from
decomposed tables R1 and R2 by using Joins.
● In Lossless Decomposition, we select the common attribute and the criteria for selecting a common
attribute is that the common attribute must be a candidate key or super key in either relation R1, R2,
or both.
● Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at least one of the
following functional dependencies are in F+ (Closure of functional dependencies)
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:
Hence, the decomposition is Lossless join decomposition
Dependency Preserving
● Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a
third attribute.
● A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it
always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each model every
year.
Here columns COLOR and MANUF_YEAR
are dependent on BIKE_MODEL and
independent of each other.In this case,
these two columns can be called as
multivalued dependent on BIKE_MODEL.
The representation of these
dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
Join Dependency
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps
involved are:
Thus, to make the system understand the user query, it needs to be translated in the form of relational algebra. We
can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using different algorithms.
So, in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated relational algebra
expression with the instructions used for specifying and evaluating each operation. Thus, after translating the user query, the
system executes a query evaluation plan.
● In order to fully evaluate a query, the system needs to construct a query evaluation plan.
● The annotations in the evaluation plan may refer to the algorithms to be used for the particular index or the specific
operations.
● Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation primitives carry the
instructions needed for the evaluation of the operation.
● Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a query. The query
evaluation plan is also referred to as the query execution plan.
● A query execution engine is responsible for generating the output of the given query. It takes the query execution
plan, executes it, and finally makes the output for the user query.
Optimization
● The cost of the query evaluation can vary for different types of queries. Although the system is responsible for
constructing the evaluation plan, the user does need not to write their query efficiently.
● Usually, a database system generates an efficient query evaluation plan, which minimizes its cost. This type of task
performed by the database system and is known as Query Optimization.
● For optimizing a query, the query optimizer should have an estimated cost analysis of each operation. It is because
the overall operation cost depends on the memory allocations to several operations, execution costs, and so on.
Evaluation of Expressions
For evaluating an expression that carries multiple operations in it, we can perform the
computation of each operation one by one. However, in the query processing system, we use
two methods for evaluating an expression carrying multiple operations. These methods are:
1. Materialization
2. Pipelining
Materialization
● A join is an operation that combines the rows of two or more tables based on related columns. This
operation is used for retrieving the data from multiple tables simultaneously using common columns
of tables.
● Join is an operation in DBMS(Database Management System) that combines the row of two or more
tables based on related columns between them. The main purpose of Join is to retrieve the data from
multiple tables in other words Join is used to perform multi-table query. It is denoted by ⨝.
● Types of Join
Inner Join
Outer join
● Inner join
● Inner Join is a join operation in DBMS that combines two or more table based on related columns and
return only rows that have matching values among tables.
Inner join of three types.
● Equi Join
● Natural Join
● Theta join
Equi Join
Natural join is a type of inner join in which we not need of any comparison
operators. In natural join columns should have the same name and domain. There
should be at least one common attribute between two tables.
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta
condition. The join condition is denoted by the symbol θ.
Notation:
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that
the attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Outer Join
Outer join is a type of join that retrieve matching as well as non-maching records
from related tables.
It is also called left join. This type of outer join retrieve all records from left table and
retrieve matching record from right table.
Right Outer Join
It is also called right join. This type of outer join retrieve all records from right table
and retrieve matching record from right table.
Full Outer Join
In full outer join all the rows from both table are inserted in result table
Query Optimization in DBMS
There are two methods of query optimization:The query optimizer uses these two techniques to
determine which process or expression to consider for evaluating the query.
This is based on the cost of the query. The query can use different paths based on indexes, constraints,
sorting methods etc. This method mainly uses the statistics like record size, number of records, number
of records per block, number of blocks, table size, whether whole table fits in a block, organization of
tables, uniqueness of column values, size of columns etc.
This method is also known as rule based optimization. This is based on the equivalence rule on
relational expressions; hence the number of combination of queries get reduces here. Hence the cost of
the query too reduces.
1. Cost based Optimization (Physical)
Suppose, we have series of table joined in a query.
T1 ∞ T2 ∞ T3 ∞ T4∞ T5 ∞ T6
For above query we can have any order of evaluation. We can start taking any two tables in any order and start
evaluating the query. Ideally, we can have join combinations in (2(n-1))! / (n-1)! ways.
For example, suppose we have 5 tables involved in join, then we can have 8! / 4! = 1680 combinations. But
when query optimizer runs, it does not evaluate in all these ways always. It uses Dynamic Programming where
it generates the costs for join orders of any combination of tables.
It is calculated and generated only once. This least cost for all the table combination is then stored in the
database and is used for future use. i.e.; say we have a set of tables, T = { T1 , T2 , T3 .. Tn}, then it generates
least cost combination for all the tables and stores it.
2. Heuristic Optimization (Logical)
This method creates relational tree for the given query based on the equivalence rules. These equivalence
rules by providing an alternative way of writing and evaluating the query, gives the better path to evaluate the
query. This rule need not be true in all cases. It needs to be examined after applying those rules. The most
important set of rules followed in this method is listed below:
● Perform all the selection operation as early as possible in the query. This should be first and
foremost set of actions on the tables in the query. By performing the selection operation, we can
reduce the number of records involved in the query, rather than using the whole tables throughout
the query.
Suppose we have a query to retrieve the students with age 18 and studying in class DESIGN_01. We can get all
the student details from STUDENT table, and class details from CLASS table. We can write this query in two
different ways.
Here both the queries will return same result. But when we observe them closely we can see that first query will join
the two tables first and then applies the filters. That means, it traverses whole table to join, hence the number of
records involved is more. But he second query, applies the filters on each table first. This reduces the number of
records on each table (in class table, the number of record reduces to one in this case). Then it joins these
intermediary tables. Hence the cost in this case is comparatively less.
Instead of writing query the optimizer creates relational algebra and tree for above case.
Perform all the projection as early as possible in the query. This is similar to selection but will reduce the number of
columns in the query.
Query optimization
Query optimization is of great importance for the performance of a relational database, especially for the execution of complex SQL statements. A
query optimizer decides the best methods for implementing each query.
The query optimizer selects, for instance, whether or not to use indexes for a given query, and which join methods to use when joining multiple
tables. These decisions have a tremendous effect on SQL performance, and query optimization is a key technology for every application, from
operational Systems to data warehouse and analytical systems to content management systems.
● Understand how your database is executing your query − The first phase of query optimization is understanding what the database is
performing. Different databases have different commands for this. For example, in MySQL, one can use the “EXPLAIN [SQL Query]”
keyword to see the query plan. In Oracle, one can use the “EXPLAIN PLAN FOR [SQL Query]” to see the query plan.
● Retrieve as little data as possible − The more information restored from the query, the more resources the database is required to
expand to process and save these records. For example, if it can only require to fetch one column from a table, do not use ‘SELECT *’.
● Store intermediate results − Sometimes logic for a query can be quite complex. It is possible to produce the desired outcomes through
the use of subqueries, inline views, and UNION-type statements. For those methods, the transitional results are not saved in the
database but are directly used within the query. This can lead to achievement issues, particularly when the transitional results have a
huge number of rows.
There are various query optimization strategies are as follows −
● Use Index − It can be using an index is the first strategy one should use to speed up a query.
● Aggregate Table − It can be used to pre-populating tables at higher levels so less amount of information is required
to be parsed.
● Vertical Partitioning − It can be used to partition the table by columns. This method reduces the amount of
information a SQL query required to process.
● Horizontal Partitioning − It can be used to partition the table by data value, most often time. This method reduces
the amount of information a SQL query required to process.
● De-normalization − The process of de-normalization combines multiple tables into a single table. This speeds up
query implementation because fewer table joins are required.
● Server Tuning − Each server has its parameters and provides tuning server parameters so that it can completely
take benefit of the hardware resources that can significantly speed up query implementation.