0% found this document useful (0 votes)
69 views42 pages

DBMS Unit-4 Notes

The document provides an overview of database systems, focusing on relational algebra and relational calculus, highlighting their differences and applications. It discusses functional dependencies, types of dependencies, and the importance of normalization in minimizing redundancy and ensuring data integrity. Additionally, it outlines various normal forms and their significance in database design, along with the advantages and disadvantages of normalization.

Uploaded by

mayur474645
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views42 pages

DBMS Unit-4 Notes

The document provides an overview of database systems, focusing on relational algebra and relational calculus, highlighting their differences and applications. It discusses functional dependencies, types of dependencies, and the importance of normalization in minimizing redundancy and ensuring data integrity. Additionally, it outlines various normal forms and their significance in database design, along with the advantages and disadvantages of normalization.

Uploaded by

mayur474645
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Unit-4

Introduction to Database System

Relational Algebra
Relational Algebra is a procedural language. In Relational
Algebra, order is specified in which operations have to
be performed. Basic operation in relational algebra are:
1. Select (σ)
2. Project (Π)
3. Union (U)
4. Set Difference (-)
5. Cartesian product (X)

Relational Calculus
Relational Calculus is the formal query language. It is
also known as Declarative language. In Relational
Calculus, order is not specified in which operation has to
be performed. Relational Calculus means what result we
have to obtain. Relational Calculus has two variations:
1. Tuple Relational Calculus (TRC)
2. Domain Relational Calculus (DRC)
Relational Calculus is denoted as:
{ t | P(t) } Where,
t: the set of tuples
p: is condition which is true for the given set of tuples.
Difference between Relational Algebra and Relational Calculus:

Basis of
S.NO Comparison Relational Algebra Relational Calculus

Relational Calculus is a
Language Type It is a Procedural language. Declarative (non-procedural)
1. language.

Relational Algebra means Relational Calculus means what


Procedure
2. how to obtain the result. result we have to obtain.

In Relational Algebra, the


order is specified in which the In Relational Calculus, the order
Order
operations have to be is not specified.
3. performed.

Relational Algebra is Relation Calculus can be


Domain
4. independent of the domain. domain-dependent

Programming Relational Algebra is nearer Relational Calculus is nearer to


5. language to a programming language. natural language.

Inclusion in SQL includes some features SQL is based to a greater extent


6. SQL from the relational algebra. on the tuple relational calculus.

Relational Algebra is one of


the languages in which
For a database language to be
queries can be expressed but
Relationally relationally complete, the query
the queries should also be
completeness written in it must be expressible
expressed in relational
in relational calculus.
calculus to be relationally
7. complete.

The evaluation of the query


The order of operations does not
Query relies on the order
matter in relational calculus for
Evaluation specification in which the
the evaluation of queries.
8. operations must be performed.
Basis of
S.NO Comparison Relational Algebra Relational Calculus

For accessing the database,


For accessing the database,
relational algebra provides a
relational calculus provides a
solution in terms of what is
Database access solution in terms as simple as
required and how to get that
what is required and lets the
information by following a
system find the solution for that.
9. step-by-step description.

Completeness of a language is
measured in the manner that it is
The expressiveness of any
least as powerful as calculus.
given language is judged
Expressiveness That implies relation defined
using relational algebra
using some expression of
operations as a standard.
calculus is also definable by
10. some other expression

Functional Dependency
In relational database management, functional
dependency is a concept that specifies the relationship
between two sets of attributes where one attribute
determines the value of another attribute. It is denoted
as X → Y, where attribute set on the left side of the
arrow, X is called Determinant, Y is called Dependent.
Example:
roll_no name dept_name dept_building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4
From above table we can conclude some valid
functional dependencies:
 roll_no → { name, dept_name, dept_building },→

Here roll_no can determine values of name,


dept_name and dept_building, hence a valid
Functional dependency
 roll_no → dept_name , Since, roll_no can determine

whole set of {name, dept_name, dept_building}, it


can determine its subset dept_name also.
 dept_name → dept_building , Dept_name can
identify the dept_building accurately, since
departments with different dept_name will also
have a different dept_building

Here are some invalid functional dependencies:


 name → dept_name Students with the same name
can have different dept_name, hence this is not a
valid functional dependency.
 dept_building → dept_name There can be multiple
departments in the same building. Example, in the
above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an
invalid functional dependency.
Types of Functional Dependencies in DBMS
1. Trivial Functional Dependency
In this, a dependent is always a subset of determinant.
i.e. If X → Y and Y is the subset of X, then it is called
trivial functional dependency
Example:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional


dependency, since the dependent name is a subset of
determinant set {roll_no, name}.

2. Non-trivial Functional Dependency


In this, the dependent is strictly not a subset of the
determinant. i.e. If X → Y and Y is not a subset of X, then
it is called Non-trivial functional dependency.
Here {roll_no, name} → age is a non-trivial functional
dependency, since age is not a subset of {roll_no, name}

3. Multivalued Functional Dependency


In this, entities of dependent set are not dependent on
each other. If a → {b, c} and there exists no functional
dependency between b and c then it is called
a multivalued functional dependency.
Here, roll_no → {name, age} is a multivalued functional
dependency, since the dependents name & age are not
dependent on each other.

4. Transitive Functional Dependency


In this, dependent is indirectly dependent on
determinant. i.e. If a → b & b → c, then according to
axiom of transitivity, a → c.
For example,
enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence,


enrol_no → building_no is an indirect functional
dependency, called Transitive functional dependency.

5. Fully Functional Dependency


In this, an attribute or a set of attributes uniquely
determines another attribute or set of attributes. If an
attribute Q is fully functional dependent on another
attribute P, if it is Functionally Dependent on P and not
on any of the proper subset of P.
For ex. XY->Z then Z will depend completely on XY.

6. Partial Functional Dependency


In this, a non key attribute depends on a part of the
composite key, rather than the whole key. If a relation R
has attributes X, Y, Z where X and Y are the composite
key and Z is non key attribute. Then X->Z is a partial
functional dependency in RBDMS.
For ex. XY->Z then Z can either depend on X or Y
individually.

Advantages of Functional Dependencies


1. Data Normalization
Data normalization is the process of organizing data in a
database in order to minimize redundancy and increase
data integrity. Functional dependencies play an
important part in data normalization.
2. Query Optimization
With the help of functional dependencies we are able to
decide connectivity between tables and necessary
attributes need to be projected to retrieve the required
data from the tables. This helps in query optimization
and improves performance.
3. Consistency of Data
Functional dependencies ensures consistency of data by
removing any redundancies or inconsistencies that exist
in data. Functional dependency ensures changes made in
one attribute does not affect another set of attributes
thus it maintains consistency of data in database.
4. Data Quality Improvement
Functional dependencies ensure that data in database to
be accurate, complete, updated. This helps to improve
overall quality of data, eliminates errors and inaccuracies
that occur during data analysis and decision making, thus
it helps in improving the quality of data in database.

Closure of a set F of FDs is the set F+ of all FDs that can


be inferred from F

Data Modification Anomalies divided into three types:


o Insertion Anomaly: Insertion Anomaly refers to
when one cannot insert a new tuple into a
relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the

situation where the deletion of data results in the


unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when

an update of a single data value requires multiple


rows of data to be updated.
Normalization
A large database defined as a single relation may result in
data duplication. This repetition of data may result in:
o Making relations very large.

o It isn't easy to maintain and update data as it would

involve searching many records in relation.


o Wastage and poor utilization of disk space and

resources.
o The chances of errors and inconsistencies increases.

So to handle these problems, we should analyze and


decompose relations with redundant data into smaller,
simpler, and well-structured relations. Normalization can
be defined as
o Normalization is the process of organizing the data

in the database.
o Normalization is a process of decomposing the

relations into relations with fewer attributes.


o Normalization is used to minimize the redundancy

from a relation or set of relations. It is also used to


eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller

and links them using relationships.


o The normal form is used to reduce redundancy from

the database table.


Why do we need Normalization?
The main reason for normalizing the relations is
removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data
integrity and other problems as the database grows.

Types of Normal Forms


Normalization works through a series of stages called
Normal forms. The normal forms apply to individual
relations. The relation is said to be in particular normal
form if it satisfies constraints.

Following are the various types of Normal forms:


Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.


2NF A relation will be in 2NF if it is in 1NF and all non-key
attributes are fully functional dependent on the
primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition
dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's
normal form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal
form and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain
any join dependency, joining should be lossless.

Advantages of Normalization
o Normalization helps to minimize data redundancy.

o Greater overall database organization.

o Data consistency within the database.

o Much more flexible database design.

o Enforces the concept of relational integrity.


Disadvantages of Normalization
o You cannot start building database before knowing

what the user needs.


o The performance degrades when normalizing the

relations to higher normal forms, i.e., 4NF, 5NF.


o It is very time-consuming and difficult to normalize

relations of a higher degree.


o Careless decomposition lead to bad database design.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.

o It states that attribute of a table cannot hold

multiple values. It must hold single-valued attribute.


o First normal form disallows multi-valued attribute,

composite attribute, and their combinations.


o Example: Relation EMPLOYEE is not in 1NF because

of multi-valued attribute EMP_PHONE.


EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302
The decomposition of EMPLOYEE table into 1NF is below
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.

o In the second normal form, all non-key attributes are

fully functional dependent on the primary key


Example: School can store data of teachers and subjects.
In school, teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is
dependent on TEACHER_ID which is a proper subset of a
candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it
into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:
TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain

any transitive partial dependency.


o 3NF is used to reduce the data duplication. It is also
used to achieve the data integrity.
o If there is no transitive dependency for non-prime
attributes, then the relation must be in third normal
form.
A relation is in third normal form if it holds atleast one of
the following conditions for every non-trivial function
dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part
of some candidate key.
Example: EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


Super key in the table above:
1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_N
AME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In given table, all attributes
except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on
EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It
violate rule of third normal form. That's why we
need to move EMP_CITY and EMP_STATE to new
<EMPLOYEE_ZIP> table with EMP_ZIPas Primary key.
EMPLOYEE table
EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table
EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal
Boyce Codd normal form (BCNF)
o BCNF is the advance version of 3NF. It is stricter than

3NF.
o A table is in BCNF if every functional dependency X

→ Y, X is the super key of the table.


o For BCNF, the table should be in 3NF, and for every

FD, LHS is super key.


Example: Let's assume there is a company where
employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In above table Functional dependencies are as follows:


1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor
EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it
into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the
functional dependencies is a key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal

form and has no multi-valued dependency.


o For a dependency A → B, if for a single value of A,

multiple values of B exists, then the relation will be a


multi-valued dependency.
Example: STUDENT
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and


HOBBY are two independent entity. Hence, there is no
relationship between COURSE and HOBBY.
In the STUDENT relation, a student with
STU_ID 21 contains two courses Computer and Math and
two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to
unnecessary repetition of data.
So to make above table into 4NF, we can decompose it
into two tables:Backward Skip 10sPlay VideoForward Skip 10s
STUDENT_COURSE
STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY
STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any

join dependency and joining should be lossless.


5NF is satisfied when all the tables are broken into as
o

many tables as possible in order to avoid


redundancy.
o 5NF is also known as Project-join normal form

(PJ/NF).
Example
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math


class for Semester 1 but he doesn't take Math class for
Semester 2. In this case, combination of all these fields
required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do
not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all
three columns together acts as a primary key, so we can't
leave other two columns blank.
So to make the above table into 5NF, we can decompose
it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2
SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3
SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen
SQL
SQL stands for Structured Query Language. SQL is used to
manipulate underlying relational databases that are
queried by SQL like Oracle, MySQL, SQLite, etc.

Components of SQL
1. Keywords:
Keywords are reserved or non-reserved words. SQL-
reserved keywords are INTO, UPDATE, SELECT, DELETE,
DROP, DESC, and ASC.
2. Identifiers:
The database objects, like function name, schema name,
table name, etc., are named Identifiers.
3.Clauses:
The components of queries and SQL statements such as
WHERE, GROUP BY, HAVING, and ORDER BY are formed
by clauses.
4. Expression:
Either columns or scalar values and rows of data in SQL
produced by EXPRESSION.
5. Boolean Conditions:
The boolean value TRUE or FALSE is the result of the
Conditions, also called expressions. the effect of
statements or queries limited by this condition.
5. Queries:
The data based on specific criteria is retrieved by the SQL
statements. Queries are Statements that start with the
SELECT clause because they retrieve data from the
underlying database.
6. Statements:
SQL statements may persistently affect schema and data
or control transactions, program flow, connections,
sessions, or diagnostics. SQL statements are such as
INSERT, UPDATE, DROP, and DELETE statements since
they modify the database structure or data.

Basic use of SQL:


1. It modifies the database table and index structures.
2. It adds, updates, and deletes the rows of data.
3. Subsets of information from within the relational
database management system are retrieved by it.
The information from that can be used for the
analytical application, transaction processing, and
other applications which require communication
with a relational database.

SQL Data Types


Data types are used to represent the nature of the data
that can be stored in the database table. For example, in
a particular column of a table, if we want to store a string
type of data then we will have to declare a string data
type of this column.
Data types mainly classified into three categories for
every database.
o String Data types

o Numeric Data types

o Date and time Data types

MySQL String Data Types

CHAR(Size) It is used to specify a fixed length string that can contain numbers, letters, and special
characters. Its size can be 0 to 255 characters. Default is 1.

VARCHAR(Size) It is used to specify a variable length string that can contain numbers, letters, and
special characters. Its size can be from 0 to 65535 characters.

BINARY(Size) It is equal to CHAR() but stores binary byte strings. Its size parameter specifies the
column length in the bytes. Default is 1.

VARBINARY(Size) It is equal to VARCHAR() but stores binary byte strings. Its size parameter specifies the
maximum column length in bytes.

TEXT(Size) It holds a string that can contain a maximum length of 255 characters.

TINYTEXT It holds a string with a maximum length of 255 characters.

MEDIUMTEXT It holds a string with a maximum length of 16,777,215.

LONGTEXT It holds a string with a maximum length of 4,294,967,295 characters.

ENUM(val1, val2, It is used when a string object having only one value, chosen from a list of possible
val3,...) values. It contains 65535 values in an ENUM list. If you insert a value that is not in the
list, a blank value will be inserted.

SET( It is used to specify a string that can have 0 or more values, chosen from a list of
val1,val2,val3,....) possible values. You can list up to 64 values at one time in a SET list.
MySQL Numeric Data Types
BIT(Size) It is used for a bit-value type. The number of bits per value is specified in size. Its size can be
1 to 64. The default value is 1.

INT(size) It is used for the integer value. Its signed range varies from -2147483648 to 2147483647 and
unsigned range varies from 0 to 4294967295. The size parameter specifies the max display
width that is 255.

INTEGER(size) It is equal to INT(size).

FLOAT(size, d) It is used to specify a floating point number. Its size parameter specifies the total number of
digits. The number of digits after the decimal point is specified by d parameter.

FLOAT(p) It is used to specify a floating point number. MySQL used p parameter to determine whether
to use FLOAT or DOUBLE. If p is between 0 to24, the data type becomes FLOAT (). If p is from
25 to 53, the data type becomes DOUBLE().

DOUBLE(size, It is a normal size floating point number. Its size parameter specifies the total number of
d) digits. The number of digits after the decimal is specified by d parameter.

DECIMAL(size, It is used to specify a fixed point number. Its size parameter specifies the total number of
d) digits. The number of digits after the decimal parameter is specified by d parameter. The
maximum value for the size is 65, and the default value is 10. The maximum value for d is 30,
and the default value is 0.

DEC(size, d) It is equal to DECIMAL(size, d).

BOOL It is used to specify Boolean values true and false. Zero is considered as false, and nonzero
values are considered as true.

MySQL Date and Time Data Types


DATE It is used to specify date format YYYY-MM-DD. Its supported range is from '1000-01-01' to
'9999-12-31'.

DATETIME(fsp) It is used to specify date and time combination. Its format is YYYY-MM-DD hh:mm:ss. Its
supported range is from '1000-01-01 00:00:00' to 9999-12-31 23:59:59'.

TIMESTAMP(fsp) It is used to specify the timestamp. Its value is stored as the number of seconds since the
Unix epoch('1970-01-01 00:00:00' UTC). Its format is YYYY-MM-DD hh:mm:ss. Its supported
range is from '1970-01-01 00:00:01' UTC to '2038-01-09 03:14:07' UTC.
TIME(fsp) It is used to specify the time format. Its format is hh:mm:ss. Its supported range is from '-
838:59:59' to '838:59:59'

YEAR It is used to specify a year in four-digit format. Values allowed in four digit format from
1901 to 2155, and 0000.

Basic Queries of SQL


1.INSERT INTO Statement
This SQL statement inserts the data or records in the existing table of the SQL database.
This statement can easily insert single and multiple records in a single query statement.

Syntax of insert a single record:

1. INSERT INTO table_name


2. (
3. column_name1,
4. column_name2, .…,
5. column_nameN
6. )
7. VALUES
8. (value_1,
9. value_2, ..…,
10. value_N
11. );

Example of insert a single record:

1. INSERT INTO Employee_details


2. (
3. Emp_ID,
4. First_name,
5. Last_name,
6. Salary,
7. City
8. )
9. VALUES
10. (101,
11. Akhil,
12. Sharma,
13. 40000,
14. Bangalore
15. );

This example inserts 101 in the first column, Akhil in the second column, Sharma in the
third column, 40000 in the fourth column, and Bangalore in the last column of the
table Employee_details.

2.UPDATE Statement
This SQL statement changes or modifies the stored data in the SQL database.

Syntax of UPDATE Statement:

1. UPDATE table_name
2. SET column_name1 = new_value_1, column_name2 = new_value_2, ...., column_nameN = new_va
lue_N
3. [ WHERE CONDITION ];

Example of UPDATE Statement:

1. UPDATE Employee_details
2. SET Salary = 100000
3. WHERE Emp_ID = 10;

This example changes the Salary of those employees of the Employee_details table
whose Emp_ID is 10 in the table.

3. DELETE Statement
This SQL statement deletes the stored data from the SQL database.
Syntax of DELETE Statement:

1. DELETE FROM table_name


2. [ WHERE CONDITION ];

Example of DELETE Statement:

1. DELETE FROM Employee_details


2. WHERE First_Name = 'Sumit';

This example deletes the record of those employees from the Employee_details table
whose First_Name is Sumit in the table.

This example inserts 101 in the first column, Akhil in the second column, Sharma in the
third column, 40000 in the fourth column, and Bangalore in the last column of the
table Employee_details.

4.View in SQL
A view is a SQL statement stored in the database with a name linked to it. It can store all
table rows or only a few selected rows from the table. The user can create a view in SQL
using single or multiple tables. The users create a view so that the data stored in a
specific table can be represented as virtual tables. It also enables the administrator to
restrict access to the data so that the user can only view or edit exactly the particular
element of the table they want to without changing the rest.

Creating Views
If the user wants to create a view in the database, then the user can do so by
implementing CREATE VIEW statement. The user can use a single or multiple tables to
create views. Mainly the views are created by the database administrator.

Syntax to Implement VIEW

1. CREATE VIEW view_name AS


2. SELECT column1, column2.....
3. FROM table_name
4. WHERE [condition];
We have used a single table in the above syntax, but the user can include multiple tables
in the SELECT statement using the same syntax used in any other SQL SELECT query.

Query Processing in DBMS


Query Processing is the activity performed in extracting
data from the database. In query processing, it takes
various steps for fetching the data from the database.
The steps involved are:
1. Parsing and translation
2. Optimization
3. Evaluation
1.Parsing and Translation
SQL or Structured Query Language is the best suitable
choice for humans. But, it is not perfectly suitable for the
internal representation of query to system. Relational
algebra is well suited for internal representation of a
query. The translation process in query processing is
similar to the parser of a query. When a user executes
any query, for generating internal form of the query, the
parser in system checks the syntax of query, verifies the
name of relation in the database, the tuple, and finally
the required attribute value. The parser creates a tree of
the query, known as 'parse-tree.' Further, translate it into
form of relational algebra. With this, it evenly replaces all
the use of the views when used in the query.
Thus, we can understand the working of a query
processing in the below-described diagram:

Suppose a user executes a query. In SQL, a user wants to


fetch records of employees whose salary is greater than
or equal to 10000. For this, following query is executed.
select emp_name from Employee where salary>10000;
Thus, to make the system understand the user query, it
needs to be translated in the form of relational algebra.
We can bring this query in the relational algebra form as:
o σsalary>10000 (πsalary (Employee))
o πsalary (σsalary>10000 (Employee))
After translating the given query, we can execute each
relational algebra operation by using different
algorithms. So, in this way, a query processing begins its
working.
2.Optimization
o The cost of query evaluation can vary for different

types of queries. Although the system is responsible


for constructing the evaluation plan, the user does
need not to write their query efficiently.
o Usually, a database system generates an efficient

query evaluation plan, which minimizes its cost. This


type of task performed by the database system and
is known as Query Optimization.
o For optimizing a query, the query optimizer should

have an estimated cost analysis of each operation. It


is because overall operation cost depends on the
memory allocations to several operations, execution
costs, and so on.

3.Evaluation
With addition to the relational algebra translation, it is
required to annotate the translated relational algebra
expression with the instructions used for specifying and
evaluating each operation. Thus after translating the user
query, the system executes a query evaluation plan.
Query Evaluation Plan
o In order to fully evaluate a query, the system needs
to construct a query evaluation plan.
o The annotations in evaluation plan may refer to the
algorithms to be used for particular index or the
specific operations.
o Such relational algebra with annotations is known
as Evaluation Primitives. Evaluation primitives carry
instructions needed for evaluation of operation.
o Thus, a query evaluation plan defines a sequence of
primitive operations used for evaluating a query. The
query evaluation plan is also known as query
execution plan.
o A query execution engine is responsible for
generating the output of the given query. It takes
the query execution plan, executes it, and finally
makes the output for the user query.

Finally, after selecting an evaluation plan, the system


evaluates the query and produces the output of query.

Concurrency Control
Concurrency Control is the management procedure that
is required for controlling concurrent execution of the
operations that take place on a database.
But before knowing about concurrency control, we
should know about concurrent execution.
Concurrent Execution in DBMS
o In a multi-user system, multiple users can access and

use the same database at one time, which is known


as concurrent execution of the database. It means
that the same database is executed simultaneously
on a multi-user system by different users.
o While working on database transactions, multiple

users can perform different operations and in that


case concurrent execution of database is performed.
o The thing is that the simultaneous execution that is

performed should be done in a manner that no


operation should affect the other executing
operations, thus maintaining the consistency of the
database. Thus, on making the concurrent execution
of the transaction operations, there occur several
challenging problems that need to be solved.

Problems with Concurrent Execution


In a database transaction, the two main operations
are READ and WRITE operations. So, there is a need to
manage these two operations in concurrent execution of
the transactions as if these operations are not performed
in an interleaved manner, the data may become
inconsistent. So, the following problems occur with the
Concurrent Execution of the operations:

Problem 1: Lost Update Problems (W - W Conflict)


The problem occurs when two different database
transactions perform the read/write operations on the
same database items in an interleaved manner (i.e.,
concurrent execution) that makes the values of the items
incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions
TX and TY, are performed on the same account A where
the balance of account A is $300.
o At time t1, transaction TX reads the value of account
A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account

A that becomes $250 (only deducted and not


updated/write).
o Alternately, at time t3, transaction TY reads the value

of account A that will be $300 only because TX didn't


update the value yet.
o At time t4, transaction TY adds $100 to account A

that becomes $400 (only added but not


updated/write).
o At time t6, transaction TX writes the value of account

A that will be updated as $250 only, as TY didn't


update the value yet.
o Similarly, at time t7, transaction TY writes the values

of account A, so it will write as done at time t4 that


will be $400. It means the value written by TX is lost,
i.e., $250 is lost.
Hence data becomes incorrect, and database sets to
inconsistent.

2.Dirty Read Problems (W-R Conflict)


The dirty read problem occurs when one transaction
updates an item of the database, and somehow the
transaction fails, and before the data gets rollback, the
updated database item is accessed by another
transaction. There comes the Read-Write Conflict
between both transactions.
For example:
Consider two transactions TX and TY in the below
diagram performing read/write operations on account
A where the available balance in account A is $300:

o At time t1, transaction TX reads the value of account


A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that
becomes $350.
o At time t3, transaction TX writes the updated value in
account A, i.e., $350.
o Then at time t4, transaction TY reads account A that
will be read as $350.
o Then at time t5, transaction TX rollbacks due to
server problem, and the value changes back to $300
(as initially).
o But the value for account A remains $350 for

transaction TY as committed, which is the dirty read


and therefore known as the Dirty Read Problem.
Thus, in order to maintain consistency in the database
and avoid such problems that take place in concurrent
execution, management is needed, and that is where the
concept of Concurrency Control comes into role.

Concurrency Control
Concurrency Control is required for controlling and
managing the concurrent execution of database
operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.

Concurrency Control Protocols


The concurrency control protocols ensure the atomicity,
consistency, isolation, durability and serializability of the
concurrent execution of the database transactions.
Therefore, these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol

1. Lock-Based Protocol
In this type of protocol, any transaction cannot read
or write data until it acquires an appropriate lock on
it. There are two types of lock:

a) Shared lock:
 It is also known as a Read-only lock. In a shared
lock, data item can be only read by transaction.
 It can be shared between the transactions
because when the transaction holds a lock, then
it can't update the data on the data item.
b) Exclusive lock:
 In the exclusive lock, the data item can be both
reads as well as written by the transaction.
 This lock is exclusive and in this lock, multiple
transactions do not modify the same data
simultaneously.

2.Timestamp Ordering Protocol


o The Timestamp Ordering Protocol is used to order

the transactions based on their Timestamps. The


order of transaction is ascending order of the
transaction creation.
o The priority of older transaction is higher that's why

it executes first. To determine timestamp of the


transaction, this protocol uses system time or logical
counter.
o The lock-based protocol is used to manage the order
between conflicting pairs among transactions at the
execution time. But Timestamp based protocols start
working as soon as a transaction is created.
o Let's assume there are two transactions T1 and T2.
Suppose the transaction T1 has entered the system
at 007 times and transaction T2 has entered the
system at 009 times. T1 has the higher priority, so it
executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the
timestamp of last 'read' and 'write' operation on a
data.

Advantages of TO protocol
o This protocol ensures serializability since the
precedence graph is as follows:
o This protocol ensures freedom from deadlock that
means no transaction ever waits.

3.Validation Based Protocol


Validation phase is also known as optimistic concurrency
control technique. In the validation based protocol, the
transaction is executed in the following three phases:
1. Read phase: In this phase, transaction T is read and
executed. It is used to read the value of various data
items and stores them in temporary local variables.
It can perform all the write operations on temporary
variables without an update to the actual database.
2. Validation phase: In this phase, the temporary
variable value will be validated against the actual
data to see if it violates the serializability.
3. Write phase: If validation of transaction is validated,
then temporary results are written to the database
or system, otherwise the transaction is rolled back.

Here each phase has the following different timestamps:


Start(Ti): It contains time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its
read phase and starts its validation phase.
Finish(Ti): It contains time when Ti finishes write phase.
o The serializability is determined during the validation
process. It can't be decided in advance.
o While executing transaction, it ensures a greater
degree of concurrency and also less number of
conflicts.
o Thus it contains transactions which have less
number of rollbacks.

You might also like